Pada modul ini kita akan membuat embedding untuk plot dari movies. Data dapat download di Kaggle Wikipedia Movie Plots.
Tujuan dari tutorial ini adalah menggunakan plot movies yang di embedd menggunakan OpenAI untuk merekomendasikan plot yang mirip.
Untuk mempermudah mengikuti, gunakan Jupyter Notebook.
Import Library
Pertama kita akan import library yang digunakan. Jika Anda mengikuti tutorial dari awal, maka library yang baru yang belum diinstall adalah tenacity.
Untuk install library gunakan perintah pip
$ pip install tenacity
import openai from dotenv import dotenv_values from tenacity import retry, wait_random_exponential, stop_after_attempt import pickle import tiktoken
Load API key dan Data
Pada tutorial kita akan gunakan film origin dari America dan 2000 film terbaru.
config = dotenv_values(".env") openai.api_key = config["OPENAI_KEY"] data_path = "./wiki_movie_plots_deduped.csv" df = pd.read_csv(data_path) movies = df[df["Origin/Ethnicity"]=="American"].sort_values("Release Year", ascending=False).head(2000) movie_plots = movies["Plot"].values
Code Perhitungan Estimasi Biaya
enc = tiktoken.encoding_for_model("text-embedding-ada-002") total_tokens = sum([len(enc.encode(plot)) for plot in movie_plots]) cost = total_tokens * (.0004 / 1000) print(f"Estimated cost ${cost:.2f}")
Fungsi Helper
Fungsi untuk get embedding dari model.
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6)) def get_embedding(text, model="text-embedding-ada-002"): # replace newlines, which can negatively affect performance. text = text.replace("\n", " ") return openai.Embedding.create(input=text, model=model)["data"][0]["embedding"]
Code untuk menyimpan embedding ke dalam cache dan fungsi mengambil embedding string.
# establish a cache of embeddings to avoid recomputing # cache is a dict of tuples (text, model) -> embedding, saved as a pickle file # set path to embedding cache embedding_cache_path = "movie_embeddings_cache2.pkl" # load the cache if it exists, and save a copy to disk try: embedding_cache = pd.read_pickle(embedding_cache_path) except FileNotFoundError: embedding_cache = {} with open(embedding_cache_path, "wb") as embedding_cache_file: pickle.dump(embedding_cache, embedding_cache_file) # define a function to retrieve embeddings from the cache if present, and otherwise request via the API def embedding_from_string( string, model="text-embedding-ada-002", embedding_cache=embedding_cache ): """Return embedding of given string, using a cache to avoid recomputing.""" if (string, model) not in embedding_cache.keys(): embedding_cache[(string, model)] = get_embedding(string, model) print(f"GOT EMBEDDING FROM OPENAI FOR {string[:20]}") with open(embedding_cache_path, "wb") as embedding_cache_file: pickle.dump(embedding_cache, embedding_cache_file) return embedding_cache[(string, model)]
Melakukan embedding
Untuk melakukan embedding, kita akan jalankan fungsi embedding_from_string dengan parameter string diisi dengan plot movies dan model digunakan adalah text-embedding-ada-002.
plot_embeddings = [embedding_from_string(plot, model="text-embedding-ada-002") for plot in movie_plots]
Perintah diatas memakan waktu cukup lama, tergantung jumlah movies yang Anda gunakan.
Membuat Fungsi Recomendations
from openai.embeddings_utils import distances_from_embeddings, indices_of_nearest_neighbors_from_distances def print_recommendations_from_strings( strings, index_of_source_string, k_nearest_neighbors=3, model="text-embedding-ada-002" ): #Get all of the embeddings embeddings = [embedding_from_string(string) for string in strings] # get embedding for our specific query string query_embedding = embeddings[index_of_source_string] # get distances between our embedding and all other embeddings distances = distances_from_embeddings(query_embedding, embeddings) # get indices of the nearest neighbors indices_of_nearest_neighbors = indices_of_nearest_neighbors_from_distances(distances) query_string = strings[index_of_source_string] match_count = 0 for i in indices_of_nearest_neighbors: if query_string == strings[i]: continue if match_count >= k_nearest_neighbors: break match_count += 1 print(f"Found {match_count} closest match: ") print(f"Distance of: {distances[i]} ") print(strings[i])
Gunakan fungsi diatas untuk memberikan rekomendasi movies berdasarkan plot yang disediakan.
print_recommendations_from_strings(movie_plots, 2)
Perintah diatas akan mengembalikan plot rekomendasi seperti berikut:
Found 1 closest match:
Distance of: 0.08445958228190453
As a spacecraft departs a planet, a humanoid alien drinks an iridescent liquid and then dissolves. The remains of the alien cascade into a waterfall. The alien's DNA strands mix with the water.
In 2089, archaeologists Elizabeth Shaw and Charlie Holloway discover a star map in Scotland that matches others from several unconnected ancient cultures. They interpret this as an invitation from humanity's forerunners, the "Engineers". Peter Weyland, the elderly CEO of Weyland Corporation, funds an expedition, aboard the scientific vessel Prometheus, to follow the map to the distant moon LV-223. The ship's crew travels in stasis while the android David monitors their voyage. Arriving in December 2093, mission-director Meredith Vickers informs them of their mission to find the Engineers and not to make contact without her permission.
The Prometheus lands on the barren, mountainous surface near a large, artificial structure, which a team explores. Inside, they find stone cylinders, a monolithic statue of a humanoid head, and the decapitated corpse of a large alien, thought to be an Engineer; Shaw recovers its head. The crew finds other bodies, leading them to surmise the species is extinct. Crew members Millburn and Fifield grow uncomfortable with the discoveries and attempt to return to Prometheus, but become stranded in the structure when they get lost. The expedition is cut short when a storm forces the crew to return to the ship. David secretly takes a cylinder from the structure, while the remaining cylinders begin leaking a dark liquid. In the ship's lab, the Engineer's DNA is found to match that of humans. David investigates the cylinder and the liquid inside. He intentionally taints a drink with the liquid and gives it to the unsuspecting Holloway, who had stated he would do anything for answers. Shortly after, Shaw and Holloway have sex.
Inside the structure, a snake-like creature kills Millburn and sprays a corrosive fluid that melts Fifield's helmet. Fifield falls face-first into a puddle of dark liquid. When the crew returns, they find Millburn's corpse. David separately discovers a control room containing a surviving Engineer in stasis, and a large 3D holographic star map highlighting Earth. Meanwhile, Holloway sickens rapidly. He is rushed back to Prometheus, but Vickers refuses to let him aboard, and at his urging, burns him to death with a flamethrower. Later, a medical scan reveals that Shaw, despite being sterile, is pregnant. Fearing the worst, she uses an automated surgery table to extract a squid-like creature from her abdomen. Shaw then discovers that Weyland has been in stasis aboard Prometheus. He explains that he wants to ask the Engineers to prevent his death from old age. As Weyland prepares to leave for the structure, Vickers addresses him as "Father".
A monstrous, mutated Fifield returns to the Prometheus and kills several crew members before he is killed. The Prometheus' captain, Janek, speculates that the structure was an Engineer military base that lost control of a virulent biological weapon, the dark liquid. He also determines that the structure houses a spacecraft. Weyland and a team return to the structure, accompanied by Shaw. David wakes the Engineer from stasis and speaks to him in an attempt to explain what Weyland wants. The Engineer responds by decapitating David and killing Weyland and his team, before reactivating the spacecraft. Shaw flees and warns Janek that the Engineer is planning to release the liquid on Earth, convincing him to stop the spacecraft. Janek and the remaining crew sacrifice themselves by ramming the Prometheus into the alien craft, ejecting the lifeboat in the process, while Vickers flees in an escape pod. The Engineer's disabled spacecraft crashes onto the ground, killing Vickers. Shaw goes to the lifeboat and finds her alien offspring is alive and has grown to gigantic size. David's still-active head warns Shaw that the Engineer is pursuing her. The Engineer forces open the lifeboat's airlock and attacks Shaw, who releases her alien offspring onto the Engineer; it thrusts an ovipositor down the Engineer's throat, subduing him. Shaw recovers David's remains, and with his help, launches another Engineer spacecraft. She intends to reach the Engineers' homeworld in an attempt to understand why they wanted to destroy humanity.
In the lifeboat, an alien creature bursts out of the Engineer's chest.
Found 2 closest match:
Distance of: 0.13632821786235982
In 2004, a satellite detects a mysterious heat bloom beneath Bouvetøya, an island about one thousand miles off the coast of Antarctica. Wealthy industrialist Charles Bishop Weyland (Lance Henriksen) discovers through thermal imaging that there is a pyramid buried 2000 feet beneath the ice. He attempts to claim it for his multinational communications company, Weyland Industries, a subsidiary of the Weyland Corporation, and assembles a team of experts to investigate. The team includes archaeologists, linguistic experts, drillers, mercenaries, and a guide named Alexa Woods (Sanaa Lathan).
As a Predator ship reaches Earth's orbit, it fires a beam that creates a passage through the ice towards the source of the heat bloom. When the team arrives at the abandoned whaling station above the heat source, they find the passage and descend beneath the ice. They locate the mysterious pyramid and begin to explore it, finding evidence of a prehistoric civilization and what appears to be a sacrificial chamber filled with human skeletons with ruptured rib cages.
Meanwhile, three Predators consisting of Scar, Celtic and Chopper arrive and kill all the humans on the surface. They make their way down to the pyramid and arrive just as the team unwittingly activates the structure. The Alien Queen awakes from cryogenic stasis and begins to produce eggs. When the eggs hatch, several facehuggers attach themselves to humans trapped in the sacrificial chamber. Chestbursters emerge from the humans and quickly grow into adult Aliens. Conflict erupts between the Predators, Aliens, and humans, resulting in several deaths. Celtic and Chopper are killed by an Alien, and Weyland buys Alexa and Italian archaeologist Sebastian De Rosa (Raoul Bova) enough time to escape from Scar, giving his life in the process. The two witness Scar kill a facehugger and an Alien with a shuriken before unmasking and marking himself with the blood of the facehugger. After Alexa and Sebastian leave, another facehugger attaches itself to Scar due to him not wearing his mask.
Through translation of the pyramid's hieroglyphs, Alexa and Sebastian learn that the Predators have been visiting Earth for thousands of years. It was they who taught early human civilizations how to build pyramids, and were worshiped as gods. Every 100 years they visited Earth to take part in a rite of passage by which several humans sacrifice themselves as hosts for the Aliens, creating the "ultimate prey" for the Predators to hunt while being able to survive in the pyramid; if overwhelmed, the Predators would activate a self-destruct device to eliminate the Aliens and themselves. The two deduce that this is why the current Predators are at the pyramid, and that the heat bloom was to attract humans for the sole purpose of making new Aliens to hunt.
Alexa and Sebastian decide that the Predators must be allowed to succeed in their hunt so that the Aliens do not escape to the surface. Sebastian is captured by an Alien, leaving only Alexa and Scar to fight the Aliens. Scar uses parts of a dead Alien to fashion weapons for Alexa and the two form an alliance. The Queen Alien, using her acid blood, is freed from her restraints and begins pursuing, along with the other Aliens, Alexa and Scar. Just as they are about to escape, they use a self-destruct device to destroy the pyramid and the remaining Aliens. Alexa and Scar reach the surface, however the Alien Queen has survived and continues chasing them. They defeat the Queen by attaching its chain to a water tower and pushing her over a cliff, dragging the Queen to the ocean floor. Scar, however, had been impaled by the Alien Queen's tail and succumbs to his wounds and dies.
A Predator ship uncloaks and several Predators appear. They retrieve their fallen comrade and an elite Predator presents Alexa with one of their spear weapons as a gift. The other Predators recognize her for her skill as a warrior symbolized by the alien blood Scar burned on her cheek before he died. As the Predators retreat into space, a chestburster with a hybrid form of an Alien and a Predator erupts from Scar's chest.
Found 3 closest match:
Distance of: 0.1365166895341312
In the near future, the unmanned Pilgrim 7 space probe returns from Mars to Earth orbit with soil samples potentially containing evidence of extraterrestrial life. The probe is captured and its samples retrieved by the International Space Station and its six-member crew. Exobiologist Hugh Derry, who is paralyzed from the waist down, revives a dormant cell from the sample, which quickly grows into a multi-celled organism that American school children name "Calvin". Hugh realizes that Calvin's cells can change their specialisation, acting as muscle, sensor, and neuron cells all at once.
An accident in the lab causes Calvin to become dormant; Hugh attempts to revive Calvin with electric shocks, but Calvin immediately becomes hostile and attacks Hugh, crushing his hand. While Hugh lies unconscious from Calvin's attack, Calvin uses Hugh's electric shock tool to escape its enclosure; now free in the laboratory, Calvin devours a lab rat by absorbing it, growing in size. Engineer Rory Adams enters the lab to rescue Hugh; he is locked in by fellow crew member and physician David Jordan, however, to keep Calvin contained. Calvin latches onto Rory's leg; after Rory unsuccessfully attacks Calvin with a portable rocket thruster, Calvin enters his mouth, devouring his organs from the inside and killing him. Emerging from Rory's mouth even larger, Calvin escapes through a fire-control vent.
Finding their communication with Earth cut off due to overheating of the communication systems, ISS commander Ekaterina Golovkina performs a space walk to find and fix the problem. She discovers that Calvin has breached the ISS's cooling system; soon after, Calvin attacks her, rupturing her spacesuit's water coolant system in the process and causing the water in the system to fill her suit. She blindly makes her way back to the airlock; however, she and the crew realize that if she re-enters, Calvin will also be able to re-enter the ISS. Hence, she refuses to open the hatch and stops David from helping her do so; this keeps Calvin out of the station for the time being but also causes her to drown and die in her spacesuit.
Calvin then attempts to re-enter the station through its maneuvering thrusters. The crew try to fire the thrusters to blast Calvin away from the spacecraft, but their attempts fail, using up too much fuel and causing the ISS to enter a decaying orbit where it will burn up in Earth's atmosphere. Pilot Sho Murakami informs the crew that they need to use the station's remaining fuel to get back into a safe orbit, although this allows Calvin to re-enter the station. The crew plan to make Calvin dormant by sealing themselves into one module and venting the atmosphere from the rest of the station.
After the remaining crew finalize preparations to do so, Hugh enters cardiac arrest. The crew then discover that Calvin has been feeding off of Hugh's paralyzed leg. Calvin attacks the remainder of the crew; Sho seals himself in a sleeping pod. As Calvin attempts to break its glass, David and the quarantine officer Miranda North use Hugh's corpse as bait to lure Calvin away from Sho and trap it in another module to deprive it of oxygen. Having received a distress call prior to the damage to the ISS's communication system, Earth sends a Soyuz spacecraft to the station as a fail-safe plan to push the station into deep space. Believing the Soyuz to be on a rescue mission for the ISS crew, Sho leaves his pod and moves to board it, forcing open its hatch; Calvin then attacks him and the Soyuz crew, causing the craft's docking mechanism to fail and resulting in the capsule crashing into the ISS, killing Sho and the Soyuz crew and causing the ISS to once again enter a decaying orbit.
The only two remaining survivors, David and Miranda, aware that Calvin could survive re-entry, plan for David to lure Calvin into one of the two remaining escape pods attached to the ISS for David to manually pilot the pod into deep space, isolating Calvin and allowing Miranda to return to Earth in the second pod. David manages to lure Calvin into his pod while Miranda enters her pod; as they simultaneously undock their pods from the ISS, one of the pods hits debris and is damaged, veering off course. In David's pod, Calvin attacks him as he struggles to manually pilot the pod; in Miranda's pod, she records a black box message in case of her death during re-entry informing the world of her colleagues' deaths and not to trust Calvin nor any extraterrestrial life from Mars as well as to destroy Calvin at any cost should he make his way to Earth.
The two pods separate, one earthbound, the other spiraling away from Earth. The earthbound pod lands in the ocean; two nearby Vietnamese fishermen approach it. As they look into the pod, it is revealed to be David's, the astronaut now encased in a web-like substance. Meanwhile, Miranda's pod's navigation system fails due to damage sustained from the debris, sending her flying away from Earth out of control, much to her horror. On Earth, despite David attempting to warn the fishermen, the fishermen open the pod's hatch. Meanwhile, more boats arrive.
Program diatas masih mentah, bukan production ready apps. Anda bisa kembangkan agar dapat menampilkan judul movies yang direkomendasikan.
Tujuannya dari modul ini adalah bagaimana menggunakan embedding untuk melakukan recommendation.