Semantic Search Over a Movie Knowledge Graph
Combine graph traversals with semantic search to query knowledge that isn't explicitly stored in the database.
CogDB lets you combine standard graph traversals with semantic search in a single query chain. In this setup, the graph handles the explicit structure (who directed what, who starred in what), while the vector model brings in external world knowledge that isn't actually stored in the database, for example, an actor's nationality or awards.
Here's a look at how they work together.
Setup
Install CogDB:
pip install cogdbWe're going to use a basic dataset of 100 films from 15 countries. The following code fetches the dataset, loads it into a graph, and vectorizes all nodes:
import requests
from cog.torque import Graph
url = "https://raw.githubusercontent.com/arun1729/data/refs/heads/master/movies.json"
data = requests.get(url).json()
g = Graph("movies")
triples = [(s, p, o) for s, p, o in data["triples"]]
g.put_batch(triples)
g.vectorize()The data is just triples. Every film has a director, genre, language, country, cast, and a one-sentence plot description. Importantly, there is no external metadata attached to the actors. There is no birthplaces, nationalities or awards data.
Queries
1. Finding British actors (without nationality data)
g.v().has("type","actor").k_nearest("British actor", k=3).tag("actor").inc().has("type","film").all()
# Output:
# {'result': [{'id': "The King's Speech", 'actor': 'Colin Firth'},
# {'id': 'Trainspotting', 'actor': 'Ewan McGregor'},
# {'id': 'Skyfall', 'actor': 'Daniel Craig'}]}Because the graph lacks a nationality field for actors, a standard graph query would fail here. However, the embedding model already knows who Colin Firth, Ewan McGregor, and Daniel Craig are. The query uses the vector model to identify names semantically similar to "British actor," and then uses the graph to traverse back to the films they starred in.
2. Searching plots by concept to find the director
g.v().has("type", "film").out("plot").k_nearest("scifi related to aliens", k=1).inc("plot").tag("film").out("directed_by").all()
# Output:
# {'result': [{'id': 'Denis Villeneuve', 'film': 'Arrival'}]}This query filters down to films, follows the edges to their plot descriptions, runs a semantic search on those descriptions, traces back to the film, and finally follows the edge to the director.
None of the plots in the dataset actually contain the words "scifi" or "aliens." The plot for Arrival is simply: "A linguist races against time to communicate with extraterrestrial visitors before global tensions explode." The embedding model bridges the vocabulary gap between the query and the raw data.
3. Finding films with Oscar-winning actors (without awards data)
g.v().has("type", "film").k_nearest("with oscar winning actors", k=3).tag("movie").out().has("type","actor").all()
# Output:
# {'result': [{'id': 'Al Pacino', 'movie': 'The Godfather'},
# {'id': 'Marlon Brando', 'movie': 'The Godfather'},
# {'id': 'Ray Liotta', 'movie': 'Goodfellas'},
# {'id': 'Robert De Niro', 'movie': 'Goodfellas'},
# {'id': 'Shelley Duvall', 'movie': 'The Shining'},
# {'id': 'Jack Nicholson', 'movie': 'The Shining'}]}Just like the nationality example, there is no "Oscar" or "awards" field in this graph. The model relies on its pre-trained knowledge to identify films featuring Oscar-winning actors (like Brando, Pacino, De Niro, and Nicholson). Once the model flags the right films, the graph takes over to pull the full cast lists for those specific movies.
What's happening under the hood
The default embedding model (BGE-small-en-v1.5) acts as a stand-in for missing metadata (like cultural associations and career history), while the graph maintains the rigid structural relationships between the nodes.
Here is a breakdown of the query components used in the examples above:
| Component | What it does |
|---|---|
g.v().has("k", "v") | Filters nodes by a specific property. |
.out("edge") / .inc("edge") | Traverses the graph relationships (outgoing or incoming). |
.k_nearest("text", k=N) | Runs the semantic similarity search via the vector model. |
.tag("label") | Tags a specific node so it's remembered in the final output. |