Word Embeddings & Vector Search
Store word embeddings and perform k-nearest neighbor search in CogDB.
Word embeddings represent words as vectors, useful for text classification, similarity, and clustering. Popular types include GloVe and FastText.
Word Embeddings API
put_embedding
g.put_embedding("orange", [0.1, 0.2, 0.3, 0.4, 0.5])put_embeddings_batch
embeddings = [
("apple", [0.1, 0.2, 0.3]),
("banana", [0.3, 0.4, 0.5]),
("orange", [0.7, 0.8, 0.9]),
]
g.put_embeddings_batch(embeddings)Use put_embeddings_batch() for bulk loading - much faster than individual puts.
get_embedding
g.get_embedding("orange")
# [0.1, 0.2, 0.3, 0.4, 0.5]delete_embedding
g.delete_embedding("orange")K-Nearest Neighbor Search
g.v().k_nearest(word, k).all()Find the k most similar embeddings using cosine similarity.
g = Graph("embeddings_graph")
g.put("vec1", "type", "vector")
g.put("vec2", "type", "vector")
g.put("vec3", "type", "vector")
g.put_embedding("vec1", [0.1, 0.2, 0.3])
g.put_embedding("vec2", [0.1, 0.2, 0.4])
g.put_embedding("vec3", [0.9, 0.8, 0.7])
result = g.v().k_nearest("vec1", k=2).all()
# {'result': [{'id': 'vec1'}, {'id': 'vec2'}]}Loading Pre-trained Embeddings
load_glove
g = Graph("glove")
count = g.load_glove("glove.6B.100d.txt", limit=10000)
result = g.v().k_nearest("king", k=5).all()load_gensim
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format("word2vec.bin", binary=True)
g = Graph("w2v")
g.load_gensim(model, limit=50000)Similarity Search
sim
sim(word, operator, value)Filter vertices by cosine similarity.
Operators: >, <, >=, <=, ==, !=, in [min, max]
g.v().sim("orange", ">", 0.35).all()
# {'result': [{'id': 'clementines'}, {'id': 'tangerine'}, {'id': 'orange'}]}
g.v().sim("orange", "in", [0.25, 0.35]).all()
# {'result': [{'id': 'banana'}, {'id': 'apple'}]}