Word Embeddings & Vector Search
Store word embeddings and perform semantic fitlering in CogDB.
Word embeddings represent words as vectors, useful for text classification, similarity, and clustering. Popular word embeddings include GloVe and FastText.
Word Embeddings API
put_embedding
g.put_embedding("orange", [0.1, 0.2, 0.3, 0.4, 0.5])put_embeddings_batch
embeddings = [
("apple", [0.1, 0.2, 0.3]),
("banana", [0.3, 0.4, 0.5]),
("orange", [0.7, 0.8, 0.9]),
]
g.put_embeddings_batch(embeddings)Use put_embeddings_batch() for bulk loading - much faster than individual puts.
get_embedding
g.get_embedding("orange")
# [0.1, 0.2, 0.3, 0.4, 0.5]delete_embedding
g.delete_embedding("orange")K-Nearest Neighbor Search
g.v().k_nearest(word, k).all()Find the k most similar embeddings using cosine similarity.
g = Graph("embeddings_graph")
g.put("vec1", "type", "vector")
g.put("vec2", "type", "vector")
g.put("vec3", "type", "vector")
g.put_embedding("vec1", [0.1, 0.2, 0.3])
g.put_embedding("vec2", [0.1, 0.2, 0.4])
g.put_embedding("vec3", [0.9, 0.8, 0.7])
result = g.v().k_nearest("vec1", k=2).all()
# {'result': [{'id': 'vec1'}, {'id': 'vec2'}]}Loading Pre-trained Embeddings
load_glove
g = Graph("glove")
count = g.load_glove("glove.6B.100d.txt", limit=10000)
result = g.v().k_nearest("king", k=5).all()load_gensim
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format("word2vec.bin", binary=True)
g = Graph("w2v")
g.load_gensim(model, limit=50000)Similarity Search
sim
sim(word, operator, value)Filter vertices by cosine similarity.
Operators: >, <, >=, <=, ==, !=, in [min, max]
g.v().sim("orange", ">", 0.35).all()
# {'result': [{'id': 'clementines'}, {'id': 'tangerine'}, {'id': 'orange'}]}
g.v().sim("orange", "in", [0.25, 0.35]).all()
# {'result': [{'id': 'banana'}, {'id': 'apple'}]}