Get documents back from vector store #9392
Replies: 2 comments 1 reply
-
🤖 Hello @steve36871! 😊 I'm Dosu, a friendly bot here to lend a hand with your LlamaIndex inquiries. Don't worry, a human maintainer will be with you soon, but in the meantime, I'm here to help with any troubleshooting, questions, or contributions you might be considering. By the way, I'm quite the card – my creators gave me a measure of wit and a dash of sarcasm. Don't mind me, just making our technical discussions a bit more lively! 🦙🎉 Yes, it is indeed possible to retrieve LlamaIndex Document objects from the Qdrant vector store. The fields 'document_id', 'node_id', 'start_char_idx', and 'end_char_idx' are used as follows:
To retrieve the Document objects, you can use the Here is an example of how to use the query = VectorStoreQuery(query_embedding=my_embedding, similarity_top_k=10)
result = qdrant_vector_store.query(query)
for node in result.nodes:
print(node.node_id, node.start_char_idx, node.end_char_idx) In this example, Regarding your question about the 'start_char_idx' and 'end_char_idx' fields, these are handled during the conversion of the vector store response to a VectorStoreQueryResult in the 'parse_to_query_result' method. When a response is received from the vector store, each point in the response is processed. The payload of the point, which contains the metadata of the node, is converted to a node using the 'metadata_dict_to_node' function. If this conversion fails, the code falls back to a legacy logic where the 'start_char_idx' and 'end_char_idx' fields are directly accessed from the node_info dictionary which is derived from the payload. Here is the relevant code snippet: def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
"""Convert vector store response to VectorStoreQueryResult.
Args:
response: List[Any]: List of results returned from the vector store.
"""
from qdrant_client.http.models import Payload
nodes = []
similarities = []
ids = []
for point in response:
payload = cast(Payload, point.payload)
try:
node = metadata_dict_to_node(payload)
except Exception:
# NOTE: deprecated legacy logic for backward compatibility
logger.debug("Failed to parse Node metadata, fallback to legacy logic.")
metadata, node_info, relationships = legacy_metadata_dict_to_node(
payload
)
node = TextNode(
id_=str(point.id),
text=payload.get("text"),
metadata=metadata,
start_char_idx=node_info.get("start", None),
end_char_idx=node_info.get("end", None),
relationships=relationships,
)
nodes.append(node)
similarities.append(point.score)
ids.append(str(point.id))
return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids) In this code, 'start_char_idx' and 'end_char_idx' are extracted from the 'node_info' dictionary with the keys 'start' and 'end' respectively. If these keys are not present in the dictionary, the values will be set to None. You can find more details in the llama_index/vector_stores/qdrant.py file. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-bot can qdrant act as a document store also? |
Beta Was this translation helpful? Give feedback.
-
Hello! I am happily using llama-index with Qdrant as a vector store, but would like to get llama-index Document objects back from the vectorstore. I realize this might not be easy (considering chunking and overlap), but still want to attempt it.
I noticed that there is a document_id field stored as well as a node_id for each node. I also saw that there are 'start_char_idx' and 'end_char_idx' fields. I thought these indicated which part of the document the node referred to, but it seems I got that wrong, because:
Is there enough information stored to make it possible? If so: how should I interpret the fields?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions