Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Add a short definition of mentions to the guide #16

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tillprochaska
Copy link
Contributor

@tillprochaska tillprochaska commented Jun 29, 2022

Mentions and especially the difference between mentions and entities seem to be grasp for many Aleph users. While this doesn’t go into the details (e.g. regarding XREF), I thought it might be a good idea to briefly explain this concept in the user guide.

Closes #15

@@ -26,6 +26,10 @@ Each entity type contains a fixed set of possible properties to describe relevan

This structured vocabulary allows entities to be more easily searched, filtered, and cross-referenced with other data sources to find relevant co-occurrences and further enrich your investigation.

## Mentions

When you upload unstructured documents (for example, PDF documents) to Aleph, Aleph tries to extract names, locations, IBAN account numbers, and more from the document contents. While entities contain structured data, mentions are simple text parts Aleph recognized in a document. You can search Aleph for other datasets and documents with matching mentions. For example, if an uploaded document mentions an IBAN, Aleph allows you to search for other datasets and documents that mention the same IBAN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably say:

When you upload unstructured data (for example, PDF documents), Aleph will try to extract names, locations, IBANs, and more from the document contents and store these as mentions. Mentions are different from entities in that they're stored as text and not FtM entities.

You can search Aleph for other datasets and documents with matching mentions. For example, if an uploaded document mentions an IBAN, Aleph allows you to search for other datasets and documents that mention the same IBAN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention the limitation of mentions in XREF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably say:

When you upload unstructured data (for example, PDF documents), Aleph will try to extract names, locations, IBANs, and more from the document contents and store these as mentions. Mentions are different from entities in that they're stored as text and not FtM entities.

I have updated the wording according to your suggestion, but have removed the reference to FtM, as we do not explain what FtM is in the user-facing documentation, so user would probably be confused about it. Does that look good for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention the limitation of mentions in XREF?

Yes, I think that would be sensible. There are two reasons why I didn’t add that to the documentation:

  1. The only documentation we currently have is this page which basically only contains a link to slides for a presentation Kirk gave some time ago. We might want to consider turning that into a docs article at some point -- once we have that, I think it would be sensible to add details on XREF, including the limitations regarding mentions.

  2. To be honest, it’s still a little unclear to me in what cases mentions are considered during XREF. Based on what Eric explained, my understanding was the following:

    • Mentions in the current dataset are matched against entities from other datasets.
    • Mentions in the current dataset are not matched against mentions in other datasets.
    • Entities in the current dataset are not matched against mentions in other datasets.

    However, Jan suggested in a Wiki comment that this might not be 100% true. I wanted to confirm this with Eric.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more information on mentions
2 participants