Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for other common formats #2798

Open
KiARC opened this issue Sep 26, 2024 · 2 comments
Open

Support for other common formats #2798

KiARC opened this issue Sep 26, 2024 · 2 comments

Comments

@KiARC
Copy link

KiARC commented Sep 26, 2024

I recently migrated to Docspell from paperless-ngx and it's been pretty great so far. Unfortunately I'm running into one issue, which is that while Docspell can store my Powerpoint presentations, it can't index or display them. Paperless solved this using Apache Tika as an optional extension so to speak, and this seems like something Docspell could do as well, especially since it is already designed to work with external services (Solr). I would love Tika (and by extension, more supported formats) to be integrated with Docspell, either as a core feature or an addon (and if such an addon exists I would appreciate a pointer to it - the addon system is a bit opaque to me).

@KiARC
Copy link
Author

KiARC commented Sep 26, 2024

A quick search shows that Tika is already partially used in Docspell, which is great. I saw a note by a maintainer who mentioned that the full Tika package is too big to be bundled with Docspell, but maybe a config option could be added to use an external Tika instance instead of the internal stripped-down one.

@eikek
Copy link
Owner

eikek commented Sep 27, 2024

Hi @KiARC, powerpoint is unfortunately not in the supported file formats for Docspell. While docspell can work with solr for fulltext search, adding another external service will still increase complexity a lot. I think there are two options for me: 1) it could be done as an addon outside of Docspell that is maintained separately. 2) Since docspell includes the poi library, there is a good chance it supports at least some powerpoint "diallects". Then it could be done directly in docspell and not as an addon.

Both variants are not likely to happen soon, though, unless someone who is not me :-) is giving it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants