You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently migrated to Docspell from paperless-ngx and it's been pretty great so far. Unfortunately I'm running into one issue, which is that while Docspell can store my Powerpoint presentations, it can't index or display them. Paperless solved this using Apache Tika as an optional extension so to speak, and this seems like something Docspell could do as well, especially since it is already designed to work with external services (Solr). I would love Tika (and by extension, more supported formats) to be integrated with Docspell, either as a core feature or an addon (and if such an addon exists I would appreciate a pointer to it - the addon system is a bit opaque to me).
The text was updated successfully, but these errors were encountered:
A quick search shows that Tika is already partially used in Docspell, which is great. I saw a note by a maintainer who mentioned that the full Tika package is too big to be bundled with Docspell, but maybe a config option could be added to use an external Tika instance instead of the internal stripped-down one.
Hi @KiARC, powerpoint is unfortunately not in the supported file formats for Docspell. While docspell can work with solr for fulltext search, adding another external service will still increase complexity a lot. I think there are two options for me: 1) it could be done as an addon outside of Docspell that is maintained separately. 2) Since docspell includes the poi library, there is a good chance it supports at least some powerpoint "diallects". Then it could be done directly in docspell and not as an addon.
Both variants are not likely to happen soon, though, unless someone who is not me :-) is giving it a try.
I recently migrated to Docspell from paperless-ngx and it's been pretty great so far. Unfortunately I'm running into one issue, which is that while Docspell can store my Powerpoint presentations, it can't index or display them. Paperless solved this using Apache Tika as an optional extension so to speak, and this seems like something Docspell could do as well, especially since it is already designed to work with external services (Solr). I would love Tika (and by extension, more supported formats) to be integrated with Docspell, either as a core feature or an addon (and if such an addon exists I would appreciate a pointer to it - the addon system is a bit opaque to me).
The text was updated successfully, but these errors were encountered: