Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--fts ignores --parameters, --field, --sort #593

Open
gingerbeardman opened this issue Apr 26, 2023 · 5 comments
Open

--fts ignores --parameters, --field, --sort #593

gingerbeardman opened this issue Apr 26, 2023 · 5 comments

Comments

@gingerbeardman
Copy link

gingerbeardman commented Apr 26, 2023

Hi,

I am doing ia search --parameters="..."

...but I do not know what parameters it accepts.

Is there a list or documentation anywhere?

My goal is to return a small number of results sorted by most recently "added" first.

  • on the website that is sort=-publicdate
  • and in advanced search it is sort createdate desc
  • this page says sort_by=-addeddate

But those do not seem to work with ia search, or maybe I am doing it wrong?

I have also tried

  • ia search --parameters="rows=10" --sort="addeddate desc" "hanafuda"
  • ia search --parameters="rows:10" --sort="created_on desc" "hanafuda"

Any help appreciated.

Thanks!

@gingerbeardman
Copy link
Author

gingerbeardman commented Apr 27, 2023

OK, I figured it out and support seems to be missing, so I will rename the issue.

ia search 'hanafuda' --parameters rows:10 --field addeddate --sort "addeddate desc"

  • returns expected results (GOOD)

But...

ia search 'hanafuda' --fts --parameters rows:10 --field addeddate --sort "addeddate desc"

  • returns more rows than requested (BAD)
  • returns unsorted results (BAD)

I am using:

  • pip install internetarchive
  • version 3.4.0

@gingerbeardman gingerbeardman changed the title List of valid search parameters? FTS ignores parameters, field and sort Apr 27, 2023
@gingerbeardman gingerbeardman changed the title FTS ignores parameters, field and sort --fts ignores parameters, field and sort Apr 27, 2023
@gingerbeardman gingerbeardman changed the title --fts ignores parameters, field and sort --fts ignores parameters, field, sort Apr 27, 2023
@gingerbeardman gingerbeardman changed the title --fts ignores parameters, field, sort --fts ignores --parameters, --field, --sort Apr 27, 2023
@jjjake
Copy link
Owner

jjjake commented Apr 28, 2023

The confusion here is that ia search uses various endpoints depending on several things. It uses the Scrape API by default, Advanced Search when either rows or page parameters are specified, and our beta FTS API when either --fts or --dsl-fts are specified.

The reasoning behind this is because the Advanced Search API is not designed for scraping/retrieving full result sets (it's capable of doing so, but it's not designed for it). The Scrape API is designed for dumping full result sets. I assume that most people want full result sets when using ia search, and that's why the Scrape API is the default. When a user specifies that they only want a subset of the results (i.e. via page or rows params), then Advanced Search is used.

Then there's the FTS API. This is in beta, is not currently documented publicly, and is subject to change. The specific parameter you're after though is size as opposed to rows:

» ia search 'hanafuda' --fts --parameters size:10 | wc -l
      10

--fields is not currently supported with --fts, all indexed fields are returned by default. addeddate is not returned, but publicdate is (under .fields.meta_publicdate). Sorting is not supported in the beta FTS API at this time.

Sorry for the confusion. We hope to consolidate these endpoints in the future!

@gingerbeardman
Copy link
Author

gingerbeardman commented Apr 28, 2023

Thanks @jjjake very informative. I'll keep an eye on progress.

It seems very wasteful to query the whole set when I only want the most X recent (for example any new items since the last time I did the query). But maybe I'm overthinking it!? I prefer to keep things lean and save time and electricity on this earth.

@chgans
Copy link

chgans commented May 12, 2023

The "beta FTS API" doesn't seem to point to the right endpoint.
results from "ia search" are not the same as the one used by https://archive.org/search?query=...
JS from this page uses https://archive.org/services/search/beta/page_production/, which return cleaner results.

Is there any plan to switch to that endpoint?

@jjjake
Copy link
Owner

jjjake commented May 19, 2023

@chgans be-api.us.archive.org/ia-pub-fts-api is the current recommendation from the developers of our FTS beta API. We do hope to consolidate our search endpoints in the future though. Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants