Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ny scraped opinions need docket number disambiguation #4469

Open
grossir opened this issue Sep 17, 2024 · 0 comments
Open

ny scraped opinions need docket number disambiguation #4469

grossir opened this issue Sep 17, 2024 · 0 comments

Comments

@grossir
Copy link
Contributor

grossir commented Sep 17, 2024

Example docket, with docket number "No. 86", has 2 clusters assigned, one from 1983, another from 2024

This happens because the docket number we are getting seems to be an ordinal, which repeats through the years. So, we should probably disambiguate with a secondary field

I got all ny dockets with a cluster with date filed greater than 2020-01-01 from the API, and found 118 with more than 1 cluster, out of 377. Some are indeed proper matches, some are not. This count is a sub-estimation, since, as seen on the above example, it also happens with older dockets. A direct DB query may give a proper count

Related to #4256

Dockets with more than 2 clusters:

[{'id': 3038338,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9394763/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9485213/',
   'https://www.courtlistener.com/api/rest/v3/clusters/3179488/'],
  'docket_number': '22'},
 {'id': 3037451,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9392746/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9485216/',
   'https://www.courtlistener.com/api/rest/v3/clusters/3178601/'],
  'docket_number': '17'},
 {'id': 2656105,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9433816/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9475205/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2800016/'],
  'docket_number': '59'},
 {'id': 2633904,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9393783/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9476468/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2779006/'],
  'docket_number': '13'},
 {'id': 2633903,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9385269/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2779005/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9475202/'],
  'docket_number': '12'},
 {'id': 2395115,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9433812/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9494594/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2531090/'],
  'docket_number': '69'},
 {'id': 2351791,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9392747/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9483870/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2484382/'],
  'docket_number': '25'},
 {'id': 2351656,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9392745/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9451989/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9486392/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2484257/'],
  'docket_number': '28'},
 {'id': 2351139,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9434706/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9494598/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2483841/'],
  'docket_number': '70'},
 {'id': 2313430,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9401133/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2441111/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9495617/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9496761/'],
  'docket_number': '41'},
 {'id': 2311070,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9385266/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2438725/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9443343/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9483871/'],
  'docket_number': '27'},
 {'id': 2310926,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9385267/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2438615/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9476470/'],
  'docket_number': '11'},
 {'id': 2310883,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9385265/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9477380/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2438563/'],
  'docket_number': '14'},
 {'id': 2309587,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9393784/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9485212/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2437247/'],
  'docket_number': '21'},
 {'id': 1963852,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9393782/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9483869/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2114345/'],
  'docket_number': '26'},
 {'id': 1895406,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9405974/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2077737/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9494597/'],
  'docket_number': '71'},
 {'id': 1889099,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9401131/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9494593/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2053948/'],
  'docket_number': '40'},
 {'id': 1889083,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9459188/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9401132/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2053886/'],
  'docket_number': '43'},
 {'id': 1816633,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9434707/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9502673/',
   'https://www.courtlistener.com/api/rest/v3/clusters/1987501/'],
  'docket_number': '48'},
 {'id': 1451336,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9401170/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9495620/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2742889/'],
  'docket_number': '42'},
 {'id': 1137365,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9400044/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9495619/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2440006/'],
  'docket_number': '38'},
 {'id': 1082102,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9401134/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2059909/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9496757/'],
  'docket_number': '39'},
 {'id': 712101,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9502672/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9406624/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2012198/'],
  'docket_number': '46'},
 {'id': 176080,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9502671/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9406627/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2483632/'],
  'docket_number': '50'},
 {'id': 101352,
  'clusters': ['https://www.courtlistener.com/api/rest/v3/clusters/9485214/',
   'https://www.courtlistener.com/api/rest/v3/clusters/9486393/',
   'https://www.courtlistener.com/api/rest/v3/clusters/2435378/'],
  'docket_number': '1'}]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant