Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOPN Parsing: Table Extraction Errors #1727

Open
VirginiaDooley opened this issue Jan 27, 2022 · 14 comments · Fixed by #1836
Open

SOPN Parsing: Table Extraction Errors #1727

VirginiaDooley opened this issue Jan 27, 2022 · 14 comments · Fixed by #1836

Comments

@VirginiaDooley
Copy link
Contributor

VirginiaDooley commented Jan 27, 2022

This issue is exclusively to track issues with SOPN Table Extraction.
For SOPN Parsing: Table Parsing Errors, go here: #1728
For SOPN Parsing: Page Extraction Errors, go here: #1726

Table extraction errors are typically found after a successful SOPN upload, during a bot parse. The bot fails to parse completely and the result is no pre-filled info in the bulk add form.

Please add these types of issues in the comments below with a

  • ballot_paper_id or link to candidates site
  • description of the error
  • screenshot
@VirginiaDooley
Copy link
Contributor Author

Missing first names #1426 (comment)

@VirginiaDooley VirginiaDooley changed the title WIP; SOPN Parsing: Table Extraction Errors SOPN Parsing: Table Extraction Errors Feb 1, 2022
@michaeljcollinsuk
Copy link
Contributor

Having looked in to #1728 (comment) a little bit locally, I think this is bug when extracting the tables. It seems that both the "surname" and "other name" are parsed together, as if they were in the same column. Debug print of the row parsed for this candidate where you can see the "surname" has both and "other name" is blank (need to scroll to see):

candidates surname                                                                                                                                                                             RICHARDSON  HARRY
other name                                                                                                                                                                                                      
address                                                                                                                             2 Manor Farm \nCottage Ashfield \nRoad Norton  \nBury St. Edmunds \nIP31 3NN
description                                                                                                                                                                         Conservative Party Candidate
decision of returning officer that nomination paper is invalid or other reason why a person nominated no longer stands nominated                                                                                

@symroe symroe pinned this issue Feb 11, 2022
@sjorford
Copy link
Contributor

sjorford commented Mar 14, 2022

Example of a SOPN where it was published as HTML, I printed it to PDF and the bot failed to parse it: https://candidates.democracyclub.org.uk/elections/local.dorset.lyme-charmouth.by.2022-04-07/sopn/

We don't get many of these, but posting this here in case it is useful

@symroe
Copy link
Member

symroe commented Apr 5, 2022

No name guess for Index(['surname other names', 'home address', 'description',
       'proposers name seconders name', ''],
      dtype='object')

for https://candidates.democracyclub.org.uk/elections/local.north-east-lincolnshire.croft-baker.2022-05-05/sopn/

@symroe
Copy link
Member

symroe commented Apr 5, 2022

No name guess for Index(['surname / cyfenw', 'other names / enwau eraill', 'home address1',
       'description', 'statement',
       'decision of returning officer that nomination paper is invalid or other reason why a person nominated no longer stands nominated penderfyniad y swyddog canlyniadau fod y papur yn ddirym neu reswm arall paham na chaiff person a enwebwyd barhau i fod felly'],
      dtype='object')

https://candidates.democracyclub.org.uk/elections/local.pembrokeshire.pembroke-st-michael.2022-05-05/sopn/

@symroe
Copy link
Member

symroe commented Apr 5, 2022

Error attempting to parse a table for local.watford.central.2022-05-05
No name guess for Index(['surname other names in full', 'home address', 'description',
       'proposers name', ''],
      dtype='object')

https://candidates.democracyclub.org.uk/elections/local.watford.central.2022-05-05/sopn/

@symroe
Copy link
Member

symroe commented Apr 5, 2022

Error attempting to parse a table for local.cheltenham.lansdown.2022-05-05
No name guess for Index(['surname other names in full', 'home address', 'description',
       'proposers name seconders name', ''],
      dtype='object')

@symroe
Copy link
Member

symroe commented Apr 6, 2022

Successfully added the Statement of Persons Nominated for local.swansea.gower.2022-05-05
Error attempting to parse a table for local.swansea.gower.2022-05-05
No name guess for Index(['enwr ymgeisydd candidate name',
       'disgrifiad or ymgeisydd description of candidate', '',
       'gwybodaeth am gyfeiriad cartref home address information',
       'gwybodaeth o ddatganiad aelodaeth plaid information from statement of party membership',
       'rheswm pam nad ywr ymgeisydd wedii enwebu mwyach reason why candidate no longer nominated'],
      dtype='object')

@symroe
Copy link
Member

symroe commented Apr 6, 2022

Successfully added the Statement of Persons Nominated for local.southwark.old-kent-road.2022-05-05
Couldn't find party for Description (if any).
Closest is The Justice & Anti-Corruption Party with similarity 0.12765957

@symroe
Copy link
Member

symroe commented Apr 6, 2022

Successfully added the Statement of Persons Nominated for local.runnymede.chertsey-st-anns.2022-05-05
Pages for table not known for document, extract page numbers first

@symroe
Copy link
Member

symroe commented Apr 6, 2022

Successfully added the Statement of Persons Nominated for local.wolverhampton.bilston-north.2022-05-05
No ParsedSOPN for local.wolverhampton.bilston-north.2022-05-05

@michaeljcollinsuk
Copy link
Contributor

michaeljcollinsuk commented Apr 6, 2022

Successfully added the Statement of Persons Nominated for local.southwark.old-kent-road.2022-05-05
Couldn't find party for Description (if any).
Closest is The Justice & Anti-Corruption Party with similarity 0.12765957

I think this is actually a good thing - the table on the second page includes the header row again, but the parser skips it because it can't find any party. But the rest of the people are parsed. Although would be better if we could remove the row entirely earlier on

@michaeljcollinsuk
Copy link
Contributor

Successfully added the Statement of Persons Nominated for local.runnymede.chertsey-st-anns.2022-05-05
Pages for table not known for document, extract page numbers first

Looking at this SOPN we have no hope of parsing unfortunately

@VirginiaDooley VirginiaDooley unpinned this issue Oct 25, 2022
@symroe symroe pinned this issue Apr 4, 2023
@symroe
Copy link
Member

symroe commented Apr 4, 2023

https://candidates.democracyclub.org.uk/elections/local.cambridgeshire.arbury.by.2023-05-04/sopn/ Marked OLIVER, Sam and BLACK, Mike as Independent incorrectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants