Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOPN Parsing: Page Extraction Errors #1726

Open
VirginiaDooley opened this issue Jan 27, 2022 · 16 comments
Open

SOPN Parsing: Page Extraction Errors #1726

VirginiaDooley opened this issue Jan 27, 2022 · 16 comments

Comments

@VirginiaDooley
Copy link
Contributor

VirginiaDooley commented Jan 27, 2022

This issue is exclusively to track issues with SOPN Page Extraction.
For SOPN Parsing: Table Parsing Errors, go here: #1728
For SOPN Parsing: Table Extraction Errors, go here: #1727

Page extraction errors are typically when trying to upload a SOPN upload. Most common errors include:

  • A multipage/multi-ward SOPN fails to match pages resulting in the entire document being uploaded rather than the page(s) for that particular ward
  • An image based or other file type which isn't currently supported

Please add these types of issues in the comments below with a

  • ballot_paper_id or link to candidates site
  • description of the error
  • screenshot
@VirginiaDooley
Copy link
Contributor Author

Page matching error: #1426 (comment)

@VirginiaDooley VirginiaDooley changed the title WIP; SOPN Parsing: Page Extraction Errors SOPN Parsing: Page Extraction Errors Feb 1, 2022
@symroe symroe pinned this issue Feb 11, 2022
@symroe
Copy link
Member

symroe commented Mar 30, 2022

https://candidates.democracyclub.org.uk/elections/local.west-lothian.livingston-south.2022-05-05/sopn/ (and other SOPNs for that election) don't match pages. Chances are this is because the ward names are in the table header.

@boothym
Copy link

boothym commented Mar 31, 2022

Hi, it seems the Fife Council one has problems as each table is spread over two pages in the PDF. https://candidates.democracyclub.org.uk/elections/local.fife.burntisland-kinghorn-and-western-kirkcaldy.2022-05-05/sopn/

@jf1
Copy link

jf1 commented Apr 6, 2022

Wigan strangeness - the correct pages have been used by the parser for all the LA (so far) but the link in the Ashton ward goes to another ward's SoPN
https://candidates.democracyclub.org.uk/elections/local.wigan.ashton.2022-05-05/sopn/

@jf1
Copy link

jf1 commented Apr 6, 2022

It's also joined the Hindley and Hindley Green wards, suggesting it's not strict enough when considering if a ward stretches onto two pages of a SoPN.
https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.wigan.hindley.2022-05-05/?edit=1
I wonder if page splitting was offset by one as a result.

...it then processed the Hindley Green page (again) for that ward without issue

@jf1
Copy link

jf1 commented Apr 6, 2022

Wigan Winstanley ward - it offered the wrong candidate names and linked to the wrong (page of the) SoPN
https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.wigan.winstanley.2022-05-05/

@gregorywilliams
Copy link

https://candidates.democracyclub.org.uk/elections/local.oxford.cowley.2022-05-05/
Should have been Cowley ward, but extracted page was for Littlemore ward. The correct ward is available in the linked
https://www.oxford.gov.uk/download/downloads/id/7948/statement_as_to_persons_nominated_-_city_elections_on_5_may_2022.pdf

@gregorywilliams
Copy link

@jf1
Copy link

jf1 commented Apr 11, 2022

This 4-page single ward PDF incorrectly generated a "Watch out! The original document contains candidate info for 2 areas." warning https://candidates.democracyclub.org.uk/elections/local.tower-hamlets.bethnal-green-west.2022-05-05/sopn/

@jf1
Copy link

jf1 commented Apr 11, 2022

Same with https://candidates.democracyclub.org.uk/elections/local.tower-hamlets.bethnal-green-east.2022-05-05/sopn/
Both were .docx on their website and initially DC had PDFs with different formatting so I re-did these two, and got the "2 areas" message after uploading each one.

@VirginiaDooley VirginiaDooley unpinned this issue Oct 25, 2022
@symroe symroe pinned this issue Apr 4, 2023
@sjorford
Copy link
Contributor

sjorford commented Apr 6, 2023

local.lichfield.boney-hay-central.2023-05-04 - the pages for Boney Hay & Central and Bourne Vale wards have been combined

@it3986
Copy link
Contributor

it3986 commented Apr 6, 2023

Exeter SOPNs don't appear to have been parsed by the bot - I've looked at the first 3 so far.
https://candidates.democracyclub.org.uk/elections/local.exeter.alphington.2023-05-04/

image

@it3986
Copy link
Contributor

it3986 commented Apr 6, 2023

DocX file for Torbay Council doesn't appear to have been understood by the bot.
Again I've checked the first 3 wards and they all have the same symptoms. Pages are matched but tables not extracted and no bot suggestions on the bulk adding screen.

https://candidates.democracyclub.org.uk/bulk_adding/sopn/local.torbay.churston-with-galmpton.2023-05-04/

[Edit] Later Wards within this SOPN document have not been page matched by the bot and required manual (Ctrl + F) Searching to even find the correct page of the SOPN to manually add the candidates.

image

@Bekabyx
Copy link
Contributor

Bekabyx commented Apr 6, 2023

Parser fail for Mapperley in Nottingham. Haven't checked the other wards yet but it seems to have picked up the wrong page when parsing.

Screenshot 2023-04-06 at 22 53 26

Screenshot 2023-04-06 at 22 53 46

Screenshot 2023-04-06 at 22 53 54

@VirginiaDooley
Copy link
Contributor Author

Sandwell St. Paul’s is in a limbo half-broken state. The page extraction failed but the table parsing succeeded (albeit in a slightly janky format). The SOPN uploaded is the entire combined PDF file. The suspect for this strange breakage was the backtick in the ward name although Virginia has checked this out and can’t see a problem with it. https://candidates.democracyclub.org.uk/elections/local.sandwell.st-pauls.2023-05-04/sopn/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants