Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate value for merged cell instead of None #422

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

tungph
Copy link

@tungph tungph commented Apr 22, 2021

See #420

@codecov
Copy link

codecov bot commented Apr 22, 2021

Codecov Report

Merging #422 (c9073fa) into develop (4407362) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head c9073fa differs from pull request most recent head 828799b. Consider uploading reports for the commit 828799b to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #422      +/-   ##
===========================================
- Coverage    98.28%   98.26%   -0.02%     
===========================================
  Files           10       10              
  Lines         1226     1213      -13     
===========================================
- Hits          1205     1192      -13     
  Misses          21       21              
Impacted Files Coverage Δ
pdfplumber/table.py 100.00% <100.00%> (ø)
pdfplumber/cli.py 100.00% <0.00%> (ø)
pdfplumber/convert.py 100.00% <0.00%> (ø)
pdfplumber/container.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4407362...828799b. Read the comment docs.

@tungph tungph force-pushed the issue-420 branch 2 times, most recently from 1391153 to 0c4c7f2 Compare April 22, 2021 12:58
@tungph tungph changed the base branch from stable to develop April 26, 2021 04:04
@tungph
Copy link
Author

tungph commented Apr 26, 2021

I saw #387:

  • That PRs should be submitted against develop, not stable
  • That PRs should contain tests
  • Run linter and reformat code
  • Adding oneself to the list of contributors
  • Update CHANGELOG.md

@jsvine
Copy link
Owner

jsvine commented Apr 26, 2021

Hi @tungph, thanks for your interest in this library and thank you for the PR. I agree that pdfplumber's current approach to complex tables is not optimal, but it is the result of not wanting to over-impose assumptions about any given table. The handling of merged cells in tables is a tricky topic, and one with a lot of edge cases — what may be a good solution for one set of tables may be a poor one for others. So I'll need to consider this suggestion and PR somewhat carefully, and think about (and test) other PDFs.

Relatedly: I have an idea for providing a richer representation of complex tables — one that would make the structure of the tables much clearer — but it would be a substantial change and so it will probably have to wait for v0.6.0.

@tungph
Copy link
Author

tungph commented May 4, 2021

Thank you for the consideration, @jsvine. Please let me know if you found bug with my PR. I'm happy to provide fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants