Skip to content

nikhilponnuru/DistillTable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

A pdf table extracter and presents the extracted data in csv format. getting output in the form of json, xml is currently in testing stage.

Dependencies 1)pdftohtml ---this tool must be installed and must be used for making the given pdf to xml command:-- pdftohtml filename.pdf -xml output--filename.xml
2)lxml parser is required
3)beautiful soup 4 is required

steps:--

1)After converting given pdf to xml using pdftohtml tool using above command then

2)use command : -- (change directory to where code.py is placed)
python code.py -f filename.xml > /path/to/destination_filename.csv

i.e the output csv is redirected to destination filename using ">" operator and /path/to/ :-is the path where final csv output must be copied to

[ the above I have tested and used in linux --ubuntu 14.04 ]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages