Extract data from pdf
lib | ||
.gitignore | ||
LICENSE | ||
main.js | ||
package.json | ||
README.md |
pdf-gold-digger
Pdf information extraction library based on pdf.js and node.js.
Work in progress
Supports:
- extract text
- separate each page
- separate each line
- separate font information
- bounding box position
TODO:
- specify output format and output directory
- output to xml format
- output to json format
- extract images to files
- extract font
- extract tables
- advanced font information
- extract forms
- extract drawings