vane/pdf-gold-digger

Michal Szczepanski b3d9b10317 Add output formatter and json output

2019-07-22 22:46:05 +02:00

698 B

Raw Blame History

pdf-gold-digger

Pdf information extraction library based on pdf.js and node.js.

Work in progress

Usage

git clone https://github.com/vane/pdf-gold-digger
node gd.js -f some_file.pdf

Supports:

extract text
- separate each page
- separate each line
- separate font information
- bounding box position
output to text -o text (default)
output to json -o json

TODO:

specify output format and output directory
output to xml format
~~output to json format~~
extract images to files
extract font
extract tables
advanced font information
extract forms
extract drawings