pdf-gold-digger/README.md

29 lines
612 B
Markdown
Raw Normal View History

pdf-gold-digger
====
Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/)
and [node.js](https://nodejs.org).
## Work in progress
### Usage
``git clone https://github.com/vane/pdf-gold-digger``
``gd -f some.pdf``
### Supports:
- extract text
- separate each page
- separate each line
- separate font information
- bounding box position
### TODO:
- specify output format and output directory
- output to xml format
- output to json format
- extract images to files
- extract font
- extract tables
- advanced font information
- extract forms
- extract drawings