pdf-gold-digger ==== Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/) and [node.js](https://nodejs.org). ### Install ``npm install -g pdf-gold-digger`` ### Usage ``pdfdig -i some_file.pdf`` see for help for all options ``pdfdig -h`` ### Documentation url [pdf-gold-digger](https://vane.pl/pdf-gold-digger/) #### or test by clonning repository ``git clone https://github.com/vane/pdf-gold-digger`` then run ``sh demo.sh`` and see results in ``out`` directory ## Work in progress ### Supports: - extract text - separate each page - separate each line - separate font information - bounding box position (probably buggy now) - extract images - output to text ``-f text (default)`` - output to json ``-f json`` ### TODO: - specify output directory - output to xml format - ~~output to json format~~ - ~~extract images to files~~ - extract font - extract tables - advanced font information - extract forms - extract drawings