From b75b5c73c671f4415306bd6d35dc3d54c692a9ba Mon Sep 17 00:00:00 2001 From: Michal Szczepanski Date: Wed, 24 Jul 2019 21:21:37 +0200 Subject: [PATCH] Update README.md --- README.md | 44 +++++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 420eb6f..310ce95 100644 --- a/README.md +++ b/README.md @@ -4,15 +4,18 @@ pdf-gold-digger Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/) and [node.js](https://nodejs.org). -### Install +## Install ```npm install -g pdf-gold-digger``` - -### Usage -```pdfdig -i some_file.pdf``` -for help use : -```pdfdig -h``` +## Usage ```bash +pdfdig -i some_file.pdf +``` + +## Avaliable commands + +```bash +pdfdig -h ex. pdfdig -i input-file -o output_directory -f json --input or -i pdf file location (required) @@ -22,36 +25,35 @@ ex. pdfdig -i input-file -o output_directory -f json --help or -h display this help message ``` - -#### or test by clonning repository -```git clone https://github.com/vane/pdf-gold-digger``` -then run -```sh demo.sh``` +## Advanced usage +```bash +git clone https://github.com/vane/pdf-gold-digger +sh demo.sh +``` and see results in ```out``` directory - -### Documentation url +## Documentation [pdf-gold-digger](https://vane.pl/pdf-gold-digger/) - -## Work in progress - -### Supports: +## Features: - extract text - separate each page - separate each line - separate font information - - bounding box position (probably buggy now) - extract images -- output to text ```-f text (default)``` -- output to json ```-f json``` +- output formats + - text ```-f text (default)``` + - json ```-f json``` - specify output directory -### TODO: +## TODO: +- extract text + - bounding box position - load pdf from remote location - from url - output to xml format - output to html format +- output to markdown format - output to zip - extract font - extract tables