Update README.md

This commit is contained in:
Michal Szczepanski 2019-07-24 21:21:37 +02:00 committed by GitHub
parent 2211abb068
commit b75b5c73c6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -4,15 +4,18 @@ pdf-gold-digger
Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/) Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/)
and [node.js](https://nodejs.org). and [node.js](https://nodejs.org).
### Install ## Install
```npm install -g pdf-gold-digger``` ```npm install -g pdf-gold-digger```
## Usage
### Usage
```pdfdig -i some_file.pdf```
for help use :
```pdfdig -h```
```bash ```bash
pdfdig -i some_file.pdf
```
## Avaliable commands
```bash
pdfdig -h
ex. pdfdig -i input-file -o output_directory -f json ex. pdfdig -i input-file -o output_directory -f json
--input or -i pdf file location (required) --input or -i pdf file location (required)
@ -22,36 +25,35 @@ ex. pdfdig -i input-file -o output_directory -f json
--help or -h display this help message --help or -h display this help message
``` ```
## Advanced usage
#### or test by clonning repository ```bash
```git clone https://github.com/vane/pdf-gold-digger``` git clone https://github.com/vane/pdf-gold-digger
then run sh demo.sh
```sh demo.sh``` ```
and see results in ```out``` directory and see results in ```out``` directory
## Documentation
### Documentation url
[pdf-gold-digger](https://vane.pl/pdf-gold-digger/) [pdf-gold-digger](https://vane.pl/pdf-gold-digger/)
## Features:
## Work in progress
### Supports:
- extract text - extract text
- separate each page - separate each page
- separate each line - separate each line
- separate font information - separate font information
- bounding box position (probably buggy now)
- extract images - extract images
- output to text ```-f text (default)``` - output formats
- output to json ```-f json``` - text ```-f text (default)```
- json ```-f json```
- specify output directory - specify output directory
### TODO: ## TODO:
- extract text
- bounding box position
- load pdf from remote location - load pdf from remote location
- from url - from url
- output to xml format - output to xml format
- output to html format - output to html format
- output to markdown format
- output to zip - output to zip
- extract font - extract font
- extract tables - extract tables