Update README.md
This commit is contained in:
parent
2211abb068
commit
b75b5c73c6
44
README.md
44
README.md
@ -4,15 +4,18 @@ pdf-gold-digger
|
|||||||
Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/)
|
Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/)
|
||||||
and [node.js](https://nodejs.org).
|
and [node.js](https://nodejs.org).
|
||||||
|
|
||||||
### Install
|
## Install
|
||||||
```npm install -g pdf-gold-digger```
|
```npm install -g pdf-gold-digger```
|
||||||
|
|
||||||
|
## Usage
|
||||||
### Usage
|
|
||||||
```pdfdig -i some_file.pdf```
|
|
||||||
for help use :
|
|
||||||
```pdfdig -h```
|
|
||||||
```bash
|
```bash
|
||||||
|
pdfdig -i some_file.pdf
|
||||||
|
```
|
||||||
|
|
||||||
|
## Avaliable commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pdfdig -h
|
||||||
ex. pdfdig -i input-file -o output_directory -f json
|
ex. pdfdig -i input-file -o output_directory -f json
|
||||||
|
|
||||||
--input or -i pdf file location (required)
|
--input or -i pdf file location (required)
|
||||||
@ -22,36 +25,35 @@ ex. pdfdig -i input-file -o output_directory -f json
|
|||||||
--help or -h display this help message
|
--help or -h display this help message
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Advanced usage
|
||||||
#### or test by clonning repository
|
```bash
|
||||||
```git clone https://github.com/vane/pdf-gold-digger```
|
git clone https://github.com/vane/pdf-gold-digger
|
||||||
then run
|
sh demo.sh
|
||||||
```sh demo.sh```
|
```
|
||||||
and see results in ```out``` directory
|
and see results in ```out``` directory
|
||||||
|
|
||||||
|
## Documentation
|
||||||
### Documentation url
|
|
||||||
[pdf-gold-digger](https://vane.pl/pdf-gold-digger/)
|
[pdf-gold-digger](https://vane.pl/pdf-gold-digger/)
|
||||||
|
|
||||||
|
## Features:
|
||||||
## Work in progress
|
|
||||||
|
|
||||||
### Supports:
|
|
||||||
- extract text
|
- extract text
|
||||||
- separate each page
|
- separate each page
|
||||||
- separate each line
|
- separate each line
|
||||||
- separate font information
|
- separate font information
|
||||||
- bounding box position (probably buggy now)
|
|
||||||
- extract images
|
- extract images
|
||||||
- output to text ```-f text (default)```
|
- output formats
|
||||||
- output to json ```-f json```
|
- text ```-f text (default)```
|
||||||
|
- json ```-f json```
|
||||||
- specify output directory
|
- specify output directory
|
||||||
|
|
||||||
### TODO:
|
## TODO:
|
||||||
|
- extract text
|
||||||
|
- bounding box position
|
||||||
- load pdf from remote location
|
- load pdf from remote location
|
||||||
- from url
|
- from url
|
||||||
- output to xml format
|
- output to xml format
|
||||||
- output to html format
|
- output to html format
|
||||||
|
- output to markdown format
|
||||||
- output to zip
|
- output to zip
|
||||||
- extract font
|
- extract font
|
||||||
- extract tables
|
- extract tables
|
||||||
|
Loading…
Reference in New Issue
Block a user