pdf-gold-digger/README.md

pdf-gold-digger
====

Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/)
and [node.js](https://nodejs.org).

## Install
```npm install -g pdf-gold-digger```

## Usage
```bash
pdfdig -i some_file.pdf
```  

## Avaliable commands

```bash
pdfdig -h
ex. pdfdig -i input-file -o output_directory -f json
  
  --input  or  -i   pdf file location (required)
  --output or  -o   pdf file location (optional default "out")
  --debug  or  -d   show debug information (optional - default "false")
  --format or  -f   format (optional - default "text") - ("text,json"): 
  --help   or  -h   display this help message
```

## Advanced usage
```bash
git clone https://github.com/vane/pdf-gold-digger
sh demo.sh
```
and see results in ```out``` directory 
                            
## Documentation
[pdf-gold-digger](https://vane.pl/pdf-gold-digger/)

## Features:
- extract text
  - separate each page
  - separate each line
  - separate font information
- extract images
- output formats
  - text ```-f text (default)```
  - json ```-f json```
- specify output directory

## TODO:
- extract text
  - bounding box position
- load pdf from remote location
  - from url    
- output to xml format
- output to html format
- output to markdown format
- output to zip
- extract font
- extract tables
- advanced font information
- extract forms
- extract drawings
Add LICENSE, README update package.json with valid repository url 2019-07-22 20:20:37 +02:00			`pdf-gold-digger`
			`====`

			`Pdf information extraction library based on [pdf.js](https://mozilla.github.io/pdf.js/)`
			`and [node.js](https://nodejs.org).`

Update README.md 2019-07-24 21:21:37 +02:00			`## Install`
Update README.md 2019-07-23 08:55:39 +02:00			```npm install -g pdf-gold-digger```
Update README.md 2019-07-23 01:22:16 +02:00
Update README.md 2019-07-24 21:21:37 +02:00			`## Usage`
			```bash
			`pdfdig -i some_file.pdf`
			```

			`## Avaliable commands`
Add LICENSE, README update package.json with valid repository url 2019-07-22 20:20:37 +02:00
Update README.md 2019-07-23 08:55:39 +02:00			```bash
Update README.md 2019-07-24 21:21:37 +02:00			`pdfdig -h`
Update README.md 2019-07-23 08:55:39 +02:00			`ex. pdfdig -i input-file -o output_directory -f json`

			`--input or -i pdf file location (required)`
			`--output or -o pdf file location (optional default "out")`
			`--debug or -d show debug information (optional - default "false")`
			`--format or -f format (optional - default "text") - ("text,json"):`
			`--help or -h display this help message`
			```
Update README.md 2019-07-23 01:22:16 +02:00
Update README.md 2019-07-24 21:21:37 +02:00			`## Advanced usage`
			```bash
			`git clone https://github.com/vane/pdf-gold-digger`
			`sh demo.sh`
			```
Update README.md 2019-07-23 08:55:39 +02:00			and see results in ```out``` directory
Update README.md move documentation link 2019-07-23 08:59:59 +02:00
Update README.md 2019-07-24 21:21:37 +02:00			`## Documentation`
Update README.md move documentation link 2019-07-23 08:59:59 +02:00			`[pdf-gold-digger](https://vane.pl/pdf-gold-digger/)`

Update README.md 2019-07-24 21:21:37 +02:00			`## Features:`
Add LICENSE, README update package.json with valid repository url 2019-07-22 20:20:37 +02:00			`- extract text`
			`- separate each page`
			`- separate each line`
			`- separate font information`
Update README.md with documentation location and package.json keywords 2019-07-23 05:16:28 +02:00			`- extract images`
Update README.md 2019-07-24 21:21:37 +02:00			`- output formats`
			- text ```-f text (default)```
			- json ```-f json```
Update README.md with new todo / done 2019-07-23 06:21:52 +02:00			`- specify output directory`
Add LICENSE, README update package.json with valid repository url 2019-07-22 20:20:37 +02:00
Update README.md 2019-07-24 21:21:37 +02:00			`## TODO:`
			`- extract text`
			`- bounding box position`
Update README.md TODO list 2019-07-23 05:47:58 +02:00			`- load pdf from remote location`
Update README.md with new todo / done 2019-07-23 06:21:52 +02:00			`- from url`
Add LICENSE, README update package.json with valid repository url 2019-07-22 20:20:37 +02:00			`- output to xml format`
Update README.md TODO list 2019-07-23 05:47:58 +02:00			`- output to html format`
Update README.md 2019-07-24 21:21:37 +02:00			`- output to markdown format`
Update README.md with new todo / done 2019-07-23 06:21:52 +02:00			`- output to zip`
Add LICENSE, README update package.json with valid repository url 2019-07-22 20:20:37 +02:00			`- extract font`
			`- extract tables`
			`- advanced font information`
			`- extract forms`
Update README.md 2019-07-23 08:55:39 +02:00			`- extract drawings`