This website requires JavaScript.
2e17443c5f
Version 0.1.1
master
0.1.1
Michal Szczepanski
2020-11-20 21:46:38 +0100
dfb6282d6c
Version 0.1.0
Michal Szczepanski
2020-11-20 21:42:20 +0100
34e04ae212
Update README.md with latest changes about password parameter
Michal Szczepanski
2020-11-20 21:41:37 +0100
edb7ddd318
Add ability to provide password as command line argument closes #16 - add catching errors and displaying formatted output ( optional stacktrace for debug) - bump pdfjs-dist and minimist to latest version - use 'pdfjs-dist/es5/build/pdf' for node v14.5.0
Michal Szczepanski
2020-11-20 21:39:09 +0100
4f2e28b557
Fix missing pdf font object toUnicode
Michal Szczepanski
2020-11-20 21:36:06 +0100
e0f7cedd76
Update README.md with link to difference between last release and master
Michal Szczepanski
2019-08-17 02:50:41 +0200
f1e6fbfcf7
Fix Extract text bounding box position
Michal Szczepanski
2019-08-17 02:45:47 +0200
3edb27c12a
Version 0.0.8
0.0.8
Michal Szczepanski
2019-08-10 06:24:05 +0200
b1ad478e85
Fix json formatting - ability to omit fonts
Michal Szczepanski
2019-08-10 06:21:45 +0200
45c70c7d7d
Update README.md
Michal Szczepanski
2019-08-05 23:22:14 +0200
0604c419e5
Add more information to FontObject - opentype.js font - loaded pdf font name - prepare for ocr of font data
Michal Szczepanski
2019-08-05 23:11:07 +0200
7df0d1543d
Change formatter argument from 'text' to 'txt'
Michal Szczepanski
2019-08-05 23:09:33 +0200
f3ab15d374
Disable console output for text formatter
Michal Szczepanski
2019-08-05 23:08:40 +0200
1c3c5461d5
Reformat command line utility and add version information
Michal Szczepanski
2019-07-29 00:02:03 +0200
89bc5a9657
Update package.json keywords to match github project
Michal Szczepanski
2019-07-28 22:12:54 +0200
eeef0bd126
Version 0.0.7
0.0.7
Michal Szczepanski
2019-07-28 22:08:45 +0200
2b1cf8c08f
Add html output to demo.sh
Michal Szczepanski
2019-07-28 22:08:22 +0200
753a8afdb5
Add font file information to formatters xml / json and more about font - optimise font extract - set unknown style to normal
Michal Szczepanski
2019-07-28 22:05:54 +0200
0395eb34e2
Remove some information about font
Michal Szczepanski
2019-07-28 21:49:22 +0200
1a6b16b53c
Format font from font cache for xml / json formatter closes #14
Michal Szczepanski
2019-07-28 21:43:53 +0200
6788ac6093
Add opentype.js and extract more font information, closes #5
Michal Szczepanski
2019-07-28 20:57:29 +0200
90e9e04153
Add title from metadata to FormatterHTML output
Michal Szczepanski
2019-07-28 19:41:49 +0200
2bf5a5eb54
dummy
Michal Szczepanski
2019-07-28 19:31:15 +0200
a85a55a6da
Update README.md
Michal Szczepanski
2019-07-28 19:26:29 +0200
09b80bd792
Merge branch 'master' of github.com:vane/pdf-gold-digger
Michal Szczepanski
2019-07-28 19:25:49 +0200
e4f44a5642
Add FormatterHTML for html output, closes #15
Michal Szczepanski
2019-07-28 19:25:37 +0200
1a4751caf1
Update README.md
Michal Szczepanski
2019-07-28 17:45:42 +0200
dedc3cb12e
Update LICENSE
Michal Szczepanski
2019-07-28 17:44:45 +0200
ea6a658d9c
Update README.md
Michal Szczepanski
2019-07-28 17:44:10 +0200
9b0574777f
Update demo.sh with xml output and extract font
Michal Szczepanski
2019-07-28 17:34:16 +0200
a115debd41
Update package.json keywords to match github project
Michal Szczepanski
2019-07-28 17:32:55 +0200
fd9058c099
Update README.md
Michal Szczepanski
2019-07-28 17:28:08 +0200
0519406a25
Add image data information to output xml, json formatters
Michal Szczepanski
2019-07-28 17:27:02 +0200
e8685c78af
Merge branch 'master' of github.com:vane/pdf-gold-digger
Michal Szczepanski
2019-07-28 16:53:26 +0200
ca3071c0a0
dummy
Michal Szczepanski
2019-07-28 16:53:20 +0200
af25b4ad9d
Update README.md
Michal Szczepanski
2019-07-28 16:52:28 +0200
9d4672ccb8
Update README.md
Michal Szczepanski
2019-07-28 16:46:31 +0200
2434562da0
Save font files as ttf option, closes #3
Michal Szczepanski
2019-07-28 16:44:00 +0200
f60eebb02d
Update README.md
Michal Szczepanski
2019-07-28 15:38:50 +0200
4fac64de45
Update README.md
Michal Szczepanski
2019-07-28 15:32:13 +0200
2b0c5350eb
Update README.md
Michal Szczepanski
2019-07-28 15:20:15 +0200
8ae1cee785
Version 0.0.6
0.0.6
Michal Szczepanski
2019-07-28 15:12:38 +0200
a28f9d997f
Add more text information, closes #8
Michal Szczepanski
2019-07-28 15:10:06 +0200
c74c037d2a
Add xml formatter for xml output closes #1
Michal Szczepanski
2019-07-28 14:57:11 +0200
255de1f3c2
Text information changes - remove width, x from textline - sort text lines before output - add text font x,y position to json formatter
Michal Szczepanski
2019-07-28 14:36:31 +0200
618500a269
Merge pull request #13 from vane/feature/bbox-calculate
Michal Szczepanski
2019-07-28 09:43:08 +0200
dea317eda8
Add eslint standard with small modifications - semi always - comma-dangle always-multiple
Michal Szczepanski
2019-07-28 09:40:39 +0200
4fd9b6024c
Fix json formatter / add width calculation
Michal Szczepanski
2019-07-27 10:09:37 +0200
5c83ef37d6
Version 0.0.5
0.0.5
Michal Szczepanski
2019-07-26 00:41:01 +0200
892a8c7bb1
Documentation update / move formatter, visitor to formatters, visitors
Michal Szczepanski
2019-07-26 00:38:51 +0200
d1400175a9
Merge pull request #12 from vane/feature/bbox
Michal Szczepanski
2019-07-26 00:06:04 +0200
14819dd714
Fix JSON formatter
Michal Szczepanski
2019-07-26 00:04:52 +0200
1aae2e7425
Remove not used Geometry classes
Michal Szczepanski
2019-07-26 00:04:40 +0200
c3dbfc0e8b
Heuristic method to determine if difference between two points is space
Michal Szczepanski
2019-07-26 00:02:05 +0200
877c14a6e1
Rewrite text extraction - calculate new line Closes #11
Michal Szczepanski
2019-07-25 23:00:04 +0200
f5441748bf
Add vertical for font object
Michal Szczepanski
2019-07-25 01:19:19 +0200
bac5ac0f48
Comment out newLine logic to move it to Extract
Michal Szczepanski
2019-07-25 01:08:35 +0200
ffc7a38175
More refactoring of moving data between constructors - move more data into PdfPage object
Michal Szczepanski
2019-07-25 00:49:23 +0200
477539c527
VisitorText add more text handling methods placeholder - remove stale debug constructor passing - add PdfPage object - change pdf page to pageData - add some more attributes to TextObject
Michal Szczepanski
2019-07-24 23:15:45 +0200
b75b5c73c6
Update README.md
Michal Szczepanski
2019-07-24 21:21:37 +0200
2211abb068
Rename page pdf to pageData and add PdfPage object
Michal Szczepanski
2019-07-24 20:58:01 +0200
38b46e0b94
Visitor refactoring, simplify code, add Geometry - create visitor on each page - pass dependencies and page in constructor - add Geometry for text position measurements
Michal Szczepanski
2019-07-24 18:35:47 +0200
20d839f447
Fix character spacing use -250 value for now (need to measure glyphs)
Michal Szczepanski
2019-07-23 19:45:33 +0200
ac8e1c9d01
Update README.md move documentation link
Michal Szczepanski
2019-07-23 08:59:59 +0200
4c969b79ae
Add example command to help
Michal Szczepanski
2019-07-23 08:58:34 +0200
9b314bda4a
Update README.md
Michal Szczepanski
2019-07-23 08:56:13 +0200
b58f364fa0
Update README.md
Michal Szczepanski
2019-07-23 08:55:39 +0200
57e594402e
Update README.md with new todo / done
Michal Szczepanski
2019-07-23 06:21:52 +0200
068f56db5c
Version 0.0.4
0.0.4
Michal Szczepanski
2019-07-23 06:18:00 +0200
c2ebe7f526
Add todo to Extract
Michal Szczepanski
2019-07-23 06:12:55 +0200
3810d2fcc0
Fix text extraction based on pdf.js samples
Michal Szczepanski
2019-07-23 06:04:55 +0200
263b318029
Update README.md TODO list
Michal Szczepanski
2019-07-23 05:47:58 +0200
30d673a455
Add missing OPS beginAnnotations, endAnnotations
Michal Szczepanski
2019-07-23 05:45:16 +0200
9c2baab2a6
Fix Unimplmemented operator message
Michal Szczepanski
2019-07-23 05:44:02 +0200
d0a2e44cdf
Update README.md with correct package url
Michal Szczepanski
2019-07-23 05:34:59 +0200
10443b009b
Update package.json repository url
0.0.3
Michal Szczepanski
2019-07-23 05:28:39 +0200
44e59dfe8f
Version 0.0.2
0.0.2
Michal Szczepanski
2019-07-23 05:16:41 +0200
c2f597d899
Update README.md with documentation location and package.json keywords
Michal Szczepanski
2019-07-23 05:16:28 +0200
2dee872d40
Change output directory structure / Closes #2
0.0.1
Michal Szczepanski
2019-07-23 05:00:53 +0200
cc2002078b
Extract images with VisitorImage #2 - output to file based on format data.{format} - more documentation
Michal Szczepanski
2019-07-23 04:53:17 +0200
1b6bfbbe13
Documentation FormatterJSON
Michal Szczepanski
2019-07-23 04:35:41 +0200
94c05cf064
Commandline change -i for file input -f for format
Michal Szczepanski
2019-07-23 04:05:48 +0200
63710c2f8b
Add demo.sh and test.sh for automating stuff
Michal Szczepanski
2019-07-23 01:59:55 +0200
23c4586d28
VisitorBase for common visitors constructor
Michal Szczepanski
2019-07-23 01:43:57 +0200
bf4698df59
dummy
Michal Szczepanski
2019-07-23 01:22:54 +0200
ab4e0ffdae
Update README.md
Michal Szczepanski
2019-07-23 01:22:16 +0200
81f4de1c29
Documentation generate scripts
Michal Szczepanski
2019-07-23 01:19:17 +0200
5cff7ee4c0
Fix after move lib to src
Michal Szczepanski
2019-07-23 01:05:11 +0200
5b2c8eded3
Move lib to src
Michal Szczepanski
2019-07-23 01:04:28 +0200
4a01a382cf
Documentation
Michal Szczepanski
2019-07-23 00:58:58 +0200
12536bbc21
Text elements move to separate files
Michal Szczepanski
2019-07-23 00:44:49 +0200
ac54beceba
Documentation for Visitor
Michal Szczepanski
2019-07-23 00:37:13 +0200
4aea804772
dummy GoldDigger comment
Michal Szczepanski
2019-07-23 00:34:20 +0200
263fe310f2
Update README.md with simplified usage information
Michal Szczepanski
2019-07-23 00:31:09 +0200
cefca38fa9
Fix missing pdfdig shebang
Michal Szczepanski
2019-07-23 00:26:04 +0200
b797a2f192
Fix package.json
Michal Szczepanski
2019-07-23 00:22:42 +0200
c141efb7a6
Formatter move each formatter to separate file
Michal Szczepanski
2019-07-23 00:17:48 +0200
b638f75bca
FormatterText return empty output as print is handled elsewhere
Michal Szczepanski
2019-07-23 00:14:17 +0200
a2b9bcb1d7
Fix Visitor dependency paths
Michal Szczepanski
2019-07-23 00:09:16 +0200
e3d25d2e77
Visitor classes move to separate files make universal visit method
Michal Szczepanski
2019-07-23 00:07:23 +0200