Convert paper documents in PDF
Goal: Store the scanned paper documents into a PDF electronic format, avec a reduced filesize (few Kb) and a printable resolution (300dpi). Method applicable under Linux, MacOSX and Windows.

Examples: Hand-written or typed'n'printed notes, postcards, administratives documents, etc...

1/ Building a PDF - Scanned documents processing

- Scans in 300DPI resolution, in raw format as TIFF or BMP without any compression
- Open the file with free software THE GIMP

If it is a photo (gradients and complex colors):
- Save in JPG format compressed at 15%
- If the file is too big (Kb), reduce the image size (pixels)
- Therefore adjust the resolution (dpi) in order to keep the same metric size (mm)
- Save in JPG format compressed at 15%

If it is a document (simple colors and lines):
- Filter noise and colormap with "Layer/Colors/Levels"
- Filter single pixels with "Filters/Enhance/NL Filter"
- Filter again noise and colormap with "Layer/Colors/Levels"
- Convert into Indexed-Colors (2 to 32) or in Black-and-White with "Image/Mode"
- Save in PNG format compressed at 9 (without loss)

2/ Building a PDF - Exporting the image into PDF format

- Linux Command: sam2p -m:dpi:17.27 (72*72/17.27) thus 209.9x297
- or Adobe Acrobat Reader or other sharewares (Windows)

Test of different PNG/JPG-to-PDF export procedures
- Ghostscript crashes (a bit tricky to use?)
- Gimp/GS export PS : JPG compression and A4 format unavoidable
- Inkscape produces very big PDF files
- OpenOfficeDraw: problem with indexed colors
- a2ping compresses in JPG before calling sam2p
- convert keeps image size/resolution but produces bigger files
- sam2p -m:dpi:17.27 produces 300dpi, with the original filesize

3/ Building a PDF - Assembling pages

- Linux Command: pdftk file1.pdf file2.pdf cat output file.pdf
- or "pdfjoin" from package "pdfjam" dependant on package "pdflatex" (Linux)
- or Adobe Acrobat Reader or other sharewares (Windows)
To print several pages on a single sheet:
- Rotate and concat 2 landscape pages: pdftk file.pdf cat 1S 2S output file-rotated.pdf
- Merge the 2 pages into a single sheet: pdfnup --landscape --a4paper --suffix 2up file-rotated.pdf
- Rotate the sheet back to portrait page: pdftk file-rotated-2up.pdf cat 1E output file-rotated-2up-derotated.pdf

4/ Unbuilding a PDF - Disassembling pages

- Linux Command: pdftk file.pdf cat 2 output file2.pdf
- or Adobe Acrobat Reader or other sharewares (Windows)

5/ Unbuilding a PDF - Extracting pictures

- "pdfimages" from package "xpdf-utils" (Linux), or Gimp/GS
- or Acrobat Reader or Photoshop or other sharewares (Windows)

6/ Editing a pdf

- If the PDF is locked, the password may be retreived using the package "pdfcrack" (Linux), or the file may be converted in raw Postscript
- "pdfedit" can be used to edit the PDF file, but edition worked to slow in my case
- "flpsed" can be used to edit PS files, it's fast and re-editable, but then PDF files have to be converted back and forth to PS format for edition:
pdftops file.pdf     then edition, and:     ps2pdf file.ps

7/ Information related to A4 paper size

- Metric Size: 210 x 297 mm
- Digital Size: 2480 x 3508 pixels in 300dpi
 [Menu]
Last Update
02/10/2023
46106 visitors
850589 robots
since 01/01/2003
Page generated
in 1.23 seconds
💗 2003-2024 by S. MARLIERE. Copying is an act of love. Love is not subject to law. Please copy and share.