Extract text from PDF


#1

The best tool there is GhostScript

gs -sDEVICE=txtwrite -o output.txt file.pdf

That is simple indeed although it might not handle unicode characters well.

Apparently there are some PDFs that GhostScript cannot decode correctly, then you shoule use pdftotext from Xpdf tools. This thing is easy to use really.

pdftotext file.pdf

Done :slight_smile: