OCR (Optical Character Recognition) allows you to convert an image into text. This page lists how to do OCR on linux.
- Convert a PDF to XML - Using pdftohtml it's possible to convert a PDF file to an XML file that includes all location information
- How to extract one page of a PDF as an image