How to grab text in PDF files

Question: I have a PDF document I have downloaded and I would like to save the information in another document. How can I do this without losing the file or document format? –J.B.

Answer: There’s a cheap way to do this and then there’s a more costly way to do this.

First a little explanation. A PDF document is created by Adobe Acrobat, a product that allows a document to be stored digitally without any changes to the layout.

Publishers love this tool because they can store their printed pages exactly as they appeared on the newsstands on a digital page in a small computer file.

Some of the text in PDF files is preserved by the program separately from the images, so it can be copied and pasted out of the PDF file into a word processor.

Here’s the cheap way of extracting text data from a PDF file:

  1. Open the PDF file with Adobe Acrobat Reader, a free program available for download from Adobe at http://www.adobe.com/products/acrobat/readstep.html. If you’ll be downloading the Acrobat Reader for the first time, or if you’re downloading a newer version than the one you had, be aware that you’ll want to read the page carefully before you begin the download because sometimes there are things you’re agreeing to receive that you may not realize. Look around for boxes that are checked and make sure you do want what they’re associated with; it you don’t want it, just uncheck those boxes before you download the Acrobat Reader.
  2. Open the PDF document on your computer and select the text tool (which looks like a T on the toolbar; in newer versions, it may look more like a capital I), and drag it over the text in the PDF document to highlight it, and then use the copy command from the Edit menu.
  3. Open the program you want to put the text into (for example, a word processor document). Go to that program’s Edit menu and select Paste.

If you want to preserve the layout and data (here comes the way that does more but costs money), and you plan to use the text in a Microsoft product, you might consider a program called BCL Drake. It is an application that automatically converts PDF documents into RTF documents — that’s Microsoft’s Rich Text Format. The resulting RTF page structure will match the page structure in the original PDF file.

There’s one catch: You won’t be able to use BCL Drake with just the Acrobat Reader. You’ll need a full functioning version of Adobe Acrobat installed on your machine, as the BCL software works with it. Unfortunately, full Adobe Acrobat isn’t cheap.

One more thing to be aware of: The product is designed for on Windows 95/98/NT, and won’t work on a Macintosh.

So it’s your call as to how you do it, but now you know you can!