Professional Web Applications Themes

Convert PDF to text - Mac Applications & Software

Is there any way to extract the text from a PDF file? -- Dennis M. Marks Do not reply with e-mail to yahoo. I do not monitor mailbox. It is for collecting spam. You can use the following address (rot 13) arg -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =-----...

  1. #1

    Default Convert PDF to text

    Is there any way to extract the text from a PDF file?

    --
    Dennis M. Marks
    Do not reply with e-mail to yahoo. I do not monitor mailbox. It is for
    collecting spam.
    You can use the following address (rot 13) arg


    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Dennis Guest

  2. #2

    Default Re: Convert PDF to text

    On Sat, 11 Oct 2003 17:35:40 -0700,
    Dennis M. Marks (com) wrote: 

    I'm sure there are plenty of ways. Here's one that works. Download
    the code to Xpdf (ftp://ftp.foolabs.com/pub/xpdf/xpdf-2.03.tar.gz)

    Compile and install it. One of the tools that it installs is `pdftotext`.
    This extracts the text out of a PDF file. There are other useful tools
    there as well -- pdftops, pdfimages, pdffonts, and pdfinfo.

    They work fine under OS X.

    Bev
    --
    Bev A. Kupf
    "The lyfe so short, the craft so long to lerne" -- Chaucer
    Sus - Division One Champions 2003!
    Bev Guest

  3. #3

    Default Re: Convert PDF to text

    In article <111020031735405950%com>,
    "Dennis M. Marks" <com> wrote:
     

    Open it in Adobe Acrobat. Select All. Copy. Switch to your word
    processor and paste into a doent.

    If you only want part of the text, click on the Text Select Tool (the
    "T" button next to the magnifying glass tool) and use it to select the
    text you want to copy.

    To copy graphics, use the Graphics Select Tool which is next to the Text
    Select Tool. (It can also copy a graphical image of the text.)
    Wayne Guest

  4. #4

    Default Re: Convert PDF to text

    In article <net>, Bev A. Kupf
    <net> wrote:
     
    >
    > I'm sure there are plenty of ways. Here's one that works. Download
    > the code to Xpdf (ftp://ftp.foolabs.com/pub/xpdf/xpdf-2.03.tar.gz)
    >
    > Compile and install it. One of the tools that it installs is `pdftotext`.
    > This extracts the text out of a PDF file. There are other useful tools
    > there as well -- pdftops, pdfimages, pdffonts, and pdfinfo.
    >
    > They work fine under OS X.
    >
    > Bev[/ref]

    I don't have OS X and I have no way to compile.
    Thank you anyway.

    --
    Dennis M. Marks
    Do not reply with e-mail to yahoo. I do not monitor mailbox. It is for
    collecting spam.
    You can use the following address (rot 13) arg


    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Dennis Guest

  5. #5

    Default Re: Convert PDF to text

    In article
    <wp.shawcable.net>, Wayne
    C. Morris <is.invalid> wrote:
     
    >
    > Open it in Adobe Acrobat. Select All. Copy. Switch to your word
    > processor and paste into a doent.
    >
    > If you only want part of the text, click on the Text Select Tool (the
    > "T" button next to the magnifying glass tool) and use it to select the
    > text you want to copy.
    >
    > To copy graphics, use the Graphics Select Tool which is next to the Text
    > Select Tool. (It can also copy a graphical image of the text.)[/ref]

    Thank you. I tried it without the text tool and it didn't work. Problem
    solved.

    --
    Dennis M. Marks
    Do not reply with e-mail to yahoo. I do not monitor mailbox. It is for
    collecting spam.
    You can use the following address (rot 13) arg


    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Dennis Guest

  6. #6

    Default Re: Convert PDF to text

    On Sun, 12 Oct 2003, Dennis M. Marks wrote: 
    > >
    > > Open it in Adobe Acrobat. Select All. Copy. Switch to your word
    > > processor and paste into a doent.
    > >
    > > If you only want part of the text, click on the Text Select Tool
    > > (the "T" button next to the magnifying glass tool) and use it to
    > > select the text you want to copy.
    > >
    > > To copy graphics, use the Graphics Select Tool which is next to the
    > > Text Select Tool. (It can also copy a graphical image of the text.)[/ref]
    >
    > Thank you. I tried it without the text tool and it didn't
    > work. Problem solved.[/ref]

    The Plugin I use in my browser has a "save as text" option. Pdf Viewer
    -- at least on OS X.

    joe
    Joe Guest

  7. #7

    Default Re: Convert PDF to text

    In article <121020030822129547%com>,
    "Dennis M. Marks" <com> wrote:
     [/ref]

    I only have Acrobat Reader. It has those buttons and they work for
    some PDFs, but I find there are quite a few where it's impossible to
    select text and copy it in to a word processing program. Preview doesn't
    give you any options for selecting and saving text. Is there some other
    way to do this?
    I often like to convert PDFs to word processing so I can use
    highlighting marks. It's really a pain not to be able to do this.

    --
    d49ot
    d49ot Guest

  8. #8

    Default Re: Convert PDF to text

    In article <client.attbi.com>,
    Joe Davison <net> wrote:
     

    Which plugin and which browser? I'm looking for something that will
    do this in Safari.
    I looked up PDF Viewer, but it appears to be a stand-alone app and
    I couldn't see anything about converting pdf files to text in its
    feature list.

    --
    d49ot
    d49ot Guest

  9. #9

    Default Re: Convert PDF to text

    d49ot <invalid> wrote:
     [/ref]
    >
    > I only have Acrobat Reader. It has those buttons and they work for
    > some PDFs, but I find there are quite a few where it's impossible to
    > select text and copy it in to a word processing program. Preview doesn't
    > give you any options for selecting and saving text. Is there some other
    > way to do this?
    > I often like to convert PDFs to word processing so I can use
    > highlighting marks. It's really a pain not to be able to do this.[/ref]

    It all depends on the PDF. In some PDF's you may SEE text, but in
    reality it's just pixels. Text on a screenshot for example. If there
    isn't any real text in the PDF, you will need an OCR package to convert
    it back into real text again.


    --
    Johan W. Elzenga johan<<at>>johanfoto.nl
    Editor / Photographer http://www.johanfoto.nl/
    Johan Guest

  10. #10

    Default Re: Convert PDF to text

    On Mon, 13 Oct 2003 23:53:04 -0400, d49ot wrote:

     

    My guess is your problem is that the PDF files you're trying to get text
    from our images of text doents. If that is indeed the case, the only
    way to get the text out of them is to use an OCR program like Omnipage.

    If it is not an image of text, you should be able to select in Acrobat
    Reader, or convert it with a command-line utility such as ps2ascii.

    GSN
    Greg Guest

  11. #11

    Default Re: Convert PDF to text

    In article <1g2teqo.1w6371a1578ku0N%invalid>,
    invalid (Johan W. Elzenga) wrote:
     
    >
    > It all depends on the PDF. In some PDF's you may SEE text, but in
    > reality it's just pixels. Text on a screenshot for example. If there
    > isn't any real text in the PDF, you will need an OCR package to convert
    > it back into real text again.[/ref]

    Yeah, that's what I've been doing, but I was hoping there was a
    more efficient solution. The problem is that my OCR program requires
    images with a resolution of at least 200 dpi and most PDFs are below
    this. So I have to use an image program to convert them, one page at a
    time, to 200 dpi. The quality is usually not that great, so I also have
    to do a lot of massaging of the text in the OCR program. It's tedious
    and time-consuming so I only do it when absolutely necessary.

    --
    d49ot
    d49ot Guest

  12. #12

    Default Re: Convert PDF to text

    In article <newsguy.com>,
    d49ot <invalid> wrote:
     
    >
    > Which plugin and which browser? I'm looking for something that will
    > do this in Safari.
    > I looked up PDF Viewer, but it appears to be a stand-alone app and
    > I couldn't see anything about converting pdf files to text in its
    > feature list.[/ref]

    see <http://www.schubert-it.com/> for the PDF browser plugin. Free and works wonderfully

    --
    James Meiss
    <http://amath.colorado.edu/faculty/jdm>
    James Guest

  13. #13

    Default Re: Convert PDF to text

    In article <colorado.edu>,
    James Meiss <invalid> wrote:
     
    >
    > see <http://www.schubert-it.com/> for the PDF browser plugin. Free and works
    > wonderfully[/ref]

    I just installed that in Safari and it does show PDFs in the
    browser window, but I don't see any "save as text" option. When I right-
    click to get the "Save As" command, it only gives me the option to save
    as pdf.

    --
    d49ot
    d49ot Guest

  14. #14

    Default Re: Convert PDF to text

    I believe the PDF author can intentionally make it copy-protected,
    right? Maybe that is your problem.
    Philo Guest

  15. #15

    Default Re: Convert PDF to text

    On Mon, 13 Oct 2003, invalid wrote: [/ref]
    >
    > I only have Acrobat Reader. It has those buttons and they work
    > for some PDFs, but I find there are quite a few where it's
    > impossible to select text and copy it in to a word processing
    > program. Preview doesn't give you any options for selecting and
    > saving text. Is there some other way to do this? I often like
    > to convert PDFs to word processing so I can use highlighting
    > marks. It's really a pain not to be able to do this.[/ref]

    I guess you're right -- looking more closely, I see the "save as text"
    option is actually in Adobe Reader 6.0 (recent upgrade), not in the
    browser plugin.

    joe
    Joe Guest

  16. #16

    Default Re: Convert PDF to text

    In article <com>,
    "Greg Nyquist" <com> wrote:
     
    >
    > My guess is your problem is that the PDF files you're trying to get text
    > from our images of text doents. If that is indeed the case, the only
    > way to get the text out of them is to use an OCR program like Omnipage.
    >
    > If it is not an image of text, you should be able to select in Acrobat
    > Reader,[/ref]

    I'm no authority, but as I understand it, when a PDF is created, it's
    possible to set flags to allow text copying, or forbid it. Some PDFs are
    truly "read only".
     

    I think that a PDF is kind of "encrypted postScript". In general,
    looking inside a PDF file reveals no readable text for a converter to
    convert. It takes Acrobat Reader to decrypt it.

    Isaac
    Isaac Guest

  17. #17

    Default Re: Convert PDF to text

    In article <attbi.com>,
    Isaac Wingfield <com> wrote:
     
    >
    > I'm no authority, but as I understand it, when a PDF is created, it's
    > possible to set flags to allow text copying, or forbid it. Some PDFs are
    > truly "read only".[/ref]

    And, unfortunately, I'm finding that more and more of them are like
    that. I wonder if that option is the default because I see a lot of
    doents where it doesn't really seem necessary -- e.g. press releases
    and other doents intended for wide public dissemination.
    I wish there was a way to highlight text within a PDF doent,
    which would be a lot less troublesome than trying to extract the text
    into word processing.

    --
    d49ot
    d49ot Guest

  18. #18

    Default Re: Convert PDF to text

    In article <newsguy.com>, d49ot
    <invalid> wrote:
     
    > >
    > > I'm no authority, but as I understand it, when a PDF is created, it's
    > > possible to set flags to allow text copying, or forbid it. Some PDFs are
    > > truly "read only".[/ref]
    >
    > And, unfortunately, I'm finding that more and more of them are like
    > that. I wonder if that option is the default because I see a lot of
    > doents where it doesn't really seem necessary -- e.g. press releases
    > and other doents intended for wide public dissemination.
    > I wish there was a way to highlight text within a PDF doent,
    > which would be a lot less troublesome than trying to extract the text
    > into word processing.[/ref]

    Using GuaPDF (PC or Linux... only but if you have VirtualPC it's easy
    to run it) you can strip out the PDF flags...
    another version can even break the encryption ....
    erwan Guest

Similar Threads

  1. How do I convert a pdf to excel or to text
    By Sarah_Petterson@adobeforums.com in forum Adobe Acrobat Windows
    Replies: 3
    Last Post: May 14th, 03:42 AM
  2. convert text to text field
    By Arch@ngel webforumsuser@macromedia.com in forum Macromedia Director Basics
    Replies: 2
    Last Post: November 11th, 02:22 PM
  3. rasterize/path text/convert text to art
    By Laura in forum Web Design
    Replies: 0
    Last Post: September 22nd, 09:03 PM
  4. How to convert text to an image
    By Kokila in forum Adobe Photoshop 7, CS, CS2 & CS3
    Replies: 2
    Last Post: September 19th, 11:43 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139