Extracting text from pdf

Ask a Question related to PHP Development, Design and Development.

  1. #1

    Default Extracting text from pdf

    Hi,

    I have to index the text of a pdf document.

    Does any of you know of a PHP script/extension or a binary that is able
    to extract the text ?

    The pdf extension mentioned in the php.net docs seem to indicate that
    it's for _creation_ of documents only, is that so? Same with all the
    PHP classes i have found.

    Regards,
    Johnny

    --
    Never express yourself more clearly than you are able to think.
    - Niels Bohr
    JustinCase Guest

  2. Similar Questions and Discussions

    1. Help with extracting text
      Hi everyone, How can I extract the text before the first occurrence of dot (.) or single space from the first field. This is my file LB1571...
    2. : Help with extracting text
      Hi Zary, I have attached a sample file with the data you offered and a perl script which can be copied and pasted into the command line on win32....
    3. Help with extracting text file
      Hi everyone, I have a file with data similar to this ........... Exxxxx|FExxxxx|NQxxxxxx|OUxxxxxx|GExxxxxx|OVxxxxxxx|IQxxxxxxxx|ORxxxx...
    4. Extracting the text in a text sprite.
      Hi, Is there a way to extract the text from a text sprite and place the text into a global variable? Thanks, Stef
    5. extracting text
      I have a HTML table which I would like to extract text inside a <TD>. For an example <TD class=12>Some text</TD> I can write a code that detects...
  3. #2

    Default Re: Extracting text from pdf

    *** JustinCase wrote/escribió (25 Oct 2004 16:09:36 GMT): 

    There's a Unix program that might help you: ps2ascii

    --
    -- Álvaro G. Vicario - Burgos, Spain
    -- Thank you for not e-mailing me your questions
    --
    Alvaro Guest

  4. #3

    Default Re: Extracting text from pdf

    On 25-10-2004 Alvaro G Vicario wrote:
     
    >
    >There's a Unix program that might help you: ps2ascii[/ref]

    Thanks for the pointer,
    I'll have a look

    /Johnny

    --
    He's turned his life around. He used to be depressed and miserable. Now
    he's miserable and depressed.
    - David Frost
    JustinCase Guest

  5. #4

    Default Re: Extracting text from pdf

    On 25-10-2004 Alvaro G Vicario wrote:
     
    >
    >There's a Unix program that might help you: ps2ascii[/ref]

    Does anyone know of any other tool for PDF text extraction ?


    ps2ascii cannot seem to parse all of the pdf file. I tried the pstotext
    tool to, but with same result.
    I figured that it has something to do with my ghostscript version being
    too old (7.05, newest is 8.14).

    Unfortunally I have no experience in installing/upgrading unix stuff
    (having spend half an evening trying in vain and confusion).



    Regards,
    Johnny

    --
    In the beginning the Universe was created. This has made a lot of
    people very angry and been widely regarded as a bad move.
    - Douglas Adams
    JustinCase Guest

  6. #5

    Default Re: Extracting text from pdf


     

    Adobe's website will convert a pdf file that is on a website to html. Try
    http://www.adobe.com/products/acrobat/access_onlinetools.html


    Kurt Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139