Converting pdf to text

Ask a Question related to PERL Miscellaneous, Design and Development.

  1. #1

    Default Converting pdf to text

    Hello all,

    Problem:

    Need to extract text information from a pdf file , write the text
    to a file for a hardware project .
    The text is contained in a table and has the width and height
    information of different layers for a chip
    The widthe and height information would be used to create test layouts
    for different layers using Cadence SKILL.


    OS: Hp-UX

    Other tools used: Cadence SKILL



    I wanted to do this initial pdf parsing in Perl because:

    - it comes with the OS
    - No point in writing the pdf parsing tool (which wld be an independen
    project then)
    - someone must have experienced the parsing proble before

    I hope Im clear so far


    Searching:

    I tried module search on search.cpan.org but as far I have seen, I
    dint notice any that extracts the text information from a pdf file.


    I also tried seaarching on google but there seems to be pdf2text for
    Linux



    Solutions:

    - I would appreciate if someone could point me to a module/script
    that converts pdf 2 text

    - any other suggestions in tackling the problem welcome



    Many thanks
    CM
    Chandramohan Neelakantan Guest

  2. Similar Questions and Discussions

    1. Converting Text to an Image
      How do you convert actual text into an image or symbol? I know i had seen it in there somewhere but cant remember where!
    2. converting text to paths and more
      I am new to Freehand and working on a detailed/layered text file. In Illustrator I know you can click on select all and change everything to...
    3. converting text into path
      how do i convert chinese text into path? as soon as I convert them into paths they become an error text. any idea how i can solve this problem? ...
    4. Content from a memo field: converting the rich text into plain text
      Hi folks, I have an Access 2000 db with a memo field. Into the memo field I put text with bold attributes, URL etc etc What I need to to is...
    5. converting text to button
      I know how to convert text to a button. However when I do just the thin lines in the text are active, the background is not. How do you make the...
  3. #2

    Default Re: Converting pdf to text

    On 10 Sep 2003, Chandramohan Neelakantan <knchandramohan@yahoo.com> wrote:
    > Hello all,
    >
    > Problem:
    >
    > Need to extract text information from a pdf file , write the text
    > to a file for a hardware project .
    > The text is contained in a table and has the width and height
    > information of different layers for a chip
    > The widthe and height information would be used to create test layouts
    > for different layers using Cadence SKILL.
    >
    >
    > OS: Hp-UX
    >
    > Other tools used: Cadence SKILL
    >
    >
    >
    > I wanted to do this initial pdf parsing in Perl because:
    >
    > - it comes with the OS
    > - No point in writing the pdf parsing tool (which wld be an independen
    > project then)
    > - someone must have experienced the parsing proble before
    >
    > I hope Im clear so far
    >
    >
    > Searching:
    >
    > I tried module search on search.cpan.org but as far I have seen, I
    > dint notice any that extracts the text information from a pdf file.
    >
    >
    > I also tried seaarching on google but there seems to be pdf2text for
    > Linux
    My system calls it pdf2ascii, which is one of the utilities included with
    ghostscript (PostScript and PDF language interpreter and previewer). You
    might see if 'gs' is either on your system or if ghostscript could be
    compiled for HP-UX. See if 'apropos pdf' (or ghostscript) turns up
    anything.

    Whether that would work depends whether the pdf was created from a text
    based source. If the text is in an image (scanned, etc.) you would need
    some sort of OCR software to interpret the graphical text.

    --
    David Efflandt - All spam ignored [url]http://www.de-srv.com/[/url]
    [url]http://www.autox.chicago.il.us/[/url] [url]http://www.berniesfloral.net/[/url]
    [url]http://cgi-help.virtualave.net/[/url] [url]http://hammer.prohosting.com/~cgi-wiz/[/url]
    David Efflandt Guest

  4. #3

    Default Re: Converting pdf to text

    Chandramohan Neelakantan <knchandramohan@yahoo.com> wrote:
    > Hello all,
    >
    > Need to extract text information from a pdf file , write the text
    > to a file for a hardware project .
    You could try using the command line utility pdftotext from the xpdf
    distribution. I've got better experience with that tool than with using
    pdf2ascii (comes with ghostscript).

    Just my two cents,
    --
    Vlad
    Vlad Tepes Guest

  5. #4

    Default Re: Converting pdf to text

    Many thanks for the tips.


    -CM



    Vlad Tepes <minceme@start.no> wrote in message news:<bjokeh$fat$1@troll.powertech.no>...
    > Chandramohan Neelakantan <knchandramohan@yahoo.com> wrote:
    >
    > > Hello all,
    > >
    > > Need to extract text information from a pdf file , write the text
    > > to a file for a hardware project .
    >
    > You could try using the command line utility pdftotext from the xpdf
    > distribution. I've got better experience with that tool than with using
    > pdf2ascii (comes with ghostscript).
    >
    > Just my two cents,
    Chandramohan Neelakantan Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139