Professional Web Applications Themes

Extract Character By Character in PDF - Adobe Acrobat SDK

Hi I have requirement to read the content of the PDF file. I achieved it using Adobe SDK and VB.NET. I used the GetJsObject, getPageNthWord and getPageNthWordQuads to read the content of the PDF file. But now the issue that i face is, the "getPageNthWord" method doesnt return the text properly. So i looked for an option to read character by character. Is there any way that i can read a PDF file character by character and find their font style, font size, character quads, etc ? If so, can any one give me an idea as how to proceed ...

  1. #1

    Default Extract Character By Character in PDF

    Hi

    I have requirement to read the content of the PDF file. I achieved it using Adobe SDK and VB.NET. I used the GetJsObject, getPageNthWord and getPageNthWordQuads to read the content of the PDF file.

    But now the issue that i face is, the "getPageNthWord" method doesnt return the text properly. So i looked for an option to read character by character.

    Is there any way that i can read a PDF file character by character and find their font style, font size, character quads, etc ?

    If so, can any one give me an idea as how to proceed on that. It would be great if i can do it without going in for a Plug-In.

    Many thanks in advance.

    -- Krish
    s.r.krish@adobeforums.com Guest

  2. #2

    Default Re: Extract Character By Character in PDF



    If so, can any one give me an idea as how to proceed on that. It would
    be great if i can do it without going in for a Plug-In.




    With a plug-in you can receive the font information.
    Bernd Alheit Guest

  3. #3

    Default Re: Extract Character By Character in PDF

    Hi Bernd

    Thanks for your reply. I tried creating a basic plug-in using C++. But am not sucessful as well am not that familiar with C++. So i shifted myself to VB.NET in which am very comfortable.

    If plug-in is the only way. Can you please guide me as how to start of with and proceed.
    s.r.krish@adobeforums.com Guest

  4. #4

    Default Re: Extract Character By Character in PDF

    The plugin APIs are the only ones that expose that level (eg. font & size)...

    You could, of course, build a basic plugin that just does the minimal necessary to get the info you need - and then sends that info over to a VB app where you process it.
    Leonard_Rosenthol@adobeforums.com Guest

  5. #5

    Default Re: Extract Character By Character in PDF

    Hi Leonard,

    Thanks for you reply. From your replies and from many forums, it seems that the only way is a PLUG-IN.

    But now a new question arises from your guidance. How can i send the info from VC++ to VB.NET? you mean to say that
    1) create a plug - in, write necessary information to a text or some file and then process the file using VB.net
    or
    2) Call the VC++ app from VB.NET and get the informations that is req.? If so, how can we refer the VC++ binary in VB.Net? Cos i tried this first. But when i try to add the binary to vb.net application it throws an error saying that its not a valid binary.

    Please advise.
    s.r.krish@adobeforums.com Guest

  6. #6

    Default Re: Extract Character By Character in PDF

    you could do either 1 or 2. For 2, look at standard IAC/IPC mechanisms such as COM...
    Leonard_Rosenthol@adobeforums.com Guest

  7. #7

    Default Re: Extract Character By Character in PDF

    Hi Leonard

    I tried opening a PDF using VC++, with the sample given in "plugin_apps_developer_guide.pdf". But it throws following error to me.

    Error :
    1) AVDoc (ASPathName,ASFileSys,char *)' : cannot convert parameter 3 from 'ASText' to 'char *.
    2) AVDoc (ASPathName,ASFileSys,char *)' : cannot convert parameter 1 from 'ASFileSys' to 'ASPathName'.

    I have used the same code given in the guide. Donno why this is happening.

    Can you please help me out on this ?

    Thanks in advance.

    -- Krish
    s.r.krish@adobeforums.com Guest

  8. #8

    Default Re: Extract Character By Character in PDF

    This is the code i use...

    const char* myPath = "C:\\PurchaseOrder.pdf";
    ASAtom pathType = ASAtomFromString("Cstring");
    //Create an ASText object
    ASText titleText = ASTextNew();
    ASTextSetPDText(titleText, "This PDF was opened by using the Acrobat SDK");
    //Create an ASPathName object
    ASFileSys fileSys = ASGetDefaultFileSysForPath(pathType, myPath);
    ASPathName pathName = ASFileSysCreatePathName(fileSys, pathType, myPath, NULL);
    //Open the PDF file
    AVDoc myDoc = AVDocOpenFromFile(pathName, fileSys, titleText);
    //Do some clean up
    ASFileSysReleasePath(fileSys, pathName);
    ASTextDestroy(titleText);
    s.r.krish@adobeforums.com Guest

  9. #9

    Default Re: Extract Character By Character in PDF

    What version of the SDK are you using? It almost seems like you are mixing older headers with newer code...
    Leonard_Rosenthol@adobeforums.com Guest

  10. #10

    Default Re: Extract Character By Character in PDF

    Am using the latest SDK "sdk9_v1_win".
    s.r.krish@adobeforums.com Guest

  11. #11

    Default Re: Extract Character By Character in PDF

    Hi

    I created a plug-in, which creates a menu item under "Window" menubar which iterates thro every word that is present in the PDF doent.

    I used VC++ and developed this api. Its compiles fine with out any error. I copied this file into acrobat plug in folder and while opening the acrobat professional it throws the following error.

    "There was an error while loading the plug-in 'DisplayWords.api'. The Plug-in is incompatible with this version of Acrobat."

    Am using Acrobat Professional 7 on my machine.

    Can any one help me out on this ?

    -- Krish
    s.r.krish@adobeforums.com Guest

  12. #12

    Default Re: Extract Character By Character in PDF

    You can't mix and match versions of the SDK with versions of Acrobat. If you are using the 9 version of the SDK, you have to use the 9 verson of Acrobat. If you only have Acrobat 7, you need the 7 version of the SDK (which is no longer available since we no longer support development around Acrobat 7).
    Leonard_Rosenthol@adobeforums.com Guest

  13. #13

    Default Re: Extract Character By Character in PDF

    Hi Leonard,

    Thanks for all your reply which made me to create a basic plug in to get all the words from a PDF file.

    But now am stuck in getting the font information such as size, width, height of the found word/character.

    Can you please provide me or guide me with some samples so that i can try it on my own?

    Any link or reference file for this will do.

    Thanks

    -- Krish
    s.r.krish@adobeforums.com Guest

  14. #14

    Default Re: Extract Character By Character in PDF

    For getting styling, look at the PDEdit APIs.
    Leonard_Rosenthol@adobeforums.com Guest

  15. #15

    Default Re: Extract Character By Character in PDF

    Hi Krish,

    I am having same requirement , if you are able to extract each word font style and font size etc.,, can you please guide me how to do.

    Thank
    Kiranmai
    kiranmai is offline Junior Member
    Join Date
    Mar 2013
    Location
    Hyderabad
    Posts
    1

Similar Threads

  1. InDesign ME Character Problem! Character-Change by Printing or saving *.PS!
    By gree@adobeforums.com in forum Adobe Indesign Windows
    Replies: 48
    Last Post: May 25th, 02:07 PM
  2. Removing a character
    By Moggie webforumsuser@macromedia.com in forum Macromedia Director Basics
    Replies: 3
    Last Post: November 11th, 01:04 PM
  3. RETURN character
    By Treylok in forum Macromedia Director Lingo
    Replies: 4
    Last Post: August 28th, 03:29 PM
  4. Character conversion
    By J. Ptak in forum ASP.NET General
    Replies: 1
    Last Post: July 9th, 11:14 PM
  5. Server character set
    By Tanel Poder in forum Oracle Server
    Replies: 2
    Last Post: June 25th, 01:18 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139