Ask a Question related to Adobe Acrobat SDK, Design and Development.
-
s.r.krish@adobeforums.com #1
Extract Character By Character in PDF
Hi
I have requirement to read the content of the PDF file. I achieved it using Adobe SDK and VB.NET. I used the GetJsObject, getPageNthWord and getPageNthWordQuads to read the content of the PDF file.
But now the issue that i face is, the "getPageNthWord" method doesnt return the text properly. So i looked for an option to read character by character.
Is there any way that i can read a PDF file character by character and find their font style, font size, character quads, etc ?
If so, can any one give me an idea as how to proceed on that. It would be great if i can do it without going in for a Plug-In.
Many thanks in advance.
-- Krish
s.r.krish@adobeforums.com Guest
-
Character Encoding
When attempting to retrieve a Greek RSS feed via mx:HTTPService the characters return in an unreadable state. The following code easily replicates... -
how to set character set to UTF-8 of <mx:HttpService>
hi, who know how to set charset=UTF-8 of Http Service. -
InDesign ME Character Problem! Character-Change by Printing or saving *.PS!
Hi everybody! I have some problems with ME Version. When i want to print a page with FARSI-Text in it, he changes one character! on screen he... -
new line character?
I need to load text dynamically into a text box in director that I have turned into a 'select list' or 'popup' text field. How should I format the... -
Server character set
Hi! I'd rather use: select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'; Tanel. "Abhijith"... -
Bernd Alheit #2
Re: Extract Character By Character in PDF
If so, can any one give me an idea as how to proceed on that. It would
be great if i can do it without going in for a Plug-In.
With a plug-in you can receive the font information.
Bernd Alheit Guest
-
s.r.krish@adobeforums.com #3
Re: Extract Character By Character in PDF
Hi Bernd
Thanks for your reply. I tried creating a basic plug-in using C++. But am not sucessful as well am not that familiar with C++. So i shifted myself to VB.NET in which am very comfortable.
If plug-in is the only way. Can you please guide me as how to start of with and proceed.
s.r.krish@adobeforums.com Guest
-
Leonard_Rosenthol@adobeforums.com #4
Re: Extract Character By Character in PDF
The plugin APIs are the only ones that expose that level (eg. font & size)...
You could, of course, build a basic plugin that just does the minimal necessary to get the info you need - and then sends that info over to a VB app where you process it.
Leonard_Rosenthol@adobeforums.com Guest
-
s.r.krish@adobeforums.com #5
Re: Extract Character By Character in PDF
Hi Leonard,
Thanks for you reply. From your replies and from many forums, it seems that the only way is a PLUG-IN.
But now a new question arises from your guidance. How can i send the info from VC++ to VB.NET? you mean to say that
1) create a plug - in, write necessary information to a text or some file and then process the file using VB.net
or
2) Call the VC++ app from VB.NET and get the informations that is req.? If so, how can we refer the VC++ binary in VB.Net? Cos i tried this first. But when i try to add the binary to vb.net application it throws an error saying that its not a valid binary.
Please advise.
s.r.krish@adobeforums.com Guest
-
Leonard_Rosenthol@adobeforums.com #6
Re: Extract Character By Character in PDF
you could do either 1 or 2. For 2, look at standard IAC/IPC mechanisms such as COM...
Leonard_Rosenthol@adobeforums.com Guest
-
s.r.krish@adobeforums.com #7
Re: Extract Character By Character in PDF
Hi Leonard
I tried opening a PDF using VC++, with the sample given in "plugin_apps_developer_guide.pdf". But it throws following error to me.
Error :
1) AVDoc (ASPathName,ASFileSys,char *)' : cannot convert parameter 3 from 'ASText' to 'char *.
2) AVDoc (ASPathName,ASFileSys,char *)' : cannot convert parameter 1 from 'ASFileSys' to 'ASPathName'.
I have used the same code given in the guide. Donno why this is happening.
Can you please help me out on this ?
Thanks in advance.
-- Krish
s.r.krish@adobeforums.com Guest
-
s.r.krish@adobeforums.com #8
Re: Extract Character By Character in PDF
This is the code i use...
const char* myPath = "C:\\PurchaseOrder.pdf";
ASAtom pathType = ASAtomFromString("Cstring");
//Create an ASText object
ASText titleText = ASTextNew();
ASTextSetPDText(titleText, "This PDF was opened by using the Acrobat SDK");
//Create an ASPathName object
ASFileSys fileSys = ASGetDefaultFileSysForPath(pathType, myPath);
ASPathName pathName = ASFileSysCreatePathName(fileSys, pathType, myPath, NULL);
//Open the PDF file
AVDoc myDoc = AVDocOpenFromFile(pathName, fileSys, titleText);
//Do some clean up
ASFileSysReleasePath(fileSys, pathName);
ASTextDestroy(titleText);
s.r.krish@adobeforums.com Guest
-
Leonard_Rosenthol@adobeforums.com #9
Re: Extract Character By Character in PDF
What version of the SDK are you using? It almost seems like you are mixing older headers with newer code...
Leonard_Rosenthol@adobeforums.com Guest
-
s.r.krish@adobeforums.com #10
Re: Extract Character By Character in PDF
Am using the latest SDK "sdk9_v1_win".
s.r.krish@adobeforums.com Guest
-
s.r.krish@adobeforums.com #11
Re: Extract Character By Character in PDF
Hi
I created a plug-in, which creates a menu item under "Window" menubar which iterates thro every word that is present in the PDF document.
I used VC++ and developed this api. Its compiles fine with out any error. I copied this file into acrobat plug in folder and while opening the acrobat professional it throws the following error.
"There was an error while loading the plug-in 'DisplayWords.api'. The Plug-in is incompatible with this version of Acrobat."
Am using Acrobat Professional 7 on my machine.
Can any one help me out on this ?
-- Krish
s.r.krish@adobeforums.com Guest
-
Leonard_Rosenthol@adobeforums.com #12
Re: Extract Character By Character in PDF
You can't mix and match versions of the SDK with versions of Acrobat. If you are using the 9 version of the SDK, you have to use the 9 verson of Acrobat. If you only have Acrobat 7, you need the 7 version of the SDK (which is no longer available since we no longer support development around Acrobat 7).
Leonard_Rosenthol@adobeforums.com Guest
-
s.r.krish@adobeforums.com #13
Re: Extract Character By Character in PDF
Hi Leonard,
Thanks for all your reply which made me to create a basic plug in to get all the words from a PDF file.
But now am stuck in getting the font information such as size, width, height of the found word/character.
Can you please provide me or guide me with some samples so that i can try it on my own?
Any link or reference file for this will do.
Thanks
-- Krish
s.r.krish@adobeforums.com Guest
-
Leonard_Rosenthol@adobeforums.com #14
Re: Extract Character By Character in PDF
For getting styling, look at the PDEdit APIs.
Leonard_Rosenthol@adobeforums.com Guest



Reply With Quote

