Ask a Question related to Adobe Acrobat SDK, Design and Development.
-
Zamdrist #1
Identifying image-only vs. image+text PDF files
I have a large collection of PDFs in a large directory/subdirectory structure, we are talking 366K PDFs.
I need to find some way of identifying which are image only from all others.
I've tried various things, I have both Windows & Linux at my disposal. I was trying to parse the PDFs source and search for "Font" Or "Annot" which I believe would identify something other than image-only types of PDFs. Baring that I'm hoping someone knows of a better way or if there is a tool out there already.
Any help would be greatly appreciated. Thanks.
Zamdrist Guest
-
Any way to programatically convert Image PDF files to OCR searchable PDF files?
Hi, I have Adobe Acrobat Reader 7.0 Standard and was wondering if via using Visual Basic there was any way to programatically convert Image PDF... -
Preserving links between Illustrator files and image files during system upgrade
Client is a design firm with thousands of Illustrator files spread across about five non-RAID external SCSI drives, most with partitions, attached to... -
Open Image in 'Kodak Image Edit Control' with web browser.
hi, 1.I want to show a image file of type '.tif' in the browser window; for that I'm writting as ASP code page. 2.This '.tif' type image can be... -
Resizing high res image smaller results in blurred image
Hi there, I have a high res logo in PSD format (around 1500px x 1500px) but when I resize it to around 300px x 300px the resulting image is not... -
Can I take a small (320x240), blurry image, and make it a clear, large image?
Just wondering if there is an easy way to do this? I'm sure it won't be perfect cause photoshop can only work with what's there, but maybe it can... -
Zamdrist #2
Re: Identifying image-only vs. image+text PDF files
Does anyone have any thoughts on this? The need is urgent and would great appreciate any feedback. Thank you.
Zamdrist Guest
-
Leonard_Rosenthol@adobeforums.com #3
Re: Identifying image-only vs. image+text PDF files
There are a number of 3rd party solutions that offer this functionality for both Windows and Linux. A web search should turn them up.
If you want to use Acrobat, you would need to write a custom plugin OR you could try to use either IAC or JavaScript to do text extraction (though that's not 100% reliable since you could have a map w/o text).
Leonard
Leonard_Rosenthol@adobeforums.com Guest
-
Zamdrist #4
Re: Identifying image-only vs. image+text PDF files
Thank you Leonard for your reply. Despite my reasonably good Googling skills, I've not been able to find anything in the sorts of a plug-in or third party tool.
Ironically I keep finding my own posted question, for which no one seems to have an answer for.
Keep in mind I need a tool which will search and return ALL PDFs in a large directory/sub-directory structure and their path names.
Not looking for a tool that will just let me know if the open PDF is image only or image+text, which is quite easy to ascertain.
Thank you.
Zamdrist Guest
-
Leonard_Rosenthol@adobeforums.com #5
Re: Identifying image-only vs. image+text PDF files
I don't now of a company with an "off the shelf" directory walker, etc. but a variety of companies have the necessary component to do the checking on a single PDF (which you could then connect up to the folder walker).
Check with companies such as Apago ([url]http://www.apago.com[/url]), Traction Software ([url]http://www.traction-software.co.uk/[/url]) and Glyph and Cog ([url]http://www.glyphandcog.com[/url]).
Leonard
Leonard_Rosenthol@adobeforums.com Guest
-
Carsten Hammer #6
Re: Identifying image-only vs. image+text PDF files
You could have a look at the adobe ifilter plugin for windows. I allows
to index pdfs. The windows platform sdk contains example binaries how to
use this windows api. It is very easy to examine pages of pdf files
using the samples in the sdk. I guess a pdf page containing only images
gives back no text.
Something like
filtdump testfile.pdf
should extract text after you installed the ifilter plugin. filtdump is
part of the platform sdk.
[url]http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611[/url]
[url]http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixrefint_9sfm.asp[/url]
Zamdrist schrieb:> Thank you Leonard for your reply. Despite my reasonably good Googling skills, I've not been able to find anything in the sorts of a plug-in or third party tool.
>
> Ironically I keep finding my own posted question, for which no one seems to have an answer for.
>
> Keep in mind I need a tool which will search and return ALL PDFs in a large directory/sub-directory structure and their path names.
>
> Not looking for a tool that will just let me know if the open PDF is image only or image+text, which is quite easy to ascertain.
>
> Thank you.Carsten Hammer Guest
-
Carsten_Hammer@adobeforums.com #7
Re: Identifying image-only vs. image+text PDF files
You could have a look at the adobe ifilter plugin for windows. I allows to index pdfs. The windows platform sdk contains example binaries how to use this windows api. It is very easy to examine pages of pdf files using the samples in the sdk. I guess a pdf page containing only images gives back no text.
Something like
filtdump testfile.pdf
should extract text after you installed the ifilter plugin. filtdump is part of the platform sdk.
<http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611>
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixrefint_9sfm.asp>
Carsten_Hammer@adobeforums.com Guest



Reply With Quote

