I have been attempting to use the pdf2xml program without any success.
I
am running SWISH-E on a Linux 7.3 OS and copied a program (first item
immediatelly below) from
[url]www.linuxjournal.com/articles.php?sid=6652[/url] which uses this program to
index
the PDF files. The SWISH-E program works properly from both a browser
and
the command line when indexing regular text files.

(howto-pdf-prog.pl file)
#!/usr/bin/perl -w
use pdf2xml;
my @files =
system (‘find /var/www/html/ccsp/docs/ -name *.pdf -print‘);
for (@files) {
chomp();
my $xml_record_ref = pdf2xml($_);
print $$xml_record_ref;
}

The following is the configuration file I am using:

(howto-pdf-conf file)
IndexDir ./howto-pdf-prog.pl
# prog file to hand us XML docs
IndexFile ./howto-pdf.index
# Index to create
UseStemming yes
MetaNames swishtitle swishdocpath+

When executed, the following is the result:

[root@DPA2 ccsp]# swish-e -c howto-pdf.conf -S prog
Indexing Data Source: "External-Program"
Indexing "./howto-pdf-prog.pl"
Error: Couldn't open file ‘65280'
../howto-pdf-prog.pl: Failed close on pipe to pdfinfo for 65280: 256 at
pdf2xml.pm line 129.
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!

I and my tech support cannot figure out what "..file 65280.." is.
There is
no such filename anywhere on the server and it is not a PDF file in
our test
directory (../ccsp/docs/). We are at a loss as to what to do next.

Thank you for any assistance you can provide.