Thank you Brian for that Perl code. I have been using wvware to do my
conversions but have run across some files that cause it to hang. The
wvware pages on sourceforge are not helpful for figuring out how to use
it and the help forum there has 90% of its queries unanswered so I'm on
the verge of giving up on wvware and wv2.



However your Perl code yields results every bit as good as wvware does.
I just had to make two changes.



1. I got rid of $doc =~ s/\s+//sg;

You say it was fo rhtml formatting. I can't see the point of it as it
removes all the explicit spaces and in 99% of .docs that just joins up
all the words!



2. I changed $doc =~ s/\x0D//g;

to $doc =~ s/\x0D/x0A/g;



As a result I get beautifully clear text from .docs! It is
sufficiently formatted to make it easy to understand the text,
visually or by machine.



Thanks again, John


--
Posted via [url]http://dbforums.com[/url]