Professional Web Applications Themes

Need a full-text indexing application for PDF, DOC, TXT and HTM files - Adobe Acrobat Windows

I hope someone can help in pointing me in the right direction. I have a large collection of PDF, DOC, TXT and HTM doents (articles and papers for research into my MA) that I'd like to create and update full-text indexes with search capability. This is a fairly low-tech PC (home PC only) running Windows Me and I'm not keen to splash out tons of money on the solution - say maximum of $100. Has anyone got any suggested packages? Cheers Martin (sandylane.d.c.u) -- Remove ".spam." from my address to email...

  1. #1

    Default Need a full-text indexing application for PDF, DOC, TXT and HTM files

    I hope someone can help in pointing me in the right direction.

    I have a large collection of PDF, DOC, TXT and HTM doents (articles
    and papers for research into my MA) that I'd like to create and update
    full-text indexes with search capability.

    This is a fairly low-tech PC (home PC only) running Windows Me and I'm
    not keen to splash out tons of money on the solution - say maximum of
    $100.

    Has anyone got any suggested packages?


    Cheers

    Martin (sandylane.d.c.u)
    --
    Remove ".spam." from my address to email
    Vulpes Argenteus Guest

  2. #2

    Default Re: Need a full-text indexing application for PDF, DOC, TXT and HTM files

    Vulpes Argenteus <v.spam.ulpesdubolly.co.uk> wrote:
    >I hope someone can help in pointing me in the right direction.
    >
    >I have a large collection of PDF, DOC, TXT and HTM doents (articles
    >and papers for research into my MA) that I'd like to create and update
    >full-text indexes with search capability.
    >
    Software is no more capable of creating indexes than word processors
    are of writing books. You might have some luck generating a
    "concordance", which is nothing more than a list of every word in the
    doent and the page it appears on, but the only road to a true index
    is through a human indexer.

    Richard T. Evans
    Infodex Indexing Services, Inc.
    Web: [url]www.mindspring.com/~infodex[/url]
    Richard Evans Guest

  3. #3

    Default Re: Need a full-text indexing application for PDF, DOC, TXT and HTM files

    On Thu, 22 Apr 2004 16:39:58 GMT, Richard Evans <infodexmindspring.com>
    wrote:
    >Vulpes Argenteus <v.spam.ulpesdubolly.co.uk> wrote:
    >
    >>I hope someone can help in pointing me in the right direction.
    >>
    >>I have a large collection of PDF, DOC, TXT and HTM doents (articles
    >>and papers for research into my MA) that I'd like to create and update
    >>full-text indexes with search capability.
    >>
    >
    >Software is no more capable of creating indexes than word processors
    >are of writing books. You might have some luck generating a
    >"concordance", which is nothing more than a list of every word in the
    >doent and the page it appears on, but the only road to a true index
    >is through a human indexer.
    Ah no, my bad ! Not indexing - as in the tables at the back of a book -
    but the ability to do a multi-doent search on the full text content
    of the doents.

    Thanks

    Cheers

    Martin (sandylane.d.c.u)
    --
    Remove ".spam." from my address to email
    Vulpes Argenteus Guest

  4. #4

    Default Re: Need a full-text indexing application for PDF, DOC, TXT and HTM files

    On Thu, 22 Apr 2004 17:29:21 +0100, Vulpes Argenteus <v.spam.ulpesdubolly.co.uk> wrote:
    > I hope someone can help in pointing me in the right direction.
    >
    > I have a large collection of PDF, DOC, TXT and HTM doents (articles
    > and papers for research into my MA) that I'd like to create and update
    > full-text indexes with search capability.
    >
    > This is a fairly low-tech PC (home PC only) running Windows Me and I'm
    > not keen to splash out tons of money on the solution - say maximum of
    > $100.
    >
    > Has anyone got any suggested packages?
    What about Apache's Jakarata project's Lucence text retrieval system? It
    is written in Java, so might be possible to run it on a Windows machine.
    Jakarata has PDF and RTF plug-ins but what functionality they include I
    don't know; my requirement for evaluating Lucence is retrieval of SGML/XML
    doents (in a Linux environment).

    As Lucence is covered by the Apache licence you can download a copy from
    [url]http://jakarta.apache.org/lucene/[/url]

    You might also want to investigate Apahce's Xindice project. Though they
    are geared toward XML. Try [url]http://xml.apache.org/xindice/[/url] same licence
    same cost. ;-)

    One caveat I have not used either of these products on a single Windows Me
    machine so you'll have to decide for yourself whether self support of a
    complex Java application is your bag or not. Depends how mission critical
    you see it I suppose.

    Regards, Trevor

    <>< Re: deemed!
    Trevor Jenkins Guest

  5. #5

    Default Re: Need a full-text indexing application for PDF, DOC, TXT and HTM files

    Get a good text editor which allows search within disk files. I use Boxer,
    but many others will do it. The only hard one is PDF, where the text is
    encrypted. Maybe you pull the text out, and use that for the searches.

    --
    Regards,

    Adrian Jansen
    J & K MicroSystems
    Microcomputer solutions for industrial control
    "Vulpes Argenteus" <v.spam.ulpesdubolly.co.uk> wrote in message
    news:g2uf80tc1q9nb1pii00bmed416g8i7o4h44ax.com...
    > On Thu, 22 Apr 2004 16:39:58 GMT, Richard Evans <infodexmindspring.com>
    > wrote:
    >
    > >Vulpes Argenteus <v.spam.ulpesdubolly.co.uk> wrote:
    > >
    > >>I hope someone can help in pointing me in the right direction.
    > >>
    > >>I have a large collection of PDF, DOC, TXT and HTM doents (articles
    > >>and papers for research into my MA) that I'd like to create and update
    > >>full-text indexes with search capability.
    > >>
    > >
    > >Software is no more capable of creating indexes than word processors
    > >are of writing books. You might have some luck generating a
    > >"concordance", which is nothing more than a list of every word in the
    > >doent and the page it appears on, but the only road to a true index
    > >is through a human indexer.
    >
    > Ah no, my bad ! Not indexing - as in the tables at the back of a book -
    > but the ability to do a multi-doent search on the full text content
    > of the doents.
    >
    > Thanks
    >
    > Cheers
    >
    > Martin (sandylane.d.c.u)
    > --
    > Remove ".spam." from my address to email

    Adrian Jansen Guest

Similar Threads

  1. Verity errors when indexing PDF files
    By XIntelligence in forum Coldfusion Server Administration
    Replies: 3
    Last Post: September 1st, 02:55 PM
  2. Replies: 0
    Last Post: June 14th, 12:29 PM
  3. CFMX 7 Verity issue with indexing full query resultsproperly
    By Pranic in forum Coldfusion - Getting Started
    Replies: 0
    Last Post: June 8th, 12:16 AM
  4. indexing html files with php
    By Turrican in forum PHP Development
    Replies: 4
    Last Post: June 8th, 02:46 AM
  5. indexing flat files
    By Siva Sai in forum PERL Beginners
    Replies: 1
    Last Post: August 12th, 08:48 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139