Professional Web Applications Themes

Full text Boolean search - AIX

Hi all, I have an interesting requirement I'm not sure how to solve - searching a large directory (150,000+ files) for two search terms, returning only the filename as the result. A positive match equals both search terms in the same file, regardless of how many times each term appears in that file. The key is that both are there. I've tried cat * |grep <term1> |grep <term2> which gave me the "too many arguments" error due to directory size. I was wondering if anyone had a suggestion on a command I could use to do this search? Thanks! Dave ...

  1. #1

    Default Full text Boolean search

    Hi all,

    I have an interesting requirement I'm not sure how to solve -
    searching a large directory (150,000+ files) for two search terms,
    returning only the filename as the result.

    A positive match equals both search terms in the same file, regardless
    of how many times each term appears in that file. The key is that
    both are there.

    I've tried cat * |grep <term1> |grep <term2> which gave me the "too
    many arguments" error due to directory size. I was wondering if
    anyone had a suggestion on a command I could use to do this search?

    Thanks!
    Dave Saunders
    David Guest

  2. #2

    Default Re: Full text Boolean search

    I believe the answer may be found in "find"

    ummm try:
    find . -exec grep -c search1 {} \; -exec grep -c search2 {} \; -print

    hmmm.. that prints out a bunch of numbers (counts of occurances)
    perhaps you can p that output with another grep

    find . -exec grep -c search1 {} \; -exec grep -c search2 {} \; -print \
    | grep '^./'

    you could also do a dual grep -l using the output of the first as a
    search filelist for the second.

    hope this helps
    --
    be safe.
    flip
    Verso l'esterno! Verso l'esterno! Deamons di ignoranza.


    Philip Guest

  3. #3

    Default Re: Full text Boolean search

    com (David) writes:
     
     

    If you are doing this only a few times, you can use:

    find . -type f -exec grep -q search1 {} \; -exec grep -q search2 {} \; -print

    For efficiency, put the least frequent search term on the first grep. Still,
    this is going to execute grep at least 150,000+ times, once for each file.

    So, you can get a slight improvement with:

    find . -type f -print | xargs -n 50 grep -l search1 | xargs grep -l search2

    If the files have names that might contain spaces or other special
    characters, you need to use the GNU or BSD versions of the utilities, with
    special flags:

    <set PATH to find GNU or BSD tools first>
    find . -type f -print0 \
    | xargs -0 -n 50 grep -lZ -- search1 \
    | xargs -0 grep -l -- search2

    The alternate commands are available from IBM's Linux Toolkit site:
    <http://www-1.ibm.com/servers/aix/products/aixos/linux/download.html>

    If you need to do many searches, look into using the "glimpse" package
    to build an index of the files. <http://webglimpse.net>. Glimpse is
    now commercial software. I think there used to be a free version you
    might be able to find in some net archive.

    I'm sure there are other content indexers that could also be used.

    --
    Dale Talcott, IT Research Computing Services, Purdue University
    cc.purdue.edu http://quest.cc.purdue.edu/~aeh/
    Dale Guest

Similar Threads

  1. mySQL and Full Text Search
    By Bill in forum MySQL
    Replies: 1
    Last Post: August 4th, 09:43 PM
  2. sql server full-text search
    By fillae in forum Coldfusion - Advanced Techniques
    Replies: 6
    Last Post: May 12th, 01:27 PM
  3. bilingual Full text search
    By uzzu webforumsuser@macromedia.com in forum Macromedia Director Lingo
    Replies: 7
    Last Post: September 2nd, 01:59 PM
  4. full text search
    By Tommy in forum Microsoft SQL / MS SQL Server
    Replies: 0
    Last Post: July 9th, 01:36 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139