count word repetitions for each pair of lines

Ask a Question related to PERL Miscellaneous, Design and Development.

  1. #1

    Default count word repetitions for each pair of lines

    I wonder if anyone has a script that could do the following?

    -Read a file containing a regular text (e.g. a news story);
    -Count word type repetitions for each pair of lines, disregarding
    numbers and ignoring case. A valid repetition is a word type that
    occurs in both lines of the pair. Words that occur many times in one
    line of the pair only must be disregarded;
    -Print out the count for word repetitions for each sentence pair in
    the formats shown below ;

    For instance:

    input:
    cat: cat is sitting on 2 mats.
    Dog: dog is sitting.
    dog, CAT and 2 mATs.

    output format 1:
    #format:
    #[line][line][repetitions][words repeated]
    [1][2][2][is,sitting]
    [1][3][2][cat,mats]
    [2][3][1][dog]

    output format 2:
    #matrix format:
    #[line 1]:[1 & 1][1 & 2][1 & 3]
    #[line 2]:[2 & 1][2 & 2][2 & 3]
    #[line 3]:[3 & 1][3 & 2][3 & 3]
    [1]:[0][2][2]
    [2]:[2][0][1]
    [3]:[2][1][0]

    thanks very much indeed

    tony berber

    Catholic University of Sao Paulo, Brazil
    Applied Linguistics Postgraduate Program
    tony4 at uol.com.br
    tony Guest

  2. Similar Questions and Discussions

    1. Getting Word Count from MS Word files
      I would like to get the word count of MS Word files. I found a CFX tag on the exchange: CFX_FileSummary, but the download page is a dead link, as...
    2. Word Count
      Is there a way to determine the word count in a story?
    3. placeholder text with word count embedded
      Anyone still have that old (REALLY old) placeholder text from Pagemaker that had every 25 or 50 words counted out so when you placed the text you...
    4. how to retrieve page count from word?
      I'm trying to extract the number of pages in a word 2000 document, but can't find either the property/method via application.word, or any other way...
    5. To count a number of lines in C++ or Java or ASCII files by exluding white spaces and comments
      Hi all, We have huge files in Java and C++ and I need to count the total number of lines in each of them by excluding white spaces (from the...
  3. #2

    Default Re: count word repetitions for each pair of lines

    tony wrote:
    > I wonder if anyone has a script that could do the following?
    >
    > -Read a file containing a regular text (e.g. a news story);
    > -Count word type repetitions for each pair of lines, disregarding
    > numbers and ignoring case. A valid repetition is a word type that
    > occurs in both lines of the pair. Words that occur many times in one
    > line of the pair only must be disregarded;
    > -Print out the count for word repetitions for each sentence pair in
    > the formats shown below ;
    Pretty simple:
    - split() both lines into arrays of words, filter out numbers and similar
    unwanted stuff
    - then apply the solution from the FAQ
    "How do I compute the difference of two arrays? How do I compute
    the intersection of two arrays?"
    - and then just print the result

    Where's the problem?

    jue


    Jürgen Exner Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139