Ask a Question related to PERL Miscellaneous, Design and Development.
-
tony #1
count word repetitions for each pair of lines
I wonder if anyone has a script that could do the following?
-Read a file containing a regular text (e.g. a news story);
-Count word type repetitions for each pair of lines, disregarding
numbers and ignoring case. A valid repetition is a word type that
occurs in both lines of the pair. Words that occur many times in one
line of the pair only must be disregarded;
-Print out the count for word repetitions for each sentence pair in
the formats shown below ;
For instance:
input:
cat: cat is sitting on 2 mats.
Dog: dog is sitting.
dog, CAT and 2 mATs.
output format 1:
#format:
#[line][line][repetitions][words repeated]
[1][2][2][is,sitting]
[1][3][2][cat,mats]
[2][3][1][dog]
output format 2:
#matrix format:
#[line 1]:[1 & 1][1 & 2][1 & 3]
#[line 2]:[2 & 1][2 & 2][2 & 3]
#[line 3]:[3 & 1][3 & 2][3 & 3]
[1]:[0][2][2]
[2]:[2][0][1]
[3]:[2][1][0]
thanks very much indeed
tony berber
Catholic University of Sao Paulo, Brazil
Applied Linguistics Postgraduate Program
tony4 at uol.com.br
tony Guest
-
Getting Word Count from MS Word files
I would like to get the word count of MS Word files. I found a CFX tag on the exchange: CFX_FileSummary, but the download page is a dead link, as... -
Word Count
Is there a way to determine the word count in a story? -
placeholder text with word count embedded
Anyone still have that old (REALLY old) placeholder text from Pagemaker that had every 25 or 50 words counted out so when you placed the text you... -
how to retrieve page count from word?
I'm trying to extract the number of pages in a word 2000 document, but can't find either the property/method via application.word, or any other way... -
To count a number of lines in C++ or Java or ASCII files by exluding white spaces and comments
Hi all, We have huge files in Java and C++ and I need to count the total number of lines in each of them by excluding white spaces (from the... -
Jürgen Exner #2
Re: count word repetitions for each pair of lines
tony wrote:
Pretty simple:> I wonder if anyone has a script that could do the following?
>
> -Read a file containing a regular text (e.g. a news story);
> -Count word type repetitions for each pair of lines, disregarding
> numbers and ignoring case. A valid repetition is a word type that
> occurs in both lines of the pair. Words that occur many times in one
> line of the pair only must be disregarded;
> -Print out the count for word repetitions for each sentence pair in
> the formats shown below ;
- split() both lines into arrays of words, filter out numbers and similar
unwanted stuff
- then apply the solution from the FAQ
"How do I compute the difference of two arrays? How do I compute
the intersection of two arrays?"
- and then just print the result
Where's the problem?
jue
Jürgen Exner Guest



Reply With Quote

