Professional Web Applications Themes

counting & sorting HTML tags - PERL Beginners

Hi. I'm still very much a Perl beginner, but I'm reading "Mastering Regular Expressions" by Friedl, so I expect to be able to at least help the group out with regexs when I'm a few more chapters in. ;o) I've just created a script which, when run like this: tags.pl filename.html Produces results like this: #snipped for brevity <title>, 1 occurrances <td>, 81 occurrances </form>, 1 occurrances </tr>, 23 occurrances </font>, 10 occurrances </title>, 1 occurrances </strong>, 1 occurrances <applet>, 1 occurrances <script>, 6 occurrances <hr>, 2 occurrances <h3>, 1 occurrances <img>, 3 occurrances <h1>, 1 occurrances <body>, 1 occurrances ...

  1. #1

    Default counting & sorting HTML tags

    Hi. I'm still very much a Perl beginner, but I'm reading "Mastering
    Regular Expressions" by Friedl, so I expect to be able to at least help the
    group out with regexs when I'm a few more chapters in. ;o)

    I've just created a script which, when run like this:

    tags.pl filename.html

    Produces results like this:

    #snipped for brevity
    <title>, 1 occurrances
    <td>, 81 occurrances
    </form>, 1 occurrances
    </tr>, 23 occurrances
    </font>, 10 occurrances
    </title>, 1 occurrances
    </strong>, 1 occurrances
    <applet>, 1 occurrances
    <script>, 6 occurrances
    <hr>, 2 occurrances
    <h3>, 1 occurrances
    <img>, 3 occurrances
    <h1>, 1 occurrances
    <body>, 1 occurrances
    </td>, 81 occurrances
    <head>, 1 occurrances
    </option>, 10 occurrances



    Is there an easy way to have this print out the matching opening and
    closing tags on one line:
    Example: <td> 81, </td> 81

    If not, is there an easy way to sort the hash before printing, so that
    either it's sorted by the value ,
    (hopefully the <td> and </td> will not be too far apart in the output
    then),
    or sorted by key disregarding the optional "/"?

    As always, if my beginner script is "functional but less-than-elegant" in
    any regard, please feel free to educate me.

    Thank you,
    Shawn


    tags.pl:
    ================================================== ================================================== ===================
    #!/usr/bin/perl
    use warnings;
    use strict;

    my %files;
    my $inFile = $ARGV[0];
    my tags;

    open (IN, "<$inFile") || die "It blew up:\n$!\nCould not open file $inFile.
    \n\n";

    while (<IN>){


    tags = split(">");

    foreach (tags){

    $_ = "$_>";
    #table tags only
    #$files{lc $1} += 1 if $_ =~ /<(\/?t(d|r|able))[^>]*>/gi;

    #all tags
    $files{lc $1} += 1 if $_ =~ /<(\/?\w+)[^>]*>/gi;
    }

    }

    my values = values %files;
    my keys = keys %files;

    while (keys){
    print "<" . pop(keys) . ">, " , pop(values), " occurrances\n";
    }

    ================================================== ================================================== ===================


    Note: Any disclaimers below this line are auto-appended by my company's
    system. I apologize for the ominous verbage.




    ************************************************** ********************
    This e-mail and any files transmitted with it may contain
    confidential information and is intended solely for use by
    the individual to whom it is addressed. If you received
    this e-mail in error, please notify the sender, do not
    disclose its contents to others and delete it from your
    system.

    ************************************************** ********************

    Shawn Milochik Guest

  2. #2

    Default Re: counting & sorting HTML tags

    On Thursday, August 14, 2003, at 07:56 AM,
    [email]shawn_milochikgodivachoc.com[/email] wrote:
    > Hi. I'm still very much a Perl beginner, but I'm reading "Mastering
    > Regular Expressions" by Friedl, so I expect to be able to at least
    > help the
    > group out with regexs when I'm a few more chapters in. ;o)
    >
    > I've just created a script which, when run like this:
    >
    > tags.pl filename.html
    >
    > Produces results like this:
    >
    > #snipped for brevity
    > <title>, 1 occurrances
    > <td>, 81 occurrances
    > </form>, 1 occurrances
    > </tr>, 23 occurrances
    > </font>, 10 occurrances
    > </title>, 1 occurrances
    > </strong>, 1 occurrances
    > <applet>, 1 occurrances
    > <script>, 6 occurrances
    > <hr>, 2 occurrances
    > <h3>, 1 occurrances
    > <img>, 3 occurrances
    > <h1>, 1 occurrances
    > <body>, 1 occurrances
    > </td>, 81 occurrances
    > <head>, 1 occurrances
    > </option>, 10 occurrances
    >
    >
    >
    > Is there an easy way to have this print out the matching opening and
    > closing tags on one line:
    > Example: <td> 81, </td> 81
    Sure. What about something like:

    foreach (sort keys %files) {
    next if substr($_, 0, 1) eq '/';
    if (exists $files{"/$_"}) {
    print "<$_> $files{$_} occurrences, </$_> $files{\"/$_\"}
    occurrences\n";
    }
    else { print "<$_> $files{$_} occurrences\n"; }
    }
    > If not, is there an easy way to sort the hash before printing, so that
    > either it's sorted by the value ,
    > (hopefully the <td> and </td> will not be too far apart in the output
    > then),
    foreach (sort { $files{$a} <=> $files{$b} } keys %files) {
    print "<$_> $files{$_} occurrences\n";
    }

    or

    foreach (sort { $files{$b} <=> $files{$a} } keys %files) {
    print "<$_> $files{$_} occurrences\n";
    }
    > or sorted by key disregarding the optional "/"?
    my order = map { substr($_, -1, 1) eq '/' ? '/' . substr($_, 0,
    length($_) - 1) : $_ }
    sort map { substr($_, 0, 1) eq '/' ? substr($_, 1) . '/' :
    $_ } keys %files;
    print "<$_> $files{$_} occurrences\n" foreach order;
    > As always, if my beginner script is "functional but less-than-elegant"
    > in
    > any regard, please feel free to educate me.
    >
    > Thank you,
    > Shawn
    >
    >
    > tags.pl:
    > ================================================== =====================
    > ================================================
    > #!/usr/bin/perl
    > use warnings;
    > use strict;
    >
    > my %files;
    > my $inFile = $ARGV[0];
    > my tags;
    >
    > open (IN, "<$inFile") || die "It blew up:\n$!\nCould not open file
    > $inFile.
    > \n\n";
    >
    > while (<IN>){
    >
    >
    > tags = split(">");
    >
    > foreach (tags){
    >
    > $_ = "$_>";
    > #table tags only
    > #$files{lc $1} += 1 if $_ =~ /<(\/?t(d|r|able))[^>]*>/gi;
    >
    > #all tags
    > $files{lc $1} += 1 if $_ =~ /<(\/?\w+)[^>]*>/gi;
    > }
    >
    > }
    >
    > my values = values %files;
    > my keys = keys %files;
    >
    > while (keys){
    > print "<" . pop(keys) . ">, " , pop(values), " occurrances\n";
    > }
    >
    > ================================================== =====================
    > ================================================
    >
    >
    > Note: Any disclaimers below this line are auto-appended by my
    > company's
    > system. I apologize for the ominous verbage.
    >
    >
    >
    >
    > ************************************************** ********************
    > This e-mail and any files transmitted with it may contain
    > confidential information and is intended solely for use by
    > the individual to whom it is addressed. If you received
    > this e-mail in error, please notify the sender, do not
    > disclose its contents to others and delete it from your
    > system.
    >
    > ************************************************** ********************
    >
    >
    > --
    > To unsubscribe, e-mail: [email]beginners-unsubscribeperl.org[/email]
    > For additional commands, e-mail: [email]beginners-helpperl.org[/email]
    >
    James Edward Gray II Guest

  3. #3

    Default Re: counting & sorting HTML tags

    On Thursday 14 August 2003 05:56, [email]shawn_milochikgodivachoc.com[/email] wrote:
    >
    > Hi. I'm still very much a Perl beginner, but I'm reading "Mastering
    > Regular Expressions" by Friedl, so I expect to be able to at least
    > help the group out with regexs when I'm a few more chapters in. ;o)
    >
    > I've just created a script which, when run like this:
    >
    > tags.pl filename.html
    >
    > Produces results like this:
    >
    > #snipped for brevity
    > <title>, 1 occurrances
    > <td>, 81 occurrances
    > </form>, 1 occurrances
    > </tr>, 23 occurrances
    > </font>, 10 occurrances
    > </title>, 1 occurrances
    > </strong>, 1 occurrances
    > <applet>, 1 occurrances
    > <script>, 6 occurrances
    > <hr>, 2 occurrances
    > <h3>, 1 occurrances
    > <img>, 3 occurrances
    > <h1>, 1 occurrances
    > <body>, 1 occurrances
    > </td>, 81 occurrances
    > <head>, 1 occurrances
    > </option>, 10 occurrances
    >
    > Is there an easy way to have this print out the matching opening and
    > closing tags on one line:
    > Example: <td> 81, </td> 81
    Yes there is.


    use HTML::TokePr;

    my $p = HTML::TokePr->new( 'index.html' );

    my %data;
    while ( my $token = $p->get_token() ) {
    next unless $token->[ 0 ] =~ /S|E/;
    $data{ $token->[ 1 ] }{ $token->[ 0 ] }++;
    }

    for my $key ( sort keys %data ) {
    print "<$key> $data{$key}{S}, </$key> $data{$key}{E}\n";
    }

    __END__

    > If not, is there an easy way to sort the hash before printing, so
    > that either it's sorted by the value,
    Change the line:

    for my $key ( sort keys %data ) {

    To:

    for my $key ( sort { $data{$a}{S} <=> $data{$b}{S} } keys %data ) {



    John
    --
    use Perl;
    program
    fulfillment

    John W . Krahn Guest

Similar Threads

  1. Counting HTML Rows
    By TGuthrie in forum Macromedia Dynamic HTML
    Replies: 1
    Last Post: November 14th, 05:30 PM
  2. html character enities and other html tags in Contribute3.x
    By dnickles in forum Macromedia Contribute General Discussion
    Replies: 1
    Last Post: April 29th, 10:56 PM
  3. search in html tags
    By esi022@yahoo.com in forum ASP Database
    Replies: 1
    Last Post: February 1st, 05:29 AM
  4. Counting the number of times a user hits an html form
    By Keith Kurzendoerfer in forum FileMaker
    Replies: 1
    Last Post: September 24th, 02:53 PM
  5. Replies: 1
    Last Post: July 15th, 02:02 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139