Professional Web Applications Themes

Sorting HTML tables - PERL Beginners

I wrote some code to identify and print HTML tables below: use strict; my table; my $logfile; my $counter; my $inc; my array; die "You must enter an argument. \n" if $#ARGV <0; $logfile = chomp ($ARGV); foreach my $line(<>) { if ($line =~ /(TABLE)(.+)/) { $inc++; } $table[$inc] .= $line; # append each line to the array element $counter++; } # Print out each table to show that the script works. foreach my $lineno (0..$#table) { print "TABLE: $lineno \n"; print $table[$lineno]; print "\n\n\n"; sleep 1; } The problem I am stuck with is that now I want to ...

  1. #1

    Default Sorting HTML tables

    I wrote some code to identify and print HTML tables below:

    use strict;

    my table;
    my $logfile;
    my $counter;
    my $inc;
    my array;

    die "You must enter an argument. \n" if $#ARGV <0;
    $logfile = chomp ($ARGV);
    foreach my $line(<>) {

    if ($line =~ /(TABLE)(.+)/) {
    $inc++;

    }

    $table[$inc] .= $line; # append each line to the array element

    $counter++;

    }

    # Print out each table to show that the script works.

    foreach my $lineno (0..$#table) {
    print "TABLE: $lineno \n";
    print $table[$lineno];
    print "\n\n\n";
    sleep 1;
    }



    The problem I am stuck with is that now I want to sort the tables based
    on a Priority (which range from 1-3). There may be several tables with
    the same priority numbers. An example of a Priority 3 would be:

    <td valign=top style='width:23%;padding:3.75pt; '><p
    class=MsoNormal><span
    style='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>
    Priority</span><span
    style='font-size:10.0pt;font-family:Verdana'><o:p></o:p></span></p></td>
    <td valign=top style='width:76%;padding:3.75pt; '><p
    class=MsoNormal><span
    style='font-size:10.0pt;font-family:Verdana;mso-bidi-font-family:Arial'>
    3</span><span
    style='font-size:10.0pt;font-family:Verdana'><o:p></o:p></span></p></td>

    I need help in understanding the methodology in how to extract these 2
    items and then sort the tables in Priority order (all the 1's, 2's and
    3's).

    Thanks.

    --Paul

    Perl Guest

  2. #2

    Default Re: Sorting HTML tables

    On Wed, 4 Aug 2004, Perl wrote:
     

    Don't do that.

    HTML is tremendously difficult to yze properly with tools like
    regular expressions.

    You're much, much better off using a proper pr library that can
    build up a tree model of the html that you can yze as you like.

    The standard libraries for this are probably HTML::Pr and
    HTML::Treebuilder. You may also like HTML::TableContentPr.

    <http://search.cpan.org/~gaas/HTML-Pr-3.36/Pr.pm>
    <http://search.cpan.org/~sburke/HTML-Tree-3.18/lib/HTML/TreeBuilder.pm>
    <http://search.cpan.org/~sdrabble/HTML-TableContentPr-0.13/TableContentPr.pm>

    This may point you in a useful direction:

    use HTML::TableContentPr;
    $p = HTML::TableContentPr->new();
    $html = read_html_from_somewhere();
    $tables = $p->p($html);
    for $t ($tables) {
    for $r ({$t->{rows}}) {
    print "Row: ";
    for $c ({$r->{cells}}) {
    print "[$c->{data}] ";
    }
    print "\n";
    }
    }

    Something like this should work even for godawful ms-html :-)
     

    It looks like HTML::TableContentPr makes sorting through the
    structure of the table pretty easy; HTML::Pr could go her by
    reducing it down to just the printable text -- some combination of the
    two may be useful here.

    Once you've stripped out all the junk (all the span tags, the paragraph
    tags, the "<o:p></o:p>" type debris, etc), you just need to do convert
    the html structure into some kind of populated data structure.

    You didn't give enough of the html to suggest what the rest of the table
    is structured like -- it was really just one big hairy table cell -- so
    it's hard to guess how the other pieces fit together.

    Can you post a simpler example of what the table is built like, e.g.:

    +------------+-------+---------------+----------------+
    | priority 1 | field | another field | some more |
    +------------+-------+---------------+----------------+
    | priority 3 | field | any data here | other things |
    +------------+-------+---------------+----------------+
    | priority 2 | field | stuff stuff | whatever |
    +------------+-------+---------------+----------------+

    Or is it more complcated than that?



    --
    Chris Devers com
    http://devers.homeip.net:8080/blog/

    np: 'Lujon'
    by Henry Mancini
    from 'The Best Of Mancini'
    Chris Guest

  3. #3

    Default RE: Sorting HTML tables

    The table is fairly complicated. I'll take a look at those modules
    though. Thanks!

    -----Original Message-----
    From: Chris Devers [mailto:com]
    Posted At: Wednesday, August 04, 2004 5:03 PM
    Posted To: Perl
    Conversation: Sorting HTML tables
    Subject: Re: Sorting HTML tables

    On Wed, 4 Aug 2004, Perl wrote:
     

    Don't do that.

    HTML is tremendously difficult to yze properly with tools like
    regular expressions.

    You're much, much better off using a proper pr library that can
    build up a tree model of the html that you can yze as you like.

    The standard libraries for this are probably HTML::Pr and
    HTML::Treebuilder. You may also like HTML::TableContentPr.

    <http://search.cpan.org/~gaas/HTML-Pr-3.36/Pr.pm>
    <http://search.cpan.org/~sburke/HTML-Tree-3.18/lib/HTML/TreeBuilder.pm>
    <http://search.cpan.org/~sdrabble/HTML-TableContentPr-0.13/TableCont
    entPr.pm>

    This may point you in a useful direction:

    use HTML::TableContentPr;
    $p = HTML::TableContentPr->new();
    $html = read_html_from_somewhere();
    $tables = $p->p($html);
    for $t ($tables) {
    for $r ({$t->{rows}}) {
    print "Row: ";
    for $c ({$r->{cells}}) {
    print "[$c->{data}] ";
    }
    print "\n";
    }
    }

    Something like this should work even for godawful ms-html :-)
     
     
     

    It looks like HTML::TableContentPr makes sorting through the
    structure of the table pretty easy; HTML::Pr could go her by
    reducing it down to just the printable text -- some combination of the
    two may be useful here.

    Once you've stripped out all the junk (all the span tags, the paragraph
    tags, the "<o:p></o:p>" type debris, etc), you just need to do convert
    the html structure into some kind of populated data structure.

    You didn't give enough of the html to suggest what the rest of the table

    is structured like -- it was really just one big hairy table cell -- so
    it's hard to guess how the other pieces fit together.

    Can you post a simpler example of what the table is built like, e.g.:

    +------------+-------+---------------+----------------+
    | priority 1 | field | another field | some more |
    +------------+-------+---------------+----------------+
    | priority 3 | field | any data here | other things |
    +------------+-------+---------------+----------------+
    | priority 2 | field | stuff stuff | whatever |
    +------------+-------+---------------+----------------+

    Or is it more complcated than that?



    --
    Chris Devers com
    http://devers.homeip.net:8080/blog/

    np: 'Lujon'
    by Henry Mancini
    from 'The Best Of Mancini'

    --
    To unsubscribe, e-mail: org
    For additional commands, e-mail: org
    <http://learn.perl.org/> <http://learn.perl.org/first-response>


    Paul Guest

  4. #4

    Default Re: Sorting HTML tables

    On Aug 4, Perl said:
     

    I'd suggest using:

    die ... if ARGV == 0;

    because it's easier to read and understand.
     

    Is that really your code? That doesn't make any sense to me. I think you
    meant:

    $logfile = $ARGV[0];

    There's no reason to chomp() here, and besides, chomp() doesn't return the
    modified string, it returns a number.

    --
    Jeff "japhy" Pinyan % How can we ever be the sold short or
    RPI Acacia Brother #734 % the cheated, we who for every service
    http://japhy.perlmonk.org/ % have long ago been overpaid?
    http://www.perlmonks.org/ % -- Meister Eckhart


    Jeff Guest

Similar Threads

  1. Replies: 2
    Last Post: April 17th, 11:42 AM
  2. Sorting tables - reference field
    By Ben in forum MySQL
    Replies: 4
    Last Post: April 7th, 02:48 PM
  3. CS - InDesign Tables - Sorting
    By Jamison_Deluski@adobeforums.com in forum Adobe Indesign Macintosh
    Replies: 2
    Last Post: September 21st, 09:53 PM
  4. Tables and Sorting
    By Darren in forum Macromedia Dynamic HTML
    Replies: 0
    Last Post: November 28th, 10:26 AM
  5. counting & sorting HTML tags
    By Shawn Milochik in forum PERL Beginners
    Replies: 2
    Last Post: August 14th, 06:16 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139