HTML-Parser / SGML-Parser

Ask a Question related to Ruby, Design and Development.

  1. #1

    Default HTML-Parser / SGML-Parser

    Ok, silly question.

    I am writing a script to determine my router's WAN ip address and then to
    email me once an hour in case it changes. Currently I am running a web
    server at work that returns a page with the client's ip address. I need to
    parse out the info on the page so I can extract the ip address of my router
    when my script/program connects.

    I am using the html-parser, sgml-parser and formatter ruby libraries
    provided from raa and I have made the changes to the regexp regarding image
    width and height. So I'm good there.

    In my test.rb file I say:
    ------------------------------------------------
    h = Net::HTTP.new('www.zachstestip.com' , 80 )
    resp,data = h.get('/index.php' , nil )

    w = DumbWriter.new
    f = AbstractFormatter.new(w)
    p = HTMLParser.new(f)
    p.feed(data)
    p.close
    ------------------------------------------------

    Here comes the silly part. The function "feed" is inherited by sgml-parser
    to html-parser. It passes "data" along to the sgml-parser function
    "goahead". It prints everything to stdout or stderr( i dont know, but it
    makes it to my screen =), but there is no print, put, etc... etc... call to
    send it there!!! I cant for the life of me determine where in the feed or
    goahead functions are outputting my parsed results from data! This is damn
    silly of me to ask I know, but how is it getting to my CLI?

    In the "goahead" function there is a giant while loop. If i place a print or
    puts statement at the right before the loop and right after the loop, then
    nothing is outputted( except for my explicit print/puts statements).

    Am I losing it?

    Zach


    Zach Dennis Guest

  2. Similar Questions and Discussions

    1. How to use SGML::Parser?
      Hi, I hope this isn't too dumb of a question, but is there any resource (book, website) that explains in simple terms how to use the SGML::Parser...
    2. Getting SGML::Parser::OpenSP to compile under Cygwin
      I'm trying to get the W3C HTML Validator to work on Windows XP SP2 using the Apache web server and cygwin. I'm using the Cygwin package manager to...
    3. HTML Parser
      Try something like this. This does send any form information, but only gets the html markup of the page. If you want to send form or query data that...
    4. HTML::Parser
      Greetings, How should i use HTML::Parser to extract text bettween font tags with attibuts like this: <font face="Arial" size="2"> the text to...
    5. How to use HTML::Parser to remove HTML tags and print result
      I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags (including comments, etc.), replace all ENTITIES (e.g. &amp), and put...
  3. #2

    Default Re: HTML-Parser / SGML-Parser

    Zach Dennis wrote:
    > Ok, silly question.
    >
    > I am writing a script to determine my router's WAN ip address and then to
    > email me once an hour in case it changes. Currently I am running a web
    > server at work that returns a page with the client's ip address. I need to
    > parse out the info on the page so I can extract the ip address of my router
    > when my script/program connects.
    >
    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding image
    > width and height. So I'm good there.
    >
    > In my test.rb file I say:
    > ------------------------------------------------
    > h = Net::HTTP.new('www.zachstestip.com' , 80 )
    > resp,data = h.get('/index.php' , nil )
    >
    > w = DumbWriter.new
    > f = AbstractFormatter.new(w)
    > p = HTMLParser.new(f)
    > p.feed(data)
    > p.close
    > ------------------------------------------------
    >
    > Here comes the silly part. The function "feed" is inherited by sgml-parser
    > to html-parser. It passes "data" along to the sgml-parser function
    > "goahead". It prints everything to stdout or stderr( i dont know, but it
    > makes it to my screen =), but there is no print, put, etc... etc... call to
    > send it there!!! I cant for the life of me determine where in the feed or
    > goahead functions are outputting my parsed results from data! This is damn
    > silly of me to ask I know, but how is it getting to my CLI?
    >
    > In the "goahead" function there is a giant while loop. If i place a print or
    > puts statement at the right before the loop and right after the loop, then
    > nothing is outputted( except for my explicit print/puts statements).
    >
    > Am I losing it?
    Why not just qualify your IP address with something like >>>>IP<<<< and
    then you can regex for it like this:

    match = />>>>(.+)<<<</.match(HTML)

    match[1] => your IP address

    Sean O'Dell


    Sean O'Dell Guest

  4. #3

    Default Re: HTML-Parser / SGML-Parser

    On Wed, 1 Oct 2003, Zach Dennis wrote:
    > Ok, silly question.
    >
    > I am writing a script to determine my router's WAN ip address and then to
    > email me once an hour in case it changes. Currently I am running a web
    > server at work that returns a page with the client's ip address. I need to
    > parse out the info on the page so I can extract the ip address of my router
    > when my script/program connects.
    check out dyndns.org - they have scripts for just about every router that does
    this.
    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding image
    > width and height. So I'm good there.
    >
    > In my test.rb file I say:
    > ------------------------------------------------
    > h = Net::HTTP.new('www.zachstestip.com' , 80 )
    > resp,data = h.get('/index.php' , nil )
    >
    > w = DumbWriter.new
    > f = AbstractFormatter.new(w)
    > p = HTMLParser.new(f)
    > p.feed(data)
    > p.close
    > ------------------------------------------------
    one thing i might point out here - i myself have spent hours trying to figure
    out weird bugs after naming a variable 'p'. worth a check...
    > Here comes the silly part. The function "feed" is inherited by sgml-parser
    > to html-parser. It passes "data" along to the sgml-parser function
    > "goahead". It prints everything to stdout or stderr( i dont know, but it
    > makes it to my screen =), but there is no print, put, etc... etc... call to
    > send it there!!! I cant for the life of me determine where in the feed or
    > goahead functions are outputting my parsed results from data! This is damn
    > silly of me to ask I know, but how is it getting to my CLI?
    >
    > In the "goahead" function there is a giant while loop. If i place a print or
    > puts statement at the right before the loop and right after the loop, then
    > nothing is outputted( except for my explicit print/puts statements).
    you could also try something like this to track the problem:

    alias __p p
    alias __print print
    alias __puts puts

    def p(*args);STDERR.p(caller.join("\n")); __p(*args);end
    def print(*args);STDERR.print(caller.join("\n")); __print(*args);end
    def puts(*args);STDERR.puts(caller.join("\n")); __puts(*args);end

    i'm note sure you'd need all three but... you get the picture.

    -a
    ====================================
    | Ara Howard
    | NOAA Forecast Systems Laboratory
    | Information and Technology Services
    | Data Systems Group
    | R/FST 325 Broadway
    | Boulder, CO 80305-3328
    | Email: [email]ara.t.howard@noaa.gov[/email]
    | Phone: 303-497-7238
    | Fax: 303-497-7259
    | The difference between art and science is that science is what we understand
    | well enough to explain to a computer. Art is everything else.
    | -- Donald Knuth, "Discover"
    | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
    ====================================
    Ara.T.Howard Guest

  5. #4

    Default Re: HTML-Parser / SGML-Parser

    This doesn't answer your questions about Ruby, but most of what you want
    exists already.

    Look at [url]http://www.dyndns.org[/url]. I've been using them for a year or so.
    Every 5 minutes, a Perl daemon (ddclient) on my system wakes up, grabs
    the WAN address from my Linksys box, and if it's changed, updates
    dyndns. I can ssh into my system at home using the name
    'tidal.dyndns.org', even though the address actually belongs to my ISP.
    It works great, and it's free.

    Steve


    Steven Jenkins Guest

  6. #5

    Default Re: HTML-Parser / SGML-Parser

    Thank you all for your replies, but I do not want to work around the
    solution I would like to work the solution. When attempting to learn a new
    language I try to think of small projects that I can do to use different
    parts of the language. This is just one of them. I am going to attempt to
    figure out what Ara had mentioned. Sometimes I have to reinvent little
    wheels so I know I am using the language right when it comes to building
    module pieces of big 99^nth polygon shapes with 147 sides.

    -Zach


    Zach Dennis Guest

  7. #6

    Default Re: HTML-Parser / SGML-Parser

    Zach Dennis wrote:
    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding image
    > width and height. So I'm good there.
    I think the HTML parser might be abandoned (RAA says the last update was
    2001-07-10 13:35:40 GMT).

    You might have better luck using (my) htmltokenizer. It has a really
    simple interface, and it might be more what you need:

    [url]http://raa.ruby-lang.org/list.rhtml?name=htmltokenizer[/url]

    If you really want to use the html-parser, sorry, I can't help you. I
    never managed to understand how to work it, which is why I ported the
    htmltokenizer.

    Ben


    Ben Giddings Guest

  8. #7

    Default Re: HTML-Parser / SGML-Parser

    I'll check it out! When you say simple, can I extract data from forms in
    html page by chance?

    Thanks,

    -Zach

    -----Original Message-----
    From: Ben Giddings [mailto:bg-rubytalk@infofiend.com]
    Sent: Wednesday, October 01, 2003 12:43 PM
    To: ruby-talk ML
    Subject: Re: HTML-Parser / SGML-Parser


    Zach Dennis wrote:
    > I am using the html-parser, sgml-parser and formatter ruby libraries
    > provided from raa and I have made the changes to the regexp regarding
    image
    > width and height. So I'm good there.
    I think the HTML parser might be abandoned (RAA says the last update was
    2001-07-10 13:35:40 GMT).

    You might have better luck using (my) htmltokenizer. It has a really
    simple interface, and it might be more what you need:

    [url]http://raa.ruby-lang.org/list.rhtml?name=htmltokenizer[/url]

    If you really want to use the html-parser, sorry, I can't help you. I
    never managed to understand how to work it, which is why I ported the
    htmltokenizer.

    Ben




    Zach Dennis Guest

  9. #8

    Default Re: HTML-Parser / SGML-Parser

    Zach Dennis wrote:
    > I'll check it out! When you say simple, can I extract data from forms in
    > html page by chance?
    You should be able to.

    If you say:

    while token = tokenizer.getTag('input')
    next unless 'ip_addr' == token.attr_hash['name']

    puts token.attr_hash['value']
    end

    I think that will do what you want. I'm not sure if the syntax is perfect
    since I'm doing this from memory, but it should be close enough to get you
    started.

    Ben



    Ben Giddings Guest

  10. #9

    Default Re: HTML-Parser / SGML-Parser

    > ------------------------------------------------
    > h = Net::HTTP.new('www.zachstestip.com' , 80 )
    > resp,data = h.get('/index.php' , nil )
    >
    > w = DumbWriter.new
    > f = AbstractFormatter.new(w)
    > p = HTMLParser.new(f)
    > p.feed(data)
    > p.close
    > ------------------------------------------------
    >
    > Here comes the silly part. The function "feed" is inherited by
    > sgml-parser to html-parser. It passes "data" along to the sgml-parser
    > function "goahead". It prints everything to stdout or stderr( i dont
    > know, but it makes it to my screen =), but there is no print, put,
    > etc... etc... call to send it there!!! I cant for the life of me
    > determine where in the feed or goahead functions are outputting my
    > parsed results from data! This is damn silly of me to ask I know, but
    > how is it getting to my CLI?
    Through the DumbWriter. Check its implementation in
    ....\Ruby\lib\ruby\site_ruby\formatter.rb
    that's where the "write" statements live.

    Often times when you want to parse HTML, it is simpler to use
    the (misleadingly named) SGMLParser. Anyway these libraries are
    direct ports of python modules, and can only be understood by
    checking the documentation of the originals.
    See eg: [url]http://www.python.org/doc/1.5.2/lib/module-sgmllib.html[/url]
    And usage examples (in python ;-)
    [url]http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52281[/url]
    [url]http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html#t4[/url]

    Cheers,

    Bernard.



    Bernard Delmée Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139