Ask a Question related to PERL Miscellaneous, Design and Development.
-
Lydia Shawn #1
extract from html
hi,
how can i extract the number between text1 and text2 in input.html
only the first time it occurs ignoring the rest?
preferably input.html would be a URL that stops downloading once a
match has occured, that would save a lot of bandwidth..
i guess html::parser would provide an option to work with a file while
it's downloading (?)
example
----
input.html:
bla..
text1 555 text2
bla
bla
text1 6000 text2
bla
EOF
output.txt
555
thanks for your help,
peter
Lydia Shawn Guest
-
help data extract to text without html tag
Hi Ive a page as below and it will save the record to text, but it does not save the file as what I needed. when i open up the file it didplaay... -
PHP: extract links AND description from html
extracting just the links from a webpage is no problem for me -> regex /<a (*)>/i but now i want to extract the link and the discription that... -
extract body-content from HTML page online
Hi everybody, I need to include an online web page in my own one. My first attempt was to include() that page. This way all the HTML framework... -
Extract data from table html
Hi, I would like to extract data from the table attached. Could someone help me to create the regular expression to grab that informations? ... -
[PHP] Extract a little string from a Html page ?
I tried something else but... It doesn't work too :-( . <? php $fichier=implode('',array_map('trim',readfile("http://myurl.com"))); if (eregi... -
Simon Taylor #2
Re: extract from html
Hello Peter,
Assuming you mean 'text1' and 'text2' are html tags, then the following> how can i extract the number between text1 and text2 in input.html
> only the first time it occurs ignoring the rest?
> preferably input.html would be a URL that stops downloading once a
> match has occured, that would save a lot of bandwidth..
> i guess html::parser would provide an option to work with a file while
> it's downloading (?)
>
> example
> ----
>
> input.html:
>
> bla..
> text1 555 text2
> bla
> bla
> text1 6000 text2
> bla
> EOF
>
>
> output.txt
> 555
example, (which is straight out of the HTML::Parser documentation), will
do it for you. This example prints out the title text of a html page if
you supply the page as a filename on the command line, so just change
the word "title" to the tag name you require:
#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser ();
sub start_handler
{
return if shift ne "title";
my $self = shift;
$self->handler(text => sub { print shift }, "dtext");
$self->handler(end => sub { shift->eof if shift eq "title"; },
"tagname,self");
}
my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&start_handler, "tagname,self");
$p->parse_file(shift || die) || die $!;
print "\n";
Simon Taylor Guest
-
Tad McClellan #3
Re: extract from html
[ comp.lang.perl is not a Newsgroup. Removed ]
Lydia Shawn <apfeloma@hotmail.com> wrote:
> Subject: extract from html
Your post is not about extracting from HTML at all, so that
seems a strange choice of Subject...
> how can i extract the number between text1 and text2 in input.html
> only the first time it occurs ignoring the rest?> input.html:
>
> bla..
> text1 555 text2
> bla
> bla
> text1 6000 text2
> bla
> EOF
No HTML there!
If you read it all into a scalar, then you can just do this pattern
match on the scalar:
/text1 (.*?) text2/
--
Tad McClellan SGML consulting
[email]tadmc@augustmail.com[/email] Perl programming
Fort Worth, Texas
Tad McClellan Guest
-
Michael Korte #4
Re: extract from html
"Lydia Shawn" <apfeloma@hotmail.com> schrieb im Newsbeitrag
news:1240b4dc.0308051647.685dde59@posting.google.c om...This problem I would solve by using a Hash. You can just put a unique key> hi,
> how can i extract the number between text1 and text2 in input.html
> only the first time it occurs ignoring the rest?
into it, while finding the same term
it will be overwritten, or you can ask the hash if the term already exist
# $term is taken from your text - inbeetween text1 / text2
if( exists $myHash{$term})
{
# ignore
}else
{
$myHash{$term} = $value;
}
The Rest of your question : I donīt know ... sorry
no prob...but what is your real name ?> thanks for your help,
> peter
"Lydia Shawn" or Peter :-)
HTH
greets Michael
Michael Korte Guest
-
Lydia Shawn #5
Re: extract from html
> Your post is not about extracting from HTML at all, so that
yes there is no html in my example,> seems a strange choice of Subject...
>> No HTML there!>>
>
> If you read it all into a scalar, then you can just do this pattern
> match on the scalar:
>
> /text1 (.*?) text2/
my question is more about the function of html::parser working with a
file and matching things as the file is coming in, and stopping after
the first match has occured.. to prevent needless downloading. how can
i do that?
thanks a lot,
peter
Lydia Shawn Guest
-
Lydia Shawn #6
Re: extract from html
>
thanks simon,> Assuming you mean 'text1' and 'text2' are html tags, then the following
> example, (which is straight out of the HTML::Parser documentation), will
> do it for you. This example prints out the title text of a html page if
> you supply the page as a filename on the command line, so just change
> the word "title" to the tag name you require:
>
>
but the text1/text2 is actual text that occur within <TD> tags which
are all over the document...
the reason i would like to use html::parser to do the job is its
feature, at least the way i understand it, to start matching things
before the whole file is read in. then, after the match, it should
stop in order to save bandwidth,
thaks again,
peter
Lydia Shawn Guest
-
Brian Helterline #7
Re: extract from html
"Lydia Shawn" <apfeloma@hotmail.com> wrote in message
news:1240b4dc.0308051647.685dde59@posting.google.c om...Take a look at the lwp-download script (in your perl bin directory)> hi,
> how can i extract the number between text1 and text2 in input.html
> only the first time it occurs ignoring the rest?
> preferably input.html would be a URL that stops downloading once a
> match has occured, that would save a lot of bandwidth..
> i guess html::parser would provide an option to work with a file while
> it's downloading (?)
as an example of a program that incrementally downloads a URL.
You can then search the contents for your text1 and text2 and stop if found.
The script uses LWP::UserAgent to do the download.
--
brian
Brian Helterline Guest



Reply With Quote

