Ask a Question related to Dreamweaver AppDev, Design and Development.
-
Bart Van Loon #1
regular expression question
Hello,
is it possible to make a regular expression match for the following
situation:
I have a string, looking like 'foobarbarbar'. I don't know what foo is,
nor what bar is, the only thing I know is that I have a string X,
concatenated an undefined number of times with a string Y. My goal is to
find out how many times this string Y (bar) is repeted, without knowing
what it exactly is.
Something like ^.*(.*)*$ maybe?
greetings,
BBBart
Bart Van Loon Guest
-
cfform regular expression question
Hey, Quick question for you. I am trying to use cfform to validate for an email address. How do I do? <cfinput type="text" name="cstreet2" ... -
Regular expression newbie question
How do I write "not" in regular expression? I am new to reg exp. I want to check the string which does not contain "http://" I wrote ... -
Regular Expression - BackReferences Question
I have a file containing the following URL in it http://www.somesite.com/folder/1.gif Now, everyday I need to run a script so that 1.gif in the... -
[PHP] Regular expression question
well, first off '>' should not be allowed as a value of an attr="" pair anyways. You should convert it to > or < this will solve that problem.... -
question abour a simple regular expression...
Hi, With the sentence : "Bordeaux est au bord de l'eau" How to do to underline, for instance, the word "eau" ? without underlining the... -
Greg Bacon #2
Re: regular expression question
In article <UbNVa.749247$V9.63522550@amsnews02.chello.com>,
Bart Van Loon <bbbart@kotnet.org> wrote:
: I have a string, looking like 'foobarbarbar'. I don't know what foo
: is, nor what bar is, the only thing I know is that I have a string X,
: concatenated an undefined number of times with a string Y. My goal is
: to find out how many times this string Y (bar) is repeted, without
: knowing what it exactly is.
% cat try
#! /usr/local/bin/perl
use warnings;
use strict;
my $X = 'foo';
my $Y = 'bar';
my $re = qr/^ $X ($Y)+ $/x;
for (qw/ foo foobarbarbar foobar barbarfoo /) {
if (/$re/) {
print "Match for [$_]\n";
}
else {
print "No match for [$_]\n";
}
}
% ./try
No match for [foo]
Match for [foobarbarbar]
Match for [foobar]
No match for [barbarfoo]
You may need to use quotemeta, depending on how you want to match.
Hope this helps,
Greg
--
I'm proud to be a taxpaying American, but I could be just as proud for
half the money!
-- Will Rogers
Greg Bacon Guest
-
Sam Holden #3
Re: regular expression question
On Wed, 30 Jul 2003 10:35:32 GMT, Bart Van Loon <bbbart@kotnet.org> wrote:
What's the point of a regular expression which matches all> Hello,
>
> is it possible to make a regular expression match for the following
> situation:
>
> I have a string, looking like 'foobarbarbar'. I don't know what foo is,
> nor what bar is, the only thing I know is that I have a string X,
> concatenated an undefined number of times with a string Y. My goal is to
> find out how many times this string Y (bar) is repeted, without knowing
> what it exactly is.
>
> Something like ^.*(.*)*$ maybe?
strings (assumming /s is used)?
And do you use (.*)* because you want some extra added slowness?
My naive approach would be to use something like (assumming "undefined
number" means 2 or more - since otherwise it doesn't make sense):
$_='foobarbarbar';
if (/^(.+?)((.+)\3+)$/s) {
print "X is $1\n";
print "Y is $3\n";
print "There are " . length($2)/length($3) . " repetitions of Y\n";
}
But that fails to match. However, /^(foo)((.+)\3+)$/ matches (and is what
I tested the code with).
Since (.+?) can match "foo", I think the first regular expression should
match (after a fair bit of backtracking - guess the second .+ might be
more efficient as .+?, but that's beside the point).
echo "foobarbarbar" | egrep '^(.+)((.+)\3+)$'
outputs "foobarbarbar" (I realise that \1 will be "foobar" and \2 "barbar"
so the match is slightly different), so it seems to do what I would expect.
Is this a bug in perl? Or is my knowedge of regexes faulty?
I suspect the second, but (other than embaressment) it can't hurt to ask.
--
Sam Holden
Sam Holden Guest
-
Janek Schleicher #4
Re: regular expression question
Bart Van Loon wrote at Wed, 30 Jul 2003 10:35:32 +0000:
perl -e '$_ = "foobarbarbar"; /((.*)\2+)$/; print length($1)/length($2)'> is it possible to make a regular expression match for the following
> situation:
>
> I have a string, looking like 'foobarbarbar'. I don't know what foo is,
> nor what bar is, the only thing I know is that I have a string X,
> concatenated an undefined number of times with a string Y. My goal is to
> find out how many times this string Y (bar) is repeted, without knowing
> what it exactly is.
>
> Something like ^.*(.*)*$ maybe?
Greetings,
Janek
Janek Schleicher Guest
-
Jay Flaherty #5
Re: regular expression question
Janek Schleicher wrote:
perl -e '$_ = "foobarbarbarbar"; /((.*)\2+)$/; print length($1)/length($2)'> perl -e '$_ = "foobarbarbar"; /((.*)\2+)$/; print length($1)/length($2)'
returns 2 (barbar x 2)
So it seems to work with odd number of concats only.
Jay
Jay Flaherty Guest
-
Janek Schleicher #6
Re: regular expression question
Jay Flaherty wrote at Thu, 31 Jul 2003 13:14:50 -0400:
No, it only returns prime numbers.> Janek Schleicher wrote:
>>>> perl -e '$_ = "foobarbarbar"; /((.*)\2+)$/; print length($1)/length($2)'
> perl -e '$_ = "foobarbarbarbar"; /((.*)\2+)$/; print length($1)/length($2)'
> returns 2 (barbar x 2)
> So it seems to work with odd number of concats only.
In fact "barbarbarbar" is "barbar" x 2.
And only prime number occurences can't be divided:
But if it is meant that the shortest repeating part should be counted,
you can use instead the non greedy version:
perl -e '$_ = "foobarbarbarbar"; /((.*?)\2+)$/; print length($1)/length($2)'
^
Greetings,
Janek
Janek Schleicher Guest
-
Dan Phiffer #7
Regular expression question
So I want to grab the attributes out of an HTML element. The following
works, except in the case that the attribute's value includes the character
">":
if (preg_match_all("/<tag([^>]*)>/i", $subject, $matches))
print_r($matches);
A $subject of "<tag attr=\"value\">" gives:
Array
(
[0] => Array
(
[0] =>
)
[1] => Array
(
[0] => attr="value"
)
)
A $subject of "<tag attr=\">\">" gives:
Array
(
[0] => Array
(
[0] =>
Thanks for any help,
-Dan
Dan Phiffer Guest
-
Simon Taylor #8
Re: Regular Expression Question
Awrigh01 wrote:
I think it would help if you were to refine the question a litle more.> I have been working with regular expressions and was wondering if
> anyone can help me tackle this problem.
>
> I was wondering if anyone could help me with a regular expression to
> strip the following two text patterns:
>
> Bob Jones Comm'n College, Inc.,
> Tufts University,
>
> I believe that this pattern should strip words that begin with capital
> letters:
> [A-Z]\w*(\s+[A-Z]\w*)*
>
> Could someone help me with a pattern than would get both words that
> begin with capital letters or a non-word character like a "," or "."
Do I understand correctly that you want to strip away lines that begin
with text like:
Bob Jones Comm'n College, Inc.,
Tufts University,
Put more generally, do you want to strip away lines that contain
words that always begin with uppercase letters?
Do these kind of lines occur at the same place in each file?
What kind of data do you *not* want to strip away?
Hope this helps,
Look forward to seeing your follow up.
Simon Taylor
Simon Taylor Guest
-
Awrigh01 #9
Re: Regular Expression Question
Sorry about not being clear. I appreciate the help so far. I am
searching through a text file read through line by line.
I want to be able to identify word patterns that occur consecutively
in the same line, such as: A Man Walks Down The Street. (including the
spaces)
This regular expression identifies this pattern:
[A-Z]\w*(\s+[A-Z]\w*)*.
However, there is another twist, sometimes the pattern will include a
comma, dash, hyphen, an "&", or a period, such as Johnson & Johnson
CauLife Ins. Co.
I can't figure out how to generally identify both "A Man Walks Down
The Street" and "Johnson & Johnson CauLife Ins. Co." in one regular
expression.
I hope this is clearer, and thanks again for any help.
Thanks.
Simon Taylor <simon@unisolve.com.au> wrote in message news:<bhsl32$ni6$1@otis.netspace.net.au>...> Awrigh01 wrote:
>>> > I have been working with regular expressions and was wondering if
> > anyone can help me tackle this problem.
> >
> > I was wondering if anyone could help me with a regular expression to
> > strip the following two text patterns:
> >
> > Bob Jones Comm'n College, Inc.,
> > Tufts University,
> >
> > I believe that this pattern should strip words that begin with capital
> > letters:
> > [A-Z]\w*(\s+[A-Z]\w*)*
> >
> > Could someone help me with a pattern than would get both words that
> > begin with capital letters or a non-word character like a "," or "."
> I think it would help if you were to refine the question a litle more.
>
> Do I understand correctly that you want to strip away lines that begin
> with text like:
>
> Bob Jones Comm'n College, Inc.,
> Tufts University,
>
> Put more generally, do you want to strip away lines that contain
> words that always begin with uppercase letters?
>
> Do these kind of lines occur at the same place in each file?
>
> What kind of data do you *not* want to strip away?
>
> Hope this helps,
>
> Look forward to seeing your follow up.
>
> Simon TaylorAwrigh01 Guest
-
Simon Taylor #10
Re: Regular Expression Question
Awrigh01 wrote:
Here is a snippet of code that matches lines in a file that contain only> Sorry about not being clear. I appreciate the help so far. I am
> searching through a text file read through line by line.
>
> I want to be able to identify word patterns that occur consecutively
> in the same line, such as: A Man Walks Down The Street. (including the
> spaces)
>
> This regular expression identifies this pattern:
> [A-Z]\w*(\s+[A-Z]\w*)*.
>
> However, there is another twist, sometimes the pattern will include a
> comma, dash, hyphen, an "&", or a period, such as Johnson & Johnson
> CauLife Ins. Co.
>
> I can't figure out how to generally identify both "A Man Walks Down
> The Street" and "Johnson & Johnson CauLife Ins. Co." in one regular
> expression.
words that start with an upper-case character, as well the other cases
you've mentioned.
#!/usr/bin/perl -w
use strict;
while (<>) {
print " line: $_";
print "matched: $_" if ($_ =~ m/^([A-Z&]\S*\s*)+$/);
}
and here is the sample data file I used:
Cats
Cats And Dogs
Ignore this line?
A Man Walks Down The Street
Johnson & Johnson CauLife Ins. Co.
johnson & johnson CauLife Ins. Co. - Gets Ignored
Ignore this line, also?
Should, See This Line
Should-see This Line
It works as I think you intend.
However, I can't help but wondering whether we are building a regular
expression that is in danger of being too-specific and therefore a bit
fragile.
Sometimes it pays to think negatively with RE's. Identify the stuff that
you don't want to match and do away with it first. This often results in
code that runs faster and the analysis that it forces you to do helps
clear up what it is you're really after.
Hope this helps.
Also check out
perldoc perlre
Regards,
Simon Taylor
Simon Taylor Guest
-
Purl Gurl #11
Re: Regular Expression Question
Simon Taylor wrote:
(snipped)> Awrigh01 wrote:
> > I want to be able to identify word patterns that occur consecutively
> > in the same line, such as: A Man Walks Down The Street. (including the
> > spaces)> > This regular expression identifies this pattern:
> > [A-Z]\w*(\s+[A-Z]\w*)*.> > However, there is another twist, sometimes the pattern will include a
> > comma, dash, hyphen, an "&", or a period, such as Johnson & Johnson
> > CauLife Ins. Co.> Here is a snippet of code> while (<>) {
> print " line: $_";
> print "matched: $_" if ($_ =~ m/^([A-Z&]\S*\s*)+$/);
> }> However, I can't help but wondering whether we are building a regular
> expression that is in danger of being too-specific and therefore a bit
> fragile.
Yours is a rather robust regex within parameters stated.
I really had to grasp at straws to find a code breaker,
at least reasonably within parameters. Nice regex is yours.
Beneath my signature is a "thinking outside the box" model
which is klunky and certainly less efficient than yours.
Mine is a mathematical model.
This counting method would be much easier if Perl's /regex/
match function could return a count instead of a match.
My method will fail if leading or trailing spaces are present,
or if multiple spaces are present. Within parameters, it does
ok but is not very efficient.
You will note I changed your data around a bit. Your method
and my method return slightly different results. Nonetheless,
your regex is a preferred method along with being easy-to-read.
Purl Gurl
--
#!perl
while (<DATA>)
{
if (($_ =~ /^[A-Z]/) && ($_ =~ tr/ // == (() = $_ =~ / [^a-z]/g)))
{ print $_; }
}
__DATA__
Cats
Cats AND Dogs
Cats - Dogs
Cats-Dogs
Ignore this line?
A Man Walks Down The Street
Johnson & Johnson CauLife Ins. Co.
johnson & johnson CauLife Ins. Co. - Gets Ignored
Ignore this Line, also?
Should, See This Line
Should-see This Line
Purl Gurl Guest
-
Harry Ohlsen #12
Regular expression question
I'm sure this is trivial, but I don't have my Mastering Regular Expressions handy (and I haven't put sufficient effort into getting through it!).
I have a string that looks something like ...
(ie, it's the set of quoted strings between the markers). What I'd like to do is split the string up into an array containing the three quoted items (with or without the quote marks).>>"Guido van Rossum", "Larry Wall", "Matz"<<
At some point, I'd probably also like to use something like \" to represent a quote within a string.
I'm sorry to say that I'm actually having to do this in Java (it's a work thing), so I'll have to do some mangling of whatever the "correct" regex is, but that's OK ... well, it's not, but I'll have to live with it :-).
Cheers,
H.
Harry Ohlsen Guest
-
Woodhouse, Mike (ANTS) #13
Re: Regular expression question
Typing straight from Mr Friedl, this is the java.util.regex version for
CSV...
Pattern pCSVMain = Pattern.compile(
"\\G(?:^|,) \n"+
"(?: \n"+
" # Either a double-quoted field... \n"+
" \" #field's opening quote \n"+
" ( (?> [^\"]*+ ) (?> \"\" [^\"]*+ ) *+ ) \n"+
" \" # field's closing quote \n"+
" # ... or ... \n"+
" | \n"+
" # ... some non-quote/non-comma text ... \n"+
" ( [^\",]*+ ) \n"+
") \n",
Pattern.COMMENTS);
I hope I haven't made any transcription errors; I'm not very
Java-literate...
While I love the book, I sometimes think some kind of "Regexp Cookbook"
might be a big timesaver as I hunt for the thing I need, not being enough of
a regular Regexp user or having enough brainpower to keep stuff in my head.
HTH,
Mike
> -----Original Message-----
> From: Harry Ohlsen [mailto:harryo@qiqsolutions.com]
> Sent: 21 August 2003 08:54
> To: [email]ruby-talk@ruby-lang.org[/email]
> Subject: Regular expression question
>
>
> I'm sure this is trivial, but I don't have my Mastering
> Regular Expressions handy (and I haven't put sufficient
> effort into getting through it!).
>
> I have a string that looks something like ...
>>> >>"Guido van Rossum", "Larry Wall", "Matz"<<
> (ie, it's the set of quoted strings between the markers).
> What I'd like to do is split the string up into an array
> containing the three quoted items (with or without the quote marks).
>
> At some point, I'd probably also like to use something like
> \" to represent a quote within a string.
>
> I'm sorry to say that I'm actually having to do this in Java
> (it's a work thing), so I'll have to do some mangling of
> whatever the "correct" regex is, but that's OK ... well, it's
> not, but I'll have to live with it :-).
>
> Cheers,
>
> H.
>
>
>
************************************************** *************************
This communication (including any attachments) contains confidential information. If you are not the intended recipient and you have received this communication in error, you should destroy it without copying, disclosing or otherwise using its contents. Please notify the sender immediately of the error.
Internet communications are not necessarily secure and may be intercepted or changed after they are sent. Abbey National Treasury Services plc does not accept liability for any loss you may suffer as a result of interception or any liability for such changes. If you wish to confirm the origin or content of this communication, please contact the sender by using an alternative means of communication.
This communication does not create or modify any contract and, unless otherwise stated, is not intended to be contractually binding.
Abbey National Treasury Services plc. Registered Office: Abbey National House, 2 Triton Square, Regents Place, London NW1 3AN. Registered in England under Company Registration Number: 2338548. Regulated by the Financial Services Authority (FSA).
************************************************** *************************
Woodhouse, Mike (ANTS) Guest
-
Xavier Noria #14
Re: Regular expression question
On Thursday 21 August 2003 13:23, [email]dblack@superlink.net[/email] wrote:
Harry asked for support for \"s inside strings as well. For that a bit> Something like this (tweaked as needed) should work:
>
> irb(main):006:0> str
> => "\"Guido van Rossum\", \"Larry Wall\", \"Matz\""
> irb(main):007:0> str.scan(/".+?"/)
> => ["\"Guido van Rossum\"", "\"Larry Wall\"", "\"Matz\""]
more of work is needed:
test = '>>"Guido", "Larry", "Matz", "foo \"bar\""<<'
puts test.scan(/"(?:\\.|[^"\\])*"/)
gives
"Guido"
"Larry"
"Matz"
"foo \"bar\""
A post-process of the result would unescape those \" easily if wanted.
-- fxn
Xavier Noria Guest
-
Harry Ohlsen #15
Re: Regular expression question
Woodhouse, Mike (ANTS) wrote:
Thanks.>Typing straight from Mr Friedl, this is the java.util.regex version for
>CSV...
>
>Pattern pCSVMain = Pattern.compile(
>"\\G(?:^|,) \n"+
>"(?: \n"+
>" # Either a double-quoted field... \n"+
>" \" #field's opening quote \n"+
>" ( (?> [^\"]*+ ) (?> \"\" [^\"]*+ ) *+ ) \n"+
>" \" # field's closing quote \n"+
>" # ... or ... \n"+
>" | \n"+
>" # ... some non-quote/non-comma text ... \n"+
>" ( [^\",]*+ ) \n"+
>") \n",
>Pattern.COMMENTS);
>
>
Whew! I remember reading the (probably perl) equivalent of that when I
made my first pass through the book, but had forgotten how tricky it was
:-).
Thanks for that. I'll give it a go when I get to work this morning.
Not to worry, I'll hunt out my copy of the book over the weekend (it's>I hope I haven't made any transcription errors; I'm not very
>Java-literate...
>
in one of the many boxes of books in the garage since my last move).
Yes, regexes are a classic case where a cookbook would probably be>While I love the book, I sometimes think some kind of "Regexp Cookbook"
>might be a big timesaver as I hunt for the thing I need, not being enough of
>a regular Regexp user or having enough brainpower to keep stuff in my head.
>
invaluable. I must admit, though, that generally I can get a regex for
most things with minimal effort. It's just when you have things like
this where the alternative is splitting the problem into a bunch of
regexes that having a recipe like that would be very very handy.
Cheers,
Harry O.
Harry Ohlsen Guest
-
Harry Ohlsen #16
Re: Regular expression question
This is a multi-part message in MIME format.
--Boundary_(ID_Mbeub26IwuRohfGF9tGirw)
Content-type: text/plain; charset=us-ascii; format=flowed
Content-transfer-encoding: 7BIT
Xavier Noria wrote:
Thanks to both of you. I like that a lot better than the long one from>On Thursday 21 August 2003 13:23, [email]dblack@superlink.net[/email] wrote:
>
>
>>>>Something like this (tweaked as needed) should work:
>>
>> irb(main):006:0> str
>> => "\"Guido van Rossum\", \"Larry Wall\", \"Matz\""
>> irb(main):007:0> str.scan(/".+?"/)
>> => ["\"Guido van Rossum\"", "\"Larry Wall\"", "\"Matz\""]
>>
>>
>Harry asked for support for \"s inside strings as well. For that a bit
>more of work is needed:
>
> test = '>>"Guido", "Larry", "Matz", "foo \"bar\""<<'
> puts test.scan(/"(?:\\.|[^"\\])*"/)
>
>gives
>
> "Guido"
> "Larry"
> "Matz"
> "foo \"bar\""
>
>
>
MRE that Mike posted, but I guess I'll have to try some complex cases to
see that it works 100%.
Cheers,
Harry O.
--Boundary_(ID_Mbeub26IwuRohfGF9tGirw)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Xavier Noria wrote:<br>
<blockquote type="cite" cite="mid200308211858.19963.fxn@hashref.com">
<pre wrap="">On Thursday 21 August 2003 13:23, <a class="moz-txt-link-abbreviated" href="mailto:dblack@superlink.net">dblack@superlin k.net</a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Something like this (tweaked as needed) should work:
irb(main):006:0> str
=> "\"Guido van Rossum\", \"Larry Wall\", \"Matz\""
irb(main):007:0> str.scan(/".+?"/)
=> ["\"Guido van Rossum\"", "\"Larry Wall\"", "\"Matz\""]
</pre>
</blockquote>
<pre wrap=""><!---->
Harry asked for support for \"s inside strings as well. For that a bit
more of work is needed:
test = '>>"Guido", "Larry", "Matz", "foo \"bar\""<<'
puts test.scan(/"(?:\\.|[^"\\])*"/)
gives
"Guido"
"Larry"
"Matz"
"foo \"bar\""
</pre>
</blockquote>
Thanks to both of you. I like that a lot better than the long one from
MRE that Mike posted, but I guess I'll have to try some complex cases
to see that it works 100%.<br>
<br>
Cheers,<br>
<br>
Harry O.<br>
<br>
<br>
</body>
</html>
--Boundary_(ID_Mbeub26IwuRohfGF9tGirw)--
Harry Ohlsen Guest
-
Tad McClellan #17
Re: Regular Expression Question
Mike Flannigan <mikeflan@earthlink.net> wrote:
> Simon Taylor wrote:
>>> print "matched: $_" if ($_ =~ m/^([A-Z&]\S*\s*)+$/);> I just have to ask, what is that '&' in [A-Z&] doing?
Matching an ampersand character.
The whole character class will match a single character, as long
as it is one of those 27 characters.
> Is it just matching anything that starts with '&' in addition
> to A-Z? I suspect not - I suspect it's doing something else.
No, the ampersand is not doing that.
The ampersand and the character class and the anchor is doing that though.
What a component of a regex does depends on the entire regex.
--
Tad McClellan SGML consulting
[email]tadmc@augustmail.com[/email] Perl programming
Fort Worth, Texas
Tad McClellan Guest
-
regular expression question
In the following regular express
$_="BCADeFGHIJKL";
if (/\L[\w]\E/) { print "true"} else { print "false\n" };
if (/\U[\w]\E/) { print "true"} else { print "false\n" };
why the first one print true while the second one print false?
Can someone explain \L \U?
Guest
-
EllenM #19
Regular Expression Question
Hello, I need to have my span tags on one line. I presently have something
like this: <span class='whatever'> text1 text2 </span> when I'd like: <span
class='whatever'>text1 text2</span> I'm using DW MX 2004. Thank you in
advance, Ellen :)
EllenM Guest



Reply With Quote

