Regular expression help

Ask a Question related to PERL Beginners, Design and Development.

  1. #1

    Default REGULAR EXPRESSION HELP

    Hi,

    I'm pretty new to regular expressions. Before, I used to write long-winded
    and buggy segments of code with PHPs string functions to extract text. But I
    want to learn how to use perl reg-ex as it seems useful to know so I ordered
    "Mastering Regular Expressions". But it hasn't come yet so I'm asking for
    help...

    I need to match a pattern, not in a single-line but from a HTML page, which
    obviously has loads of lines. I need to match 2 lines from this HTML page:
    1) <HTML><TITLE>FirstVariable - Second Variable</TITLE></HTML>........
    2) <TABLE><TD><TR>(newline)
    ThirdVariable</TR></TD></TABLE>...

    I tried this code:
    1) preg_match("/<HTML><TITLE>(\S+) - (\S+)</TITLE></HTML>/", $html_page,
    $variables);
    2) preg_match("/<TABLE><TD><TR>\n(\S+)</TR></TD></TABLE>/", $html_page,
    $variables);

    The first 2 variables are matched into the $variables array but not the
    third one. Sometimes when the third one is matched, it starts from where I
    want it to start but takes all the text to the end of the HTML document!

    Any ideas? Is there any characters that I should have escaped that I
    didn't?? All I can think of is that because the first line that I want to
    match is on the FIRST LINE of the html page, that matches. But reg-ex can't
    handle the next line as its way down the page????

    TIA


    John Guest

  2. Similar Questions and Discussions

    1. Regular Expression
      Hi, I am writing a script that parses an html file (which has been retrieved as a scalar by LWP::UserAgent). The script looks for everything in...
    2. Regular expression for both first and last name?
      I'm new to regular expressions, can someone explain to me how I can write one that will check for 2 names, at least, for a name field? Thanks!...
    3. Regular expression bug?
      All of CF's RE functions act in a weird way, contrary to the documentation (both CF's own, and the underlying Java Regex docs). The special...
    4. help on regular expression
      Hi, I need some help on regular expression... i have following in variable $total_count $total_count = "##I USBP 000001 10:38:09(000)...
    5. [PHP] REGULAR EXPRESSION HELP
      John wrote: Your "newline" may be \r\n or \r instead of just \n. -- ---John Holmes... Amazon Wishlist:...
  3. #2

    Default Re: Regular Expression help

    In article <d216aa8d.0307171157.5e63c194@posting.google.com >,
    Joseph <joseph.vanquakebeke@ngc.com> wrote:

    : I am having trouble with a regular expression not working the way I
    : expect it to. My code:
    :
    : [reformatted]
    : if (/^\s*<!--\s*\w*\s*-->|^\s*<!--\s*\w*\s*\w*\s*-->$/) {
    : if (DEBUG) {
    : print "<!-- Some words and spaces --> ",
    : "comment line pattern single line : \n",
    : "\n$_";
    : }
    : # match a multi-line comment that spans 1 or lines.
    : $count++;
    : next;
    : }
    :
    : [snip output]

    In general with Usenet questions, it's almost always better to give a
    complete -- but BRIEF! -- example. The less effort people have to
    expend in understanding your post, the more likely you are to get good
    answers.

    Where did the blank line pattern come from?

    : where are the lines with more than one word. I really do not want to
    : have write a seperate expression for 1 .. N words. I thought the
    : asterisk was supposed to be greedy. I have tried the + but it has the
    : same effect.

    You could write your pattern as

    if (/^\s*<!--\s*[\w\s]*\s*-->$/) {
    ...
    }

    This will look for alternating runs of any length of either whitespace
    or word characters within your comments.

    IMPORTANT: this regular expression will not correctly recognize all XML
    comments and could yield false positives. If you want to parse XML, use
    an XML parser. :-)

    : Can someone explain why this is not working? I think I can fix this
    : if I try to use a character class but I am not sure. I am still
    : reading the camel book to find out.

    The Kleene star has a greedy in perl's regular expression matcher, but
    you're anchoring your match, with a possibility for at most two runs of
    word characters.

    Consider the following comment:

    <!-- a b c -->

    The matcher will try your first alternative (^\s*<!--\s*\w*\s*-->), but
    will only get as far as indicated:

    substring subpattern
    ========= ==========
    BEGIN ^
    SPACES \s*
    <!-- <!--
    SPACE \s*
    a \w*
    SPACE \s*
    b --> FAIL!

    Then it'll try the other alternative (^\s*<!--\s*\w*\s*\w*\s*-->$):

    substring subpattern
    ========= ==========
    BEGIN ^
    SPACES \s*
    <!-- <!--
    SPACE \s*
    a \w*
    SPACE \s*
    b \w*
    SPACE \s*
    c --> FAIL!

    With the pattern I gave (/^\s*<!--\s*[\w\s]*\s*-->$/), it'll match:


    substring subpattern
    ========= ==========
    BEGIN ^
    SPACES \s*
    <!-- <!--
    SPACE \s*
    a b c [\w\s]*
    SPACE \s*
    --> -->
    END $

    Hope this helps,
    Greg
    --
    The whole aim of practical politics is to keep the populace alarmed -- and
    thus clamorous to be led to safety -- by menacing it with an endless
    series of hobgoblins, all of them imaginary.
    -- H.L. Mencken
    Greg Bacon Guest

  4. #3

    Default Regular Expression Help

    Hi everyone. I have never worked with regular expressions
    before and here is my dillema:

    I have a textbox that I want to only accept this type of
    input:

    1,2,3-8,10,16 ( Comma separated numbers or Hyphenated
    ranges )

    I want to use a validator for this, so I assume I use a
    regularexpressionvalidator?? If so any help on what I
    would write for a regular expression?? Thank you


    Jorell
    Jorell Guest

  5. #4

    Default Re: Regular Expression Help

    Well, you can interrogate it on a number of levels - it really depends on
    how thoroughly you want to make your validation.

    I mean, you could validate the characters themselves:

    ^[0-9\-,]*$

    What this means: you will have zero or more of the characters 0 through 9, a
    dash, or a comma. This would filter out any other character, however, you
    would then approve the string ,-,,-,,-9-,,

    If you want to get a little bit closer, here is a better regex:

    ^([0-9]+(\-[0-9]+)*,*)+$

    Now, you are required to have one or more numbers, followed by zero or more
    instances of a dash followed by one or more numbers, followed by zero or
    more instances of a comma - this entire sequence being repeated one or more
    times.

    Now, you will match 1,2-9,13. You also guarantee that you will get at least
    one number put in. However, this doesn't prevent you from putting in 9-2,
    which may throw your parser off. It will also validate 0-2-4.

    So, you have to sit and play with exactly what you will allow, and gradually
    keep expanding what you are doing until you have a great validation in place
    that covers every instance that you can come up with.

    If your parser is very particular about how the input comes in, then you may
    want to write a custom validator in code.

    --
    Chris Jackson
    Software Engineer
    Microsoft MVP - Windows XP
    Windows XP Associate Expert
    --
    "Jorell" <zwarichj@nospam.powervisionsw.com> wrote in message
    news:028301c3552b$a0ff81c0$a101280a@phx.gbl...
    > Hi everyone. I have never worked with regular expressions
    > before and here is my dillema:
    >
    > I have a textbox that I want to only accept this type of
    > input:
    >
    > 1,2,3-8,10,16 ( Comma separated numbers or Hyphenated
    > ranges )
    >
    > I want to use a validator for this, so I assume I use a
    > regularexpressionvalidator?? If so any help on what I
    > would write for a regular expression?? Thank you
    >
    >
    > Jorell

    Chris Jackson Guest

  6. #5

    Default Re: Regular Expression Help

    Thank you very much!!! Do you happen to know where a
    person can learn a little more about creating those??
    Thanks in advance!!

    Jorell

    >-----Original Message-----
    >Well, you can interrogate it on a number of levels - it
    really depends on
    >how thoroughly you want to make your validation.
    >
    >I mean, you could validate the characters themselves:
    >
    >^[0-9\-,]*$
    >
    >What this means: you will have zero or more of the
    characters 0 through 9, a
    >dash, or a comma. This would filter out any other
    character, however, you
    >would then approve the string ,-,,-,,-9-,,
    >
    >If you want to get a little bit closer, here is a better
    regex:
    >
    >^([0-9]+(\-[0-9]+)*,*)+$
    >
    >Now, you are required to have one or more numbers,
    followed by zero or more
    >instances of a dash followed by one or more numbers,
    followed by zero or
    >more instances of a comma - this entire sequence being
    repeated one or more
    >times.
    >
    >Now, you will match 1,2-9,13. You also guarantee that you
    will get at least
    >one number put in. However, this doesn't prevent you from
    putting in 9-2,
    >which may throw your parser off. It will also validate 0-
    2-4.
    >
    >So, you have to sit and play with exactly what you will
    allow, and gradually
    >keep expanding what you are doing until you have a great
    validation in place
    >that covers every instance that you can come up with.
    >
    >If your parser is very particular about how the input
    comes in, then you may
    >want to write a custom validator in code.
    >
    >--
    >Chris Jackson
    >Software Engineer
    >Microsoft MVP - Windows XP
    >Windows XP Associate Expert
    >--
    >"Jorell" <zwarichj@nospam.powervisionsw.com> wrote in
    message
    >news:028301c3552b$a0ff81c0$a101280a@phx.gbl...
    >> Hi everyone. I have never worked with regular
    expressions
    >> before and here is my dillema:
    >>
    >> I have a textbox that I want to only accept this type of
    >> input:
    >>
    >> 1,2,3-8,10,16 ( Comma separated numbers or Hyphenated
    >> ranges )
    >>
    >> I want to use a validator for this, so I assume I use a
    >> regularexpressionvalidator?? If so any help on what I
    >> would write for a regular expression?? Thank you
    >>
    >>
    >> Jorell
    >
    >
    >.
    >
    Jorell Guest

  7. #6

    Default Re: Regular Expression Help

    Here are a couple articles to help you get going.

    Look closely at the EvaluationFunction attribute and the EvaluateIsValid
    method (or function)

    [url]http://www.codeproject.com/useritems/DateValidator.asp[/url]

    [url]http://www.c-sharpcorner.com/Code/2002/May/DateCustomValidator.asp[/url]

    HTH,

    bill

    "Jorell" <zwaricgj@nospam.powervisionsw.com> wrote in message
    news:029e01c35541$2d4a5ff0$a401280a@phx.gbl...
    > Thank you very much!!! Do you happen to know where a
    > person can learn a little more about creating those??
    > Thanks in advance!!
    >
    > Jorell
    >
    >
    > >-----Original Message-----
    > >Well, you can interrogate it on a number of levels - it
    > really depends on
    > >how thoroughly you want to make your validation.
    > >
    > >I mean, you could validate the characters themselves:
    > >
    > >^[0-9\-,]*$
    > >
    > >What this means: you will have zero or more of the
    > characters 0 through 9, a
    > >dash, or a comma. This would filter out any other
    > character, however, you
    > >would then approve the string ,-,,-,,-9-,,
    > >
    > >If you want to get a little bit closer, here is a better
    > regex:
    > >
    > >^([0-9]+(\-[0-9]+)*,*)+$
    > >
    > >Now, you are required to have one or more numbers,
    > followed by zero or more
    > >instances of a dash followed by one or more numbers,
    > followed by zero or
    > >more instances of a comma - this entire sequence being
    > repeated one or more
    > >times.
    > >
    > >Now, you will match 1,2-9,13. You also guarantee that you
    > will get at least
    > >one number put in. However, this doesn't prevent you from
    > putting in 9-2,
    > >which may throw your parser off. It will also validate 0-
    > 2-4.
    > >
    > >So, you have to sit and play with exactly what you will
    > allow, and gradually
    > >keep expanding what you are doing until you have a great
    > validation in place
    > >that covers every instance that you can come up with.
    > >
    > >If your parser is very particular about how the input
    > comes in, then you may
    > >want to write a custom validator in code.
    > >
    > >--
    > >Chris Jackson
    > >Software Engineer
    > >Microsoft MVP - Windows XP
    > >Windows XP Associate Expert
    > >--
    > >"Jorell" <zwarichj@nospam.powervisionsw.com> wrote in
    > message
    > >news:028301c3552b$a0ff81c0$a101280a@phx.gbl...
    > >> Hi everyone. I have never worked with regular
    > expressions
    > >> before and here is my dillema:
    > >>
    > >> I have a textbox that I want to only accept this type of
    > >> input:
    > >>
    > >> 1,2,3-8,10,16 ( Comma separated numbers or Hyphenated
    > >> ranges )
    > >>
    > >> I want to use a validator for this, so I assume I use a
    > >> regularexpressionvalidator?? If so any help on what I
    > >> would write for a regular expression?? Thank you
    > >>
    > >>
    > >> Jorell
    > >
    > >
    > >.
    > >

    William F. Robertson, Jr. Guest

  8. #7

    Default Regular expression help

    Hi,

    I tried to write a script to extrat data from the given DATA but I can find
    the right regular expression to do that.
    RULE: I need to catch everything between quotes (single or double) and if
    inside exists a repeated quote (single or double) it is not seen as end of
    the match.

    #!/usr/bin/perl -w

    use strict;

    my $begin = '[\",\']'; #
    my $end = '[\",\']'; #
    my $pattern = "${begin}(.*?)${end}"

    while (<DATA>) {
    print "line content $_ \n"
    while ( /$pattern/g ){
    print "line $. found : $1\n"
    }

    }


    __DATA__
    ' data11 '' data12' 'data13'
    " data21 "" data22" "data23"
    " data31 '' data32" "" "data34"
    "data41" "data42" "data43" "__--data44"
    """" "''" '""' ''''



    Result should be
    1:( data11 '' data12)
    2:(data13)
    3:( data21 "" data22)
    4:(data23)
    5:(data31 '' data32)
    6:()
    7:(data34)
    8:(data41)
    9:(data42)
    10:(data43)
    11:(__--data44)
    12:("")
    13:('')
    14:("")
    15:('')


    It could be resolved by replacing the \"{2} or \'{2} by a "~@~" and
    alterwards relacing it back to the previous value but it is not very smart.



    great thanks in advance

    Michel
    Eurospace Szarindar Guest

  9. #8

    Default RE: Regular expression help


    Thanks you, it works fine. Could you explain me why have you added the \1 ?
    I will have a look at Text::CSV

    Michel

    -----Message d'origine-----
    De: Rob Anderson [mailto:rjanderson@uk2.net]
    Date: lundi 18 août 2003 16:39
    À: [email]beginners@perl.org[/email]
    Objet: Re: Regular expression help



    "Eurospace Szarindar" <EUROSPACE.SZARINDAR@launchers.eads.net> wrote in
    message news:F3174A44D16DE7478BEFD32644C5C7D601A5D018@mspc 5748.bs.as....
    > Hi,
    >
    > I tried to write a script to extrat data from the given DATA but I can
    find
    > the right regular expression to do that.
    Hi Szarindar,

    Firstly why are you using this format and trying to parse it yourself? If
    you've got control of your data, use something like Text::CSV. The guy who
    maintains your code will thank you for it.

    otherwise, this seems to work (you'll have to print with $2 though)....

    /(['"])(((''|"")|[^'"])*?)\1(\s|$)/g

    HTH

    Rob Anderson
    > RULE: I need to catch everything between quotes (single or double) and if
    > inside exists a repeated quote (single or double) it is not seen asend of
    > the match.
    >
    > #!/usr/bin/perl -w
    >
    > use strict;
    >
    > my $begin = '[\",\']'; #
    > my $end = '[\",\']'; #
    > my $pattern = "${begin}(.*?)${end}"
    >
    > while (<DATA>) {
    > print "line content $_ \n"
    > while ( /$pattern/g ){
    > print "line $. found : $1\n"
    > }
    >
    > }
    >
    >
    > __DATA__
    > ' data11 '' data12' 'data13'
    > " data21 "" data22" "data23"
    > " data31 '' data32" "" "data34"
    > "data41" "data42" "data43" "__--data44"
    > """" "''" '""' ''''
    >
    >
    >
    > Result should be
    > 1:( data11 '' data12)
    > 2:(data13)
    > 3:( data21 "" data22)
    > 4:(data23)
    > 5:(data31 '' data32)
    > 6:()
    > 7:(data34)
    > 8:(data41)
    > 9:(data42)
    > 10:(data43)
    > 11:(__--data44)
    > 12:("")
    > 13:('')
    > 14:("")
    > 15:('')
    >
    >
    > It could be resolved by replacing the \"{2} or \'{2} by a "~@~" and
    > alterwards relacing it back to the previous value but it is not very
    smart.
    >
    >
    >
    > great thanks in advance
    >
    > Michel


    --
    To unsubscribe, e-mail: [email]beginners-unsubscribe@perl.org[/email]
    For additional commands, e-mail: [email]beginners-help@perl.org[/email]
    Eurospace Szarindar Guest

  10. #9

    Default Re: Regular expression help

    >
    >"Eurospace Szarindar" <EUROSPACE.SZARINDAR@launchers.eads.net> wrote in
    message >news:F3174A44D16DE7478BEFD32644C5C7D601A5D01C@msp c5748.bs.as...
    >
    >Thanks you, it works fine. Could you explain me why have you added the \1 ?
    Hi,

    A quick breakdown...


    /(['"])(((''|"")|[^'"])*?)\1(\s|$)/g
    ^^^^^^
    1 ^^^^^^^^^^^^^^^^
    2
    ^^
    3


    1) This is picking up either a single or double quote

    2) This is scoping up any quotes pairs, and is being carefull not
    to match quotes on thier own.

    3) In a regex \1, \2 etc.. will match the same sequence of characters
    matched
    earlier in the same-numbered pair of parentheses, in this case, it means
    that were making sure the opening and closing quotes are the same by
    mathching
    that same character as in part 1.

    Explaining it like this has made me spot a problem. The regex you really
    want is

    m/(['"])(((''|"")|[^\1])*?)\1(\s|$)/g

    This way our 'scoping up' part matches any paired quotes or anything other
    than
    our closing quote.


    The following test data wouldn't have worked...

    ' data11 '' dat"a12' 'data13'
    >
    >I will have a look at Text::CSV
    >
    Please do, my mistake above shows you why you shouldn't reinvent this stuff
    if you can
    get away with it. :-)

    Ta

    Rob

    >
    >Michel
    >
    >-----Message d'origine-----
    >De: Rob Anderson [mailto:rjanderson@uk2.net]




    Rob Anderson Guest

  11. #10

    Default Regular Expression Help

    Hi,
    here is the scenario:

    I have a variable containing the following:
    <table cellpadding="0" cellspacing="0">
    <tr>
    <td class="footer">Ottiliavej 9</td>
    <td class="footer">telefon 092323</td></tr>
    <tr><td class="footer">DK-2500 Valby</td>
    <td class="footer">fax 092323</td>
    </tr>
    </table>

    A regular expression that gives me the text(sometext) between the <td
    ....>sometext</td>
    tags ?
    I have tried all of my own knowledge og regular expression (which by
    the way is very little). Any help will be appeciated.

    Thx
    Saya
    Saya Guest

  12. #11

    Default Re: Regular Expression Help

    Saya (vahu@novonordisk.com) wrote on MMMDCXXXIX September MCMXCIII in
    <URL:news:9e9517bf.0308181124.19a9c3d5@posting.goo gle.com>:
    ^^ Hi,
    ^^ here is the scenario:
    ^^
    ^^ I have a variable containing the following:
    ^^ <table cellpadding="0" cellspacing="0">
    ^^ <tr>
    ^^ <td class="footer">Ottiliavej 9</td>
    ^^ <td class="footer">telefon 092323</td></tr>
    ^^ <tr><td class="footer">DK-2500 Valby</td>
    ^^ <td class="footer">fax 092323</td>
    ^^ </tr>
    ^^ </table>
    ^^
    ^^ A regular expression that gives me the text(sometext) between the <td
    ^^ ...>sometext</td>
    ^^ tags ?
    ^^ I have tried all of my own knowledge og regular expression (which by
    ^^ the way is very little). Any help will be appeciated.

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Regexp::Common;

    $_ = <<'--';
    <table cellpadding="0" cellspacing="0">
    <tr>
    <td class="footer">Ottiliavej 9</td>
    <td class="footer">telefon 092323</td></tr>
    <tr><td class="footer">DK-2500 Valby</td>
    <td class="footer">fax 092323</td>
    </tr>
    </table>
    --

    while (m!$RE{balanced}{-begin => '<td'}{-end => '</td>'}{-keep}!g) {
    print "$1\n";
    }
    __END__
    <td class="footer">Ottiliavej 9</td>
    <td class="footer">telefon 092323</td>
    <td class="footer">DK-2500 Valby</td>
    <td class="footer">fax 092323</td>


    This will take of nested elements as well. However, it won't deal
    with cases like:

    <td><img src = 'foo.gif' alt = '</td>'></td>


    Abigail
    --
    perl -wle'print"Êõóô*áîïôèåò*Ðåòì*Èáãëåò"^"\x80"x24'
    Abigail Guest

  13. #12

    Default Re: Regular Expression Help

    Saya <vahu@novonordisk.com> wrote:
    > Hi,
    > here is the scenario:
    >
    > I have a variable containing the following:
    > <table cellpadding="0" cellspacing="0">
    > <tr>
    > <td class="footer">Ottiliavej 9</td>
    > <td class="footer">telefon 092323</td></tr>
    > <tr><td class="footer">DK-2500 Valby</td>
    > <td class="footer">fax 092323</td>
    > </tr>
    > </table>
    >
    > A regular expression that gives me the text(sometext) between the <td
    > ...>sometext</td>
    > tags ?
    > I have tried all of my own knowledge og regular expression (which by
    > the way is very little). Any help will be appeciated.
    Here is one way (assuming the html is stored in $html):

    my @tds = $html =~ m#<td class="footer">(.*)</td>#g;

    ( Matching (=~) has higher precedence than assignment (=), so
    the list of captured items by matching $html against the regex
    is put into the array @tds. )

    To handle cells with newlines, for example

    <td class="footer">telefon
    092323</td></tr>

    you could make the period match newlines (add the modifier 's'), and
    then use minimal matching (a questionmark after the quantifier makes it
    eat as few characters as possible, ie. find the first '</td>' instead of
    the last) thus:

    my @tds = $html =~ m#<td class="footer">(.*?)</td>#sg;

    Some references:

    perldoc perlrequick (gentle intro to regexes)
    perldoc perlre (the full story)
    perldoc -q html (search faqs for html,
    could be of interest)

    --
    Vlad

    Vlad Tepes Guest

  14. #13

    Default RE: Regular expression help


    Hi Rob,


    You are totally right your exemple (' data11 '' dat"a12' 'data13') should
    raise an error and this is normal. I will nethertheless look carefully for
    the Text::CSV module.

    Thanks

    Michel




    -----Message d'origine-----
    De: Rob Anderson [mailto:rjanderson@uk2.net]
    Date: lundi 18 août 2003 17:34
    À: [email]beginners@perl.org[/email]
    Objet: Re: Regular expression help

    >
    >"Eurospace Szarindar" <EUROSPACE.SZARINDAR@launchers.eads.net> wrotein
    message >news:F3174A44D16DE7478BEFD32644C5C7D601A5D01C@msp c5748.bs.as....
    >
    >Thanks you, it works fine. Could you explain me why have you added the \1 ?
    Hi,

    A quick breakdown...


    /(['"])(((''|"")|[^'"])*?)\1(\s|$)/g
    ^^^^^^
    1 ^^^^^^^^^^^^^^^^
    2
    ^^
    3


    1) This is picking up either a single or double quote

    2) This is scoping up any quotes pairs, and is being carefull not
    to match quotes on thier own.

    3) In a regex \1, \2 etc.. will match the same sequence of characters
    matched
    earlier in the same-numbered pair of parentheses, in this case, itmeans
    that were making sure the opening and closing quotes are the same by
    mathching
    that same character as in part 1.

    Explaining it like this has made me spot a problem. The regex you really
    want is

    m/(['"])(((''|"")|[^\1])*?)\1(\s|$)/g

    This way our 'scoping up' part matches any paired quotes or anything other
    than
    our closing quote.


    The following test data wouldn't have worked...

    ' data11 '' dat"a12' 'data13'
    >
    >I will have a look at Text::CSV
    >
    Please do, my mistake above shows you why you shouldn't reinvent thisstuff
    if you can
    get away with it. :-)

    Ta

    Rob

    >
    >Michel
    >
    >-----Message d'origine-----
    >De: Rob Anderson [mailto:rjanderson@uk2.net]





    --
    To unsubscribe, e-mail: [email]beginners-unsubscribe@perl.org[/email]
    For additional commands, e-mail: [email]beginners-help@perl.org[/email]
    Eurospace Szarindar Guest

  15. #14

    Default Regular Expression Help

    Hi, i've been trying to create a regex to do the following.
    Ive got scripts which enter japanese and chinese into a database. A
    part of this involves them entering furigana style inputs using a
    simplified tag system.

    {R:chacaters, furgana}
    do regex to translate into

    <RUBY><RB>$1</RB><RT>$2</RT></RUBY>

    $temp = preg_replace("/({R:)(.*)(,)(.*)(})/",
    "<RUBY><RB>$2</RB><RT>$4</RT></RUBY>", $source);

    That is what i came up with, but it's too greedy. If i have more than
    1 encoded set, it just grabs the longest match it can find. I'm newish
    to this regex syuff.

    Thanks

    J


    Jamie
    [email]jamien@interworx.com.au[/email]
    Jamie Guest

  16. #15

    Default Re: Regular Expression Help

    Jamie wrote:
    > That is what i came up with, but it's too greedy.
    You may invert the greediness of quantifiers using question marks:

    `(a.*?b)`

    Or, set the U pattern modifier, which inverts the greediness of
    all quantifiers:

    `(a.*b)`U

    Or, invert the greediness internally:

    `((?U)a.*b)`

    The regular expressions above are semantically identical. All of
    them capture a letter "a" followed by anything (except newlines,
    by default) until the first occurrence of a letter "b".

    --
    Jock
    John Dunlop Guest

  17. #16

    Default Re: Regular Expression Help

    On Mon, 6 Oct 2003 16:28:31 +0100, John Dunlop
    <john+usenet@johndunlop.info> wrote:
    >Jamie wrote:
    >
    >> That is what i came up with, but it's too greedy.
    >
    >You may invert the greediness of quantifiers using question marks:
    >
    >`(a.*?b)`
    >
    >Or, set the U pattern modifier, which inverts the greediness of
    >all quantifiers:
    >
    >`(a.*b)`U
    >
    >Or, invert the greediness internally:
    >
    >`((?U)a.*b)`
    >
    >The regular expressions above are semantically identical. All of
    >them capture a letter "a" followed by anything (except newlines,
    >by default) until the first occurrence of a letter "b".
    Hi Thanks!
    I've almost got it now:

    preg_replace("/({R:)(.*?,)(.*?})/",
    "<RUBY><RB>$2</RB><RT>$3</RT></RUBY>", $temp);

    But, i can't get it quite right (easliy that is). This captures
    everything up to and including the , and } So how do i capture
    everything upto but not including?
    Right now, i can kind of strip the commas and curlies from the text,
    but this won't work for the large block text bits.

    Thanks
    Jamie :-)

    Jamie
    [email]jamien@interworx.com.au[/email]
    Jamie Guest

  18. #17

    Default Re: Regular Expression Help

    Jamie wrote:
    > preg_replace("/({R:)(.*?,)(.*?})/",
    > "<RUBY><RB>$2</RB><RT>$3</RT></RUBY>", $temp);
    >
    > But, i can't get it quite right (easliy that is). This captures
    > everything up to and including the , and } So how do i capture
    > everything upto but not including?
    Place the characters you don't want captured outside the
    subpatterns' parentheses. Giving you:

    $temp = preg_replace(
    '`{R:(.*?),(.*?)}`',
    '<RUBY><RB>$1</RB><RT>$2</RT></RUBY>',
    $temp)
    > Right now, i can kind of strip the commas and curlies from the
    > text, but this won't work for the large block text bits.
    The dot metacharacter doesn't match newlines by default. Thus, if
    the text spans multiple lines, it won't match. If the s pattern
    modifier (PCRE_DOTALL) is set, the dot matches newlines too.

    [url]http://www.php.net/manual/en/pcre.pattern.modifiers.php[/url]

    --
    Jock
    John Dunlop Guest

  19. #18

    Default Regular Expression Help

    Does anyone know of a simple way that I could convert the following using a
    regular expression???

    CONVERT THIS:

    <item>
    <title>Some link title</title>
    <description>description text here</description>
    <link>http://www.example.com</link>
    </item>

    INTO THIS:

    <li>
    <div class="t"><a href="http://www.example.com">Some link title</a></div>
    <span class="d">description text here</span>
    </li>

    OR VICE VERSA.

    wiseduck Guest

  20. #19

    Default Re: Regular Expression Help

    Looks like an XML document to me. I haven't used them myself, but do some
    research on XMLParse and all the other XML functions. That would probably be a
    better solution in the long run than using regular expressions.

    MW

    mattw Guest

  21. #20

    Default Re: Regular Expression Help

    Check out XmlTransform() and XSLT (I'm assuming this is an xml document).

    Mike
    mrampson Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139