Professional Web Applications Themes

Regex for numbers and text - PERL Beginners

Hi! I am trying to setup a single regex to breakdown the following lines: Jerry 2.7 4 4.5 mon Mark -14 -10.75 -10 new With /(\w+)\s+(-?\d+.\d+)\s+(-?\d+.\d+)\s+(-?\d+.\d+)\s+(\w+)/; What am I doing wrong? Thanks, Jerry...

  1. #1

    Default Regex for numbers and text

    Hi!

    I am trying to setup a single regex to breakdown the following lines:

    Jerry 2.7 4 4.5 mon
    Mark -14 -10.75 -10 new

    With

    /(\w+)\s+(-?\d+.\d+)\s+(-?\d+.\d+)\s+(-?\d+.\d+)\s+(\w+)/;

    What am I doing wrong?

    Thanks,

    Jerry




    Jerry Guest

  2. #2

    Default Re: Regex for numbers and text

    Jerry Preston wrote: 

    At first glance, the regex appears to be fine, although I could just be missing
    something. What is your code and what happens instead of what you expect?

    --
    Andrew Gaffney
    Network Administrator
    Skyline Aeronautics, LLC.
    636-357-1548

    Andrew Guest

  3. #3

    Default Re: Regex for numbers and text

    Jerry Preston wrote: 

    You are not showing us a complete program that generates some other
    output than the output you were expecting.

    A couple of observations besides that:

    - Not all numbers at those lines include digits before and after a
    decimal point.

    - The '.' character has a special meaning when used in a regex outside
    a character class, and should therefore be escaped.

    This code:

    (-?\d+(?:\.\d+)?)

    would match any of those numbers.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  4. #4

    Default Re: Regex for numbers and text

    On 7/11/2004 1:07 AM, Gunnar Hjalmarsson wrote: 
    >
    >
    > You are not showing us a complete program that generates some other
    > output than the output you were expecting.
    >
    > A couple of observations besides that:
    >
    > - Not all numbers at those lines include digits before and after a
    > decimal point.
    >
    > - The '.' character has a special meaning when used in a regex outside a
    > character class, and should therefore be escaped.
    >
    > This code:
    >
    > (-?\d+(?:\.\d+)?)
    >
    > would match any of those numbers.
    >[/ref]

    Better yet, use Regexp::Common:

    use Regexp::Common qw(number);
    /(\w+)(?:\s+$RE{num}{real}){3,3}\s+(\w+)/;

    Randy Guest

  5. #5

    Default Re: Regex for numbers and text

    Randy W. Sims wrote: 
    >
    > Better yet, use Regexp::Common:
    >
    > use Regexp::Common qw(number);
    > /(\w+)(?:\s+$RE{num}{real}){3,3}\s+(\w+)/;[/ref]

    That's an alternative, but would it necessarily be better? To me,
    using a module for such a trivial thing just creates another level of
    abstraction without actually making it easier.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  6. #6

    Default Re: Regex for numbers and text

    >>>>> "Gunnar" == Gunnar Hjalmarsson <cc> writes:
     [/ref]

    Gunnar> That's an alternative, but would it necessarily be better? To me,
    Gunnar> using a module for such a trivial thing just creates another level of
    Gunnar> abstraction without actually making it easier.

    Can we agree to disagree on that?

    Once you understand Regex::Common, it becomes a nice abstraction.
    Having said that, I don't yet. :)

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <com> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    Randal Guest

  7. #7

    Default Re: Regex for numbers and text

    Randal L. Schwartz wrote: 
    >>
    >> That's an alternative, but would it necessarily be better? To me,
    >> using a module for such a trivial thing just creates another
    >> level of abstraction without actually making it easier.[/ref]
    >
    > Can we agree to disagree on that?[/ref]

    Sure.
     

    Please feel free to like any module you don't understand. ;-)

    I for one concentrate on learning - and practicing - the basic regex
    syntax. That way I get better prepared when I need to write less
    common regexes.

    But that's me. I'm not claiming that Regex::Common is not a useful
    tool for those who like it. I'm rather saying that it should better be
    considered a matter of personal preferences, rather than a matter of
    preferred programming style.

    Can we possibly agree on that?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  8. #8

    Default Module reuse (was: Re: Regex for numbers and text)

    Gunnar Hjalmarsson wrote: 
    >>
    >>
    >> Can we agree to disagree on that?[/ref]
    >
    >
    > Sure.

    >
    >
    > Please feel free to like any module you don't understand. ;-)
    >
    > I for one concentrate on learning - and practicing - the basic regex
    > syntax. That way I get better prepared when I need to write less
    > common regexes.
    >
    > But that's me. I'm not claiming that Regex::Common is not a useful
    > tool for those who like it. I'm rather saying that it should better be
    > considered a matter of personal preferences, rather than a matter of
    > preferred programming style.
    >
    > Can we possibly agree on that?[/ref]

    I understand your argument. It is very important to understand what is
    going on, to understand regexs well enough to construct ones like we're
    discussing now. In fact, I am an old school programmer with background
    in C & C++ and various other compiled & scripting languages. In
    generall, I believe that it is important to know how things are done
    under the hood as well so that you can know and evaluate the costs of
    using certain features & algorithms. I very much understand and agree
    with you that being able to understand and construct regexs is essential
    and that the use of modules should not be an excuse to avoid learning.

    However, I do think taking advantage of modules such as Regexp::Common
    is also essential. The reason are the basically the same reasons we
    practice modular, object-oriented, and other programming methodologies:
    To isolate repeatedly used pieces of code so that it's use is
    consistent, changes & fixes are isolated, etc.

    For example, let's say that over the course of a few years that you
    write several related scripts to munge a certain data store. Each time
    you write a script you have to construct a regex like the one discussed,
    to match numbers. Because of experience, learning, or even a mood swing
    on the day you write one of the scripts it's possible that you may use
    similar but subtlely different regexs. This can lead to subtle problems
    and not-so-subtle hair pulling.

    I recommend modules like Regexp::Common because they consolidate and
    isolate common code in reusable chunks. It helps you write consistent
    code and code that is consistent with other authors. It isolates change
    so that if it doesn't behave the way you want, you can change that
    behaviour in a single place. If there is a bug, you know were to look.
    And it's great to know that if there is a bug, you can blame it on
    someone else. ;-)

    I've used or played with many modules. I've got one machine that
    probably has about half of CPAN installed. I use it to test and evaluate
    modules, or just to take a peek at anything that looks like it might be
    interesting. There are very few modules that I highly recommend that are
    not either already part of core perl or specialized. Regexp::Common is
    one exception. IMO, it belongs in the core, and I don't say that
    lightly--I'm a minimalist. But, it solves a very popular subset of
    problems in a consistent, robust, and flexible way.

    If that's not enough, let me point out that parsing numbers is not as
    straightforward as it might seem. Some of the things you must consider
    are: do you allow decimal? Is the decimal a comma or period? Do you
    allow numbers with no digit before the decimal? Do you allow
    negiative/positive indicators? Do you allow scientific notation? What
    about grouping (eg. 123,456.05)? What is the grouping symbol? How many
    digits in each grouping? etc. There is a lot of variation to consider.
    And it's already been considered by someone else in Regexp::Common.

    Regards,
    Randy.
    Randy Guest

  9. #9

    Default Re: Module reuse

    Randy W. Sims wrote: 
    >
    > I understand your argument. It is very important to understand what
    > is going on, to understand regexs well enough to construct ones
    > like we're discussing now.[/ref]

    Okay.
     

    Now it comes... ;-)
     

    The concept of code reuse. To avoid possible misunderstandings, I do
    agree that code reuse is also essential. I'm not arguing against code
    reuse in general; reusing code by using modules makes very much sense.
    But personally I'm applying a 'threshold' before considering the use
    of modules: I prefer to code trivial things myself.

    I noticed that you said "however"; in other words you agree that there
    is a contradiction between learning the basics and having modules do
    the basics for you.

    The OP in the thread that triggered this discussion has obviously not
    yet learned the basics about regexes. Considering that, what's the
    better advice? Is it to

    - Point him to a module, or

    - Point him in the right direction with the aim to helping him
    understand what mistakes he did, and helping him improve his skills
    with respect to regexes?

    (You can of course do both.)

    After all, this is a beginner level list, and I suppose you'd better
    learn how to walk before trying to run. :)
     

    Indeed there are a few variations to consider. And you can't skip
    those considerations even if you use Regexp::Common, but you must know
    what you are doing in order to pick the right method(s) and use the
    right RC syntax. Just like you must know what you are doing in order
    to write the regex correctly, if you choose to take that route.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  10. #10

    Default Re: Regex for numbers and text

    On Jul 11, Gunnar Hjalmarsson said:
     
    >
    >You are not showing us a complete program that generates some other
    >output than the output you were expecting.[/ref]

    But we can see that his text does NOT match his regex.
     

    I tend to write that as /(-?\d+\.?\d*)/, but be aware that this doesn't
    match numbers like .52 or .9, because they don't have digits BEFORE the
    decimal point.

    --
    Jeff "japhy" Pinyan % How can we ever be the sold short or
    RPI Acacia Brother #734 % the cheated, we who for every service
    http://japhy.perlmonk.org/ % have long ago been overpaid?
    http://www.perlmonks.org/ % -- Meister Eckhart

    Jeff Guest

  11. #11

    Default Re: Regex for numbers and text

    On Jul 10, Jerry Preston said:
     

    Have you considered just using split()?

    my fields = split;
    # or
    my ($name, $x, $y, $z, $whatever) = split;

    --
    Jeff "japhy" Pinyan % How can we ever be the sold short or
    RPI Acacia Brother #734 % the cheated, we who for every service
    http://japhy.perlmonk.org/ % have long ago been overpaid?
    http://www.perlmonks.org/ % -- Meister Eckhart

    Jeff Guest

  12. #12

    Default Re: Regex for numbers and text (ANS)

    Jerry,

    Jerry> What am I doing wrong?

    Such an open question. I assume you mean what is wrong with your code. To
    answer, it isn't doing what you want it to do :-)

    Jerry> I am trying to setup a single regex
    Jerry> to breakdown the following lines:
    Jerry>
    Jerry> Jerry 2.7 4 4.5 mon
    Jerry> Mark -14 -10.75 -10 new
    Jerry>
    Jerry>
    Jerry> /(\w+)\s+(-?\d+.\d+)\s+(-?\d+.\d+)\s+(-?\d+.\d+)\s+(\w+)/;

    At first blush, you're going to run into problems when the number you're
    trying to capture is a whole number without a decimal place holder (#.00).
    Take a look at your expression, `(-?\d+.\d+)` in row-1 column-3 you have a
    value of `4`. That won't get captured because it isn't two numbers split by
    any value. Additionally, you're telling Perl to look for a number after the
    any-character. So `4`, `44`, `44.`, or `44324242` won't work. I would
    suggest the following:

    # UN-TESTED
    /(\w+)\s+(-?\d+.?\d*)\s+(-?\d+.?\d*)\s+(-?\d+.?\d*)\s+(\w+)/

    This tells Perl the decimal is optional. It also says the values after the
    decimal are optional too (this is important if there is no decimal). This
    syntax is fairly noisy. As an alternative, try using split() Something along
    the lines of:

    #! /usr/bin/perl
    use strict;

    # UN-TESTED
    my ($value, pd);
    $value = "adam 1.2 2.2 3.2 wed";
    pd = split(/\s+/, $value);
    print "$_\n" for (pd);

    Unless the number of columns will vary by record this should work nicely.

    Regards,
    Adam


    Adam Guest

  13. #13

    Default RE: Regex for numbers and text

    Jeff,

    What needs to be changed in /(-?\d+\.?\d*)/ so that it also see number like
    ..59?

    Thanks,

    Jerry

    -----Original Message-----
    From: Jeff 'japhy' Pinyan [mailto:org]
    Sent: Sunday, July 11, 2004 9:20 PM
    To: Gunnar Hjalmarsson
    Cc: org
    Subject: Re: Regex for numbers and text


    On Jul 11, Gunnar Hjalmarsson said:
     
    >
    >You are not showing us a complete program that generates some other
    >output than the output you were expecting.[/ref]

    But we can see that his text does NOT match his regex.
     

    I tend to write that as /(-?\d+\.?\d*)/, but be aware that this doesn't
    match numbers like .52 or .9, because they don't have digits BEFORE the
    decimal point.

    --
    Jeff "japhy" Pinyan % How can we ever be the sold short or
    RPI Acacia Brother #734 % the cheated, we who for every service
    http://japhy.perlmonk.org/ % have long ago been overpaid?
    http://www.perlmonks.org/ % -- Meister Eckhart


    --
    To unsubscribe, e-mail: org
    For additional commands, e-mail: org
    <http://learn.perl.org/> <http://learn.perl.org/first-response>


    Jerry Guest

  14. #14

    Default Re: Regex for numbers and text

    Jerry Preston wrote: 

    Have you seen any of the Perl doentation sections that deals with
    regular expressions? This one, for instance, for a quick overview:

    http://www.perldoc.com/perl5.8.4/pod/perlrequick.html

    I suggest that you give it a try yourself to fix the regex with help
    of the docs.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  15. #15

    Default Re: Regex for numbers and text

    Jerry Preston wrote: 

    This is why I like to recommend Regexp::Common. But...

    Here is an example from Jeffrey Friedl's, "Mastering Regular Expressions":

    /-?([0-9]+(\.[0-9]*)?|\.[0-9]+)/

    perlified:

    /-?(?:\d+(?:\.\d*)?|\.\d+)/
    Randy Guest

  16. #16

    Default RE: Regex for numbers and text

    Another good module for helping you understand what a complicated regex MEANS (which will help you know how to fix it) is YAPE::Regex::Explain. It also helps with some issues where a regex may be working, but not for the reason you think.

    -----Original Message-----
    From: Jerry Preston [mailto:com]
    Sent: Mon 7/12/2004 6:32 PM
    To: com; 'Gunnar Hjalmarsson'
    Cc: org
    Subject: RE: Regex for numbers and text



    Jeff,

    What needs to be changed in /(-?\d+\.?\d*)/ so that it also see number like
    .59?

    Tim Guest

  17. #17

    Default Re: Regex for numbers and text

    Randy W. Sims wrote: 
    >
    > This is why I like to recommend Regexp::Common. But...[/ref]

    use warnings;
    use Regexp::Common 'number';
    $_ = '.';
    /^$RE{num}{real}$/ and print "\"$_\" is a number.\n";
    my $x = 1 if $_ < 5;

    Outputs:
    "." is a number.
    "." isn't numeric in numeric lt (<) at ...

    Regexp::Common considers an alone decimal point to be a number, while
    the Perl compiler does not. Did you know that? ;-)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  18. #18

    Default Re: Regex for numbers and text

    Gunnar Hjalmarsson wrote: 
    >>
    >>
    >> This is why I like to recommend Regexp::Common. But...[/ref]
    >
    >
    > use warnings;
    > use Regexp::Common 'number';
    > $_ = '.';
    > /^$RE{num}{real}$/ and print "\"$_\" is a number.\n";
    > my $x = 1 if $_ < 5;
    >
    > Outputs:
    > "." is a number.
    > "." isn't numeric in numeric lt (<) at ...
    >
    > Regexp::Common considers an alone decimal point to be a number, while
    > the Perl compiler does not. Did you know that? ;-)[/ref]

    Hrm, that's unfortunate :-/

    Well, I'll still take this as an argument in favor of using modules like
    Regexp::Common. If your original regex were used: '(-?\d+(?:\.\d+)?)'
    and if it were used in more than one application, it would have to be
    changed everywhere. If, OTOH, Regexp::Common were used, we need only
    make the correction in one place and it is fixed everywhere.

    BTW, I've posted the following to RT:

    [cpan #6940] lone . (decimal) considered a match for $RE{num}{real}
    -------------------------------------------------------------------------
    perl -MRegexp::Common=number -e 'print "oops\n" if "." =~ /$RE{num}{real}/'

    This was pointed out to me by Gunnar Hjalmarsson in a thread on
    <org> ...
    Randy Guest

  19. #19

    Default Re: Regex for numbers and text

    On Jul 11, Jeff 'japhy' Pinyan said:
     

    I've seen the response of /-?(?:\d+\.?\d*|\.\d+)/, and while that does
    work, it seems too noisy to me. What we would really like to be able to
    say is /-?\d*\.?\d*/, but you should be able to see that could match "-."
    and "." and "", which we decided aren't legitimite numbers.

    So use a look-ahead.

    /(-?(?=.?\d)\d*\.?\d*)/

    *That* regex requires some explanation. Who will give it?

    --
    Jeff "japhy" Pinyan % How can we ever be the sold short or
    RPI Acacia Brother #734 % the cheated, we who for every service
    http://japhy.perlmonk.org/ % have long ago been overpaid?
    http://www.perlmonks.org/ % -- Meister Eckhart

    Jeff Guest

  20. #20

    Default Re: Regex for numbers and text

    Randy W. Sims wrote: 
    >
    > Hrm, that's unfortunate :-/
    >
    > Well, I'll still take this as an argument in favor of using modules
    > like Regexp::Common.[/ref]

    Why am I not surprised...

    <argument previously stated in this thread snipped>
     

    Good. And thanks for giving me credit. :)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

Page 1 of 2 12 LastLast

Similar Threads

  1. Can a regex match numbers?
    By Jabez wilson in forum PERL Beginners
    Replies: 5
    Last Post: November 22nd, 11:15 PM
  2. Regex to extract row data from text
    By TimBenz in forum PERL Miscellaneous
    Replies: 13
    Last Post: October 23rd, 05:02 PM
  3. Text numbers for checks, etc.
    By Shadenfroh in forum FileMaker
    Replies: 1
    Last Post: August 9th, 06:05 PM
  4. numbers to text
    By Pete Morganic in forum PHP Development
    Replies: 2
    Last Post: July 11th, 11:40 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139