Ask a Question related to PERL Beginners, Design and Development.

  1. #1

    Default quick re help

    Hi everyone

    I am pretty new to regex's, so I was happy when my text wrapping
    expression worked - for the most part.

    It messes up when I need to wrap lines with \n that don't end in a space.
    If there is no space, it places last word on its own line before it
    should wrap. Otherwise it double \ns the line.

    I can't figure out why.

    can anyone help me out?



    sub quickWrap {
    my $data = @_[0];
    my $wrapAt = 75;
    if (scalar @_ > 1) {
    $wrapAt = @_[1];
    }
    my $wrappedData ="";
    while ($data =~ /[^|\n][^\n]{$wrapAt,}?[ |$|\n]/) {
    $data =~ /([^|\n])([^\n]{1,$wrapAt})( )([\s|\S]*$)/;
    $wrappedData .= "$`$1$2\n";
    $data = "$4";
    }
    return "$wrappedData$data";
    }

    thanks!

    Webmaster@Oldwest.Org Guest

  2. Similar Questions and Discussions

    1. XML Quick help
      I am working on a simple flash interface, ussing the tutorial below. http://www.macromediahelp.com/flash/simple_flash_and_xml_sample/ It is all...
    2. I need Help Quick! - - PLEASE
      Recently I used the CF7 updater and the report builder 'add-on' In putting in place the updater I could no longer access my admin pages to update...
    3. I need some help quick...
      After making a graphic for screen printing, I will PDF it out to proof my seperations. This works great. I then need to make a "drawing" that shows...
    4. *Please Help Quick
      Does anyone know of a way to restore a works processor document back to the way it was when I working with it before? A user on my computer had...
    5. QUICK HELP PLEASE
      i would like to know,..how or if its possible to change the title of a pop up browser window i created,...i know how to do it through the "PAGE...
  3. #2

    Default Re: quick re help

    On Aug 13, [email]webmaster@oldwest.org[/email] said:
    >It messes up when I need to wrap lines with \n that don't end in a space.
    >If there is no space, it places last word on its own line before it
    >should wrap. Otherwise it double \ns the line.
    >sub quickWrap {
    > my $data = @_[0];
    You shouldn't use an array slice where you mean to use a single array
    element.

    my $data = $_[0];
    > my $wrapAt = 75;
    > if (scalar @_ > 1) {
    > $wrapAt = @_[1];
    > }
    Again, @_[1] should be $_[1]. And the use of scalar() here is redundant.

    if (@_ > 1) { $wrapAt = $_[1] }

    We could have written those first few lines in many different ways. Here
    are two ways I might have written it:

    my $data = shift;
    my $wrap_at = @_ ? shift : 75;

    or

    my $data = $_[0];
    my $wrap_at = @_ > 1 ? $_[1] : 75;
    > my $wrappedData ="";
    It's not technically required to give $wrappedData a value of "", since
    you're adding to it using the .= operator, which is nice enough not to
    complain if the variable started out undef. Just a little note.
    > while ($data =~ /[^|\n][^\n]{$wrapAt,}?[ |$|\n]/) {
    Something tells me you're not sure what a character class does. A
    character class is for CHARACTERS. Therefore, you don't use | in it. The
    class [a|b|c] is the same as [|abc] -- that is, it matches an 'a', an 'b',
    a 'c', or a '|'. Also, you can't match "beginning of line" or "end of
    line" in a regex, like you think you're doing with [^|\n] and [ |$|\n].
    First of all, the ^ and $ in a regex don't mean the same thing inside a
    character class. Second, ^ and $ don't match characters, they match
    locations. Third, the ^, as the first character of a class, means "match
    everything except ...".

    So. Let's give your regex a fixer-upper. Instead of [^|\n], I have a
    feeling you'll want either (?:^|\n) which matches ^ or \n, and doesn't
    capture to any $DIGIT variable; or maybe you can use ^ with the /m
    modifier on the regex. Instead of [^\n], you can just use . -- that's
    what it was made for. And instead of [ |$|\n], you'll want (?:\s|$) I
    think.

    But I think you're doing MUCH more work than needed. We'll see.
    > $data =~ /([^|\n])([^\n]{1,$wrapAt})( )([\s|\S]*$)/;
    This regex looks familiar. I'm going to suggest a big change in a bit.
    Oh, and [\s|\S], which could be [\s\S], is kind of awkward.
    > $wrappedData .= "$`$1$2\n";
    EWW. DON'T USE $`. It's terrible.
    > $data = "$4";
    You don't need those quotes.
    > }
    > return "$wrappedData$data";
    >}
    Ok, here's my idea: instead of matching text and putting it in a new
    string, why not CHANGE the string we're working on as we match it? We can
    do that using a substitution, the s/// operation.

    We want to match UP TO $wrap_at characters, as many as possible, and add a
    newline after them, SO LONG as it's in the place of a space. Here's a
    regex I think will do the job for you:

    sub quick_wrap { # I use the word_word_word style, not wordWordWord
    my $str = shift;
    my $wrap_at = @_ ? shift : 60;

    $str =~ s{(.{1,$wrap_at})\s}{$1\n}g;

    return $str;
    }

    The regex matches between 1 and $wrap_at characters (trying to match the
    most possible) that are followed by a space. It replaces this with the
    text it matched (and captured to $1) followed by a newline. Let me know
    if this does what you expected.

    --
    Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
    RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
    <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
    [ I'm looking for programming work. If you like my work, let me know. ]

    Jeff 'Japhy' Pinyan Guest

  4. #3

    Default Re: quick re help

    Jeff 'Japhy' Pinyan wrote:
    > On Aug 13, [email]webmaster@oldwest.org[/email] said:
    >
    >>sub quickWrap {
    >> my $data = @_[0];
    >
    >
    > You shouldn't use an array slice where you mean to use a single array
    > element.
    >
    Thanks for catching that, I should have really seen that one.

    $times_to_reread_my_code_before_posting_to_list++;

    > my $data = shift;
    > my $wrap_at = @_ ? shift : 75;
    >
    I like that.


    > Something tells me you're not sure what a character class does. A
    > character class is for CHARACTERS. Therefore, you don't use | in it. The
    Thanks for the correction character classes *runs to fix about half a
    dozen regexes*

    > This regex looks familiar. I'm going to suggest a big change in a
    > bit.
    > Oh, and [\s|\S], which could be [\s\S], is kind of awkward.
    what is less awkward than [\s|\S] for 'match anything?'

    >
    > EWW. DON'T USE $`. It's terrible.
    >
    Okay, is that because it is slow and makes the rest of the regular
    expressions afterwards run slowly? (I saw something about that in the
    perlre document)
    Is it as bad to use something like '(match anything)' before the main
    expression, and using $1 in place of $` when it's useful?


    > Ok, here's my idea: instead of matching text and putting it in a new
    > string, why not CHANGE the string we're working on as we match it? We can
    > do that using a substitution, the s/// operation.
    >
    That is a better approach, I had given up on that when I couldn't
    understand why it was failing, and thought I could follow the logic
    better if I broke it down into a match in a loop.

    > We want to match UP TO $wrap_at characters, as many as possible, and add a
    > newline after them, SO LONG as it's in the place of a space. Here's a
    > regex I think will do the job for you:
    >
    > sub quick_wrap { # I use the word_word_word style, not wordWordWord
    > my $str = shift;
    > my $wrap_at = @_ ? shift : 60;
    >
    > $str =~ s{(.{1,$wrap_at})\s}{$1\n}g;
    >
    > return $str;
    > }
    >
    > The regex matches between 1 and $wrap_at characters (trying to match the
    > most possible) that are followed by a space. It replaces this with the
    > text it matched (and captured to $1) followed by a newline. Let me know
    > if this does what you expected.
    >
    That is exactly what I was trying to do, and that's a far, far more
    elegant way to do it.

    I think I'll review all my material again on regexes. Are there any
    good books you recommend on how to use and think in regexes?


    Thanks again for your help, that has really really helped.





    Webmaster@Oldwest.Org Guest

  5. #4

    Default Re: quick re help

    On Wednesday, August 13, 2003, at 04:22 PM, [email]webmaster@oldwest.org[/email]
    wrote:
    > Jeff 'Japhy' Pinyan wrote:
    >> On Aug 13, [email]webmaster@oldwest.org[/email] said:
    >> my $wrap_at = @_ ? shift : 75;
    >
    > I like that.
    Just to add my two cents, I like:

    my $wrap_at = shift || 75;

    James Gray

    James Edward Gray II Guest

  6. #5

    Default RE: quick re help

    [email]webmaster@oldwest.org[/email] inquired:
    >> This regex looks familiar. I'm going to suggest a big change in a
    >> bit.
    >> Oh, and [\s|\S], which could be [\s\S], is kind of awkward.
    > what is less awkward than [\s|\S] for 'match anything?'
    ..

    Yes ->.<-

    Dot, period, point, et al, is the universal match "something" symbol. So,
    m'.*$' matches everthing on a line. If you want to match a period you can
    either escape it: \. or bracket it [.]. Escaping is better for simple
    matching.

    HTH,

    Robert Taylor

    Robert J Taylor Guest

  7. #6

    Default RE: quick re help

    Robert J Taylor wrote:
    >
    >webmaster@oldwest.org inquired:
    >
    > >> This regex looks familiar. I'm going to suggest a big change in a
    > >> bit.
    > >> Oh, and [\s|\S], which could be [\s\S], is kind of awkward.
    >
    > > what is less awkward than [\s|\S] for 'match anything?'
    >
    >.
    >
    >Yes ->.<-
    >
    >Dot, period, point, et al, is the universal match "something" symbol. So,
    >m'.*$' matches everthing on a line. If you want to match a period you can
    >either escape it: \. or bracket it [.]. Escaping is better for simple
    >matching.
    Almost... From "perldoc perlretut":

    \s is a whitespace character and represents [\ \t\r\n\f]
    \S is a negated \s; it represents any non-whitespace character [^\s]
    The period '.' matches any character but "\n"

    So, /[\s\S]/ would match a "\n", while /./ would not. The equivalent of
    /[\s\S]/, using period notation, would be /[.\n]/

    Alan
    Alan Perry Guest

  8. #7

    Default Re: quick re help

    On Aug 13, [email]webmaster@oldwest.org[/email] said:
    >Jeff 'Japhy' Pinyan wrote:
    >> On Aug 13, [email]webmaster@oldwest.org[/email] said:
    >>
    >> my $data = shift;
    >> my $wrap_at = @_ ? shift : 75;
    >>
    >
    >I like that.
    The simpler-looking

    my $wrap_at = shift || 75;

    has also been proposed. The only reason I didn't use that is because, in
    case you were ever in a situation where 0 was a valid value for your
    variable, you wouldn't want to use that || operator.
    >what is less awkward than [\s|\S] for 'match anything?'
    Well, for certain values of "less awkward", /(?s:.)/, which is the .
    metacharacter with the /s switch turned on -- that way it matches any
    character including newlines. I think someone suggested [.\n], but that
    is not correct, because . just means "." inside a character class.
    Another way to do it, if you are certain your input will have no
    multi-byte characters, is to use \C, which matches a single byte.
    >> EWW. DON'T USE $`. It's terrible.
    >
    >Okay, is that because it is slow and makes the rest of the regular
    >expressions afterwards run slowly? (I saw something about that in the
    >perlre document)
    Yes. When Perl compiles your program, it makes note of any use of $`, $&,
    or $'. If it sees you use it ONCE, ANYWHERE, it will make each regex
    prepare their values for you. Not cool. What is cool is that, even
    though $1, $2, etc. are the same way, they are provided only on a
    per-regex basis.
    >Is it as bad to use something like '(match anything)' before the main
    >expression, and using $1 in place of $` when it's useful?
    Well, if you're using a recent enough Perl (5.6+), you have access to the
    @- and @+ arrays, which hold offsets related to your last successful
    pattern match. $-[0] holds the offset in your string where the match
    started, so you could do

    my $pre = substr($str, 0, $-[0]);

    to get the equivalent of $`. See perlvar.
    >I think I'll review all my material again on regexes. Are there any
    >good books you recommend on how to use and think in regexes?
    "Mastering Regular Expressions", published by O'Reilly.

    --
    Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
    RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
    <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
    [ I'm looking for programming work. If you like my work, let me know. ]

    Jeff 'Japhy' Pinyan Guest

  9. #8

    Default RE: quick re help

    On Aug 14, Perry, Alan said:
    >So, /[\s\S]/ would match a "\n", while /./ would not. The equivalent of
    >/[\s\S]/, using period notation, would be /[.\n]/
    Not so much; the . in a character class matches just itself.

    --
    Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
    RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
    <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
    [ I'm looking for programming work. If you like my work, let me know. ]

    Jeff 'Japhy' Pinyan Guest

  10. #9

    Default RE: quick re help

    Jeff 'japhy' Pinyan wrote:
    >
    >On Aug 14, Perry, Alan said:
    >
    >>So, /[\s\S]/ would match a "\n", while /./ would not. The equivalent of
    >>/[\s\S]/, using period notation, would be /[.\n]/
    >
    >Not so much; the . in a character class matches just itself.
    You are correct, I forgot about that. You could use /.|\n/ ... That should
    work.
    Alan Perry Guest

  11. #10

    Default RE: quick re help

    > > > what is less awkward than [\s|\S] for 'match anything?'
    > >
    > >.
    > >
    > >Yes ->.<-
    > >
    > >Dot, period, point, et al, is the universal match "something" symbol.
    So,
    > >m'.*$' matches everthing on a line. If you want to match a period you
    can
    > >either escape it: \. or bracket it [.]. Escaping is better for simple
    > >matching.
    >
    > Almost... From "perldoc perlretut":
    >
    > \s is a whitespace character and represents [\ \t\r\n\f]
    > \S is a negated \s; it represents any non-whitespace character [^\s]
    > The period '.' matches any character but "\n"
    >
    > So, /[\s\S]/ would match a "\n", while /./ would not. The equivalent of
    > /[\s\S]/, using period notation, would be /[.\n]/
    >
    > Alan
    Thanks! I appreciate the clarification.

    Robert Taylor

    Robert J Taylor Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139