Professional Web Applications Themes

speed up string matching - PERL Beginners

Hi! I need to match an expression and its reverse to a very long string. When a match occurs all matching should stop and the position of the match should be returned. Question1: can I match the forward and reverse expression to the string on the same time and thereby save half the time it normally would take to find a match or does the matching just get slower? Question2: is the "fork" function what I should use in order to match a string with multiple expressions simultaneously? Thanks to all helpers!...

  1. #1

    Default speed up string matching

    Hi!

    I need to match an expression and its reverse to a very long string.
    When a match occurs all matching should stop and the position of the match should be returned.

    Question1: can I match the forward and reverse expression to the string on the same time and thereby save half the time it normally would take to find a match or does the matching just get slower?

    Question2: is the "fork" function what I should use in order to match a string with multiple expressions simultaneously?



    Thanks to all helpers!

    C Guest

  2. #2

    Default RE: speed up string matching

    Hi,

    c r <dk> asked: 

    Could you please illustrate this with an example or two?

    Unless you specify the /g modifier, the RE engine stops at
    the first match. Use the pos function to find the position
    of the match.
     

    Well, you'd have to merge your expressions somehow - the easiest
    way would be to try and match /expr-a|expr-b/ but then I suspect
    that for all but simple cases two separate matches would be faster.
     

    Well, there is a certain overhead involved in keeping your processes
    synchronized that would only be outweighed if you had a multi CPU
    machine where both processes could run at once in the first place.
    Even then it's a hassle.

    If I were you I'd focus my energy in optimizing the expression.
    If you're going to match many long strings with the same RE, you
    could use the /o modifier to benefit. Also, you could try wether
    a study() of the input strings speeds things up.

    HTH,
    Thomas
    Thomas Guest

  3. #3

    Default Re: speed up string matching

    C R wrote: 

    You can make use of alternation (the '|' character).
     

    Intuitively I'd guess it's faster, but to know you need to do a
    benchmark.
     

    Aha, did you mean "at the same time" in that sense? Maybe, if the
    string is really, really log. You may e.g. want to check out the
    module Parallel::ForkManager.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  4. #4

    Default Re: speed up string matching



    c r <dk> wrote:Thanks for replying!

    Are you certain that using the module makes the simultaneous matching faster than a sequential and to what degree (roughly)?

    Gunnar Hjalmarsson <cc> wrote:
    C R wrote: 

    You can make use of alternation (the '|' character).
     

    Intuitively I'd guess it's faster, but to know you need to do a
    benchmark.
     

    Aha, did you mean "at the same time" in that sense? Maybe, if the
    string is really, really log. You may e.g. want to check out the
    module Parallel::ForkManager.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl

    --
    To unsubscribe, e-mail: org
    For additional commands, e-mail: org





    C Guest

  5. #5

    Default RE: speed up string matching



    c r <dk> wrote:Thanks for replying!

    Thomas Bätzler <com> wrote:
    Hi,

    c r asked: 

    Could you please illustrate this with an example or two?

    Don't mind the lack of position return. I have:

    $expres = '10_normal_characters'; $rev_expres = reverse $expres; $long_string = 'ARGBB...........';

    if ($long_string =~ /$expres/i) { next;}

    if ($long_string =~ /$rev_expres/i) {next;}

    (the "next" function takes a different $expres reverses it and does the matching procedure again. This is repeted many thounsands of times and it takes days to finish).


    Unless you specify the /g modifier, the RE engine stops at
    the first match. Use the pos function to find the position
    of the match.
     

    Well, you'd have to merge your expressions somehow - the easiest
    way would be to try and match /expr-a|expr-b/ but then I suspect
    that for all but simple cases two separate matches would be faster.
     

    Well, there is a certain overhead involved in keeping your processes
    synchronized that would only be outweighed if you had a multi CPU
    machine where both processes could run at once in the first place.
    Even then it's a hassle.

    So in order to match a very long string with multiple expressions simultaneously and faster than the matching procedure I have described above I need multiple computers?

    If I were you I'd focus my energy in optimizing the expression.
    If you're going to match many long strings with the same RE, you
    could use the /o modifier to benefit. Also, you could try wether
    a study() of the input strings speeds things up.

    I don't know the "study" function, but I doubt it can solve the problem to satisfatory. please correct me if I am wrong!

    I acknowledge that this is a serious programming challenge. Do you have any other ideas of how to tackle this problem (what about other hardware)?

    HTH,
    Thomas




    C Guest

  6. #6

    Default Re: speed up string matching

    [ Please 'bottom-post', i.e. type your reply below the quoted part of
    the message you are replying to. Also, don't quote the whole message,
    but only the part(s) needed for context. ]

    C R wrote: 
    >>
    >> Maybe, if the string is really, really log. You may e.g. want to
    >> check out the module Parallel::ForkManager.[/ref]
    >
    > Are you certain that using the module makes the simultaneous
    > matching faster than a sequential and to what degree (roughly)?[/ref]

    Certain? Certainly not. :) It depends, among other things, on your
    systems ability to run parallel processes and on the size of the
    string you want to p.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  7. #7

    Default Re: speed up string matching



    Gunnar Hjalmarsson <cc> wrote: 
    >>
    >> Maybe, if the string is really, really log. You may e.g. want to
    >> check out the module Parallel::ForkManager.[/ref]
    >
    > Are you certain that using the module makes the simultaneous
    > matching faster than a sequential and to what degree (roughly)?[/ref]

    Certain? Certainly not. :) It depends, among other things, on your
    systems ability to run parallel processes and on the size of the
    string you want to p.



    I have an average personal computer. At the moment the size of the string can get up to about 600.000 characters, but in the future it will get much larger than 100 MB.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl

    --
    To unsubscribe, e-mail: org
    For additional commands, e-mail: org






    C Guest

  8. #8

    Default Re: speed up string matching

    C R wrote: 
    >>
    >> Certain? Certainly not. :) It depends, among other things, on
    >> your systems ability to run parallel processes and on the size of
    >> the string you want to p.[/ref]
    >
    > I have an average personal computer. At the moment the size of the
    > string can get up to about 600.000 characters, but in the future it
    > will get much larger than 100 MB.[/ref]

    I still can't tell. Maybe somebody else is able to give you guidance,
    but why don't you simply try it if you want to explore that option?

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  9. #9

    Default Re: speed up string matching



    Gunnar Hjalmarsson <cc> wrote:
    C R wrote: 
    >>
    >> Certain? Certainly not. :) It depends, among other things, on
    >> your systems ability to run parallel processes and on the size of
    >> the string you want to p.[/ref]
    >
    > I have an average personal computer. At the moment the size of the
    > string can get up to about 600.000 characters, but in the future it
    > will get much larger than 100 MB.[/ref]

    I still can't tell. Maybe somebody else is able to give you guidance,
    but why don't you simply try it if you want to explore that option?



    OK, thanks!

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl

    --
    To unsubscribe, e-mail: org
    For additional commands, e-mail: org






    C Guest

Similar Threads

  1. Regex for matching a string not in a URL
    By webpointz in forum Coldfusion - Advanced Techniques
    Replies: 1
    Last Post: April 8th, 05:10 PM
  2. Reg. string matching using reg-exp
    By Balaji thoguluva in forum PERL Beginners
    Replies: 7
    Last Post: February 6th, 11:13 PM
  3. Pattern matching for xx-xx-xx string
    By jeff@nospam.com in forum PHP Development
    Replies: 7
    Last Post: January 19th, 06:38 PM
  4. Matching String
    By Pablo Fischer in forum PERL Beginners
    Replies: 1
    Last Post: August 24th, 05:25 PM
  5. string matching
    By kamal in forum PERL Miscellaneous
    Replies: 2
    Last Post: August 22nd, 01:01 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139