Professional Web Applications Themes

"walk over," and XPath-based substitutions? - PERL Modules

[Cross-posting to news:comp.text.xml, yet omitting it from Followup-To:, for I'm primarily interested in Perl-based solutions.] Is there an easy way to invoke a particular code for each of XML nodes that satisfies an XPath expression out of a certain list? A simple-minded approach (based on XML::LibXML) could be like: require XML::LibXML; my %xpath_sub = { q {//node ()[foo = "bar"]} => \&foo_bar, q {//node ()[baz = "qux"]} => sub { baz ("qux", _); } }; foreach my $xpath (keys (%xpath_sub)) { my $sub = $xpath_sub{$xpath}; foreach my $node ($context->findnodes ($xpath)) { $sub->($node); } } However, AIUI, the code above implies ...

  1. #1

    Default "walk over," and XPath-based substitutions?

    [Cross-posting to news:comp.text.xml, yet omitting it from
    Followup-To:, for I'm primarily interested in Perl-based
    solutions.]

    Is there an easy way to invoke a particular code for each of XML
    nodes that satisfies an XPath expression out of a certain list?

    A simple-minded approach (based on XML::LibXML) could be like:

    require XML::LibXML;

    my %xpath_sub = {
    q {//node ()[foo = "bar"]} => \&foo_bar,
    q {//node ()[baz = "qux"]} => sub { baz ("qux", _); }
    };

    foreach my $xpath (keys (%xpath_sub)) {
    my $sub
    = $xpath_sub{$xpath};
    foreach my $node ($context->findnodes ($xpath)) {
    $sub->($node);
    }
    }

    However, AIUI, the code above implies that the XML tree is to be
    traversed multiple times. Which could probably be avoided by
    traversing the tree explicitly, as in:

    sub traverse {
    my ($node, $xsubs) = _;
    foreach my $xpath (keys (%$xsubs)) {
    next
    unless ($node->find ($xpath));
    ## FIXME: check if the result is a boolean?
    $xsubs->{$xpath}->($node);
    ## FIXME: there, one may wish for a recursion; or not
    }
    ## recurse over the children
    foreach my $child ($node->childNodes ()) {
    traverse ($child, $xsubs);
    }
    ## .
    }

    Still, it may repeatedly traverse the children of $node while
    computing ->find () for each of the XPath expressions. (Unlike
    the way an "optimized," or "compiled," regular expression would
    be handled, IIUC.)

    The question is: does LibXML (or some other library) provide a
    way to make such a task both simpler to code and more efficient
    on execution?

    ... Or do I "optimize" all the XPath expressions themselves into
    a single one somehow?

    TIA.

    --
    FSF associate member #7257 http://hfday.org/
    Ivan Guest

  2. #2

    Default Re: "walk over," and XPath-based substitutions?

    On 4/6/2013 5:32 AM, Ivan Shmakov wrote: 

    XML::Twig http://search.cpan.org/~mirod/XML-Twig-3.42/Twig.pm
    may be useful.

    Dennis

    Dennis Guest

  3. #3

    Default Re: "walk over," and XPath-based substitutions?

    On 4/6/2013 7:32 AM, Ivan Shmakov wrote:
    .... 

    First off, I'd suggest that you consider XSLT or XQuery, which are
    specifically designed for this kind of find-and-process operation.

    What you're looking for is a "streaming processor" -- one which rewrites
    the complete set of operations into a state machine which can produce
    its results in a single pass over the nodes. There are XPath/XSLT/XQuery
    systems which attempt to do this for a subset of the query language -- I
    think Xerces and the IBM XML pr have streaming-subset XPath
    evaluators, and I know the DataPower "xml appliance" machines have some
    limited XSLT streaming capability -- but even as subsets, those are
    fairly rare, and while they may be able to reduce storage by not keeping
    the entire doent model in memory they may not reduce computational
    load. If you're looking for something off-the-shelf, that's where I'd start.

    A _good_ general solution for matching multiple paths in a single pass
    over the doent is NOT easy to create. You need to create a state
    machine which tracks what has been seen so far and detects which nodes
    match which expression, and at the same time you want to constrain the
    tree walk so you don't waste time exploring trees which provably can't
    contribute nodes to those results. Getting all those details right even
    for the subset approach can be complicated. Reassembling the individual
    results in the correct order to produce the intended result doent
    further complicates the process.

    (I'm one of the authors of a patent on that topic, actually -- US
    8,120,789 B2 -- but unfortunately our group didn't get the funding to
    finish a product-quality implementation of that logic so it isn't
    available for use. If someone wants to license the patent, I'm sure IBM
    would be delighted to talk to you...)





    --
    Joe Kesselman,
    http://www.love-song-productions.com/people/keshlam/index.html

    {} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
    /\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
    Joe Guest

  4. #4

    Default Re: "walk over," and XPath-based substitutions?

    >>>>> Joe Kesselman <net> writes: [/ref][/ref]

    [Given that there were little Perl-specific matter in this
    subthread, cross-posting back to news:comp.text.xml, and setting
    Followup-To: there.]
     [/ref]
     

    I see little advantage in using XSLT for my task (and I'm not
    familiar with XQuery), as XML is not the only data source I need
    to interface. (E. g., I'm also accessing an SQLite database.)
    The usual benefits of XSLT -- the existence of browser-based
    implementations and its "Lisp-like" nature (in that it uses the
    same syntax for both the code and data) -- do not seem to apply.
     

    Indeed, thanks for clarification!
     

    Is it Apache Xerces [1]? It doesn't seem to include either XSLT
    or XQuery.

    [1] https://xerces.apache.org/
     

    Which is?
     

    ACK, thanks. My XMLs are rather small, so I'm more interested
    in reducing computational load than memory usage. But even that
    is not a priority right now. Rather, I'm looking for the ways
    to avoid total code rewrite at some later point.

    I guess I should check XML::Twig. Or, given that the conditions
    that I currently need to consider are rather simple, a
    straight-forward ->childNodes ()-based, no-XPath implementation
    may be possible.

    [...]
     

    I believe that I may be under a jurisdiction which has no notion
    of software patents. (Subject to the reading of TRIPS, though.)

    --
    FSF associate member #7257 http://hfday.org/
    Ivan Guest

Similar Threads

  1. Replies: 2
    Last Post: November 22nd, 08:53 PM
  2. Replies: 0
    Last Post: November 22nd, 05:58 PM
  3. Any thought on "Perl Database" based on "Tie:File"?
    By Public in forum PERL Miscellaneous
    Replies: 8
    Last Post: October 20th, 04:38 PM
  4. SelectSingeNode("xpath")...how to test if null?
    By Kathy Burke in forum ASP.NET General
    Replies: 2
    Last Post: July 23rd, 09:17 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139