Ask a Question related to PERL Miscellaneous, Design and Development.
-
JAG #1
Splitting up an XML File
I have an XML file that looks like this:
<root>
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>
<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>
</root>
But the actual file has about 100 <economist> elements.
I need to write some Perl code to parse this XML file and
write out 100 smaller XML files, each file corresponding to one
<economist> element.
So in my example, I'd write 2 smaller files, one that
looks like this:
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>
and one that looks like this:
<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>
There are some nested elements in the real file, so I think
XML::Simple won't work for this.
Any ideas about how I can do this? I don't need to do any processing
(at least not now) - just reading and writing smaller chunks.
Thanks!
JAG Guest
-
splitting a PDF file
Does anyone know if or how to split a multiple page PDF file into single page PDF files in Acrobat 7? I heard this could be done without purchasing... -
PHP variable splitting
Is there a command in PHP that will allow me to split a text variable into an array by length rather than using a delimiter? What I want to do is... -
Splitting a Logo
Hi Is it possible to split a Logo cleanly in half and if so how would i do this? TOON TOON. -
Splitting OR Regex
On Thu, 30 Oct 2003 23:37:55 -0500, Scott, Joshua wrote: This is a FAQ: perldoc -q delimit -- Tore Aursand <tore@aursand.no> -
splitting an array
Hi All , I have one array of numbers say (12 17 18 19 120 121 122 123 124 379 480 481). Now I want to get the starting and ending of any... -
Tad McClellan #2
Re: Splitting up an XML File
JAG <jeffg@programmer.net> wrote:
> But the actual file has about 100 <economist> elements.
> I need to write some Perl code to parse this XML file and
> write out 100 smaller XML files, each file corresponding to one
><economist> element.> There are some nested elements in the real file,
I will assume that <economist> is NOT nested, and that the
start/end tags are on lines by themselves.
> Any ideas about how I can do this?
# strip non-<economist> stuff at top of file
$/ = "<economist>\n";
while ( <> ) { # read one <economist> element per loop iteration
# open file, output $_ to file, close file.
}
--
Tad McClellan SGML consulting
[email]tadmc@augustmail.com[/email] Perl programming
Fort Worth, Texas
Tad McClellan Guest
-
Tad McClellan #3
Re: Splitting up an XML File
Tad McClellan <tadmc@augustmail.com> wrote:
> $/ = "<economist>\n";
Oops! That should have been:
$/ = "</economist>\n";
--
Tad McClellan SGML consulting
[email]tadmc@augustmail.com[/email] Perl programming
Fort Worth, Texas
Tad McClellan Guest
-
trwww #4
Re: Splitting up an XML File
[email]jeffg@programmer.net[/email] (JAG) wrote in message news:<6b40b6b9.0309171009.20d66b6a@posting.google. com>...
<snip />> I have an XML file that looks like this:
><snip />>
> But the actual file has about 100 <economist> elements.
> I need to write some Perl code to parse this XML file and
> write out 100 smaller XML files, each file corresponding to one
> <economist> element.
>
> So in my example, I'd write 2 smaller files, one that
> looks like this:This uses one of my favorite modules, XML::XPath:>
> There are some nested elements in the real file, so I think
> XML::Simple won't work for this.
>
> Any ideas about how I can do this? I don't need to do any processing
> (at least not now) - just reading and writing smaller chunks.
>
[trwww@waveright trwww]$ perl
use warnings;
use strict;
use XML::XPath;
use IO::File;
my($xp) = XML::XPath->new( xml => join('', <DATA>) );
my($nodeset) = $xp->find( '/root/economist' );
my($ext) = 0;
foreach my $record ( $nodeset->get_nodelist() ) {
IO::File->new('> record.'.$ext++)->print($record->toString());
}
__DATA__
<root>
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>
<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>
</root>
Ctrl-D
[trwww@waveright trwww]$ ls -l
total 24
drwxr-xr-x 3 trwww trwww 4096 Aug 17 19:00 apps
drwx------ 3 trwww trwww 4096 Sep 16 20:49 Desktop
drwxr-xr-x 3 trwww trwww 4096 Aug 18 16:50 misc
drwxrwxr-x 3 trwww trwww 4096 Sep 6 19:00 public_html
-rw-rw-r-- 1 trwww trwww 297 Sep 17 22:56 record.0
-rw-rw-r-- 1 trwww trwww 306 Sep 17 22:56 record.1
[trwww@waveright trwww]$ cat record.0
<economist publications="true">
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>[trwww@waveright trwww]$ cat record.1
<economist publications="true">
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>[trwww@waveright trwww]$
Todd W.
trwww Guest
-
JAG #5
Re: Splitting up an XML File
[email]toddrw69@excite.com[/email] (trwww) wrote in message news:<d81ecffa.0309171902.596dfa99@posting.google. com>...
> [email]jeffg@programmer.net[/email] (JAG) wrote in message news:<6b40b6b9.0309171009.20d66b6a@posting.google. com>...> <snip />> > I have an XML file that looks like this:
> >> <snip />> >
> > But the actual file has about 100 <economist> elements.
> > I need to write some Perl code to parse this XML file and
> > write out 100 smaller XML files, each file corresponding to one
> > <economist> element.
> >
> > So in my example, I'd write 2 smaller files, one that
> > looks like this:>> >
> > There are some nested elements in the real file, so I think
> > XML::Simple won't work for this.
> >
> > Any ideas about how I can do this? I don't need to do any processing
> > (at least not now) - just reading and writing smaller chunks.
> >
> This uses one of my favorite modules, XML::XPath:
>
> [trwww@waveright trwww]$ perl
> use warnings;
> use strict;
> use XML::XPath;
> use IO::File;
>
> my($xp) = XML::XPath->new( xml => join('', <DATA>) );
> my($nodeset) = $xp->find( '/root/economist' );
>
> my($ext) = 0;
>
> foreach my $record ( $nodeset->get_nodelist() ) {
> IO::File->new('> record.'.$ext++)->print($record->toString());
> }
>
> __DATA__
> <root>
> <economist publications="true" >
> <name>
> <first>John</first>
> <last>Doe</last>
> </name>
> <keywords>
> <keyword>Foo</keyword>
> <keyword>Bar</keyword>
> </keywords>
> <title>Indian Chief</title>
> </economist>
>
> <economist publications="true" >
> <name>
> <first>Jane</first>
> <last>Smith</last>
> </name>
> <keywords>
> <keyword>More Foo</keyword>
> <keyword>More Bar</keyword>
> </keywords>
> <title>President</title>
> </economist>
> </root>
> Ctrl-D
> [trwww@waveright trwww]$ ls -l
> total 24
> drwxr-xr-x 3 trwww trwww 4096 Aug 17 19:00 apps
> drwx------ 3 trwww trwww 4096 Sep 16 20:49 Desktop
> drwxr-xr-x 3 trwww trwww 4096 Aug 18 16:50 misc
> drwxrwxr-x 3 trwww trwww 4096 Sep 6 19:00 public_html
> -rw-rw-r-- 1 trwww trwww 297 Sep 17 22:56 record.0
> -rw-rw-r-- 1 trwww trwww 306 Sep 17 22:56 record.1
> [trwww@waveright trwww]$ cat record.0
> <economist publications="true">
> <name>
> <first>John</first>
> <last>Doe</last>
> </name>
> <keywords>
> <keyword>Foo</keyword>
> <keyword>Bar</keyword>
> </keywords>
> <title>Indian Chief</title>
> </economist>[trwww@waveright trwww]$ cat record.1
> <economist publications="true">
> <name>
> <first>Jane</first>
> <last>Smith</last>
> </name>
> <keywords>
> <keyword>More Foo</keyword>
> <keyword>More Bar</keyword>
> </keywords>
> <title>President</title>
> </economist>[trwww@waveright trwww]$
>
> Todd W.
Thanks! This works beautifully.
Now, here are two more things.
Instead of naming the files record.[0..n], I want each
output file to have the name of the person.
So these two files would be named Jane.Smith and John.Doe
Also, within each <economist> element, there is now an element
called <work> that contains other elements. I need each of these
<work> elements to be writtten to its own file called lastname_work
and not in the first output file.
So for this XML file:
<root>
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
<work>
<title>Title 1</title>
<content>Some Content</content>
</work>
</economist>
<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
<work>
<title>Title 2</title>
<content>Some More Content</content>
</work>
</economist>
So this would produce the same two files your original code produced,
but named John.Doe and Jane.Smith and also without the <work> element.
Instead of printing the work element in this file, it should be printed
in its own file, in this case, called Smith_work and Doe_work.
Thanks again.
JAG Guest
-
trwww #6
Re: Splitting up an XML File
[email]jeffg@programmer.net[/email] (JAG) wrote in message news:<6b40b6b9.0309180733.6cdcb2c8@posting.google. com>...
<snip />> [email]toddrw69@excite.com[/email] (trwww) wrote in message news:<d81ecffa.0309171902.596dfa99@posting.google. com>...> > [email]jeffg@programmer.net[/email] (JAG) wrote in message news:<6b40b6b9.0309171009.20d66b6a@posting.google. com>...> > > I have an XML file that looks like this:
> > ><snip />> >> > >
> > > There are some nested elements in the real file, so I think
> > > XML::Simple won't work for this.
> > >
> > > Any ideas about how I can do this? I don't need to do any processing
> > > (at least not now) - just reading and writing smaller chunks.
> > >
> > This uses one of my favorite modules, XML::XPath:
> >
> > [trwww@waveright trwww]$ perl
> > use warnings;
> > use strict;
> > use XML::XPath;
> > use IO::File;
> >
> > my($xp) = XML::XPath->new( xml => join('', <DATA>) );
> > my($nodeset) = $xp->find( '/root/economist' );
> >
> > my($ext) = 0;
> >
> > foreach my $record ( $nodeset->get_nodelist() ) {
> > IO::File->new('> record.'.$ext++)->print($record->toString());
> > }
> >
> > __DATA__
> > <root>of course =0)>
> Thanks! This works beautifully.
No thank you.> Now, here are two more things.
<snip />>
> Instead of naming the files record.[0..n], I want each
> output file to have the name of the person.
> So these two files would be named Jane.Smith and John.Doe
>
> Also, within each <economist> element, there is now an elementI replied to your post to show you and CLPM lurkers how easy>
> So this would produce the same two files your original code produced,
> but named John.Doe and Jane.Smith and also without the <work> element.
> Instead of printing the work element in this file, it should be printed
> in its own file, in this case, called Smith_work and Doe_work.
>
XML::XPath is to use.
If you need a consultant, email me off-list at [email]sendwade@hotmail.com[/email]
Otherwise, read the XML::XPath documentation. What you propose above
is trivial to implement with XPath.
Todd W.
trwww Guest



Reply With Quote

