Professional Web Applications Themes

finding subdirectories without parsing every file - PERL Miscellaneous

Hi Is there any way to get the subdirectories of a directory without having to sort through all the files in a directory? I'm actually building a little perl script that looks at the directories and then prints out a directory tree (as a webpage). I've been using file::find to generate the directory tree but it's too slow. I think the problem is that it looks at each file in the directory. I'm not interested in what's in the directory, I just want to know what the subdirectories are. It takes about 30 seconds to build the directory tree on ...

  1. #1

    Default finding subdirectories without parsing every file

    Hi

    Is there any way to get the subdirectories of a directory without
    having to sort through all the files in a directory?

    I'm actually building a little perl script that looks at the
    directories and then prints out a directory tree (as a webpage).

    I've been using file::find to generate the directory tree but it's too
    slow. I think the problem is that it looks at each file in the
    directory. I'm not interested in what's in the directory, I just want
    to know what the subdirectories are.

    It takes about 30 seconds to build the directory tree on some of the
    larger sites and the directory searching seems to be where the
    bottlekneck is. That's compared to around 5 seconds to just download
    the file.

    Thanks :)

    Helen
    Helen Guest

  2. #2

    Default Re: finding subdirectories without parsing every file

    On Thu, 14 Aug 2003 00:45:26 -0700, Helen wrote:
    > Is there any way to get the subdirectories of a directory without
    > having to sort through all the files in a directory?
    Why is it going slow? Maybe you could share some of your code with us, so
    that we're able to actually know what you're talking about?

    I tend to use the following code when "filtering out" directories;

    #!/usr/bin/perl
    #
    use strict;
    use warnings;
    use File::Find;

    my $root = '/home/tore';
    my dirs = ();
    find sub { push(dirs, $File::Find::name) if -d }, $root;

    I guess there are faster ways to do it (as always), but this solution does
    it for me.


    --
    Tore Aursand <toreaursand.no>
    Tore Aursand Guest

  3. #3

    Default Re: finding subdirectories without parsing every file

    >>> Helen<helenhelephant.com> 8/14/2003 3:45:26 AM >>>
    Hi

    Is there any way to get the subdirectories of a directory without
    having to sort through all the files in a directory?

    I'm actually building a little perl script that looks at the
    directories and then prints out a directory tree (as a webpage).

    I've been using file::find to generate the directory tree but it's too
    slow. I think the problem is that it looks at each file in the
    directory. I'm not interested in what's in the directory, I just want
    to know what the subdirectories are.

    -this should do what you want, it's kind of like a 'tree' cmd on win but
    *nix like.
    ( none of those lines ).
    #!/usr/bin/perl -w

    use strict;
    use warnings;

    #yeah, 20 minutes later
    #my $path = '/';
    my $path = '/cygdrive/h/';
    #my $path = '/home/dangle/';
    &read_dir($path);


    sub open_dir {
    my $dir = shift;
    my $path;
    chdir($dir) or die "can't opendir $dir: $!";
    opendir(DIR, $dir) || die "can't opendir $dir: $!";
    $path = `pwd`;
    my dir_o = grep { !/^\./ && -d "$dir/$_" } readdir(DIR);
    closedir DIR;
    foreach my $dir_n (dir_o) {
    my $path_n = "$dir/$dir_n/";
    print $path_n, "\n";
    &read_dir($path_n);
    }
    }

    # did you find a directory
    # send it to open dir to get contents
    sub read_dir {
    # send pwd
    my $path = shift;
    opendir(DIR, $path) || die "can't opendir $path: $!";
    my dir_o = grep { !/^\./ && -d "$path/$_" } readdir(DIR);
    closedir DIR;
    foreach my $dir (dir_o) {
    print $path . $dir . "\n";
    &open_dir($path . $dir);
    }
    }






    __danglesocket__

    danglesocket Guest

  4. #4

    Default Re: finding subdirectories without parsing every file

    > Purl Gurl
    > --
    >
    > #!perl
    >
    > print "Content-type: text/plain\n\n";
    >
    > $internal_path = "c:/apache/users/callgirl";
    works better if you do this:

    chdir($internal_path)
    or die "Can't chdir to $internal_path: $!\n";

    Just to show why:
    --unmodified--
    [jimoplinux jim]$ perl news.pl
    Content-type: text/plain

    [jimoplinux jim]$

    --with my 'die' added--
    [jimoplinux jim]$ perl news.pl
    Content-type: text/plain

    Can't chdir to c:/apache/users/callgirl: No such file or directory
    [jimoplinux jim]$

    But hey, you were busy and didn't have time to test
    it before posting ... right?

    Jim
    James Willmore Guest

  5. #5

    Default Re: finding subdirectories without parsing every file

    James Willmore wrote:
    > > Purl Gurl
    (snipped)
    > > $internal_path = "c:/apache/users/callgirl";
    > Can't chdir to c:/apache/users/callgirl: No such file or directory
    > But hey, you were busy and didn't have time to test
    > it before posting ... right?

    No, you are inventing a lame excuse to troll
    evidenced by this idiocy of expecting sample
    code to compensate for every possibility,
    which is an insulting slap across the face
    of a reader.

    This is an insult because you are saying,
    paraphrased most accurately,

    "You readers are too stupid to use your own path."

    My attitude is one of respect. I know a reader
    is smart enough to add a personal path or to
    modify my "example" as needed to meet whatever
    cirstances are present.

    Additionally, without any modification, my
    simple example will simply not print any
    expected results if the initial internal path
    cannot be read, which, in itself, is a clear
    error message which cannot be ignored along
    with a typical error message generated in
    lieu of your troll excuse error checking.

    You are another typical and classic CLPM
    "What If Village Idiot" looking for any
    possible way to break good code examples
    to afford yourself a chance to masturbate
    your fragile masculine ego to compensate
    for your personally known short comings,
    at the insulting expense of others, for
    whom you don't give a damn.

    But hey, you were too busy writing an
    idiotic troll article to stop and think
    how stupid you will appear to others and
    how insulting you will appear to others.

    Oh, you forgot to toss in typical CLPM
    trolling employing,

    "No strict, no warnings, you are an idiot."

    You should slow down and think before
    splattering ink to sooth your ego.

    Readers are not stupid nor do I treat
    them as such, unless a reader posts
    an article declaring his stupidity.


    Purl Gurl
    Purl Gurl Guest

  6. #6

    Default Re: finding subdirectories without parsing every file

    Purl Gurl wrote:
    > James Willmore wrote:
    > > > Purl Gurl wrote:
    (snipped)
    > Additionally, without any modification, my
    > simple example will simply not print any
    > expected results if the initial internal path
    > cannot be read, which, in itself, is a clear
    > error message which cannot be ignored along
    > with a typical error message generated in
    > lieu of your troll excuse error checking.

    Poor wording on my part.

    "...error message generated _by your_ troll excuse error checking."

    Clearly I was and am keyed in on no need for error checking
    because no expected results, is de facto error checking, along
    with being annoyed by this persistent lame trolling here.

    Nonetheless, I am most confident, you the reader, are quite
    capable of making modifications to my example per your needs,
    sans a personal need to deliberately annoy me.


    Purl Gurl
    Purl Gurl Guest

  7. #7

    Default Re: finding subdirectories without parsing every file

    > James Willmore wrote:
    >
    > > > Purl Gurl
    >
    > (snipped)
    >
    > > > $internal_path = "c:/apache/users/callgirl";
    >
    > > Can't chdir to c:/apache/users/callgirl: No such file or directory
    >
    > > But hey, you were busy and didn't have time to test
    > > it before posting ... right?
    >
    >
    > No, you are inventing a lame excuse to troll
    > evidenced by this idiocy of expecting sample
    > code to compensate for every possibility,
    > which is an insulting slap across the face
    > of a reader.
    <snip>

    No, I was pointing out an error you made and made light of it -
    because you always seem to point out the mistakes of others in a
    demeaning way. I figured that I would return the favor.
    (RE: "I am sure you boys can determine what is wrong.")

    More to the point - the results, if you had bothered to read them
    fully, pointed out that - your method did NOT return an error when
    using the 'chdir' function, the fix I added did.

    Now, what if the end user of your version changed the path, made a
    mistake in typing, and then the script didn't work. Would it not be
    more productive that the script TELL them EXACTLY what happened? You
    ALWAYS complain about the code of others - why did you not just own
    the error (okay - poor coding) and move on instead of going off on
    some ramblings about God knows what - I just ignored it (not just snip
    it in the reply)?

    If you want to continue this 'flame out', you have my email address.
    Don't take up bandwidth doing it here.

    James Willmore Guest

  8. #8

    Default Re: finding subdirectories without parsing every file

    [email]jwillmorecyberia.com[/email] (James Willmore) wrote in message news:<e0160815.0308141712.67b4eac2posting.google. com>...
    > > Is there any way to get the subdirectories of a directory without
    > > having to sort through all the files in a directory?
    > > <snip>
    > > I've been using file::find to generate the directory tree but it's too
    > > slow. I think the problem is that it looks at each file in the
    > > directory. I'm not interested in what's in the directory, I just want
    > > to know what the subdirectories are.
    >
    Thanks for the help of all who've answered my post. :)
    > Ah.... but how far down the parent directory do you wish to search?
    > File::Find has a 'finddepth' method and a multitude of options.
    I really need it to list all of the directories, no matter how deep it
    goes. I've designed the system so that it's simple to make sure that
    the directory tree doesn't go too deep, but I didn't want to enforce a
    depth because it makes the script less flexiable.
    > Post your code and maybe we can lend more assistance.
    I'm using the method below to build a "tree" structure which
    represents the directories on our web server. The main complication is
    that sites can have subsites, but in this part of the code I'm only
    looking for the subdirectories of one site. If it finds another
    subsite it stops recursing. This works because I load all the subsites
    into the tree before I load all the subdirectories.

    The directories and sites are stored in a tree object that uses the
    directory and site path to add new sites/dirs to the tree. It's then
    quite easy to recurse the bits I want when I'm printing the tree.

    On the page where I'm doing the recursing it prints out only the
    subdirectories of the site that don't belong to another subsite. So
    it's really only looking at a small part of the tree. The problem is
    that "small" is a relative term. I'm testing it with a subsite that
    has 800 subdirectories (and over 9000 files) as a worst case scenario
    (which isn't the biggest site on the server). I'm not sure I'll be
    able to get the load time to anywhere near 10 seconds, but I like
    working with such a large site because the effects of changing parts
    of the script are exagerated.

    The subsites are stored in a database, but the first thing I did was
    make sure that all the database accesses happened at the same time. So
    there are only two calls to the database (no matter how big the tree
    gets) and they both use the same database handle. The database stuff
    happens before I go looking for the subdirectories.

    my $nodePath = "$basePath/".$node->getDirectory();
    find(\&wanted, "$basePath/".$node->getDirectory());

    sub wanted {
    my $currentFile = $File::Find::name;
    if(-d $currentFile) {
    if($currentFile ne $nodePath) {
    my $newDir = $currentFile;
    $newDir =~ s/$basePath\///;

    # if this directory is actually a site,
    # we only want to recurse it
    # if we're told to by the recurseSubSites parameter
    if(!$siteTree->isNodeSite($newDir)) {
    # if this directory isn't a site,
    # add the directory to the site tree
    $siteTree->addDirectory($newDir);
    } elsif(!$recurseSubSites) {
    # we don't want to recurse any of this directory's subdirs
    $File::Find::prune = 1;
    } # end if
    } # end if
    } # end if
    } # end wanted

    Since I posted here, I've done more comparisons of how fast it runs. A
    lot of the problem is with the adding the node to the site tree and
    I'm going to try to reduce that by doing sorting within the nodes as I
    add them (and probably some other stuff too).

    However, it takes a good 10-15 seconds just to print the directories
    with the rest of the sub commented out. Perhaps I'm doing something in
    an inefficient way? Or is it that I'm going to have to live with this
    sort of speed if I'm using perl to recurse that many directories? I
    actually didn't realise that I had so many files in the directories, I
    thought it was only one or two thousand. I don't think I can rely on
    the sorting of the operating system because I'm on a unix system that
    seems to just return the files on alphabetical order.

    Anyway, any comments or suggestions about the code would be
    appreciated. I'm a bit of a newbie perl programmer so I'm just
    muddling along and don't really know if I'm doing things the best way.

    Thanks again for your help. It's given me a few more things to think
    about.

    Helen
    Helen Guest

  9. #9

    Default Re: finding subdirectories without parsing every file

    [email]helenhelephant.com[/email] (Helen) wrote in message news:<33517f44.0308142312.52443236posting.google. com>...
    <snip>
    > On the page where I'm doing the recursing it prints out only the
    > subdirectories of the site that don't belong to another subsite. So
    > it's really only looking at a small part of the tree. The problem is
    > that "small" is a relative term. I'm testing it with a subsite that
    > has 800 subdirectories (and over 9000 files) as a worst case scenario
    > (which isn't the biggest site on the server). I'm not sure I'll be
    > able to get the load time to anywhere near 10 seconds, but I like
    > working with such a large site because the effects of changing parts
    > of the script are exagerated.
    If you want to do benchmarking, you can use the Benchmark module.
    This should give you a snapshot of how the changes you make far as far
    as time and CPU are concerned.
    >
    > The subsites are stored in a database, but the first thing I did was
    > make sure that all the database accesses happened at the same time. So
    > there are only two calls to the database (no matter how big the tree
    > gets) and they both use the same database handle. The database stuff
    > happens before I go looking for the subdirectories.
    >
    > my $nodePath = "$basePath/".$node->getDirectory();
    > find(\&wanted, "$basePath/".$node->getDirectory());
    >
    > sub wanted {
    > my $currentFile = $File::Find::name;
    > if(-d $currentFile) {
    > if($currentFile ne $nodePath) {
    > my $newDir = $currentFile;
    > $newDir =~ s/$basePath\///;
    >
    > # if this directory is actually a site,
    > # we only want to recurse it
    > # if we're told to by the recurseSubSites parameter
    > if(!$siteTree->isNodeSite($newDir)) {
    > # if this directory isn't a site,
    > # add the directory to the site tree
    > $siteTree->addDirectory($newDir);
    > } elsif(!$recurseSubSites) {
    > # we don't want to recurse any of this directory's subdirs
    > $File::Find::prune = 1;
    > } # end if
    > } # end if
    > } # end if
    > } # end wanted
    At first glance, it appears that you have everything in place to do
    what you want. Just a suggestion - given the amount of files you are
    dealing with and what you want the end result to look like, have you
    considered writing out to file in, maybe XML or CSV? This would free
    up memory and save the information you already processed in the event
    your script is killed for some reason. Then you could also just
    process the directories with one script and do something with the
    results with another. Again, it's just a suggestion and may lead to
    other issues.

    HTH

    Jim
    James Willmore Guest

  10. #10

    Default Re: finding subdirectories without parsing every file

    <snip>
    > If you want to do benchmarking, you can use the Benchmark module.
    > This should give you a snapshot of how the changes you make far as far
    > as time and CPU are concerned.
    Just wanted to thank you for this suggestion. It's made my
    optimisation a *lot* easier. :)

    Helen
    Helen Guest

Similar Threads

  1. finding a file by name
    By joonstar in forum Macromedia ColdFusion
    Replies: 0
    Last Post: April 7th, 09:49 PM
  2. finding the physical path of an include file when parsing
    By advertis in forum PHP Development
    Replies: 2
    Last Post: October 23rd, 12:22 AM
  3. Parsing fields in a log file
    By Perl in forum PERL Beginners
    Replies: 1
    Last Post: September 25th, 12:07 AM
  4. random file from subdirectories?
    By Oliver Spiesshofer in forum PHP Development
    Replies: 1
    Last Post: September 13th, 05:24 PM
  5. Replies: 0
    Last Post: July 30th, 07:22 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139