Ask a Question related to UNIX Programming, Design and Development.
-
Sam Zoghaib #1
Acessing files without directory structure information
Hello,
I am looking for a way to read a file in a directory (and its subdirectories)
without knowing its contents.
Practically, I need to choose a file randomly in a directory which may
contains an awful lot, so I cannot build a representation of its contents,
unless the time and memory needed to build it does not depend on the number
or nature of the files.
Are there any standard Unix routines that provide such functionality?
Thank you,
Sam
--
"So if you meet me, have some courtesy, have some sympathy, and some taste
Use all your well-learned politesse, or I'll lay your soul to waste"
- The Rolling Stones, "Sympathy for the Devil"
Sam Zoghaib Guest
-
Directory Structure FOR CF (CFC's)
My cfm pages cannot see my CFC's. CFC is new to the client. What does the directory structure look like? Do I need to create a separate folder for... -
Directory structure workaround
Hi, For hit tracking purposes I've been asked if it is possible to turn our current URLs into something more friendly to analysis tools. The... -
Server directory structure
I am setting up a ASP.NET intranet site. I will be using VS 2003 as development tool. I am not sure about what should be the best directory... -
[PHP] Linking to files outside the directory structure...?
Hello, This is very much a possibility. first you 'fopen' the file and then you do an 'fpassthru' it will send the file to the client. You do... -
Backup directory structure
Hi .. I copied the directory structure to CD .. The directory Structure looks like this .. ( This directory structure I used to create the DB... -
Paul Pluzhnikov #2
Re: Acessing files without directory structure information
Sam Zoghaib <NOsam@zep.homedns.orgSPAM> writes:
If you already knew its contents, why would you want to read it (again)?> I am looking for a way to read a file in a directory (and its subdirectories)
> without knowing its contents.
Huh?> Practically, I need to choose a file randomly in a directory which may
> contains an awful lot, so I cannot build a representation of its contents,
> unless the time and memory needed to build it does not depend on the number
> or nature of the files.
Probably. But your description of what you actually want to do is> Are there any standard Unix routines that provide such functionality?
not quite clear.
If what you want to do is:
- given a (possibly large) directory, select a random file and read it
- without any knowledge of how many files there are
- with equal probability that any given file is chosen
- without scanning the directory twice
- assuming that the directory is not modified while you are in the
process of selection.
then:
open directory;
target = next file;
do {
nextfile = next file;
taget = select from target or nextfile with probabilty 1/2;
} while (more files exist);
close directory;
open target;
Note: any design that requires directories with large number of
files in them is probably not going to give you good performance.
Cheers,
--
In order to understand recursion you must first understand recursion.
Paul Pluzhnikov Guest
-
Lorinczy Zsigmond #3
Re: Acessing files without directory structure information
Sam Zoghaib wrote:
Yes, there are complete virus-creating libraries, with functions> Hello,
>
> I am looking for a way to read a file in a directory (and its subdirectories)
> without knowing its contents.
> Practically, I need to choose a file randomly in a directory which may
> contains an awful lot, so I cannot build a representation of its contents,
> unless the time and memory needed to build it does not depend on the number
> or nature of the files.
> Are there any standard Unix routines that provide such functionality?
find_an_executable, infect and so on
Lorinczy Zsigmond Guest
-
Sam Zoghaib #4
Re: Acessing files without directory structure information
Paul Pluzhnikov wrote in article <uvfu7qiwy.fsfYB8X@earthlink.net> on Sunday
13 July 2003 05:02 in comp.unix.programmer:
yes> If what you want to do is:
>
> - given a (possibly large) directory, select a random file and read it
> - without any knowledge of how many files there are
yes> - with equal probability that any given file is chosen
Actually, without scanning it even once. That's why I asked how I could make a> - without scanning the directory twice
routine which reads a directory whose contents you have no information about.
Yes, though I forgot to mention it.> - assuming that the directory is not modified while you are in the
> process of selection.
The problem here is that you are scanning the directory.> then:
>
> open directory;
> target = next file;
> do {
> nextfile = next file;
> taget = select from target or nextfile with probabilty 1/2;
> } while (more files exist);
> close directory;
> open target;
Which is why I am looking for a routine that does not scan the directory.> Note: any design that requires directories with large number of
> files in them is probably not going to give you good performance.
>
I was thinking of picking a small chunk of data in the directory (small enough
to be sure it belongs to only one file), then retrieving what file it belongs
to (without scanning the whole directory of course) to read it from start to
end. I have no idea of how to do this though.
Thanks,
Sam
--
"One cannot be betrayed if one has no people"
- Kobayashi in "The Usual Suspects", 1995
Sam Zoghaib Guest
-
Erwin S. Andreasen #5
Re: Acessing files without directory structure information
Paul Pluzhnikov <ppluzhnikov@earthlink.net> writes:
[...]
> If what you want to do is:
>
> - given a (possibly large) directory, select a random file and read it
> - without any knowledge of how many files there are
> - with equal probability that any given file is chosen
> - without scanning the directory twice
> - assuming that the directory is not modified while you are in the
> process of selection.That would have a flat 50% chance of selecting the very last file. I> open directory;
> target = next file;
> do {
> nextfile = next file;
> taget = select from target or nextfile with probabilty 1/2;
> } while (more files exist);
> close directory;
> open target;
think you meant: set target to the current file with 1/n probability,
where n is the number of files seen so far, including this one.
In that case, the last, n-th, file has a 1/n chance of being selected.
The next-to-last file has a 1/(n-1) chance of being selected but 1/n
of those 1/(n-1) times the last file will be selected later -- so the
actual chance is:
1/(n-1) - 1/n * 1/(n-1) = n/n(n-1) - 1/n(n-1) = n-1/n(n-1) = 1/n
--
================================================== =============
<erwin@andreasen.org> Herlev, Denmark
<URL:http://www.andreasen.org/> <*>
================================================== =============
Erwin S. Andreasen Guest
-
John W. Krahn #6
Re: Acessing files without directory structure information
Sam Zoghaib wrote:
You will have to scan it at least once as users don't have low-level>
> Paul Pluzhnikov wrote in article <uvfu7qiwy.fsfYB8X@earthlink.net> on Sunday
> 13 July 2003 05:02 in comp.unix.programmer:
>>> > If what you want to do is:
> >
> > - given a (possibly large) directory, select a random file and read it
> > - without any knowledge of how many files there are
> yes
>>> > - with equal probability that any given file is chosen
> yes
>>> > - without scanning the directory twice
> Actually, without scanning it even once. That's why I asked how I could make a
> routine which reads a directory whose contents you have no information about.
access to the directory "file" to access it randomly.
This will pick a random file from a directory, but you still have to>> > - assuming that the directory is not modified while you are in the
> > process of selection.
> Yes, though I forgot to mention it.
>>> > then:
> >
> > open directory;
> > target = next file;
> > do {
> > nextfile = next file;
> > taget = select from target or nextfile with probabilty 1/2;
> > } while (more files exist);
> > close directory;
> > open target;
> The problem here is that you are scanning the directory.
>> Which is why I am looking for a routine that does not scan the directory.> > Note: any design that requires directories with large number of
> > files in them is probably not going to give you good performance.
> >
>
> I was thinking of picking a small chunk of data in the directory (small enough
> to be sure it belongs to only one file), then retrieving what file it belongs
> to (without scanning the whole directory of course) to read it from start to
> end. I have no idea of how to do this though.
scan the entire directory.
#!/usr/bin/perl
my $dir = shift or die "usage: $0 directory-name\n";
opendir my $dh, $dir or die "Cannot open $dir: $!";
my ( $n, $entry );
rand( ++$n ) < 1 && ( $entry = $_ ) while $_ = readdir $dh;
print "$entry\n";
__END__
John
--
use Perl;
program
fulfillment
John W. Krahn Guest
-
Paul Pluzhnikov #7
Re: Acessing files without directory structure information
"Erwin S. Andreasen" <erwin@andreasen.com> writes:
> > taget = select from target or nextfile with probabilty 1/2;Oops. My mistake. You are correct.> That would have a flat 50% chance of selecting the very last file. I
> think you meant: set target to the current file with 1/n probability,
> where n is the number of files seen so far, including this one.
[Original algorithm from "The Practice of Programming" by Kernighan
and Pike, p. 70]
Cheers,
--
In order to understand recursion you must first understand recursion.
Paul Pluzhnikov Guest
-
Paul Pluzhnikov #8
Re: Acessing files without directory structure information
Sam Zoghaib <NOsam@zep.homedns.orgSPAM> writes:
I do not believe it is possible to guarantee this:> Actually, without scanning it even once. That's why I asked how I could make a
> routine which reads a directory whose contents you have no information about.
without scanning through the whole directory once.> > - with equal probability that any given file is chosen
Even if this were possible (which it isn't, since "directory data"> I was thinking of picking a small chunk of data in the directory (small enough
> to be sure it belongs to only one file), then retrieving what file it belongs
> to (without scanning the whole directory of course) to read it from start to
> end. I have no idea of how to do this though.
does not contain any "file data" at all), this would violate the
"equal probability of any given file", unless you also know that
all files in the directory are the same size.
Your requirements appear to be extremely strange. What is it that
you are *actually* trying to achieve?
Cheers,
--
In order to understand recursion you must first understand recursion.
Paul Pluzhnikov Guest
-
Sam Zoghaib #9
Re: Acessing files without directory structure information
Paul Pluzhnikov wrote in article <uptkeqb8x.fsfYB8X@earthlink.net> on Monday
14 July 2003 01:02 in comp.unix.programmer:
You're right. Sorry for the mistake.> Even if this were possible (which it isn't, since "directory data"
> does not contain any "file data" at all), this would violate the
> "equal probability of any given file", unless you also know that
> all files in the directory are the same size.
I'm trying to make a program which can randomly read data from directories> Your requirements appear to be extremely strange. What is it that
> you are *actually* trying to achieve?
which can contains lots of files, in reasonable time and memory usage.
Statistics may be an application.
Actually, it may be possible to drop the "no information about the structure
of the directory" condition, if I had a way to get the needed information
(basically the list of files) very fast even for a large amount of files and
can hold this information in little memory.
I'm thinking, if I can represent each file with 2 integers (one for the
directory it is in and one for the file itself) I could handle a very large
number of files with little memory. However, it does not sove the time
problem.
Sam
--
"People sometimes ask me if it is a sin in the Church of Emacs to use vi.
Using a free version of vi is not a sin; it's a penance".
- Richard Stallman
Sam Zoghaib Guest
-
Barry Margolin #10
Re: Acessing files without directory structure information
In article <bestf2$ht2$1@news-reader6.wanadoo.fr>,
Sam Zoghaib <NOsam@zep.homedns.orgSPAM> wrote:Maybe if you explained why you need to do this, we might be able to suggest>I'm trying to make a program which can randomly read data from directories
>which can contains lots of files, in reasonable time and memory usage.
>Statistics may be an application.
an alternative solution.
--
Barry Margolin, [email]barry.margolin@level3.com[/email]
Level(3), Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Barry Margolin Guest
-
Chuck Dillon #11
Re: Acessing files without directory structure information
Sam Zoghaib wrote:
A directory is just a file and you can open it for reading as raw data.>
> Actually, it may be possible to drop the "no information about the structure
> of the directory" condition, if I had a way to get the needed information
> (basically the list of files) very fast even for a large amount of files and
> can hold this information in little memory.
The filenames are what you are looking for. They are strings and can
be found heuristically. For example on a non-Linux systems, try doing
'strings -t d /'. It will print an index containing the byte offsets
of the filenames in the / directory.
You could generate such an index and randomly select from it. Or you
could randomly select an offset into the directory and write a function
to find the closest string. Once found use the string to open whatever
file it names.
Note that how you find the strings would be filesystem specific.
-- ced
--
Chuck Dillon
Senior Software Engineer
NimbleGen Systems Inc.
Chuck Dillon Guest
-
Eric Sosman #12
Re: Acessing files without directory structure information
Paul Pluzhnikov wrote:
This will select a file "at random," but not with equal>
> If what you want to do is:
>
> - given a (possibly large) directory, select a random file and read it
> - without any knowledge of how many files there are
> - with equal probability that any given file is chosen
> - without scanning the directory twice
> - assuming that the directory is not modified while you are in the
> process of selection.
>
> then:
>
> open directory;
> target = next file;
> do {
> nextfile = next file;
> taget = select from target or nextfile with probabilty 1/2;
> } while (more files exist);
> close directory;
> open target;
probability. It will choose the very last file half the time,
the penultimate one-quarter of the time, the antepenultimate
one-eighth of the time, and so on. The first and second are
chosen with equal probability 1/(2**(N-1)) if there are N files
altogether. If each execution of the algorithm takes one
nanosecond and the directory contains 100 files, you should
expect to wait about eight hundred times the current age of
the Universe before either the first or second file will be
chosen (don't hold your breath).
One way to choose each file with probability 1/N and
without (much) advance knowledge of N is to use a random
number generator that produces N or more distinct values
before repeating. It's not hard to implement a linear
congruential generator with a period of four billion or
so, so as long as the directory holds fewer than four
billion files you'll be fine (this upper bound is the small
amount of advance knowledge you need about the value of N).
open directory
target = next file;
key = next random value;
while (more files exist) {
nextfile = next file;
nextkey = next random value;
if (nextkey < key) {
target = nextfile;
key = nextkey;
}
}
close directory;
open target;
That is, you "attach" a different pseudo-random number to
each file, and you choose the file whose attached number is
smallest. If the generator is any good at all, the smallest
of N consecutive values is as likely to be in one position
as in another; the first, second, ..., Nth files are all
equally likely to be associated with the minimum value.
--
[email]Eric.Sosman@sun.com[/email]
Eric Sosman Guest
-
Sam Zoghaib #13
Re: Acessing files without directory structure information
Chuck Dillon wrote in article <3F12D037.2020601@nimblegen.com> on Monday 14
July 2003 17:45 in comp.unix.programmer:
That's what I'm gonna try to do.> A directory is just a file and you can open it for reading as raw data.
> The filenames are what you are looking for. They are strings and can
> be found heuristically. For example on a non-Linux systems, try doing
> 'strings -t d /'. It will print an index containing the byte offsets
> of the filenames in the / directory.
>
> You could generate such an index and randomly select from it.
That's not possible, because this way, the bigger the file, the higher> Or you
> could randomly select an offset into the directory and write a function
> to find the closest string. Once found use the string to open whatever
> file it names.
>
probability to be selected.
Sam
--
"Giving the Linus Torvalds Award to the Free Software Foundation is a bit like
giving the Han Solo Award to the Rebel Alliance"
- Richard Stallman, August 1999
Sam Zoghaib Guest
-
Larry Doolittle #14
Re: Acessing files without directory structure information
In article <3F12D037.2020601@nimblegen.com>, Chuck Dillon wrote:
Depending on your system and definition of "very fast", you could use> Sam Zoghaib wrote:>>>
>> Actually, it may be possible to drop the "no information about the structure
>> of the directory" condition, if I had a way to get the needed information
>> (basically the list of files) very fast even for a large amount of files and
>> can hold this information in little memory.
> A directory is just a file and you can open it for reading as raw data.
> The filenames are what you are looking for.
C's readdir(), which is POSIX standardized. Peeking at the raw data
may not be much of a speed improvement, at least for some systems,
and is horribly non-portable.
- Larry
Larry Doolittle Guest
-
Barry Margolin #15
Re: Acessing files without directory structure information
In article <slrnbh5q86.teg.ldoolitt@recycle.lbl.gov>,
Larry Doolittle <ldoolitt@recycle.lbl.gov> wrote:The OP specifically said he doesn't want to scan the directory sequentially>Depending on your system and definition of "very fast", you could use
>C's readdir(), which is POSIX standardized.
(because it's so big). How do you propose using readdir() to accomplish
his goal?
--
Barry Margolin, [email]barry.margolin@level3.com[/email]
Level(3), Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Barry Margolin Guest
-
Sam Zoghaib #16
Re: Acessing files without directory structure information
Thanks to everybody for the useful replies.
I have studied the different solutions proposed, and it seems that it's not
really possible to address all requirements without scanning the directory
once. Indeed, if I cannot scan the directory, I don't see how I'm gonna be
able to pick among all subdirectories in equal probability. (determining
after the files are picked which are directories and picking among them fails
the equal probability requirement).
So I guess I'll have to cope with a scan, and find a way to store references
to files in the least memory I can.
Thanks again for all your help.
Sam
--
"Don't be afraid, I'm gonna give you the choice I never had..."
- Lestat in "Interview with the Vampire" (Ann Rice, 1976)
Sam Zoghaib Guest
-
David Schwartz #17
Re: Acessing files without directory structure information
"Sam Zoghaib" <NOsam@zep.homedns.orgSPAM> wrote in message
news:bf0mdp$8e4$1@news-reader3.wanadoo.fr...
references> So I guess I'll have to cope with a scan, and find a way to storeWhy do you need to store references in memory at all? Do you care about> to files in the least memory I can.
files other than the one you picked?
DS
David Schwartz Guest



Reply With Quote

