Ask a Question related to PostgreSQL / PGSQL, Design and Development.
-
Greer, Doug [NTK] #1
hundreds of millions row dBs
Hello all,
I am interested in using Postgresql for a dB of hundreds of
millions of rows in several tables. The COPY command seems to be way
too slow. Is there any bulk import program similar to Oracle's SQL
loader for Postgresql?
Sincerely,
Doug Greer
Greer, Doug [NTK] Guest
-
Hundreds of .Lck files that we didn't lock
I manage an intranet and one of my users is having a nightmare of a problem. She doesn't know what she did wrong, but Contribute has copied hundreds... -
buildingcontrols: 4 Millions Domains data with Category
Successfull Internet and Direct Marketing products on www.promotionsite.net * NEW * DOMUS Domains Toolkit Fall 2004 - Unique on the Net 4... -
FREE HOMEBUSINESS-THE GURUS ARE MAKING MILLIONS
AS ANYONE SHOULD KNOW,YOU SHOULDN'T HAVE TO PAY TO BE IN BUSINESS.....SO WATCH YOURSELF... YOU WOULDN'T GO TO K-MART OR SHOPKO AND PAY TO WORK... -
Adding extensions to hundreds MAC files
This has been answered before but I can't find it. (Thought I bookmarked it but obviously didn't!) Len Hewitt gave an answer using a DOS Command. ... -
Fastest way to create web image gallery from hundreds of Pictures???
In Dreamweaver MX from the Commands menu you have the command called "Create Web Photo Album..." which is similar. If you don't have Dreamweaver MX... -
Guy Rouillier #2
Re: hundreds of millions row dBs
Greer, Doug wrote:
We're getting about 64 million rows inserted in about 1.5 hrs into a> Hello all,
> I am interested in using Postgresql for a dB of hundreds of
> millions of rows in several tables. The COPY command seems to be way
> too slow. Is there any bulk import program similar to Oracle's SQL
> loader for Postgresql? Sincerely,
> Doug Greer
table with a multiple-column primary key - that's the only index.
That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
same job.
--
Guy Rouillier
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [email]majordomo@postgresql.org[/email] so that your
message can get through to the mailing list cleanly
Guy Rouillier Guest
-
Tom Lane #3
Re: hundreds of millions row dBs
"Guy Rouillier" <guyr@masergy.com> writes:
> Greer, Doug wrote:>> I am interested in using Postgresql for a dB of hundreds of
>> millions of rows in several tables. The COPY command seems to be way
>> too slow. Is there any bulk import program similar to Oracle's SQL
>> loader for Postgresql? Sincerely,If you're talking about loading into an initially empty database, it's> We're getting about 64 million rows inserted in about 1.5 hrs into a
> table with a multiple-column primary key - that's the only index.
> That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
> same job.
worth a try to load into bare tables and then create indexes and add
foreign key constraints. Index build and FK checking are both
significantly faster as "bulk" operations than "incremental". Don't
forget to pump up sort_mem as much as you can stand in the backend doing
such chores, too.
I have heard of people who would actually drop and recreate indexes
and/or FKs when adding a lot of data to an existing table.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
[url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]
Tom Lane Guest
-
Wes #4
Re: hundreds of millions row dBs
> We're getting about 64 million rows inserted in about 1.5 hrs into a
As I recall, the last time we rebuilt our database, it took about 3 hours to> table with a multiple-column primary key - that's the only index.
> That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
> same job.
import 265 million rows of data. It then took another 16 hours to rebuild
all the indexes. I think the entire pg_dumpall/reload process took about 21
hours +/-. I wonder what it will be like with 1.5 billion rows...
Wes
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Wes Guest
-
Dann Corbit #5
Re: hundreds of millions row dBs
-----Original Message-----
From: [email]pgsql-general-owner@postgresql.org[/email]
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Wes
Sent: Tuesday, January 04, 2005 8:59 AM
To: Guy Rouillier; [email]pgsql-general@postgresql.org[/email]; Greer, Doug [NTK]
Subject: Re: [GENERAL] hundreds of millions row dBs
the> We're getting about 64 million rows inserted in about 1.5 hrs into a
> table with a multiple-column primary key - that's the only index.
> That's seems pretty good to me - SQL Loader takes about 4 hrs to doAs I recall, the last time we rebuilt our database, it took about 3> same job.
hours to
import 265 million rows of data.24537 rows per second.>>
<<
It then took another 16 hours to rebuild
all the indexes. I think the entire pg_dumpall/reload process took
about 21
hours +/-. I wonder what it will be like with 1.5 billion rows...Load will probably scale linearly, so I think you could just multiply by>>
5.66 go get 17 hours to load.
Building indexes is likely to be at least n*log(n) and maybe even n^2.
For sure, it would take a whole weekend.
Here is an instance where a really big ram disk might be handy.
You could create a database on a big ram disk and load it, then build
the indexes.
Then shut down the database and move it to hard disk.
It might save a few days of effort if you have billions of rows to load.
<<
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Dann Corbit Guest
-
Tom Lane #6
Re: hundreds of millions row dBs
Wes <wespvp@syntegra.com> writes:
Out of curiosity, what value of sort_mem were you using?> As I recall, the last time we rebuilt our database, it took about 3 hours to
> import 265 million rows of data. It then took another 16 hours to rebuild
> all the indexes.
(In PG 8.0, the sort memory setting used by CREATE INDEX will be
maintenance_work_mem not work_mem, which should help in getting larger
values to be used. But in existing releases you usually need to think
about a manual tweak.)
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Tom Lane Guest
-
Tom Lane #7
Re: hundreds of millions row dBs
"Dann Corbit" <DCorbit@connx.com> writes:
Actually, if you have a RAM disk, just change the $PGDATA/base/nnn/pgsql_tmp> Here is an instance where a really big ram disk might be handy.
> You could create a database on a big ram disk and load it, then build
> the indexes.
> Then shut down the database and move it to hard disk.
subdirectory into a symlink to some temp directory on the RAM disk.
Should get you pretty much all the win with no need to move stuff around
afterwards.
You have to be sure the RAM disk is bigger than your biggest index though.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
[url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]
Tom Lane Guest
-
Pierre-Frédéric Caillaud #8
Re: hundreds of millions row dBs
To speed up load :
- make less checkpoints (tweak checkpoint interval and other parameters
in config)
- disable fsync (not sure if it really helps)
- have source data, database tables, and log on three physically
different disks
- have the temporary on a different disk too, or in ramdisk
- gunzip while restoring to read less data from the disk
> "Dann Corbit" <DCorbit@connx.com> writes:>>> Here is an instance where a really big ram disk might be handy.
>> You could create a database on a big ram disk and load it, then build
>> the indexes.
>> Then shut down the database and move it to hard disk.
> Actually, if you have a RAM disk, just change the
> $PGDATA/base/nnn/pgsql_tmp
> subdirectory into a symlink to some temp directory on the RAM disk.
> Should get you pretty much all the win with no need to move stuff around
> afterwards.
>
> You have to be sure the RAM disk is bigger than your biggest index
> though.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]
>
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [email]majordomo@postgresql.org[/email]
Pierre-Frédéric Caillaud Guest
-
Wes #9
Re: hundreds of millions row dBs
> Out of curiosity, what value of sort_mem were you using?
Normally it is set to 65535. However, during the load I bump it up to>
> (In PG 8.0, the sort memory setting used by CREATE INDEX will be
> maintenance_work_mem not work_mem, which should help in getting larger
> values to be used. But in existing releases you usually need to think
> about a manual tweak.)
655350. The system has 2GB ECC memory.
I'm afraid we don't have quite that much RAM... With just under 400 million> Here is an instance where a really big ram disk might be handy.
> You could create a database on a big ram disk and load it, then build
> the indexes.
rows right now, it is 74 GB. That will probably grow to around 300 GB or so
before it stabilizes.
Hmm. That's a thought. I expect our largest index will still be bigger> Actually, if you have a RAM disk, just change the $PGDATA/base/nnn/pgsql_tmp
> subdirectory into a symlink to some temp directory on the RAM disk.
> Should get you pretty much all the win with no need to move stuff around
> afterwards.
>
> You have to be sure the RAM disk is bigger than your biggest index though.
than available RAM though. How can I check index sizes?
We already have pg_xlog on a dedicated mirrored disk. Would it help
significantly to give pgsql_tmp its own mirrored disk? PGDATA is on an 8
disk hardware RAID 5.
Wes
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
[url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]
Wes Guest



Reply With Quote

