Professional Web Applications Themes

hundreds of millions row dBs - PostgreSQL / PGSQL

Hello all, I am interested in using Postgresql for a dB of hundreds of millions of rows in several tables. The COPY command seems to be way too slow. Is there any bulk import program similar to Oracle's SQL loader for Postgresql? Sincerely, Doug Greer...

  1. #1

    Default hundreds of millions row dBs

    Hello all,
    I am interested in using Postgresql for a dB of hundreds of
    millions of rows in several tables. The COPY command seems to be way
    too slow. Is there any bulk import program similar to Oracle's SQL
    loader for Postgresql?

    Sincerely,
    Doug Greer

    Greer, Doug [NTK] Guest

  2. #2

    Default Re: hundreds of millions row dBs

    Greer, Doug wrote:
    > Hello all,
    > I am interested in using Postgresql for a dB of hundreds of
    > millions of rows in several tables. The COPY command seems to be way
    > too slow. Is there any bulk import program similar to Oracle's SQL
    > loader for Postgresql? Sincerely,
    > Doug Greer
    We're getting about 64 million rows inserted in about 1.5 hrs into a
    table with a multiple-column primary key - that's the only index.
    That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
    same job.

    --
    Guy Rouillier

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email]majordomopostgresql.org[/email] so that your
    message can get through to the mailing list cleanly

    Guy Rouillier Guest

  3. #3

    Default Re: hundreds of millions row dBs

    "Guy Rouillier" <guyrmasergy.com> writes:
    > Greer, Doug wrote:
    >> I am interested in using Postgresql for a dB of hundreds of
    >> millions of rows in several tables. The COPY command seems to be way
    >> too slow. Is there any bulk import program similar to Oracle's SQL
    >> loader for Postgresql? Sincerely,
    > We're getting about 64 million rows inserted in about 1.5 hrs into a
    > table with a multiple-column primary key - that's the only index.
    > That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
    > same job.
    If you're talking about loading into an initially empty database, it's
    worth a try to load into bare tables and then create indexes and add
    foreign key constraints. Index build and FK checking are both
    significantly faster as "bulk" operations than "incremental". Don't
    forget to pump up sort_mem as much as you can stand in the backend doing
    such chores, too.

    I have heard of people who would actually drop and recreate indexes
    and/or FKs when adding a lot of data to an existing table.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?

    [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]

    Tom Lane Guest

  4. #4

    Default Re: hundreds of millions row dBs

    > We're getting about 64 million rows inserted in about 1.5 hrs into a
    > table with a multiple-column primary key - that's the only index.
    > That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
    > same job.
    As I recall, the last time we rebuilt our database, it took about 3 hours to
    import 265 million rows of data. It then took another 16 hours to rebuild
    all the indexes. I think the entire pg_dumpall/reload process took about 21
    hours +/-. I wonder what it will be like with 1.5 billion rows...

    Wes



    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Wes Guest

  5. #5

    Default Re: hundreds of millions row dBs



    -----Original Message-----
    From: [email]pgsql-general-ownerpostgresql.org[/email]
    [mailto:pgsql-general-ownerpostgresql.org] On Behalf Of Wes
    Sent: Tuesday, January 04, 2005 8:59 AM
    To: Guy Rouillier; [email]pgsql-generalpostgresql.org[/email]; Greer, Doug [NTK]
    Subject: Re: [GENERAL] hundreds of millions row dBs
    > We're getting about 64 million rows inserted in about 1.5 hrs into a
    > table with a multiple-column primary key - that's the only index.
    > That's seems pretty good to me - SQL Loader takes about 4 hrs to do
    the
    > same job.
    As I recall, the last time we rebuilt our database, it took about 3
    hours to
    import 265 million rows of data.
    >>
    24537 rows per second.
    <<

    It then took another 16 hours to rebuild
    all the indexes. I think the entire pg_dumpall/reload process took
    about 21
    hours +/-. I wonder what it will be like with 1.5 billion rows...
    >>
    Load will probably scale linearly, so I think you could just multiply by
    5.66 go get 17 hours to load.

    Building indexes is likely to be at least n*log(n) and maybe even n^2.
    For sure, it would take a whole weekend.

    Here is an instance where a really big ram disk might be handy.
    You could create a database on a big ram disk and load it, then build
    the indexes.
    Then shut down the database and move it to hard disk.
    It might save a few days of effort if you have billions of rows to load.
    <<

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Dann Corbit Guest

  6. #6

    Default Re: hundreds of millions row dBs

    Wes <wespvpsyntegra.com> writes:
    > As I recall, the last time we rebuilt our database, it took about 3 hours to
    > import 265 million rows of data. It then took another 16 hours to rebuild
    > all the indexes.
    Out of curiosity, what value of sort_mem were you using?

    (In PG 8.0, the sort memory setting used by CREATE INDEX will be
    maintenance_work_mem not work_mem, which should help in getting larger
    values to be used. But in existing releases you usually need to think
    about a manual tweak.)

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Tom Lane Guest

  7. #7

    Default Re: hundreds of millions row dBs

    "Dann Corbit" <DCorbitconnx.com> writes:
    > Here is an instance where a really big ram disk might be handy.
    > You could create a database on a big ram disk and load it, then build
    > the indexes.
    > Then shut down the database and move it to hard disk.
    Actually, if you have a RAM disk, just change the $PGDATA/base/nnn/pgsql_tmp
    subdirectory into a symlink to some temp directory on the RAM disk.
    Should get you pretty much all the win with no need to move stuff around
    afterwards.

    You have to be sure the RAM disk is bigger than your biggest index though.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?

    [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]

    Tom Lane Guest

  8. #8

    Default Re: hundreds of millions row dBs


    To speed up load :
    - make less checkpoints (tweak checkpoint interval and other parameters
    in config)
    - disable fsync (not sure if it really helps)
    - have source data, database tables, and log on three physically
    different disks
    - have the temporary on a different disk too, or in ramdisk
    - gunzip while restoring to read less data from the disk


    > "Dann Corbit" <DCorbitconnx.com> writes:
    >> Here is an instance where a really big ram disk might be handy.
    >> You could create a database on a big ram disk and load it, then build
    >> the indexes.
    >> Then shut down the database and move it to hard disk.
    >
    > Actually, if you have a RAM disk, just change the
    > $PGDATA/base/nnn/pgsql_tmp
    > subdirectory into a symlink to some temp directory on the RAM disk.
    > Should get you pretty much all the win with no need to move stuff around
    > afterwards.
    >
    > You have to be sure the RAM disk is bigger than your biggest index
    > though.
    >
    > regards, tom lane
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 5: Have you checked our extensive FAQ?
    >
    > [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]
    >


    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomopostgresql.org[/email]

    Pierre-Frédéric Caillaud Guest

  9. #9

    Default Re: hundreds of millions row dBs

    > Out of curiosity, what value of sort_mem were you using?
    >
    > (In PG 8.0, the sort memory setting used by CREATE INDEX will be
    > maintenance_work_mem not work_mem, which should help in getting larger
    > values to be used. But in existing releases you usually need to think
    > about a manual tweak.)
    Normally it is set to 65535. However, during the load I bump it up to
    655350. The system has 2GB ECC memory.
    > Here is an instance where a really big ram disk might be handy.
    > You could create a database on a big ram disk and load it, then build
    > the indexes.
    I'm afraid we don't have quite that much RAM... With just under 400 million
    rows right now, it is 74 GB. That will probably grow to around 300 GB or so
    before it stabilizes.
    > Actually, if you have a RAM disk, just change the $PGDATA/base/nnn/pgsql_tmp
    > subdirectory into a symlink to some temp directory on the RAM disk.
    > Should get you pretty much all the win with no need to move stuff around
    > afterwards.
    >
    > You have to be sure the RAM disk is bigger than your biggest index though.
    Hmm. That's a thought. I expect our largest index will still be bigger
    than available RAM though. How can I check index sizes?

    We already have pg_xlog on a dedicated mirrored disk. Would it help
    significantly to give pgsql_tmp its own mirrored disk? PGDATA is on an 8
    disk hardware RAID 5.

    Wes



    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?

    [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]

    Wes Guest

Similar Threads

  1. Hundreds of .Lck files that we didn't lock
    By abna in forum Macromedia Contribute General Discussion
    Replies: 2
    Last Post: September 28th, 09:58 PM
  2. Adding extensions to hundreds MAC files
    By 86 in forum Adobe Photoshop 7, CS, CS2 & CS3
    Replies: 12
    Last Post: August 5th, 04:56 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139