Professional Web Applications Themes

Bulk inserts with update - MySQL

Hi All, I'm working on a project that is using a MySQL database. Every hour some bulk data is processed and then inserted into a database. The database looks like this: CREATE TABLE mytable { data1 char(255), data2 char(255), int1 tinyint unsigned, int2 int unsigned, int3 int unsigned, PRIMARY KEY(data1, data2, int1) ) TYPE=MyISAM; Now, typically I'm inserting all this data into the database nicely. The problem happens when I already have an entry in the database (with the same primary key). In that case I have to assign int2 to the one from bulk data and add int3 with ...

  1. #1

    Default Bulk inserts with update

    Hi All,

    I'm working on a project that is using a MySQL database. Every hour some
    bulk data is processed and then inserted into a database.

    The database looks like this:

    CREATE TABLE mytable {
    data1 char(255),
    data2 char(255),
    int1 tinyint unsigned,
    int2 int unsigned,
    int3 int unsigned,
    PRIMARY KEY(data1, data2, int1)
    ) TYPE=MyISAM;


    Now, typically I'm inserting all this data into the database nicely. The
    problem happens when I already have an entry in the database (with the
    same primary key). In that case I have to assign int2 to the one from
    bulk data and add int3 with the one from bulk data.
    Thanks to ON DUPLICATE KEY, this is straightforward:

    INSERT INTO mytable (data1, data2, int1, int2, int3) VALUES (newdata1,
    newdata2, newint1, newint2, newint3) ON DUPLICATE KEY UPDATE int2 =
    newint2, int3 = int3 + newint3;


    The problem here is that when the table gets pretty big, the performance
    goes down - the table currently has about 9 million rows and my typical
    bulk imports have around 60.000 rows - this takes about 10 minutes to
    import.


    What would be the best way to optimize this? I tried creating a
    temporary table in memory and then using select from that table to
    update the real one, but I didn't get anything improved speed wise.


    What would experts here recommend?


    Thanks a lot.

    user Guest

  2. #2

    Default Re: Bulk inserts with update

    user wrote: 

    Sounds like a bad design to have a primary key on a bulk load-type table -
    especially if you *expect* more than one record for that primary key. Drop the
    primary key and just add a "duplicates allowed" index on (data1, data2, int1).
    Now you will be able to process this data and aggregate the "duplicates" during
    post-processing. If you need "order" add a timestamp field with default value
    of the current system time. You could now potentially drop columns int2 and int3.

    "been there, done that!"

    --
    Michael Austin.
    Database Consultant
    Michael Guest

  3. #3

    Default Re: Bulk inserts with update

    On Mon, 1 Jan 2007 00:01:18 +0000 (UTC), user wrote: 

    Compound primary keys may be a good idea somewhere, but I haven't been
    to that place yet. (This is a mild form of "This is never a good idea.")

    My usual primary key is a nice integer field of an appropriate width,
    that has absolutely no meaning other than being the primary key for that
    particular table. All the intertable relationships are maintained with
    keys to different fields than the primary. This means a little more
    manual work, but its work that's done more or less only once. Data gets
    backed up with the primary key, but never exported with it. This means
    your loading process will quickly and happily assign new primary keys to
    records, and your secondary keys (or "relationship keys") point you at
    the potential duplicate records you may need to reconcile.
     

    I wouldn't classify that as horrible -- You are, after all, doing one of
    the most complicated and least optimized way of getting data in.
     

    Separate out the primary key from your actually-significant data. Use
    "LOAD DATA INFILE" instead. Your 10-minute load will drop to a couple of
    operations that take mere seconds.

    --
    186,000 Miles per Second. It's not just a good idea. IT'S THE LAW.
    Peter Guest

Similar Threads

  1. Bulk eMailing
    By Whizzzper in forum Coldfusion - Getting Started
    Replies: 2
    Last Post: December 4th, 09:23 AM
  2. SQL Bulk Loader
    By acsdirect in forum Coldfusion Database Access
    Replies: 2
    Last Post: May 24th, 05:12 PM
  3. BULK INSERT
    By sampsas23 in forum Coldfusion - Getting Started
    Replies: 10
    Last Post: January 12th, 01:50 PM
  4. 1-to-Many Inserts
    By Fandomlife in forum Macromedia ColdFusion
    Replies: 3
    Last Post: March 29th, 09:15 PM
  5. Memory usage of Bulk Inserts
    By Connor McDonald in forum Oracle Server
    Replies: 0
    Last Post: December 5th, 08:59 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139