How to delete duplicate rows?

Ask a Question related to PostgreSQL / PGSQL, Design and Development.

  1. #1

    Default How to delete duplicate rows?

    This one must be obvious for most here.

    I have a 170 million rows table from which I want to eliminate
    duplicate "would be" keys and leave only uniques.

    I found a query in [url]http://www.jlcomp.demon.co.uk/faq/duplicates.html[/url]
    for the oracle database but can't figure out how to refer to the row
    id in postgresql:

    delete from test where rowid not in
    (select min(rowid) from test group by a,b);

    How to refer to the row id? Any better way to do it?

    Regards, Clodoaldo Pinto

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomo@postgresql.org[/email]

    Clodoaldo Pinto Guest

  2. Similar Questions and Discussions

    1. Help with delete duplicate locked files
      I'm kinda new at this so forgive me if I'm a bonehead. I made a contribute site, then wanted to disable check in check out. All the files have...
    2. MySql UPDATE problem with duplicate rows
      Hi, I have a MySql problem I hope someone can help me with. I'm trying to run an update on a linking table, the update is running into a Primary...
    3. how to eliminate duplicate rows?
      Hi I have one table with two field in sql server 2000. Table data looks like this: Field1 Field2 Admin Active Admin Passive Sales ...
    4. need help figuring out how to delete rows?
      Hi Im making a chatroom using flash and php, the one problem Im having is deleting rows. What I'm looking for is when they click on the button to...
    5. Could anyone help me to remove duplicate rows?
      I have a table that has more than 1 milion rows so practically it is impossible to remove all duplicate rows by hand. Could you help me to remove...
  3. #2

    Default Re: How to delete duplicate rows?

    Can you not use your table's primary key value instead?

    If you table is created with OIDs you may be able to use those -
    although I don't know if that this advisable or not since I never use
    OIDs...

    John Sidney-Woollett

    Clodoaldo Pinto wrote:
    > This one must be obvious for most here.
    >
    > I have a 170 million rows table from which I want to eliminate
    > duplicate "would be" keys and leave only uniques.
    >
    > I found a query in [url]http://www.jlcomp.demon.co.uk/faq/duplicates.html[/url]
    > for the oracle database but can't figure out how to refer to the row
    > id in postgresql:
    >
    > delete from test where rowid not in
    > (select min(rowid) from test group by a,b);
    >
    > How to refer to the row id? Any better way to do it?
    >
    > Regards, Clodoaldo Pinto
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 1: subscribe and unsubscribe commands go to [email]majordomo@postgresql.org[/email]
    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email]majordomo@postgresql.org[/email] so that your
    message can get through to the mailing list cleanly

    John Sidney-Woollett Guest

  4. #3

    Default Re: How to delete duplicate rows?

    Doh, sorry - you're completely correct! Silly me...

    Can you not add a serial or sequence column to the table for the
    purposes of the de-dupe?

    Then create an index on that column in one operation at the end and use
    that in the way that you would use Oracle's rowid from the examples?

    John Sidney-Woollett


    Clodoaldo Pinto wrote:
    > On Fri, 04 Feb 2005 07:38:26 +0000, John Sidney-Woollett
    > <johnsw@wardbrook.com> wrote:
    >
    >>Can you not use your table's primary key value instead?
    >>
    >
    > John, If a primary key existed there would be no duplicates.
    >
    > The row insertion is made in 650k rows batches 8 times a day and a
    > primary key make it very slow.
    >
    > Found also methods 2 and 3 in
    > [url]http://www.orafaq.com/faq/Server_Utilities/SQL/faq55.htm[/url]
    >
    > Method 3 also relies in the row id. If no one can help I will do this:
    >
    > Insert the distinct rows in a temporary table. Drop the index. Insert
    > into the original from the temporary.
    >
    > Clodoaldo
    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    John Sidney-Woollett Guest

  5. #4

    Default Re: How to delete duplicate rows?

    On Thu, Feb 03, 2005 at 23:04:57 -0200,
    Clodoaldo Pinto <clodoaldo.pinto@gmail.com> wrote:
    > This one must be obvious for most here.
    >
    > I have a 170 million rows table from which I want to eliminate
    > duplicate "would be" keys and leave only uniques.
    >
    > I found a query in [url]http://www.jlcomp.demon.co.uk/faq/duplicates.html[/url]
    > for the oracle database but can't figure out how to refer to the row
    > id in postgresql:
    >
    > delete from test where rowid not in
    > (select min(rowid) from test group by a,b);
    >
    > How to refer to the row id? Any better way to do it?
    Your best bet is to use oid if the table has them. If it doesn't you
    might want to use ctid. That will require creating some comparison
    functions that don't exist by default. By the time you write and test
    those, the select distinct method you actually used might be better.

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomo@postgresql.org[/email]

    Bruno Wolff III Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139