Professional Web Applications Themes

Finding duplicates without primary key - MySQL

Hi, is there a way to find all duplicates in a table without a primary key? Regards, André...

Sponsored Links
  1. #1

    Default Finding duplicates without primary key

    Hi,

    is there a way to find all duplicates in a table without a primary key?

    Regards,
    André


    Sponsored Links
    André Hänsel Guest

  2. #2

    Default Re: Finding duplicates without primary key

    "André Hänsel" <andrewebkr.de> wrote in message
    news:droqad$s8c$1sagnix.uni-muenster.de...
    > is there a way to find all duplicates in a table without a primary key?
    SELECT COUNT(col), col
    FROM myTable
    GROUP BY col
    HAVING COUNT(col) > 1

    Regards,
    Bill K.


    Bill Karwin Guest

  3. #3

    Default Re: Finding duplicates without primary key

    Bill Karwin wrote:
    > "André Hänsel" <andrewebkr.de> wrote in message
    > news:droqad$s8c$1sagnix.uni-muenster.de...
    >> is there a way to find all duplicates in a table without a primary
    >> key?
    >
    > SELECT COUNT(col), col
    > FROM myTable
    > GROUP BY col
    > HAVING COUNT(col) > 1
    No, this will give me distinct values of col, not the actual rows that are
    double.


    André Hänsel Guest

  4. #4

    Default Re: Finding duplicates without primary key

    "André Hänsel" <andrewebkr.de> wrote in message
    news:drrgpp$eqs$1sagnix.uni-muenster.de...
    > Bill Karwin wrote:
    >> "André Hänsel" <andrewebkr.de> wrote in message
    >> news:droqad$s8c$1sagnix.uni-muenster.de...
    >>> is there a way to find all duplicates in a table without a primary
    >>> key?
    >>
    >> SELECT COUNT(col), col
    >> FROM myTable
    >> GROUP BY col
    >> HAVING COUNT(col) > 1
    >
    > No, this will give me distinct values of col, not the actual rows that are
    > double.
    I see what you mean. How does one distinguish between two identical rows,
    if the table has no primary key?

    Here's a different method, using self-joins. It works nicely for testing
    for duplicates across multiple columns:

    CREATE TABLE foo (col INTEGER, col2 INTEGER, col3 INTEGER);

    INSERT INTO foo VALUES (1,1,1), (2,2,2), (2,2,2),
    (3,3,3), (3,3,3), (3,3,3), (4,4,4), (4,4,4), (4,4,4), (4,4,4);

    SELECT COUNT(*), a.*
    FROM foo AS a INNER JOIN foo AS b
    ON (a.col = b.col AND a.col2 = b.col2 AND a.col3 = b.col3)
    GROUP BY a.col, a.col2, a.col3
    HAVING COUNT(*) > 1;

    The result shows count of 4 for (2,2,2), 9 for (3,3,3), and 16 for (4,4,4).
    This is the square of the actual number of copies, because there's no way to
    prevent rows from being joined to themselves when there's no primary key.

    Now say you want to delete the duplicate rows and leave only one copy of
    each?
    One solution is to use DELETE with a LIMIT clause. For each row returned by
    the above SELECT, run this statement once, and provide the values as
    parameters.

    DELETE FROM myTable WHERE col = ? AND col2 = ? AND col3 = ? LIMIT 1

    So you would execute this once with parameter values 2,2,2, once with values
    3,3,3, and once with values 4,4,4.

    Then run the SELECT again and see if the duplicates have been eliminated.
    In this case, the 2,2,2 result should vanish. The 3,3,3 should find a count
    of 4, and the 4,4,4 should find a count of 9. You still have some
    duplicates, but they are fewer in number.

    If all the duplicates occured in sets of two, then the job should be
    complete after one pass. If the duplicates rows have three or more copies,
    then you may have to loop through the SELECT & DELETE test several times.
    Keep repeating this until the SELECT returns no rows.

    Does this help?

    Regards,
    Bill K.


    Bill Karwin Guest

  5. #5

    Default Re: Finding duplicates without primary key

    Bill Karwin wrote:
    > "André Hänsel" <andrewebkr.de> wrote in message
    > news:drrgpp$eqs$1sagnix.uni-muenster.de...
    >> Bill Karwin wrote:
    >>> "André Hänsel" <andrewebkr.de> wrote in message
    >>> news:droqad$s8c$1sagnix.uni-muenster.de...
    >>>> is there a way to find all duplicates in a table without a primary
    >>>> key?
    >>>
    >>> SELECT COUNT(col), col
    >>> FROM myTable
    >>> GROUP BY col
    >>> HAVING COUNT(col) > 1
    >>
    >> No, this will give me distinct values of col, not the actual rows
    >> that are double.
    >
    > I see what you mean. How does one distinguish between two identical
    > rows, if the table has no primary key?
    >
    > Here's a different method, using self-joins. It works nicely for
    > testing for duplicates across multiple columns:
    >
    > CREATE TABLE foo (col INTEGER, col2 INTEGER, col3 INTEGER);
    >
    > INSERT INTO foo VALUES (1,1,1), (2,2,2), (2,2,2),
    > (3,3,3), (3,3,3), (3,3,3), (4,4,4), (4,4,4), (4,4,4), (4,4,4);
    >
    > SELECT COUNT(*), a.*
    > FROM foo AS a INNER JOIN foo AS b
    > ON (a.col = b.col AND a.col2 = b.col2 AND a.col3 = b.col3)
    > GROUP BY a.col, a.col2, a.col3
    > HAVING COUNT(*) > 1;
    >
    > The result shows count of 4 for (2,2,2), 9 for (3,3,3), and 16 for
    > (4,4,4). This is the square of the actual number of copies, because
    > there's no way to prevent rows from being joined to themselves when
    > there's no primary key.
    >
    > Now say you want to delete the duplicate rows and leave only one copy
    > of each?
    > One solution is to use DELETE with a LIMIT clause. For each row
    > returned by the above SELECT, run this statement once, and provide
    > the values as parameters.
    >
    > DELETE FROM myTable WHERE col = ? AND col2 = ? AND col3 = ? LIMIT 1
    >
    > So you would execute this once with parameter values 2,2,2, once with
    > values 3,3,3, and once with values 4,4,4.
    >
    > Then run the SELECT again and see if the duplicates have been
    > eliminated. In this case, the 2,2,2 result should vanish. The 3,3,3
    > should find a count of 4, and the 4,4,4 should find a count of 9.
    > You still have some duplicates, but they are fewer in number.
    >
    > If all the duplicates occured in sets of two, then the job should be
    > complete after one pass. If the duplicates rows have three or more
    > copies, then you may have to loop through the SELECT & DELETE test
    > several times. Keep repeating this until the SELECT returns no rows.
    >
    > Does this help?
    >
    Yes, this helps, thanks. :)

    Althought I probably have to put the grouped result in a temporary table
    since MySQL (or no DBE at all?) doesn't support DELETEing and sub-QUERYing
    from the same table.

    Regards,
    André


    André Hänsel Guest

  6. #6

    Default Re: Finding duplicates without primary key

    "André Hänsel" <andrewebkr.de> wrote in message
    news:dsjir5$pur$1sagnix.uni-muenster.de...
    > Althought I probably have to put the grouped result in a temporary table
    > since MySQL (or no DBE at all?) doesn't support DELETEing and sub-QUERYing
    > from the same table.
    I would just fetch the rows from the SELECT in a loop in some application
    code, and issue the DELETE operations based on these values.

    SQL was designed to be used in cooperation with an application language.

    Regards,
    Bill K.


    Bill Karwin Guest

Similar Threads

  1. Primary Scratch & Windows Primary Paging file?
    By Tommy Oberst in forum Adobe Photoshop 7, CS, CS2 & CS3
    Replies: 14
    Last Post: April 15th, 10:26 PM
  2. Finding duplicates
    By André Hänsel in forum MySQL
    Replies: 0
    Last Post: February 11th, 02:36 AM
  3. SQL Duplicates
    By Sander Martens in forum MySQL
    Replies: 4
    Last Post: January 24th, 07:50 AM
  4. Finding duplicates in an msaccess table
    By Rustywater in forum Coldfusion Database Access
    Replies: 5
    Last Post: April 8th, 11:11 AM
  5. No Duplicates
    By Bebop & Rocksteady in forum FileMaker
    Replies: 2
    Last Post: June 28th, 11:30 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139