Professional Web Applications Themes

Find duplicates - MySQL

Hey all, I want to give the user a dialog where they can find pairs of duplicate entries in the database and resolve these down to a single record. I have this working fine so far, but I just noticed that it falls down when there are 3 or more duplicates of a given key. Which is not surprising, given my query: SELECT p1.id, p2.id, FROM participator as p1 LEFT JOIN participator AS p2 ON (p1.key=p2.key AND p1.id<p2.id) WHERE (p2.key IS NOT NULL) If duplicates only come in pairs, this'll work fine: The query will return pairs of distinct duplicate ...

  1. #1

    Default Find duplicates

    Hey all,

    I want to give the user a dialog where they can find pairs of duplicate
    entries in the database and resolve these down to a single record. I
    have this working fine so far, but I just noticed that it falls down
    when there are 3 or more duplicates of a given key. Which is not
    surprising, given my query:

    SELECT p1.id, p2.id,
    FROM participator as p1
    LEFT JOIN participator AS p2
    ON (p1.key=p2.key AND p1.id<p2.id)
    WHERE (p2.key IS NOT NULL)

    If duplicates only come in pairs, this'll work fine: The query will
    return pairs of distinct duplicate entries. Of course, the lame
    p1.id<p2.id trick falls down with 3 or more entries. I want only the ids
    from 2 rows of a set of duplicate items, so that the user can fix them
    up. What trick am I missing? :)

    Cheers,
    Nicholas Sherlock

    --
    [url]http://www.sherlocksoftware.org[/url]
    Nicholas Sherlock Guest

  2. #2

    Default Re: Find duplicates


    Nicholas Sherlock wrote:
    > Hey all,
    >
    > I want to give the user a dialog where they can find pairs of duplicate
    > entries in the database and resolve these down to a single record. I
    > have this working fine so far, but I just noticed that it falls down
    > when there are 3 or more duplicates of a given key. Which is not
    > surprising, given my query:
    >
    > SELECT p1.id, p2.id,
    > FROM participator as p1
    > LEFT JOIN participator AS p2
    > ON (p1.key=p2.key AND p1.id<p2.id)
    > WHERE (p2.key IS NOT NULL)
    >
    > If duplicates only come in pairs, this'll work fine: The query will
    > return pairs of distinct duplicate entries. Of course, the lame
    > p1.id<p2.id trick falls down with 3 or more entries. I want only the ids
    > from 2 rows of a set of duplicate items, so that the user can fix them
    > up. What trick am I missing? :)
    >
    > Cheers,
    > Nicholas Sherlock
    I'm not totally sure about how you wish to handle greater than 2 dupes
    but would:

    SELECT p1.id, p2.id,
    FROM participator as p1
    LEFT JOIN participator AS p2
    ON (p1.key=p2.key AND p1.id<p2.id)
    WHERE (p2.key IS NOT NULL)
    GROUP BY p1.id
    HAVING COUNT(*) = 2

    work for you?

    Captain Paralytic Guest

  3. #3

    Default Re: Find duplicates


    Captain Paralytic wrote:
    > Nicholas Sherlock wrote:
    >
    > > Hey all,
    > >
    > > I want to give the user a dialog where they can find pairs of duplicate
    > > entries in the database and resolve these down to a single record. I
    > > have this working fine so far, but I just noticed that it falls down
    > > when there are 3 or more duplicates of a given key. Which is not
    > > surprising, given my query:
    > >
    > > SELECT p1.id, p2.id,
    > > FROM participator as p1
    > > LEFT JOIN participator AS p2
    > > ON (p1.key=p2.key AND p1.id<p2.id)
    > > WHERE (p2.key IS NOT NULL)
    > >
    > > If duplicates only come in pairs, this'll work fine: The query will
    > > return pairs of distinct duplicate entries. Of course, the lame
    > > p1.id<p2.id trick falls down with 3 or more entries. I want only the ids
    > > from 2 rows of a set of duplicate items, so that the user can fix them
    > > up. What trick am I missing? :)
    > >
    > > Cheers,
    > > Nicholas Sherlock
    > I'm not totally sure about how you wish to handle greater than 2 dupes
    > but would:
    >
    > SELECT p1.id, p2.id,
    > FROM participator as p1
    > LEFT JOIN participator AS p2
    > ON (p1.key=p2.key AND p1.id<p2.id)
    > WHERE (p2.key IS NOT NULL)
    > GROUP BY p1.id
    > HAVING COUNT(*) = 2
    >
    > work for you?
    Oops, maybe make that:

    SELECT p1.key, p1.id, p2.id,
    FROM participator as p1
    LEFT JOIN participator AS p2
    ON (p1.key=p2.key AND p1.id<p2.id)
    WHERE (p2.key IS NOT NULL)
    GROUP BY p1.key
    HAVING COUNT(*) = 2

    Captain Paralytic Guest

  4. #4

    Default Re: Find duplicates


    Nicholas Sherlock wrote:
    > Hey all,
    >
    > I want to give the user a dialog where they can find pairs of duplicate
    > entries in the database and resolve these down to a single record. I
    > have this working fine so far, but I just noticed that it falls down
    > when there are 3 or more duplicates of a given key. Which is not
    > surprising, given my query:
    >
    > SELECT p1.id, p2.id,
    > FROM participator as p1
    > LEFT JOIN participator AS p2
    > ON (p1.key=p2.key AND p1.id<p2.id)
    > WHERE (p2.key IS NOT NULL)
    >
    > If duplicates only come in pairs, this'll work fine: The query will
    > return pairs of distinct duplicate entries. Of course, the lame
    > p1.id<p2.id trick falls down with 3 or more entries. I want only the ids
    > from 2 rows of a set of duplicate items, so that the user can fix them
    > up. What trick am I missing? :)
    >
    > Cheers,
    > Nicholas Sherlock
    Just noticed that the SQL that you have posted is invalid!

    Try this:

    SELECT p1.`key` , p1.id, p2.id
    FROM ndups AS p1
    LEFT JOIN ndups AS p2 ON ( p1.`key` = p2.`key`
    AND p1.id < p2.id )
    WHERE (
    p2.`key` IS NOT NULL
    )
    GROUP BY p1.`key` , p1.`id`
    HAVING COUNT( * ) =1
    Or maybe, ignore my

    Captain Paralytic Guest

  5. #5

    Default Re: Find duplicates

    Captain Paralytic wrote: 

    Hmm, not sure. In the end, I went with this version:

    SELECT MIN(p1.id) as id, MAX(p1.id) as id_1,
    FROM participator AS p1
    GROUP BY `key`
    HAVING COUNT(`key`)>1
    ORDER BY id ASC

    So with this data:

    id key
    1 1
    2 1
    3 1
    4 2
    5 2
    6 3
    7 4
    8 5

    I get:

    id id_1
    1 3
    4 5

    Perfect for resolving duplicate pairs!

    Cheers,
    Nicholas Sherlock

    --
    http://www.sherlocksoftware.org
    Nicholas Guest

Similar Threads

  1. Finding duplicates
    By André Hänsel in forum MySQL
    Replies: 0
    Last Post: February 11th, 02:36 AM
  2. SQL Duplicates
    By Sander Martens in forum MySQL
    Replies: 4
    Last Post: January 24th, 07:50 AM
  3. "find duplicates" sql query
    By David in forum ASP Database
    Replies: 4
    Last Post: May 18th, 02:27 PM
  4. >1 mio files but find duplicates
    By Robert in forum Mac Applications & Software
    Replies: 0
    Last Post: July 14th, 09:33 PM
  5. No Duplicates
    By Bebop & Rocksteady in forum FileMaker
    Replies: 2
    Last Post: June 28th, 11:30 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139