Professional Web Applications Themes

Duplicate Records - MySQL

I am using this query to find duplicate records (first_name, last_name and i_active) . However, this query takes a lot of time to execute when executed for the first time. When I execute it for the second time, it is very fast.What could be the reason ? and can anyone recommend a way to optimise this? SELECT * FROM talent WHERE (((talent.first_name) In (SELECT first_name FROM talent As Tmp GROUP BY first_name,last_name,i_active HAVING Count(*)>1 And last_name = talent.last_name and i_active != '0'))) ORDER BY talent.first_name, talent.last_name TIA...

  1. #1

    Default Duplicate Records

    I am using this query to find duplicate records (first_name, last_name
    and i_active) . However, this query takes a lot of time to execute
    when executed for the first time. When I execute it for the second
    time, it is very fast.What could be the reason ? and can anyone
    recommend a way to optimise this?

    SELECT * FROM talent WHERE (((talent.first_name) In (SELECT first_name
    FROM talent As Tmp GROUP BY first_name,last_name,i_active HAVING
    Count(*)>1 And last_name = talent.last_name and i_active != '0')))
    ORDER BY talent.first_name, talent.last_name

    TIA

    Pankaj Guest

  2. #2

    Default Re: Duplicate Records

    On Feb 5, 8:39 am, "Pankaj" <com> wrote: 


    The query is being cached - which is why it's quick the second time.

    Use a join instead of a sub select. It will be much faster. There are
    many examples of using joins in this way within these NGs.

    strawberry Guest

  3. #3

    Default Re: Duplicate Records

    > The query is being cached - which is why it's quick the second time. 

    Thanks.... I use this query and the results are already faster

    SELECT *
    FROM talent as e1
    LEFT JOIN talent AS e2
    ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name
    and e1.i_active=e2.i_active)
    WHERE (e2.i_talent_id IS NOT NULL)
    GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active
    HAVING COUNT(*) > 1

    but this displays the results only once. I want to view the duplicates
    as well. Any suggestions ?

    Pankaj Guest

  4. #4

    Default Re: Duplicate Records

    On 5 Feb, 11:59, "Pankaj" <com> wrote: 

    >
    > Thanks.... I use this query and the results are already faster
    >
    > SELECT *
    > FROM talent as e1
    > LEFT JOIN talent AS e2
    > ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name
    > and e1.i_active=e2.i_active)
    > WHERE (e2.i_talent_id IS NOT NULL)
    > GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active
    > HAVING COUNT(*) > 1
    >
    > but this displays the results only once. I want to view the duplicates
    > as well. Any suggestions ?[/ref]

    What happens if you leave out the GROUP BY clause?

    strawberry Guest

  5. #5

    Default Re: Duplicate Records

    On 5 Feb, 11:59, "Pankaj" <com> wrote: 

    >
    > Thanks.... I use this query and the results are already faster
    >
    > SELECT *
    > FROM talent as e1
    > LEFT JOIN talent AS e2
    > ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name
    > and e1.i_active=e2.i_active)
    > WHERE (e2.i_talent_id IS NOT NULL)
    > GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active
    > HAVING COUNT(*) > 1
    >
    > but this displays the results only once. I want to view the duplicates
    > as well. Any suggestions ?[/ref]

    Could you supply an exort of the structure and some data for us to
    play with.

    Captain Guest

  6. #6

    Default Re: Duplicate Records

    Pankaj <com> wrote:
     

    Make sure to use a primary key. In this way duplicate records will never
    occur.
    --
    Per Erik Rønne
    http://www.RQNNE.dk
    Per Guest

  7. #7

    Default Re: Duplicate Records

    On Feb 6, 3:42 am, invalid (Per Rønne) wrote: 
    >
    > Make sure to use a primary key. In this way duplicate records will never
    > occur.
    > --
    > Per Erik Rønnehttp://www.RQNNE.dk[/ref]

    I am using primary keys. but some fields like first_name and last_name
    can be duplicate :-)

    Pankaj Guest

  8. #8

    Default Re: Duplicate Records

    Here's one record ........... exported this from phpmyadmin so that
    you can simply import in your table

    -- phpMyAdmin SQL Dump
    -- version 2.8.2.4
    -- http://www.phpmyadmin.net
    --
    -- Server version: 5.0.24
    -- PHP Version: 5.1.6
    --
    -- Database: `temp`
    --

    -- --------------------------------------------------------

    --
    -- Table structure for table `edx_talent`
    --

    CREATE TABLE `edx_talent` (
    `i_talent_id` mediumint(9) NOT NULL auto_increment,
    `s_legal_name` varchar(60) default NULL,
    `s_first_name` varchar(150) default NULL,
    `s_last_name` varchar(150) default NULL,
    `s_middle_name` varchar(150) default NULL,
    `s_long_description` text,
    `s_short_description` varchar(255) default NULL,
    `d_dob` date NOT NULL default '0000-00-00',
    `s_` varchar(6) NOT NULL default '',
    `s_address_1` varchar(60) default NULL,
    `s_address_2` varchar(60) default NULL,
    `s_city` varchar(30) default NULL,
    `s_state_province` varchar(30) default NULL,
    `s_zip_route_code` varchar(10) default NULL,
    `s_country` varchar(20) default NULL,
    `s_drivers_license` varchar(15) default NULL,
    `s_drivers_license_state` varchar(5) default NULL,
    `i_view_on_website` int(1) default '0',
    `s_notes` text,
    `s_company` varchar(255) default NULL,
    `i_active` int(1) NOT NULL default '-1',
    PRIMARY KEY (`i_talent_id`),
    KEY `s_first_name` (`s_first_name`),
    KEY `s_last_name` (`s_last_name`),
    KEY `d_dob` (`d_dob`)
    ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHT=latin1
    AUTO_INCREMENT=10 ;

    --
    -- Dumping data for table `edx_talent`
    --

    INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`,
    `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`,
    `s_short_description`, `d_dob`, `s_`, `s_address_1`, `s_address_2`,
    `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`,
    `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`,
    `s_notes`, `s_company`, `i_active`) VALUES
    (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
    '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0),
    (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
    '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1),
    (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
    '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1);


    Pankaj Guest

  9. #9

    Default Re: Duplicate Records

    Pankaj <com> wrote:
     
    > >
    > > Make sure to use a primary key. In this way duplicate records will never
    > > occur.[/ref][/ref]
     

    Yes, but if you use primary keys, /per definition/ no duplicate records
    can exist.

    But of course, in a person database with let's say one milliard records,
    there may be more than one male holding the name John Smith :-).

    In your country you may not have a Central Person Register number
    uniquely defining every citizen and metic, and for other reasons it may
    not be wise to use such a field as the primary key. Then just define one
    column or a set of columns as being 'unique'.

    Perhaps I should add that I'm more used to Oracle than to MySQL.
    --
    Per Erik Rønne
    http://www.RQNNE.dk
    Per Guest

  10. #10

    Default Re: Duplicate Records

    On Feb 6, 6:40 am, "Pankaj" <com> wrote: 

    Interesting, the Js are all missing their closing quote.

    Captain Guest

  11. #11

    Default Re: Duplicate Records

    On Feb 6, 6:40 am, "Pankaj" <com> wrote: 

    OK, now I'm not sure about the significance of the i_active criteria
    and you'll need to add the join conditions for first name if you want
    them but this seems to take you in more like the right direction.

    I have written it based on the table you posted rather than the first
    query which seems to have a different table name and some different
    field names.

    Wierd about that missing quote???

    Anyway, here it is:

    SELECT DISTINCT t1. *
    FROM edx_talent t1
    JOIN edx_talent t2 ON t2.s_last_name = t1.s_last_name
    AND t2.i_talent_id <> t1.i_talent_id

    Captain Guest

  12. #12

    Default Re: Duplicate Records

    On 6 Feb, 10:26, "Captain Paralytic" <com> wrote: 






    >
    > OK, now I'm not sure about the significance of the i_active criteria
    > and you'll need to add the join conditions for first name if you want
    > them but this seems to take you in more like the right direction.
    >
    > I have written it based on the table you posted rather than the first
    > query which seems to have a different table name and some different
    > field names.
    >
    > Wierd about that missing quote???
    >
    > Anyway, here it is:
    >
    > SELECT DISTINCT t1. *
    > FROM edx_talent t1
    > JOIN edx_talent t2 ON t2.s_last_name = t1.s_last_name
    > AND t2.i_talent_id <> t1.i_talent_id[/ref]

    You could also add an ORDER BY (to group duplicates together) and a
    WHERE clause (to discount NULLs or blanks) - something like this:

    SELECT DISTINCT t1. *
    FROM edx_talent t1
    JOIN edx_talent t2 ON t2.s_first_name = t1.s_first_name
    AND t2.s_last_name = t1.s_last_name
    AND t2.i_talent_id <> t1.i_talent_id
    WHERE s_first_name <> ''
    AND s_last_name <> ''
    ORDER BY s_first_name, s_last_name,i_talent_id
    LIMIT 0 , 30

    strawberry Guest

  13. #13

    Default Re: Duplicate Records

    >Yes, but if you use primary keys, /per definition/ no duplicate records 

    In a person database with records of people living on *ONE BLOCK*
    of a particular street (which I also lived on, about three dozen
    people and two dozen Smiths), there were four John Smiths, three
    with the same middle name. All related. Two had the same address,
    middle name, and phone number. Expand that to several surrounding
    blocks, and there were a lot more John Smiths, along with more first
    names that were duplicated in the Smith family.
     

    Governments may assume that they have not handed out duplicate
    numbers. They are probably wrong. In the case of the USA, I believe
    the Social Security Administration is on record as saying they don't
    attempt to make it unique, and in any case there are a number of
    people who bought wallets with sample SSNs which people assumed
    were real, and then there's massive identity theft by illegal
    immigrants getting jobs, and regular identity theft. And there
    have been cases where it's apparent that the SSA *assigned* duplicate
    numbers to two different people, not that they were mistaken about
    their number.

    Oh, yes, nobody says that the identity numbers from country A won't
    overlap those of country B.

    And in any case, it's a bad idea to refuse a customer because they
    have the same identity number as an existing customer (the real one
    is likely to be the second one). Also, corporate entities, trusts,
    etc. may not have identity numbers of the same kind but they might
    want to be your customers also.
     

    When in doubt, issue your own account numbers. Possibly you do this
    with an auto_increment field. If the account numbers can be used to
    spend money, more security is needed (guessing a valid account number
    shouldn't be easy).
     

    The issue of what to make a unique key is one of database design that
    is essentially the same in Oracle and MySQL.

    Gordon Guest

  14. #14

    Default Re: Duplicate Records

    Gordon Burditt <org> wrote: 
     [/ref]
     
    >
    > Governments may assume that they have not handed out duplicate
    > numbers. They are probably wrong.[/ref]

    For sure not in Denmark. Everybody is given a unique CPR number when
    being registered at birth or when being allowed permanent residence in
    the country.

    <http://www.workindenmark.dk/CPR>
     

    Certainly not - we would have to define universal person numbers first -
    through ISO, I think. Like:

    yyyymmdd-cccssssg

    y: year.
    m: month.
    d: day.
    c: country.
    s: sequential number.
    g: gender.
     

    Use their cell phone number, then ...
     
    >
    > When in doubt, issue your own account numbers. Possibly you do this
    > with an auto_increment field. If the account numbers can be used to
    > spend money, more security is needed (guessing a valid account number
    > shouldn't be easy).[/ref]

    Yes.
     
    >
    > The issue of what to make a unique key is one of database design that
    > is essentially the same in Oracle and MySQL.[/ref]

    Yes, but how it is done in different databases is different.
    --
    Per Erik Rønne
    http://www.RQNNE.dk
    Per Guest

  15. #15

    Default Re: Duplicate Records

    On 8 Feb, 10:15, invalid (Per Rønne) wrote: 
    >
    > Use their cell phone number, then ...[/ref]

    Yes certainly you can buy that...
    Please give me your cell phone number.
    What! You don't have a cell phone!
    Sorry, we only do business with cell phone owners.

    Captain Guest

  16. #16

    Default Re: Duplicate Records

    Captain Paralytic <com> wrote:
     
    > >
    > > Use their cell phone number, then ...[/ref]
    >
    > Yes certainly you can buy that...
    > Please give me your cell phone number.
    > What! You don't have a cell phone!
    > Sorry, we only do business with cell phone owners.[/ref]

    Of course they should use a sequence to generate primary keys. But also,
    it should be possible to search on other fields, and have such fields
    indexed.
    --
    Per Erik Rønne
    http://www.RQNNE.dk
    Per Guest

Similar Threads

  1. Duplicate records
    By DuLaus in forum Coldfusion - Advanced Techniques
    Replies: 7
    Last Post: October 25th, 03:20 PM
  2. select all the duplicate records
    By atse in forum ASP
    Replies: 9
    Last Post: October 19th, 06:59 PM
  3. Deleting duplicate records
    By Dinesh.T.K in forum FileMaker
    Replies: 3
    Last Post: August 13th, 12:00 AM
  4. Need help serializing duplicate records
    By Scott in forum FileMaker
    Replies: 2
    Last Post: July 30th, 06:24 PM
  5. Duplicate records?
    By Bridget Eley in forum FileMaker
    Replies: 2
    Last Post: July 28th, 12:06 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139