Professional Web Applications Themes

Merge name from many records - Microsoft SQL / MS SQL Server

Assuming that "most complete" means the longest string value from each column: CREATE TABLE Sometable (ssn INTEGER NOT NULL, firstname VARCHAR(20) NOT NULL, middlename VARCHAR(20) NOT NULL, lastname VARCHAR(20) NOT NULL, style VARCHAR(20) NOT NULL, PRIMARY KEY(ssn,firstname,middlename,lastname,style)) INSERT INTO Sometable VALUES (1,'W', 'Scott', 'Miller', 'Jr.') INSERT INTO Sometable VALUES (1,'William', 'S', 'Miller', 'Jr.') INSERT INTO Sometable VALUES (1,'W.', 'S.', 'Miller', 'Jr') INSERT INTO Sometable VALUES (2,'John', 'Q.', 'Doe', '') SELECT S.ssn, MIN(CASE LEN(firstname) WHEN lenfirstname THEN firstname END) AS firstname, MIN(CASE LEN(middlename) WHEN lenmiddlename THEN middlename END) AS middlename, MIN(CASE LEN(lastname) WHEN lenlastname THEN lastname END) AS lastname, MIN(CASE LEN(style) ...

  1. #1

    Default Re: Merge name from many records

    Assuming that "most complete" means the longest string value from each
    column:

    CREATE TABLE Sometable (ssn INTEGER NOT NULL, firstname VARCHAR(20) NOT
    NULL, middlename VARCHAR(20) NOT NULL, lastname VARCHAR(20) NOT NULL, style
    VARCHAR(20) NOT NULL, PRIMARY KEY(ssn,firstname,middlename,lastname,style))

    INSERT INTO Sometable VALUES (1,'W', 'Scott', 'Miller', 'Jr.')
    INSERT INTO Sometable VALUES (1,'William', 'S', 'Miller', 'Jr.')
    INSERT INTO Sometable VALUES (1,'W.', 'S.', 'Miller', 'Jr')
    INSERT INTO Sometable VALUES (2,'John', 'Q.', 'Doe', '')

    SELECT S.ssn,
    MIN(CASE LEN(firstname) WHEN lenfirstname THEN firstname END) AS firstname,
    MIN(CASE LEN(middlename) WHEN lenmiddlename THEN middlename END) AS
    middlename,
    MIN(CASE LEN(lastname) WHEN lenlastname THEN lastname END) AS lastname,
    MIN(CASE LEN(style) WHEN lenstyle THEN style END) AS style
    FROM Sometable AS S
    JOIN
    (SELECT ssn,
    MAX(LEN(firstname)) AS lenfirstname,
    MAX(LEN(middlename)) AS lenmiddlename,
    MAX(LEN(lastname)) AS lenlastname,
    MAX(LEN(style)) AS lenstyle
    FROM Sometable
    GROUP BY ssn) AS L
    ON S.ssn = L.ssn
    GROUP BY S.ssn

    --
    David Portas
    ------------
    Please reply only to the newsgroup
    --



    David Portas Guest

  2. #2

    Default Re: Merge name from many records

    David:

    That just gives me the longest name there. This doesn't validate that the
    name is actually correct and that there isn't a data entry error that needs
    to be addressed manually. For example, if I have the following:

    W. Scott Miller
    Q. Scott Miller

    The choice of which one would be used would be random with the supplied SQL.
    Yet this is a "needs attention" set of records for a given SSN because how
    can the first name of an individual be both Q and W? The routine should
    check initials against the longest name to make sure they match.

    Secondly, if we have:

    Mo Norm Smith
    M. Norm Smith

    Which one actually gets chosen? Both first names are two charaters long,
    yet one is a actual name and the other is an initial?

    Is there a way to strip out punctuation before checking length?

    Thanks for the reply,
    Scott
    --
    Remove the words spam and killer from the e-mail to reply.
    "David Portas" <REMOVE_BEFORE_REPLYING_dportasacm.org> wrote in message
    news:%23CjejA$PDHA.1612TK2MSFTNGP11.phx.gbl...
    > Assuming that "most complete" means the longest string value from each
    > column:
    >
    > CREATE TABLE Sometable (ssn INTEGER NOT NULL, firstname VARCHAR(20) NOT
    > NULL, middlename VARCHAR(20) NOT NULL, lastname VARCHAR(20) NOT NULL,
    style
    > VARCHAR(20) NOT NULL, PRIMARY
    KEY(ssn,firstname,middlename,lastname,style))
    >
    > INSERT INTO Sometable VALUES (1,'W', 'Scott', 'Miller', 'Jr.')
    > INSERT INTO Sometable VALUES (1,'William', 'S', 'Miller', 'Jr.')
    > INSERT INTO Sometable VALUES (1,'W.', 'S.', 'Miller', 'Jr')
    > INSERT INTO Sometable VALUES (2,'John', 'Q.', 'Doe', '')
    >
    > SELECT S.ssn,
    > MIN(CASE LEN(firstname) WHEN lenfirstname THEN firstname END) AS
    firstname,
    > MIN(CASE LEN(middlename) WHEN lenmiddlename THEN middlename END) AS
    > middlename,
    > MIN(CASE LEN(lastname) WHEN lenlastname THEN lastname END) AS lastname,
    > MIN(CASE LEN(style) WHEN lenstyle THEN style END) AS style
    > FROM Sometable AS S
    > JOIN
    > (SELECT ssn,
    > MAX(LEN(firstname)) AS lenfirstname,
    > MAX(LEN(middlename)) AS lenmiddlename,
    > MAX(LEN(lastname)) AS lenlastname,
    > MAX(LEN(style)) AS lenstyle
    > FROM Sometable
    > GROUP BY ssn) AS L
    > ON S.ssn = L.ssn
    > GROUP BY S.ssn
    >
    > --
    > David Portas
    > ------------
    > Please reply only to the newsgroup
    > --
    >
    >
    >

    Wm. Scott Miller Guest

  3. #3

    Default Re: Merge name from many records

    Here's another attempt:

    CREATE TABLE Sometable (ssn INTEGER NOT NULL, firstname VARCHAR(20) NOT
    NULL, middlename VARCHAR(20) NOT NULL, lastname VARCHAR(20) NOT NULL, style
    VARCHAR(20) NOT NULL, PRIMARY KEY(ssn,firstname,middlename,lastname,style))

    INSERT INTO Sometable VALUES (1,'W', 'Scott', 'Miller', 'Jr.')
    INSERT INTO Sometable VALUES (1,'William', 'S', 'Miller', 'Jr.')
    INSERT INTO Sometable VALUES (1,'W.', 'S.', 'Miller', 'Jr')
    INSERT INTO Sometable VALUES (1,'Q.', 'Scott', 'Miller', '')
    INSERT INTO Sometable VALUES (2,'John', 'Q.', 'Doe', '')
    INSERT INTO Sometable VALUES (2,'John', 'Quentin', 'Doe', '')
    INSERT INTO Sometable VALUES (3,'Mo', 'Norm', 'Smith', '')
    INSERT INTO Sometable VALUES (3,'M.', 'Norm', 'Smith', '')

    Create a view to give the longest version of each column:

    CREATE VIEW LongNames
    AS
    SELECT S.ssn,
    MAX(CASE LEN(firstname) WHEN lenfirstname THEN firstname END) AS
    firstname,
    MAX(CASE LEN(middlename) WHEN lenmiddlename THEN middlename END) AS
    middlename,
    MAX(CASE LEN(lastname) WHEN lenlastname THEN lastname END) AS lastname,
    MAX(CASE LEN(style) WHEN lenstyle THEN style END) AS style
    FROM Sometable AS S
    JOIN
    (SELECT ssn,
    MAX(LEN(firstname)) AS lenfirstname,
    MAX(LEN(middlename)) AS lenmiddlename,
    MAX(LEN(lastname)) AS lenlastname,
    MAX(LEN(style)) AS lenstyle
    FROM Sometable
    GROUP BY ssn) AS L
    ON S.ssn = L.ssn
    GROUP BY S.ssn

    This query then verifies that all rows match (ignoring punctuation) and
    populates a Match (Y/N) column accordingly:

    SELECT T.ssn, MIN(T.firstname) AS firstname, MIN(T.middlename) AS
    middlename, MIN(T.lastname) AS lastname, MIN(T.style) AS style,
    CASE WHEN
    SUM(CHARINDEX(REPLACE(REPLACE(S.firstname,',',''), '.',''),
    REPLACE(REPLACE(T.firstname,',',''),'.',''))) =
    COUNT(NULLIF(S.firstname,'')) AND
    SUM(CHARINDEX(REPLACE(REPLACE(S.middlename,',','') ,'.',''),
    REPLACE(REPLACE(T.middlename,',',''),'.',''))) =
    COUNT(NULLIF(S.middlename,'')) AND
    SUM(CHARINDEX(REPLACE(REPLACE(S.lastname,',',''),' .',''),
    REPLACE(REPLACE(T.lastname,',',''),'.',''))) =
    COUNT(NULLIF(S.lastname,'')) AND
    SUM(CHARINDEX(REPLACE(REPLACE(S.style,',',''),'.', ''),
    REPLACE(REPLACE(T.style,',',''),'.',''))) =
    COUNT(NULLIF(S.style,''))
    THEN 'Y' ELSE 'N' END AS match
    FROM Sometable AS S
    JOIN LongNames AS T
    ON S.ssn = T.ssn
    GROUP BY T.ssn

    --
    David Portas
    ------------
    Please reply only to the newsgroup
    --

    "Wm. Scott Miller" <millerscmailspamkiller.wvnet.edu> wrote in message
    news:udDUSABQDHA.2768tk2msftngp13.phx.gbl...
    > David:
    >
    > That just gives me the longest name there. This doesn't validate that the
    > name is actually correct and that there isn't a data entry error that
    needs
    > to be addressed manually. For example, if I have the following:
    >
    > W. Scott Miller
    > Q. Scott Miller
    >
    > The choice of which one would be used would be random with the supplied
    SQL.
    > Yet this is a "needs attention" set of records for a given SSN because how
    > can the first name of an individual be both Q and W? The routine should
    > check initials against the longest name to make sure they match.
    >
    > Secondly, if we have:
    >
    > Mo Norm Smith
    > M. Norm Smith
    >
    > Which one actually gets chosen? Both first names are two charaters long,
    > yet one is a actual name and the other is an initial?
    >
    > Is there a way to strip out punctuation before checking length?
    >
    > Thanks for the reply,
    > Scott
    > --
    > Remove the words spam and killer from the e-mail to reply.
    > "David Portas" <REMOVE_BEFORE_REPLYING_dportasacm.org> wrote in message
    > news:%23CjejA$PDHA.1612TK2MSFTNGP11.phx.gbl...
    > > Assuming that "most complete" means the longest string value from each
    > > column:
    > >
    > > CREATE TABLE Sometable (ssn INTEGER NOT NULL, firstname VARCHAR(20) NOT
    > > NULL, middlename VARCHAR(20) NOT NULL, lastname VARCHAR(20) NOT NULL,
    > style
    > > VARCHAR(20) NOT NULL, PRIMARY
    > KEY(ssn,firstname,middlename,lastname,style))
    > >
    > > INSERT INTO Sometable VALUES (1,'W', 'Scott', 'Miller', 'Jr.')
    > > INSERT INTO Sometable VALUES (1,'William', 'S', 'Miller', 'Jr.')
    > > INSERT INTO Sometable VALUES (1,'W.', 'S.', 'Miller', 'Jr')
    > > INSERT INTO Sometable VALUES (2,'John', 'Q.', 'Doe', '')
    > >
    > > SELECT S.ssn,
    > > MIN(CASE LEN(firstname) WHEN lenfirstname THEN firstname END) AS
    > firstname,
    > > MIN(CASE LEN(middlename) WHEN lenmiddlename THEN middlename END) AS
    > > middlename,
    > > MIN(CASE LEN(lastname) WHEN lenlastname THEN lastname END) AS lastname,
    > > MIN(CASE LEN(style) WHEN lenstyle THEN style END) AS style
    > > FROM Sometable AS S
    > > JOIN
    > > (SELECT ssn,
    > > MAX(LEN(firstname)) AS lenfirstname,
    > > MAX(LEN(middlename)) AS lenmiddlename,
    > > MAX(LEN(lastname)) AS lenlastname,
    > > MAX(LEN(style)) AS lenstyle
    > > FROM Sometable
    > > GROUP BY ssn) AS L
    > > ON S.ssn = L.ssn
    > > GROUP BY S.ssn
    > >
    > > --
    > > David Portas
    > > ------------
    > > Please reply only to the newsgroup
    > > --
    > >
    > >
    > >
    >
    >

    David Portas Guest

Similar Threads

  1. How do I merge Many PDF into One?
    By Samar_Roy@adobeforums.com in forum Adobe Acrobat SDK
    Replies: 1
    Last Post: September 3rd, 04:57 PM
  2. Data merge skipping first 3 records
    By Phil_Taz@adobeforums.com in forum Adobe Indesign Macintosh
    Replies: 2
    Last Post: September 12th, 05:47 AM
  3. Photoshop 7 Merge Linked vs. Merge Down
    By Seth_Thompson@adobeforums.com in forum Adobe Photoshop Mac CS, CS2 & CS3
    Replies: 0
    Last Post: February 25th, 08:14 PM
  4. Merge wont Merge Gradients?
    By Jason_Howard@adobeforums.com in forum Adobe Illustrator Macintosh
    Replies: 1
    Last Post: February 13th, 07:52 PM
  5. help-how to merge multiple records into 1
    By Jackie in forum FileMaker
    Replies: 0
    Last Post: July 26th, 02:53 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139