Professional Web Applications Themes

Newbie Database table design question (guid) - Microsoft SQL / MS SQL Server

I had posted this to microsoft.public.sqlserver.server a day and a half ago, with no response. I'm guessing I posted it to the wrong forum, because this should be a simple question for anyone that has ever had to design a database. Apologies if this is the wrong forum. Please direct me otherwise. Imagine these two tables: Job Table ---------- PK - Job_Num Permit Table ------------- PK, FK - Job_Num PK - Permit_Num (in other words, there can be multiple permits for a Job). I was doing some reading, and it was suggested (for SQL Server) that the PK of the ...

  1. #1

    Default Newbie Database table design question (guid)

    I had posted this to microsoft.public.sqlserver.server a day and a half ago,
    with no response. I'm guessing I posted it to the wrong forum, because this
    should be a simple question for anyone that has ever had to design a
    database. Apologies if this is the wrong forum. Please direct me
    otherwise.

    Imagine these two tables:
    Job Table
    ----------
    PK - Job_Num

    Permit Table
    -------------
    PK, FK - Job_Num
    PK - Permit_Num

    (in other words, there can be multiple permits for a Job).

    I was doing some reading, and it was suggested (for SQL Server) that the PK
    of the table should be a GUID (uniqueidentifier). Then, you define an
    additional multi-column unique constraint independent of the PK.
    Apparently, using PK's that are a natural key can give headaches down the
    road (for example, if you used LAN ID for a PK for an employee and someone
    had the same LAN_ID on a different domain. Or that person quit. Example 2,
    Using SSN for an employee and the data was incorrectly entered (now have to
    do cascading updates). Or better yet, this was a Customer table and the
    Customer didn't want to give out their SSN).

    Based on that advice, it seems like I would want my two tables to look like
    this:
    Job Table
    ----------
    PK - Job_Guid
    UI1 - Job_Num (UI = Unique Index)

    Permit Table
    -------------
    PK - Permit_Guid
    UI1,FK1 - Job_Num (FK = Foreign Key)
    UI1 - Permit_Num
    FK2 - Job_Guid

    Because I'm using Guid's for my PK's, the original "natural keys" become the
    unique constraints that are independent of the PK. Which leads to two FK's
    on the Permit table. 1) We want to make sure that Job_Num is real. 2) We
    want to make sure the Job_Num we are referencing is correctly associated
    with the Job_Guid we gave in the Job Table. Although, it seems like the FK
    for Job_Num would bring about the same cascading update issues I mentioned
    above. But, you would want this FK, I would think.

    Does all this make sense? Thoughts?

    Thanks for reading,
    Bill


    William Guest

  2. #2

    Default Re: Newbie Database table design question (guid)

    Hi Willaim,
    The idea of using a created PK is one that lots of people have trouble
    with. Basically two thoughts are here

    Thought One: Data always(almost always) has a natural PK and it should be
    used in all cases. If you can't find it then you are not looking hard
    enough. If you still can't find it after looking harder then you obviously
    don't know what you are doing and you should get another job, perhaps
    cleaning my shoes,......

    Thought Two: Not using natural primary keys are find if and when you have a
    good reason for it. The people from "Thought One" should just chill out and
    get of everyone's back, we really need to concentrate on getting the job
    done. Rules are fine, but they are really just guidelines. The people from
    thought One would rebuff that "Thought Two" people need to have a good hard
    look at history and we have been around for more years than they have had
    hot dinners and so and so on ,...

    Personally I don't have a problem with surrogate PK's, I wouldn't use a GUID
    unless it had to unique world-wide. I think GUID are too big and too slow,
    INT or BIGINT are my preferred type.

    With your example you have it a bit wrong, you had
    Job Table
    ----------
    PK - Job_Guid
    UI1 - Job_Num (UI = Unique Index)

    Permit Table
    -------------
    PK - Permit_Guid
    UI1,FK1 - Job_Num (FK = Foreign Key)
    UI1 - Permit_Num
    FK2 - Job_Guid

    it should have been

    Job Table
    ----------
    PK - Job_Guid
    UI1 - Job_Num (UI = Unique Index)

    Permit Table
    -------------
    PK - Permit_Guid
    FK2 - Job_Guid
    UI1 - Permit_Num

    but in your case I wouldn't use a surrogate PK as the natural ones would
    work well.


    --
    I hope this helps
    regards
    Greg O MCSD
    http://www.ag-software.com/ags_scribe_index.asp. SQL Scribe Doentation
    Builder, the quickest way to doent your database
    http://www.ag-software.com/ags_SSEPE_index.asp. AGS SQL Server Extended
    Property Extended properties manager for SQL 2000
    http://www.ag-software.com/IconExtractionProgram.asp. Free icon extraction
    program
    http://www.ag-software.com. Free programming tools



    "William Campbell" <NO_SPAM_AT_WILLIAM_CAMPBELL> wrote in message
    news:phx.gbl... 
    ago, 
    this 
    PK 
    2, 
    to 
    like 
    the 
    FK's 
    FK 


    Greg Guest

  3. #3

    Default Re: Newbie Database table design question (guid)

    "Greg Obleshchuk" says:
     

    I used to think like this, but now I'm not so sure. Here is some of what
    this guy said, that is starting to change my mind (and this is why I started
    this thread here at the newsgroup, to discuss it further):

    Your job should be designing ENTITIES AND RELATIONSHIP, not PKs and FKs. If
    you find yourself pondering what to use for a PK (beyond the question of
    "should it be an identity or GUID?"), you are doing something terribly wrong
    and will be costing a whole team a great deal of money and headache.

    Furthermore, if a PK is not meaningless--e.g. it is any attribute remotely
    helping to add human meaning to the entity (SSN, network login, etc.), there
    WILL be cascading updates throughout the entire system, and many people are
    going to be very upset. Users make typos, people change their names. Any
    data that was entered by or derived from a human at any point is subject to
    change. To be perfectly compliant with the forms of normalization, the value
    in a PK should not even be used as an account number. But giving a
    system-defined value that later becomes meaningful to a human is not nearly
    as harmful as using an existing value of meaning for a PK

    This absurd notion that a PK solely defines a unique entity has got to stop.
    If you remember one thing, remember this: a PK is used to build
    relationships. Period. Nothing more, nothing less. Don't forget it. To meet
    that end, it must be unique, but DOES NOT IN ANY WAY define what makes a
    logical entity unique. If you just define a PK and move on, your system is
    BROKEN, human-readable PK or not. YOU, the developer, must take additional
    steps to define and enforce what combination of attributes define a unique
    entity, based on what your business rules and/or human logic require.

    ** Here's a couple of (many) reasons not to use "Natural Keys" **

    * Natrual Keys place relational integrity and the very fabric of your ER
    design at the mercy of business rules, Dilbert-esque management
    misdecisions, external regulation, re-orgs, name changes, and typos by
    customer service monkeys. Who would welcome that kind of headache?

    * Natural Keys make global localization more difficult. The "natural key" is
    stuck in English, for example

    * Some would like you to think you'll have access to great "natural" fields
    such as required SSN, DL, etc. But the business often has other plans, and
    helping you on your Natural Key quest is generally not part of their agenda.
    And unfortunately for you, they are the ones giving YOU the requirements,
    not the other way around.

    * Let's not even discuss multi-column Natural Keys in too much detail.
    Anyone advocating this has obviously never tried to maintain or enhance a
    system built with multi-column foreign keys. A multi-column key, by
    definition, would have to contain every attribute that makes the entity
    unique, which is god-awfully expensive on join performance, I/O, and
    application maintenance. It makes an application virtually impossible to
    extend, alter, or enhance. Fortunately we don't have to do this for the sake
    of uniqueness. Entities can and must have their uniqueness defined by
    constraints (if you are not using secondary unique constraints that don't
    involve the PK, and don't involve more than one column on child entities,
    your design is broken).

    * Last but not least, or all: do not forget that primary and foreign keys
    exist for one reason only: to model entity relationships. That's all, there
    is no other reason. As such, they are simply visible artifacts of the
    relational model we use--the glue driving ERs. They have no other relevance.

    ** For the sake of brevity, here's the summation: **

    * You should never, ever even think about using a "natural key". Wipe
    "natural key" from your memory. The downsides are severely bad, and the
    upshots are severely bad.

    * Transactional entities = GUID as the PK. No need to think, I've already
    done that for you. Just do. (Don't forget to define your additional,
    multi-column unique constraints independent of the PK, depending on your
    business requirements.)

    * Lookup data = enumerated integer as a PK. (What have I told you about
    trying to use that pathetic excuse for a brain? Just read my proof again and
    do it.)

    * OLAP data = no key.

    Bill


    William Guest

  4. #4

    Default Re: Newbie Database table design question (guid)

    William,
    These are arguments that have been voiced before. They have points good and
    bad. The overriding issue here is that it is your design. Things change as
    time moves on and globalisation is something that is newest. To say always
    use GUID is a throw away line. for example if I have a description table
    that will only every consist of 4 products. The table consists of a PK and
    description field of 20 characters
    Using the GUID approach I would have a PK larger than my description, I
    would also have a PK with lots of value. Where as if I use a int or small
    int I save space , lookup will be quicker and it makes sense.

    By the way who is this person that is saying this. Is he a known author?
    Is the article on the net or in a paper?


    --
    I hope this helps
    regards
    Greg O MCSD
    http://www.ag-software.com/ags_scribe_index.asp. SQL Scribe Doentation
    Builder, the quickest way to doent your database
    http://www.ag-software.com/ags_SSEPE_index.asp. AGS SQL Server Extended
    Property Extended properties manager for SQL 2000
    http://www.ag-software.com/IconExtractionProgram.asp. Free icon extraction
    program
    http://www.ag-software.com. Free programming tools

    "William Campbell" <bcampbell_!AT!_linkspoint_!DOT!_com> wrote in message
    news:phx.gbl... [/ref]
    be 
    >
    > I used to think like this, but now I'm not so sure. Here is some of what
    > this guy said, that is starting to change my mind (and this is why I[/ref]
    started 
    If 
    wrong 
    there 
    are 
    to 
    value 
    nearly 
    stop. 
    meet 
    is 
    fields 
    agenda. 
    sake 
    there 
    relevance. 
    and 


    Greg Guest

  5. #5

    Default Re: Newbie Database table design question (guid)

    Greg Obleshchuk says:
     
    many 

    Let me think about that for a second. I don't think I mean that. Just
    because each job has a permit_Num = 1, doesn't mean it is for both jobs (I
    don't think!). Think of it as a sequence number (maybe I better make sure
    of that, myself though! -- but for the sake of this discussion...sequence
    number).
     

    Well, wouldn't that be the advantage of using a GUID? If the product ID is
    changed to 10023, we change just the one record's Product ID column from 23
    to 10023. And if there is no relation to this column (PK or FK), we won't
    have to do any cascading updates. And the PK doesn't have to be changed or
    replaced.

    Bill


    William Guest

  6. #6

    Default Re: Newbie Database table design question (guid)

    Greg Obleshchuk says:
     

    More knowledge from this guy :) ...

    DO NOT LET THE DISPLAYED LENGTH OF A GUID FOOL YOU. It is much, much smaller
    than it looks.

    * A GUID consumes exactly as many bits as it does to store as the following
    string:
    "IBM INCORPORATED"

    * 128-bit GUIDs are only 4 times the size of the int data type. Yet
    real-world join performance is nowhere near 4 times as costly. In addition,
    the next generation of 64-bit processors will penalize 32-bit joins, and
    further cut the cost of GUIDs. The next step after that will all but require
    that PKs are done with 128-bit GUIDs. Even with 128-bit processors, there
    will still be extra storage and I/O overhead with 128-bit GUIDs, but
    compared to the size of the overall record including Unicode data, a measly
    96 bits is truly "chump change".
     

    I was doing some research on the subject and read an article over at
    sqlteam.com: http://www.sqlteam.com/item.asp?ItemID=2599

    It was the comments generated from this article (scroll to the bottom and
    under related articles you will see "Comments(104)"). This article was old,
    and no one seems to be discussing it anymore (because I posted there), so I
    decided to take up the discussion here at this newsgroup.

    The guy that gave all this information was someone named "quazibubble". He
    had a lot of good information, but he was so rude that he eventually got
    banned. If you can get past his rude-ness, he does make a lot of sense.
    The relevant discussion he puts forth starts on the 3rd page of comments
    (6th post from the bottom -- it starts out "I'm at home sick today") and
    continues to the 4th page.

    Yes, he was rude ... but I didn't really see anyone come up with an argument
    that disputes what he said. I'm just looking to pick up on this topic (I
    just wish I didn't read this article 6 months later!).

    Bill


    William Guest

  7. #7

    Default Re: Newbie Database table design question (guid)

    Re-reading what Greg Obleshchuk said:
     

    Wouldn't that example you give, then be this "Lookup data = enumerated
    integer as a PK." in his summation?

    Oh, and I didn't add the nasty comment at the end of his (the "pathetic
    excuse for a brain" comment). That's how he was talking, some of the stuff
    I posted I edited (to take out those silly comments), but I missed that one.
    So, please don't think I'm as rude as him! :)

    Bill

    "Greg Obleshchuk" <com> wrote in message
    news:phx.gbl... 
    and 
    as 
    always 
    and [/ref]
    > be 
    > >
    > > I used to think like this, but now I'm not so sure. Here is some of[/ref][/ref]
    what 
    > started 
    > If 
    > wrong [/ref]
    remotely 
    > there 
    > are 
    > to 
    > value 
    > nearly 
    > stop. 
    > meet [/ref]
    is [/ref]
    additional [/ref]
    unique [/ref]
    key" 
    > fields [/ref]
    and 
    > agenda. [/ref]
    requirements, [/ref]

    > sake [/ref]
    don't [/ref]
    entities, [/ref]
    keys 
    > there 
    > relevance. [/ref]
    already 
    > and 
    >
    >[/ref]


    William Guest

  8. #8

    Default Re: Newbie Database table design question (guid)

    Willaim,
    Don't get too caught up with this guys babble. While he does have some
    comments that can be useful, I believe he falls into the third thought
    group. Which is "I'm right and everyone else is wrong".
    I suggest you get some good books on the subject and do some reading .

    As to his size of GUID he is right , but would you have "IBM INCORPORATED"
    as a PK.

    Lastly think of this example
    You have a database with 10 tables, each table has about 100,000,000 rows
    of data
    If you have a
    PK Type Size
    GUID 1,600,000,000 bytes
    Int 400,000,000 bytes

    GUID consume 4 time the amount of an int.

    As I have said though, design is up to you

    --
    I hope this helps
    regards
    Greg O MCSD
    http://www.ag-software.com/ags_scribe_index.asp. SQL Scribe Doentation
    Builder, the quickest way to doent your database
    http://www.ag-software.com/ags_SSEPE_index.asp. AGS SQL Server Extended
    Property Extended properties manager for SQL 2000
    http://www.ag-software.com/IconExtractionProgram.asp. Free icon extraction
    program
    http://www.ag-software.com. Free programming tools



    "William Campbell" <bcampbell_!AT!_linkspoint_!DOT!_com> wrote in message
    news:phx.gbl... 
    >
    > More knowledge from this guy :) ...
    >
    > DO NOT LET THE DISPLAYED LENGTH OF A GUID FOOL YOU. It is much, much[/ref]
    smaller 
    following 
    addition, 
    require 
    measly [/ref]
    author? 
    >
    > I was doing some research on the subject and read an article over at
    > sqlteam.com: http://www.sqlteam.com/item.asp?ItemID=2599
    >
    > It was the comments generated from this article (scroll to the bottom and
    > under related articles you will see "Comments(104)"). This article was[/ref]
    old, 

    He 
    argument 


    Greg Guest

  9. #9

    Default Re: Newbie Database table design question (guid)

    Willaim
    In your example you are saying that a

    There is a "Work Order" to inspection a "building" which consists of one or
    more "Floors"

    I can see the problem you are having but it comes down to design principles.

    WorkOrder, Building, floor, result
    1,1,1,OK
    1,1,2,OK
    1,1,3,Bad
    1,2,1,OK
    1,2,2,Bad
    2,1,1,bad
    2,1,2,OK
    2,1,3,OK

    Normalise this and you get
    WorkdOrders
    ---------------
    WO_ID PK
    WO_Details

    Buildings
    -------------
    BD_ID PK
    BD_Details

    Floors
    __________
    FR_Number PK
    BD_ID PK FK
    Floor_Details

    Results
    -----------
    WO_ID PK FK
    BD_ID PK FK
    FR_Number PK FK
    Results_Details

    In your example you don't need the Work_Order_Num because you have the
    Work_Order_Guid column (for FK and UI these are the same)

    --
    I hope this helps
    regards
    Greg O MCSD
    http://www.ag-software.com/ags_scribe_index.asp. SQL Scribe Doentation
    Builder, the quickest way to doent your database
    http://www.ag-software.com/ags_SSEPE_index.asp. AGS SQL Server Extended
    Property Extended properties manager for SQL 2000
    http://www.ag-software.com/IconExtractionProgram.asp. Free icon extraction
    program
    http://www.ag-software.com. Free programming tools

    "William Campbell" <bcampbell_!AT!_linkspoint_!DOT!_com> wrote in message
    news:phx.gbl... 
    >
    > True. True.
    >
    > And in my instance, it's not going to be a table like you described (one
    > column for pk, one column for description -- only 4 or 5 records). So,[/ref]
    the 
    Plus, 
    column 
    mean 
    based. 
    some 
    referencing 
    to 
    on 



    Greg Guest

  10. #10

    Default Re: Newbie Database table design question (guid)

    Greg Obleshchuk says,
     

    Cool! I think I just needed to talk it out. I finally figured it out at
    the end of my babbling -- so thanks for confirming! Thus, to sum -- the
    tables are:

    Work_Order table
    -------------------
    Work_Order_Guid - PK
    Work_Order_Num - UI

    Inspection_Results
    -------------------
    Inspection_Results_Guid - PK
    Work_Order_Guid - FK1, UI
    Floor - UI

    I don't believe I need a building table, because I think it's 1 work order
    per building. Although, if you can do multiple buildings for work order
    that'd be a good idea. I think if it's only one building per work order,
    you could just squeeze the building (or the address of it in this instance)
    elsewhere (work order table?). These tables will never contain more than
    one day's worth of inspections -- so, the duplication of building data just
    won't happen.

    One of the bad things I can see is if you want to do a query like: "give me
    all the inspections that passed for this work_order". Well, you have to do
    a join on the two tables. Although, it's probably more likely that you'd
    add "for this date" ... and the schedule date of the work order would be in
    the work_order table..thus, you might have to do the join anyway.
     

    You've been very helpful! Thanks!

    Just wait until I start my next database design question thread! -- I'm
    going to bed! :) ... I don't want to hijack this thread and have it digress
    (some other people might want to add their thoughts) .. but I've always
    wondered what db designers do in the situation of:

    How do you set up a relation when it's based on a value!?

    For example .. Inspection_Type A uses certain fields. Inspection_Type B
    uses other fields. Do you do one table:
    Foo
    ----
    Inspection_A_Field_1 (will be null if Inspection_B)
    Inspection_A_Field_2 (will be null if Inspection_B)
    Insepction_B_Field_1 (will be null if Inspection_A)
    Inspection_B_Field_2 (will be null if Inspection_A)

    or two tables:

    Foo
    ----
    Inspection_A_Field_1
    Inspection_A_Field_2

    Bar
    ---
    Inspection_B_Field_1
    Inspection_B_Field_2

    It just seems like, when pulling out data for display (on a form, for
    example)...you have to do:

    if (Inspection_Type_A)
    do select with these fields
    else if (Inspection_Type_B)
    do select with these fields.

    Ok, I digressed!

    Thanks again,
    Bill


    William Guest

  11. #11

    Default Re: Newbie Database table design question (guid)

    "William Campbell" <NO_SPAM_AT_WILLIAM_CAMPBELL> wrote in message
    news:phx.gbl... 

    As I said, he makes valid points. So there's no need to argue them.
     
    was 
    give 

    Scattered throughout the rest of this post.
     

    I don't understand your question.
     
    downfall 

    Cascading updates is the downfall of using natural keys, not of using
    non-GUIDs. Cascading updates are handled the same way with GUIDs as they are
    with identities.
     

    Again, you're mixing subjects. This has to do with natural keys, not
    non-GUIDs.
     

    Storage is not the only concern. An index page can hold 400% more int
    identity values than it can GUIDs (I'm not positive of the exact math in
    index pages). Therefore for large masses of data, the index can be smaller.
    Therefore there is less index page I/O and increases the chances that index
    pages would already be cached. I have a table that on average would be 50 or
    60 bytes wide but contain an enormous number of rows. Changing the identity
    to GUID means a 20-25% increase in size of that table and a considerable
    increase in the size of the index with no real added value.
     
    unique 

    I don't understand your question. First, what is the "other" unique
    constraint? There is only one on the identity. There is no other unique
    constraint, nor can there be on this particular data. What select?
     

    The table designs were identical except for the autonumbering key. They only
    had one additional integer column just to make a legal table. The whole
    point was to keep out as much a possible that was not directly related to
    the autonumbering so as to accurately measure how it changes without being
    washed out by other variables. That's the point of doing the benchmarking.
    Each autonumber technique was tested with no index, with a clustered index,
    and with a nonclustered index. If I have missed some technique that will
    improve the GUID's performance, please let me know.

    The randomness of the GUID value is going to cause considerably more index
    fragmentation that an identity, which is sequential. That means using low
    fill factors (which means "fatter" index pages and therefore more raw index
    pages affecting overall I/O performance) or a high frequency of index page
    splitting which itself is no great speed advantage either. Which one you
    choose depends on whether you want to optimize data inserts or optimize data
    selects.
     
    bigger 

    Not every data access is a human being sitting in front of a screen working
    on one piece of data at a time. In fact those areas are usually not even
    areas of consideration when optimizing performance. There are high-volume
    systems that push overall data through at high rates, or have to pass large
    amounts of data for searching or reporting. Those are the areas where little
    differences in performance aculate. What I am about to describe is
    academic based on the math because we did not use GUIDs in order to have a
    real comparsion of performance. I've built data update systems that insert
    5,000,000 rows of data at a rate of about 1500 rows per second. This was a
    repeated process, not a one time job. My chart shows about 1 second per 1000
    rows of additional overhead for GUID over other techniques, and that just
    grows worse as the row count grows (that's what scares me the most). That's
    5000 seconds or 83 minutes of additional overhead for a process that
    originally took 55 minutes. That's 83 MORE minutes, totalling 138 minutes.
    Again, I am explaining theoretically because I did not have this chart when
    I did the 5,000,000 row thing and GUIDs were not even considered then, or
    even a possibility, in order to have actually measured that raw amount of
    data with and without GUIDs. But the question was basically "does 1 second
    really matter?". Yes, sometimes 1 second matters.

    Remember, I did not say these were everyday issues. I only said there were
    exceptions that defy a "do as your told" rule.

    --
    Steve Troxell
    Krell Software
    Professional Database Development Tools for MS SQL Server
    http://www.krell-software.com


    Steve Guest

Similar Threads

  1. creating a table (newbie) in existing database
    By carltown in forum Coldfusion - Getting Started
    Replies: 2
    Last Post: January 24th, 06:06 PM
  2. database design and asp question
    By Michael in forum Dreamweaver AppDev
    Replies: 2
    Last Post: June 27th, 09:23 AM
  3. newbie flash design question(s)
    By geeegeee in forum Macromedia Flash Sitedesign
    Replies: 7
    Last Post: May 19th, 07:22 AM
  4. Table Design Question
    By Paul in forum Informix
    Replies: 6
    Last Post: January 26th, 04:57 PM
  5. Ruby OO-newbie design-question
    By Per Wigren in forum Ruby
    Replies: 1
    Last Post: August 29th, 01:00 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139