Professional Web Applications Themes

DB2 On Linux // Clustering // Shared Nothing - IBM DB2

Questions: After reviewing this doent: http://www-3.ibm.com/software/data/pubs/papers/linuxcluster/linuxcluster.pdf I am curious: Does this include the shared-nothing features in the version of DB2 that is noted in this PDF? Is there anyone out there on this list that has actually clustered a shared-nothing environment with DB2? How do you partition data across nodes and disks? How much does this system cost? Thanks in advance, Tim...

  1. #1

    Default DB2 On Linux // Clustering // Shared Nothing

    Questions:

    After reviewing this doent:

    http://www-3.ibm.com/software/data/pubs/papers/linuxcluster/linuxcluster.pdf

    I am curious:

    Does this include the shared-nothing features in the version of DB2 that is noted in this PDF?

    Is there anyone out there on this list that has actually clustered a shared-nothing environment with DB2?

    How do you partition data across nodes and disks?

    How much does this system cost?

    Thanks in advance,

    Tim



    Tim Guest

  2. #2

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    "Tim Schaefer" <com> wrote in message
    news:3ykdb.19607$atl2.webusenet.com... 
    http://www-3.ibm.com/software/data/pubs/papers/linuxcluster/linuxcluster.pdf 
    is noted in this PDF? 
    shared-nothing environment with DB2? 
    Not sure what you asking. A shared nothing environment simply means that
    there is one partition per node (computer) so each partition has it's own
    memory, disks, etc. So if you had 24 partitions, you would need 24 nodes.
    These nodes are sometimes called "thin" nodes because each node usually
    contains only one or two processors and less memory than most
    multi-processor nodes. This works well with extremely large data warehouses
    with frequent tablespace scans.

    Often times, people will use clustering with multiple partitions per node
    (and multiple CPU's per node). For example one might have 4 nodes, each with
    8 processors, and each node with 6 partitions (24 partitions total). This
    works better in transaction environments and data warehouses that are not
    massive (and do not require frequent table space scans). In this
    configuration, special care must be taken to have plenty of disk subsystem
    throughput, and the placement of data on the disks is more critical than a
    shared nothing partitioning system.

    A shared nothing architecture scales in a more linear manner than a system
    with multiple partitions (and multiple processors) per node. But this linear
    scaling is more applicable to queries that do tablespace scans, rather than
    a large number of smaller queries.


    Mark Guest

  3. #3

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Yes.

    Tim Schaefer wrote:
     

    Blair Guest

  4. #4

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    > Tim Schaefer wrote: [/ref]
    http://www-3.ibm.com/software/data/pubs/papers/linuxcluster/linuxcluster.pdf [/ref]
    is noted in this PDF? 
    >[/ref]
    "Blair Adamache" <> wrote in message
    news:bl4toa$jcf$torolab.ibm.com... 
    Actually the article is misleading. The share nothing feature is really a
    hardware configuration, rather than a DB2 feature (given the ability to
    partition on deferent nodes). The DB2 feature is the partitioning feature,
    which may be deployed on shared nothing nodes (one partition per node) or on
    multiple partition nodes (multiple processors and multiple partitions per
    node).


    Mark Guest

  5. #5

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Thanks Mark. I think you're understanding me the best, but of course anyone
    else, please feel free to respond. I already understand shared-nothing architecture,
    what I'm not sure of is if DB2 is really going to deliver a shared-nothing database,
    and is it available for Linux. I have yet to see any diagrams or docs that can simply
    show me the money.

    Let's say I buy 6 1u-boxes ( for rack-mounting ) and install DB2 on each one,
    and buy the recommended switch for the interconnect.

    I want to have a table with 100 million rows spread across disks on each node.
    I'd like to partition the data by some kind of grouping, such as order_no, or date,
    or whatever.

    In my scenario, I'd see some of the data in this table on each node right? In
    other words, can I create a table with data spread across each of these nodes?
    ( In Informix XPS we'd call it dbslices )

    Example:

    Server 1 2 3 4 5 6
    table part1 part2 part3 part4 part5 part6

    Is this possible with DB2?

    Thanks,

    Tim


    "Mark A" <net> wrote in message news:mcndb.45$uswest.net... [/ref]
    > http://www-3.ibm.com/software/data/pubs/papers/linuxcluster/linuxcluster.pdf [/ref]
    > is noted in this PDF? 
    > >[/ref]
    > "Blair Adamache" <> wrote in message
    > news:bl4toa$jcf$torolab.ibm.com... 
    > Actually the article is misleading. The share nothing feature is really a
    > hardware configuration, rather than a DB2 feature (given the ability to
    > partition on deferent nodes). The DB2 feature is the partitioning feature,
    > which may be deployed on shared nothing nodes (one partition per node) or on
    > multiple partition nodes (multiple processors and multiple partitions per
    > node).
    >
    >[/ref]



    Tim Guest

  6. #6

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    "Tim Schaefer" <com> wrote in message
    news:beqdb.30564$atl2.webusenet.com... 
    anyone 
    architecture, 
    database, 
    that can simply 
    one, 
    node. 
    or date, 
    In 
    nodes? 
    Yes, DB2 takes one table and distributes the data across each partition
    defined for that table. There can be multiple partitions on each node (works
    well if you have about 25% more processors than partitions, such as 6
    partitions on a 8 processor SMP node). Or for true shared nothing, you
    simply put each partition on it's own node (with one or two processors per
    node).

    Note that with the SMP solution, you can have a hybrid sharing solution,
    such as 4 nodes with 8 processors on each node, and 6 DB2 partitions on each
    node (for a total of 24 partitions).

    DB2 partitions the table on the partitioning key you select using a hash
    algorithm, rather than range partitioning to distribute the data. This
    automatically creates a fast and even distribution of the data. The
    partitioning key can be made up of one or more columns of the table.

    In addition to the Intel/Linux solution in the article, IBM also has the SP2
    hardware solution (basically using some special purpose RS/6000's) and AIX.
    This solution has been around for quite a while and is widely used with DB2
    in some the largest MPP systems that you can find. The SP2 has an optional
    integrated high speed switch between nodes.

    I don't know the price of the IBM Intel/Linux hardware, but it looks like it
    might be very cost effective compared to other shared nothing systems. I
    believe that DB2 with partitioning license is partly based on the number of
    CPU's, so a shared nothing system with 1 processor per node might be cost
    effective if it fits your application.


    Mark Guest

  7. #7

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Comments below...

    "Mark A" <net> wrote in message news:ecrdb.640$uswest.net...
     

    In Informix this would have been "round robin" partitioning, with the option
    for expression-based partitioning ( fragmentation as Informix calls it ).

    I've used the AIX SP70? cluster with XPS back in'98, but would like today to
    go with DB2 for the simplicity, and knowing DB2 will be around--I don't know
    if XPS will be around much longer. DB2 is interesting, and it actually works
    well. It's a bit more basic than what I'm used to but c'est la vie.
     

    It would be nice to know if expression-based partitioning is available, but maybe
    the KISS principle applies. Maybe it's fast enough without it. I'm also concerned
    that expression-based partitioning will introduce data skew, so not having it may
    not be so bad.

    Thanks very much for your comments!

    Tim



    Tim Guest

  8. #8

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    "Tim Schaefer" <com> wrote in message
    news:beqdb.30564$atl2.webusenet.com... 
    anyone 
    architecture, 
    database, 
    that can simply 
    one, 
    node. 
    or date, 
    In 
    nodes? 
    DB2 has set the TPC-H benchmark record for 100GB database size using a
    similar architecture as described in the brochure you mentioned. They used 8
    nodes, with 2 AMD processors per node. I believe that they used 2 partitions
    per node for a total of 16 partitions.

    Just for perspective, here are the number of rows included in the TPC-H
    100GB database:

    Table Name Rows
    Order 150,000,000
    Lineitem 600,037,902
    Customer 15,000,000
    Part 20,000,000
    Supplier 1,000,000
    Partsupp 80,000,000
    Nation 25
    Region 5

    I have run the TPC-H benchmark myself using DB2 on Linux (with the 10GB
    database version) and I would question whether the TPC-H database would be
    typical of all multi-dimensional databases. Many multi-dimensional fact
    tables that I have seen make extensive use of indexes for access of the fact
    table, while the Lineitem table in the TPC-H benchmark almost always
    requires a complete scan of the table for each of the 22 queries in the
    benchmark.

    Although DB2 partitioning is definitely the way to go for a large fact
    table, there would be considerable question as to whether to use a share
    nothing vs. a SMP implementation (or hybrid implementation with multiple
    CPUs per node and multiple nodes). There are a lot of factors that need to
    be considered in order to achieve the best price performance ratio.

    Here is a link to the IBM TPC-H results:
    http://www.tpc.org/tpch/results/tpch_perf_results.asp

    Mark Farnsworth
    InfoStar Solutions, LLC
    info_star at domain qw_est.net
    (remove "domain" and all underscores)


    Mark Guest

  9. #9

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    > In Informix this would have been "round robin" partitioning, with the
    option 
    to 
    know 
    works 
    I don' think DB2 is as simple as looks. It's just easier to use than most
    other databases. But to really tune it and exploit it well will get you more
    into the details. 
    but maybe 
    concerned 
    it may 
    Sorry to confuse. "Expression based" or "range based" partitioning is NOT
    available on DB2 (except the mainframe DB2). DB2 may provide that in the
    future, probably just because some customers think it is important, even if
    IBM is not so sure. However, there is a way one could force all data from a
    particular range onto a particular partition even when using hashing (I
    charge for that one :-)

    There is the ability in DB2 to create pseudo-partitions (with ranges based
    on data for each table) with the UNION ALL views, which works in parallel
    mode without the complication and expense of partitioning. This can be a
    very effective strategy, and exploits parallelism quite well if implemented
    carefully.

    DB2 also has multi-dimensional clustering of data that might be useful for a
    fact table.

    Mark Farnsworth
    InfoStar Solutions, LLC
    info_star at domain qw_est.net
    (remove "domain" and all underscores)


    Mark Guest

  10. #10

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Thanks Mark...

    It's hard for me to visual shared-nothing outside of the XPS environment
    as that has been my only experience. We did have 4-CPUs per node,
    so I guess this qualifies as "hybrid". :-) My concern is speed at this
    point, making sure that the cluster will bring back queries quickly, as
    we are not using star-schema, we have more of an ODS, or data mart
    at this point. It's a typical relational database, not star-schema.

    We also don't have quite as much data as the TPC-H benchmark, but is
    similar. The load time is impressive in the benchmark. I think the only
    way to know for sure is to just try it out. Interesting too are the Opteron
    chips instead of Intel. That one is notable. Now all I have to do is find
    the docs on the syntax to load tables on such a beast. If you have
    any links for that let me know.

    Thanks,

    Tim

    "Mark A" <net> wrote in message news:jztdb.653$uswest.net... 



    Tim Guest

  11. #11

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Tim,

    In case you are a "learn by example" kind of guy take a look at the full
    disclosure report of a clustered
    DB2 TPC-H bechmark. It's available right from the details of the result.
    It should show the command for the auto-loater I imagine.

    W.r.t. starjoin I don't know XPS well enough to compare. I knwo that there a
    still a coupel of tricks to snicker from Redbrick but I'm not sure whether
    XPS has those either. It may well be a wash.
    Fragmented tables are (to my knowledge) not so much useful for performance,
    compared to hash-partitioning. Their benefit lies mostly in managability
    (roll-in, roll-out).
    It is no secret that fragmented tables are being worked ported to DB2, can't
    tell you an ETA at this point yet though.

    Cheers
    Serge


    Serge Guest

  12. #12

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Tim Schaefer wrote: 

    Tim,

    I'm not sure if this will be too basic for you since you know XPS, but
    why don't you read the Administration Guide: Planning - Chapter 2,
    "Parallel Database Systems." This chapter does a good job of talking
    about how DB2 leverages the shared nothing environment. You can view
    this book from:

    http://www-3.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/v8pubs.d2w/en_main

    (Scroll down to see the PDF copies of the books).

    You can use partitioning in an OLTP environment, it's not restricted to
    data warehousing. It requires careful planning (in either environment)
    to make sure that you won't run into issues specific to a partitioned
    environment (i.e. partitioning key selection, insert/load performance,
    etc.)
     

    Load syntax is now very simple. With V8, IBM added partitioning support
    directly to the LOAD command, so loading a single- or multi-partition
    table can be done with the exact same command, e.g.:

    load from yourfile.del of del replace into your_table

    (Of course you still have control over the partitioning-specific options
    in the load command, if you need them).

    See the Data Movement Utilities Guide & Reference - Chapter 3, "Load"
    for more information.




    Good luck,




    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Ian Guest

  13. #13

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Ian,

    I am reading the Admin Guide Chapter 2 with great interest.

    It is not too basic either, as I now have some great materials
    to present to my team. I hope my initial exuberance with
    this product stays for the long-term. Surprising that MS-SQL
    Server hasn't tried to at least imitate this, and even more
    surprising that IBM hasn't marketed this more. ( well I think
    I already know why IBM hasn't marketed more, and no
    surprise :-) It does have the flavor of XPS on-the-cheap.

    Thanks very much for the link!

    By the way has anyone mixed NT DB2 with Linux DB2
    as a cluster? It's a twisted thought, but I thought I might
    throw it out there.

    :-)

    Tim

    "Ian" <com> wrote in message news:3f76f5db$newsgroups.com... 
    >
    > Load syntax is now very simple. With V8, IBM added partitioning support
    > directly to the LOAD command, so loading a single- or multi-partition
    > table can be done with the exact same command, e.g.:
    >
    > load from yourfile.del of del replace into your_table
    >
    > (Of course you still have control over the partitioning-specific options
    > in the load command, if you need them).
    >
    > See the Data Movement Utilities Guide & Reference - Chapter 3, "Load"
    > for more information.
    >
    >
    >
    >
    > Good luck,
    >
    >
    >
    >
    > -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    > http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    > -----== Over 100,000 Newsgroups - 19 Different Servers! =-----[/ref]



    Tim Guest

  14. #14

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    "Tim Schaefer" <com> wrote in message
    news:7hDdb.44659$atl2.webusenet.com... 
    I think that IBM has marketed their parallel offering quite a bit, and has
    been used extensively by corporations for quite a number of years. Maybe
    they are having a hard time reaching non-IBM customers.


    Mark Guest

  15. #15

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    "Tim Schaefer" <com> wrote in message
    news:4zzdb.34425$atl2.webusenet.com... 
    Opteron 
    find 
    Shared nothing parallel database was invented by Teradata many years ago
    with single processor Intel 386 nodes and a proprietary OS. Not too many
    customers (including Teradata customers) are running a true share nothing
    environment anymore (one partition per node) on new hardware, but this is
    simply a function of hardware and not any limitation of any of the parallel
    database offerings. The number of partitions per node is a DBA decision.

    Most everyone has gone to multi-processor nodes with multiple partitions per
    node. It was surprising to see IBM do so well (price performance wise) with
    a two-processor/2-partition per node configuration, but I suppose that those
    AMD node "slices" are reasonably priced with Linux.


    Mark Guest

  16. #16

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    We don't support spreading a database across nodes on mixed operating
    systems. To do this, you need DB2's Federated support.

    Tim Schaefer wrote: 
    >>
    >>Load syntax is now very simple. With V8, IBM added partitioning support
    >>directly to the LOAD command, so loading a single- or multi-partition
    >>table can be done with the exact same command, e.g.:
    >>
    >> load from yourfile.del of del replace into your_table
    >>
    >>(Of course you still have control over the partitioning-specific options
    >>in the load command, if you need them).
    >>
    >>See the Data Movement Utilities Guide & Reference - Chapter 3, "Load"
    >>for more information.
    >>
    >>
    >>
    >>
    >>Good luck,
    >>
    >>
    >>
    >>
    >>-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    >>http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    >>-----== Over 100,000 Newsgroups - 19 Different Servers! =-----[/ref]
    >
    >
    >
    >[/ref]

    Blair Guest

  17. #17

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    "Mark A" <net> writes:
     

    And 300GB record as well on exactly the same hardware. Thus you can see
    the scalability story for this combo of database / hardware. See the
    complete set of results at:

    http://www.tpc.org/tpch/results/tpch_perf_results.asp
     

    As you may have noticed TPC-H database schema is in the 3NF. In fact DB2
    used the MDC (multi-dimensional clustering) feature to provide an
    equivalent of range partitioning for these benchmarks (using the date
    column for lineitem and orders table as a single dimension). For a truly
    multi-dimensional fact / dims schema, you can use lay the fact tables
    with MDC using multiple dimensions and thereby satisfy the joins with
    dimensions very efficiently.

    Here are some articles on MDC that you may want to look at:

    http://www7b.software.ibm.com/dmdd/library/techarticle/0207huras/0207huras.html
    http://www-8.ibm.com/software/au/universe/download/db2/id112.pdf
    http://www.db2mag.com/db_area/archives/2003/q2/bhattacharjee.shtml
    http://db2mag.com/db_area/archives/2003/q2/welgan.shtml

    Regards,
    --
    Haider
    Haider Guest

  18. #18

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    > "Mark A" <net> writes: [/ref]
    used 8 [/ref]
    partitions 
    > "Haider Rizvi" <ca.ibm.com> wrote in message
    > And 300GB record as well on exactly the same hardware. Thus you can see
    > the scalability story for this combo of database / hardware. See the
    > complete set of results at:
    >[/ref]
    1. Were the number of disk drives fewer with the 100GB than 300GB to reduce
    total system cost and Price/QphH?
    2. Care to comment on why SuSE Linux was chosen vs Red Hat? Technical
    reasons or marketing reasons?

    http://www.tpc.org/tpch/results/tpch_perf_results.asp [/ref]
    be [/ref]
    fact 
    >
    > As you may have noticed TPC-H database schema is in the 3NF. In fact DB2
    > used the MDC (multi-dimensional clustering) feature to provide an
    > equivalent of range partitioning for these benchmarks (using the date
    > column for lineitem and orders table as a single dimension). For a truly
    > multi-dimensional fact / dims schema, you can use lay the fact tables
    > with MDC using multiple dimensions and thereby satisfy the joins with
    > dimensions very efficiently.
    >
    > Here are some articles on MDC that you may want to look at:
    >
    >[/ref]
    http://www7b.software.ibm.com/dmdd/library/techarticle/0207huras/0207huras.html 

    I have to admit to some ignorance on one point. Can you explain how using
    MDC [ORGANIZE BY (L_SHIPDATE)] on the LINEITEM table is the equivalent of
    range partitioning? Are you saying that the PARTITIONING KEY(O_ORDERKEY)
    USING HASHING on the LINEITEM table does not determine the partition (by
    hashing), rather the data is range partitioned by ORDERDATE? If this is so,
    is there anyway to know what date ranges are on which partition?

    Mark Farnsworth
    InfoStar Solutions, LLC
    info_star at domain qw_est.net
    (remove "domain" and all underscores)


    Mark Guest

  19. #19

    Default Re: DB2 On Linux // Clustering // Shared Nothing

    Mark A wrote:

     

    MDC defines how the rows are physically laid out within the table -- it
    guarantees clustering across multiple dimensions. This is separate from
    how DB2 partitions rows across database partitions.

    MDC is similar to range partitioning because it allows DB2 to eliminate
    the portions ("branches") of the table do not meet the query criteria.

    e.g.,

    select sum(o_totalprice)
    from orders
    where o_orderdate = '1998-23-12'

    You will scan ONLY the rows that have o_orderdate = 1998-23-12 even
    though the TPCH schema does not include an index on o_orderdate. This
    applies whether you're running a partitioned environment or not.

    This can be combined very effectively with hash partitioning to make
    your queries very fast.


    Good luck,




    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
    Ian Guest

  20. #20

    Default Re: DB2 On Linux // Clustering // Shared Nothing


    "Ian" <com> wrote in message
    news:3f77a7c2$newsgroups.com... [/ref]
    using [/ref]
    of [/ref]
    so, 
    >
    > MDC defines how the rows are physically laid out within the table -- it
    > guarantees clustering across multiple dimensions. This is separate from
    > how DB2 partitions rows across database partitions.
    >
    > MDC is similar to range partitioning because it allows DB2 to eliminate
    > the portions ("branches") of the table do not meet the query criteria.
    >
    > e.g.,
    >
    > select sum(o_totalprice)
    > from orders
    > where o_orderdate = '1998-23-12'
    >
    > You will scan ONLY the rows that have o_orderdate = 1998-23-12 even
    > though the TPCH schema does not include an index on o_orderdate. This
    > applies whether you're running a partitioned environment or not.
    >
    > This can be combined very effectively with hash partitioning to make
    > your queries very fast.
    >
    >
    > Good luck,
    >[/ref]
    OK, this is very interesting. But does that mean that '1998-24-12' is also
    in the same partition? Or does it only work on "=" predicates. What is
    difference between that (MDC on orderdate) and creating a clustered index on
    orderdate (since there is only one column in the MDC in this example).

    Apologies for my ignorance on this. Is there some doent that has detailed
    description of how this works on partitioned databases?

    Mark Farnsworth
    InfoStar Solutions, LLC
    info_star at domain qw_est.net
    (remove "domain" and all underscores)


    Mark Guest

Page 1 of 3 123 LastLast

Similar Threads

  1. Clustering across subnets.
    By pldoolittle in forum Coldfusion Server Administration
    Replies: 0
    Last Post: February 23rd, 08:25 PM
  2. Clustering MX7
    By Trygve in forum Coldfusion Server Administration
    Replies: 2
    Last Post: April 30th, 09:35 AM
  3. Clustering Question
    By altimage in forum Coldfusion - Advanced Techniques
    Replies: 2
    Last Post: April 7th, 12:02 AM
  4. Replies: 3
    Last Post: February 2nd, 06:23 PM
  5. Porting a shared library from Linux to MacOS
    By Timo Benk in forum Mac Programming
    Replies: 0
    Last Post: August 29th, 09:16 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139