Professional Web Applications Themes

Constant char length is a good practice? - MySQL

Hi, People said that using constant char length is a good practice as the speed of accessing of a row is faster. However, as the size of the table might get bigger and bigger, is this really a good trade off?...

  1. #1

    Default Constant char length is a good practice?

    Hi,

    People said that using constant char length is a good practice as the
    speed of accessing of a row is faster.

    However, as the size of the table might get bigger and bigger, is this
    really a good trade off?

    howachen@gmail.com Guest

  2. #2

    Default Re: Constant char length is a good practice?

    [email]howachen[/email] wrote:
    > People said that using constant char length is a good practice as the
    > speed of accessing of a row is faster.
    Here's the page that describes the advantages of using fixed-length CHAR
    columns over dynamic-length VARCHAR columns.

    [url]http://dev.mysql.com/doc/refman/5.0/en/static-format.html[/url]

    This applies only to MyISAM tables. Note also that _all_ the columns in
    a table must be fixed-length datatypes, or else you don't get the
    advantage. That is, if the table contains at least one VARCHAR, BLOB,
    or TEXT column, then that whole table is stored as a dynamic table.

    That doc page does say that fixed-length rows have some performance
    advantage, but it doesn't quantify it. So it could be a trivial advantage.

    I usually use the datatype that most fits the application, then worry
    about optimizations if I determine that it's causing a bottleneck.
    Other improvements like indexes and caches are almost always more
    important than the performance difference between CHAR and VARCHAR.

    Regards,
    Bill K.
    Bill Karwin Guest

  3. #3

    Default Re: Constant char length is a good practice?


    Bill Karwin 寫道:
    > [email]howachen[/email] wrote:
    > > People said that using constant char length is a good practice as the
    > > speed of accessing of a row is faster.
    >
    > Here's the page that describes the advantages of using fixed-length CHAR
    > columns over dynamic-length VARCHAR columns.
    >
    > [url]http://dev.mysql.com/doc/refman/5.0/en/static-format.html[/url]
    >
    > This applies only to MyISAM tables. Note also that _all_ the columns in
    > a table must be fixed-length datatypes, or else you don't get the
    > advantage. That is, if the table contains at least one VARCHAR, BLOB,
    > or TEXT column, then that whole table is stored as a dynamic table.
    >
    > That doc page does say that fixed-length rows have some performance
    > advantage, but it doesn't quantify it. So it could be a trivial advantage.
    >
    > I usually use the datatype that most fits the application, then worry
    > about optimizations if I determine that it's causing a bottleneck.
    > Other improvements like indexes and caches are almost always more
    > important than the performance difference between CHAR and VARCHAR.
    >
    > Regards,
    > Bill K.
    thanks.

    howachen@gmail.com Guest

  4. #4

    Default Re: Constant char length is a good practice?


    Bill Karwin 寫道:
    > [email]howachen[/email] wrote:
    > > People said that using constant char length is a good practice as the
    > > speed of accessing of a row is faster.
    >
    > Here's the page that describes the advantages of using fixed-length CHAR
    > columns over dynamic-length VARCHAR columns.
    >
    > [url]http://dev.mysql.com/doc/refman/5.0/en/static-format.html[/url]
    >
    > This applies only to MyISAM tables. Note also that _all_ the columns in
    > a table must be fixed-length datatypes, or else you don't get the
    > advantage. That is, if the table contains at least one VARCHAR, BLOB,
    > or TEXT column, then that whole table is stored as a dynamic table.
    >
    > That doc page does say that fixed-length rows have some performance
    > advantage, but it doesn't quantify it. So it could be a trivial advantage.
    >
    > I usually use the datatype that most fits the application, then worry
    > about optimizations if I determine that it's causing a bottleneck.
    > Other improvements like indexes and caches are almost always more
    > important than the performance difference between CHAR and VARCHAR.
    >
    > Regards,
    > Bill K.
    it that mean this problem does not exist if i use InnoDB, rather then
    MyISAM ?

    thanks...

    howachen@gmail.com Guest

  5. #5

    Default Re: Constant char length is a good practice?

    [email]howachen[/email] wrote:
    > Bill Karwin wrote
    >>
    >> Here's the page that describes the advantages of using fixed-length CHAR
    >> columns over dynamic-length VARCHAR columns.
    >>
    >> [url]http://dev.mysql.com/doc/refman/5.0/en/static-format.html[/url]
    >>
    >> That doc page does say that fixed-length rows have some performance
    >> advantage, but it doesn't quantify it. So it could be a trivial advantage
    Advantages from fixed length records depend heavily on the table size
    and usage pattern. In most cases the extra lookup of the record
    position (compared to the simple calculation record# * recordlen) is
    *not* a problem. A more serious problem is table space fragmentation.
    It happens if you modify and/or delete in a table with variable length
    records. MyISAM doesn't try very hard to find a place for new/modified
    records. So a busy table may end with lots of holes in the data file.
    Fixed length records completely eliminate the fragmentation problem.
    Records can be modified in place and a "hole" has always space for one
    or more complete records.

    InnoDB does not have the (extra) fragmentation problem, because it
    stores records of variable length in a single table space anyway.
    Therefore InnoDB always has to cope with fragmentation.
    >> I usually use the datatype that most fits the application, then worry
    >> about optimizations if I determine that it's causing a bottleneck.
    >> Other improvements like indexes and caches are almost always more
    >> important than the performance difference between CHAR and VARCHAR.
    Exactly.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel Schwenke Guest

  6. #6

    Default Re: Constant char length is a good practice?

    Axel Schwenke wrote:
    > [email]howachen[/email] wrote:
    >
    >>Bill Karwin wrote
    >>
    >>>Here's the page that describes the advantages of using fixed-length CHAR
    >>>columns over dynamic-length VARCHAR columns.
    >>>
    >>>[url]http://dev.mysql.com/doc/refman/5.0/en/static-format.html[/url]
    >>>
    >>>That doc page does say that fixed-length rows have some performance
    >>>advantage, but it doesn't quantify it. So it could be a trivial advantage
    >
    >
    > Advantages from fixed length records depend heavily on the table size
    > and usage pattern. In most cases the extra lookup of the record
    > position (compared to the simple calculation record# * recordlen) is
    > *not* a problem. A more serious problem is table space fragmentation.
    > It happens if you modify and/or delete in a table with variable length
    > records. MyISAM doesn't try very hard to find a place for new/modified
    > records. So a busy table may end with lots of holes in the data file.
    > Fixed length records completely eliminate the fragmentation problem.
    > Records can be modified in place and a "hole" has always space for one
    > or more complete records.
    >
    > InnoDB does not have the (extra) fragmentation problem, because it
    > stores records of variable length in a single table space anyway.
    > Therefore InnoDB always has to cope with fragmentation.
    >
    >
    >>>I usually use the datatype that most fits the application, then worry
    >>>about optimizations if I determine that it's causing a bottleneck.
    >>>Other improvements like indexes and caches are almost always more
    >>>important than the performance difference between CHAR and VARCHAR.
    >
    >
    > Exactly.
    >
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    > MySQL User Forums: [url]http://forums.mysql.com/[/url]
    It can, however, have serious effects when MySQL needs to do a table scan.

    If all the fields are fixed length, the table scan is very easy - just add the
    size of a record to the current pointer to get the next record. However, if
    there are variable length fields in the record, MySQL needs to calculate the
    size of each column in the current row to get to the next row.

    But InnoDB does have fragmentation problems, also. Yes, it stores variable
    length records in a separate table space. But the fragmentation still occurs,
    and eventually can cause performance problems.

    More advanced databases have REORG commands. MySQL doesn't have one, but you
    can do a backup/delete/restore to get the same effect.


    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  7. #7

    Default Re: Constant char length is a good practice?

    Jerry Stuckle <jstucklexattglobal.net> wrote:
    > Axel Schwenke wrote:
    [MyISAM fragmentation]
    >> InnoDB does not have the (extra) fragmentation problem, because it
    >> stores records of variable length in a single table space anyway.
    >> Therefore InnoDB always has to cope with fragmentation.
    > It can, however, have serious effects when MySQL needs to do a table scan.
    Not *so* serious. MyISAM engine reads hunks of read_buffer_size anyway.
    So again the difference between fixed and variable length records is
    just another lookup for the record length. Details can be found here:

    [url]http://dev.mysql.com/doc/internals/en/myisam-introduction.html[/url]
    > But InnoDB does have fragmentation problems, also. Yes, it stores variable
    > length records in a separate table space. But the fragmentation still occurs,
    > and eventually can cause performance problems.
    In fact fragmentation is inevitable in InnoDB - because there always
    are records of different lenght. And InnoDB stores data on pages.
    However, InnoDB can avoid fragmentation in some cases where MyISAM
    can't. Because InnoDB was designed with fragmentation in mind.
    > More advanced databases have REORG commands. MySQL doesn't have one, but you
    > can do a backup/delete/restore to get the same effect.
    For InnoDB you can use ALTER TABLE ... ENGINE=InnoDB. Of course this
    is doented in the manual:

    [url]http://dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html[/url]


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel Schwenke Guest

  8. #8

    Default Re: Constant char length is a good practice?

    Axel Schwenke wrote:
    > Jerry Stuckle <jstucklexattglobal.net> wrote:
    >
    >>Axel Schwenke wrote:
    >
    >
    > [MyISAM fragmentation]
    >
    >
    >>>InnoDB does not have the (extra) fragmentation problem, because it
    >>>stores records of variable length in a single table space anyway.
    >>>Therefore InnoDB always has to cope with fragmentation.
    >
    >
    >>It can, however, have serious effects when MySQL needs to do a table scan.
    >
    >
    > Not *so* serious. MyISAM engine reads hunks of read_buffer_size anyway.
    > So again the difference between fixed and variable length records is
    > just another lookup for the record length. Details can be found here:
    >
    > [url]http://dev.mysql.com/doc/internals/en/myisam-introduction.html[/url]
    >
    Yes, it does. But where it starts can matter. A large row with several
    variable length fields may require multiple additional buffers being read.
    Additionally, just having to read the data and calculate the beginning of the
    next column in that row takes time.
    >
    >>But InnoDB does have fragmentation problems, also. Yes, it stores variable
    >>length records in a separate table space. But the fragmentation still occurs,
    >>and eventually can cause performance problems.
    >
    >
    > In fact fragmentation is inevitable in InnoDB - because there always
    > are records of different lenght. And InnoDB stores data on pages.
    > However, InnoDB can avoid fragmentation in some cases where MyISAM
    > can't. Because InnoDB was designed with fragmentation in mind.
    >
    Yes, InnoDB was designed with fragmentation in mind. And it does a decent job
    at limiting fragmentation. But that in itself adds additionally overhead. And
    fragmentation does exist.
    >
    >>More advanced databases have REORG commands. MySQL doesn't have one, but you
    >>can do a backup/delete/restore to get the same effect.
    >
    >
    > For InnoDB you can use ALTER TABLE ... ENGINE=InnoDB. Of course this
    > is doented in the manual:
    >
    > [url]http://dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html[/url]
    >
    Yes, you can change the table to be managed by InnoDB. But that's not at all
    the same as reorganizing the table.

    All of the more advanced engines - Oracle, SQL Server, DB2, etc. also have
    fragmentation-minimizing algorithms. Most are more advanced that InnoDB's -
    because they are more tightly integrated into the database instead of being a
    separate engine. And all of them have a need for a REORG command to defragment
    tables.

    And all of them agree that working with fixed length columns is more efficient
    than variable length columns for the above reasons.
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    > MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel,


    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  9. #9

    Default Re: Constant char length is a good practice?


    "Jerry Stuckle" <jstucklexattglobal.net> wrote in message
    news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2dcomcast.com. ..
    > Axel Schwenke wrote:
    >> Jerry Stuckle <jstucklexattglobal.net> wrote:
    >>
    >>>Axel Schwenke wrote:
    >>
    >>
    >> [MyISAM fragmentation]
    >>
    <snip>
    >>
    >>>More advanced databases have REORG commands. MySQL doesn't have one, but
    >>>you
    >>>can do a backup/delete/restore to get the same effect.
    MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim
    the unused space and to defragment the data file. Refer to
    [url]http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html[/url]

    --
    Tony Marston

    [url]http://www.tonymarston.net[/url]
    [url]http://www.radicore.org[/url]



    Tony Marston Guest

  10. #10

    Default Re: Constant char length is a good practice?

    Tony Marston wrote:
    > "Jerry Stuckle" <jstucklexattglobal.net> wrote in message
    > news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2dcomcast.com. ..
    >
    >>Axel Schwenke wrote:
    >>
    >>>Jerry Stuckle <jstucklexattglobal.net> wrote:
    >>>
    >>>
    >>>>Axel Schwenke wrote:
    >>>
    >>>
    >>>[MyISAM fragmentation]
    >>>
    >
    > <snip>
    >
    >>>>More advanced databases have REORG commands. MySQL doesn't have one, but
    >>>>you
    >>>>can do a backup/delete/restore to get the same effect.
    >
    >
    > MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim
    > the unused space and to defragment the data file. Refer to
    > [url]http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html[/url]
    >
    Tony,

    Yes, I'm familiar with the OPTIMIZE TABLE command. But it only does part of
    what a REORG command does.

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  11. #11

    Default Re: Constant char length is a good practice?


    Jerry Stuckle 寫道:
    > Tony Marston wrote:
    > > "Jerry Stuckle" <jstucklexattglobal.net> wrote in message
    > > news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2dcomcast.com. ..
    > >
    > >>Axel Schwenke wrote:
    > >>
    > >>>Jerry Stuckle <jstucklexattglobal.net> wrote:
    > >>>
    > >>>
    > >>>>Axel Schwenke wrote:
    > >>>
    > >>>
    > >>>[MyISAM fragmentation]
    > >>>
    > >
    > > <snip>
    > >
    > >>>>More advanced databases have REORG commands. MySQL doesn't have one,but
    > >>>>you
    > >>>>can do a backup/delete/restore to get the same effect.
    > >
    > >
    > > MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim
    > > the unused space and to defragment the data file. Refer to
    > > [url]http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html[/url]
    > >
    >
    > Tony,
    >
    > Yes, I'm familiar with the OPTIMIZE TABLE command. But it only does partof
    > what a REORG command does.
    >
    > --
    > ==================
    > Remove the "x" from my email address
    > Jerry Stuckle
    > JDS Computer Training Corp.
    > [email]jstucklexattglobal.net[/email]
    > ==================
    so what normally "REORG" do ?

    howachen@gmail.com Guest

  12. #12

    Default Re: Constant char length is a good practice?

    Jerry Stuckle <jstucklexattglobal.net> wrote:
    > Axel Schwenke wrote:
    >>>More advanced databases have REORG commands. MySQL doesn't have one, but you
    >>>can do a backup/delete/restore to get the same effect.
    >>
    >>
    >> For InnoDB you can use ALTER TABLE ... ENGINE=InnoDB. Of course this
    >> is doented in the manual:
    >>
    >> [url]http://dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html[/url]
    >
    > Yes, you can change the table to be managed by InnoDB. But that's not at all
    > the same as reorganizing the table.
    Jerry,

    maybe you should follow my pointers to the manual - at least sometimes.
    If you ALTER TABLE an InnoDB table to ENGINE=InnoDB again, it will get
    reorganized.
    > All of the more advanced engines - Oracle, SQL Server, DB2, etc. also have
    > fragmentation-minimizing algorithms. Most are more advanced that InnoDB's -
    > because they are more tightly integrated into the database instead of being a
    > separate engine. And all of them have a need for a REORG command to defragment
    > tables.
    This is not a valid proof. A defrag tool for Linux' ext2 file system
    exists as well. This does not proove that ext2 is susceptible to
    fragmentation. In fact it isn't.
    > And all of them agree that working with fixed length columns is more efficient
    > than variable length columns for the above reasons.
    Agreed. But again: the advantages from using fixed length records
    cannot easily be quantified. In most cases the positive effects are
    overestimated. My own measurements in the past showed effects next to
    nothing. Except quite heavy fragmentation for a "spool" table. This was
    easily fixed by a weekly OPTIMIZE TABLE. If you need total certainty,
    you should measure yourself. Every single case.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel Schwenke Guest

  13. #13

    Default Re: Constant char length is a good practice?

    [email]howachen[/email] wrote:
    > Jerry Stuckle 寫道:
    >
    >
    >>Tony Marston wrote:
    >>
    >>>"Jerry Stuckle" <jstucklexattglobal.net> wrote in message
    >>>news:RZadnUJrOqkPgDTZnZ2dnUVZ_omdnZ2dcomcast.c om...
    >>>
    >>>
    >>>>Axel Schwenke wrote:
    >>>>
    >>>>
    >>>>>Jerry Stuckle <jstucklexattglobal.net> wrote:
    >>>>>
    >>>>>
    >>>>>
    >>>>>>Axel Schwenke wrote:
    >>>>>
    >>>>>
    >>>>>[MyISAM fragmentation]
    >>>>>
    >>>
    >>><snip>
    >>>
    >>>>>>More advanced databases have REORG commands. MySQL doesn't have one, but
    >>>>>>you
    >>>>>>can do a backup/delete/restore to get the same effect.
    >>>
    >>>
    >>>MySQL has the OPTIMIZE TABLE command for MyISAM tables which will reclaim
    >>>the unused space and to defragment the data file. Refer to
    >>>[url]http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html[/url]
    >>>
    >>
    >>Tony,
    >>
    >>Yes, I'm familiar with the OPTIMIZE TABLE command. But it only does part of
    >>what a REORG command does.
    >>
    >>--
    >>==================
    >>Remove the "x" from my email address
    >>Jerry Stuckle
    >>JDS Computer Training Corp.
    >>jstucklexattglobal.net
    >>==================
    >
    >
    > so what normally "REORG" do ?
    >

    It also reorganizes the data in a table according to a specific index to make
    access faster, compresses the table to get rid of data and optionally frees the
    extra space (it can also keep extra space so it doesn't need to be reallocated
    immediately on the next INSERT operation or UPDATE operation with longer
    variable length data).

    Also, depending on the underlying file system, it can defragment the table so
    that all clusters are together.

    Additionally, it will provide input to the internal statistics monitor to allow
    the database manager to more intelligently select the most efficient method of
    accessing the data.

    It really does a lot more than just optimize table - which basically only
    compresses the empty space out of the table.




    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  14. #14

    Default Re: Constant char length is a good practice?

    Axel Schwenke wrote:
    > Jerry Stuckle <jstucklexattglobal.net> wrote:
    >
    >>Axel Schwenke wrote:
    >
    >
    >>>>More advanced databases have REORG commands. MySQL doesn't have one, but you
    >>>>can do a backup/delete/restore to get the same effect.
    >>>
    >>>
    >>>For InnoDB you can use ALTER TABLE ... ENGINE=InnoDB. Of course this
    >>>is doented in the manual:
    >>>
    >>>[url]http://dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html[/url]
    >>
    >>Yes, you can change the table to be managed by InnoDB. But that's not at all
    >>the same as reorganizing the table.
    >
    >
    > Jerry,
    >
    > maybe you should follow my pointers to the manual - at least sometimes.
    > If you ALTER TABLE an InnoDB table to ENGINE=InnoDB again, it will get
    > reorganized.
    >
    >
    >>All of the more advanced engines - Oracle, SQL Server, DB2, etc. also have
    >>fragmentation-minimizing algorithms. Most are more advanced that InnoDB's -
    >>because they are more tightly integrated into the database instead of being a
    >>separate engine. And all of them have a need for a REORG command to defragment
    >>tables.
    >
    >
    > This is not a valid proof. A defrag tool for Linux' ext2 file system
    > exists as well. This does not proove that ext2 is susceptible to
    > fragmentation. In fact it isn't.
    >
    If there was no need for the REORG command, why would all of the major databases
    put all the time and money into developing such a command? It's not a trivial job.

    REORG is much more than just compression. And, unfortunately, ext2 is
    susceptible to fragmentation. However, it's just a lot more efficient about how
    it handles fragmentation. The result being it really doesn't need defragging.

    But this isn't about the underlying file system's fragmentation. It's also
    about fragmentation WITHIN the file, since MySQL keeps most data types in a
    single file. So even if the file itself isn't fragmented in the file system,
    the data can (and very likely is after a few operations) be.
    >
    >>And all of them agree that working with fixed length columns is more efficient
    >>than variable length columns for the above reasons.
    >
    >
    > Agreed. But again: the advantages from using fixed length records
    > cannot easily be quantified. In most cases the positive effects are
    > overestimated. My own measurements in the past showed effects next to
    > nothing. Except quite heavy fragmentation for a "spool" table. This was
    > easily fixed by a weekly OPTIMIZE TABLE. If you need total certainty,
    > you should measure yourself. Every single case.
    >
    It also depends on the database manager, the actual data and a whole host of
    variables. But lets take an simple example. We have a table with five columns,
    all 40 chars long. We are doing a table scan, checking for a specific value in
    the 3rd column.

    If these are CHAR(40) columns, the size of a row will be 200 bytes. If the
    manager is doing a table scan, it can pull 200 bytes and compare starting 80
    bytes in. If there is a match, it can transfer 200 bytes to its output buffer
    for further processing. When done, it can go 200 bytes past the current row to
    get the next one.

    Compare this to VARCHAR(40) columns. A row can be between 5 and 205 bytes (one
    byte for the length). The manager needs to fetch the first row and get the
    length of the first field (x1). It needs to go x2+1 bytes to get the length of
    the next field (x2). It needs to go x2 bytes past the second field to get to
    the third. It then needs to check the length of the third field (x3) to limit
    the length of the needed comparison. It has to remember if there is a match,
    then get x4 and x5 to get the length of the last two fields. If there is a
    match, it then transfers the row to its output buffer. It then needs to see if
    there is any leftover space at the end of this row, and if so add that to the
    length of the data. Finally it knows where the next record starts. It can
    fetch that row (which may have all, none or part of the row already in the
    buffer from the previous read).

    This is a significant amount of additional processing for VARCHAR fields.

    Now - if most of the data is generally much smaller than the max (i.e. average 5
    bytes per column), the space savings will be significant - 30 bytes vs. 200,
    requiring fewer disk operations (and even fewer reads because it will get up to
    six rows in a single read). In a case like this the overhead of processing the
    VARCHAR may be more than compensated for by the fewer disk operations required.
    The result can be that VARCHAR access would be faster. However, if all the
    fields are very close to the maximum (i.e. 38 chars), processing VARCHAR fields
    may be slower.

    But I do agree with you on one thing. You need to test each individual case to
    determine which is the best for *this* instance. It is way too data and access
    dependent.


    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    > MySQL User Forums: [url]http://forums.mysql.com/[/url]

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  15. #15

    Default Re: Constant char length is a good practice?

    Jerry Stuckle <jstucklexattglobal.net> wrote:
    > Axel Schwenke wrote:
    >> again: the advantages from using fixed length records
    >> cannot easily be quantified. In most cases the positive effects are
    >> overestimated.
    >
    > lets take an simple example. We have a table with five columns,
    > all 40 chars long. We are doing a table scan, checking for a specific value in
    > the 3rd column.
    [lengthy description snipped]
    > This is a significant amount of additional processing for VARCHAR fields.
    Jerry, you are right. Processing variable length records is more
    expensive than processing fixed length records. But it's only CPU
    cycles. Real world databases are always I/O bound, never CPU bound.

    Lets do another calculation. Assume we have a 1GHz CPU and a hard disk
    with 5ms avg. seek time. Then for each seek of the hard disk we have
    5.000.000 CPU cycles to spare. You can do a lot of processing with 5
    million CPU cycles.

    Another calculation. Lets stay with 200 byte records. Assume we get
    20MB/s from the hard disks. Thats 100.000 rows per second. Then we can
    spend 10.000 CPU cycles per row without losing speed. You don't need
    10.000 cycles to read a few length fields from the record and add them.
    (remember: the record is already in memory)

    Also, nobody reads single rows for table scans. MyISAM reads hunks of
    8MB (default read_buffer_size, can be increased). There should be many
    records in this 8MB buffer. The only uncomfortability from variable
    length records is the copying of the last, probably incomplete record
    to the buffer start to complete it with the next read. This effect
    will be negligible if 100 or more rows fit in the read buffer.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel Schwenke Guest

  16. #16

    Default Re: Constant char length is a good practice?


    Axel Schwenke 寫道:
    > Jerry Stuckle <jstucklexattglobal.net> wrote:
    > > Axel Schwenke wrote:
    >
    > >> again: the advantages from using fixed length records
    > >> cannot easily be quantified. In most cases the positive effects are
    > >> overestimated.
    > >
    > > lets take an simple example. We have a table with five columns,
    > > all 40 chars long. We are doing a table scan, checking for a specific value in
    > > the 3rd column.
    >
    > [lengthy description snipped]
    >
    > > This is a significant amount of additional processing for VARCHAR fields.
    >
    > Jerry, you are right. Processing variable length records is more
    > expensive than processing fixed length records. But it's only CPU
    > cycles. Real world databases are always I/O bound, never CPU bound.
    >
    > Lets do another calculation. Assume we have a 1GHz CPU and a hard disk
    > with 5ms avg. seek time. Then for each seek of the hard disk we have
    > 5.000.000 CPU cycles to spare. You can do a lot of processing with 5
    > million CPU cycles.
    >
    > Another calculation. Lets stay with 200 byte records. Assume we get
    > 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can
    > spend 10.000 CPU cycles per row without losing speed. You don't need
    > 10.000 cycles to read a few length fields from the record and add them.
    > (remember: the record is already in memory)
    >
    > Also, nobody reads single rows for table scans. MyISAM reads hunks of
    > 8MB (default read_buffer_size, can be increased). There should be many
    > records in this 8MB buffer. The only uncomfortability from variable
    > length records is the copying of the last, probably incomplete record
    > to the buffer start to complete it with the next read. This effect
    > will be negligible if 100 or more rows fit in the read buffer.
    >
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    > MySQL User Forums: [url]http://forums.mysql.com/[/url]
    thanks....

    in fact, currently our solution is always divide a table into two
    sub-tables - static part (fixed length) & dynamic part (varchar/text)

    table join with other primary tables will frequently use the static
    part so as to speed up the query of the overall system.

    Of coz there will additional overhead when you need to call the dynamic
    part for detail information, as you need to perform join on the static
    part & dynamic part instead of just a single query.

    howachen@gmail.com Guest

  17. #17

    Default Re: Constant char length is a good practice?

    Axel Schwenke wrote:
    > Jerry Stuckle <jstucklexattglobal.net> wrote:
    >
    >>Axel Schwenke wrote:
    >
    >
    >>>again: the advantages from using fixed length records
    >>>cannot easily be quantified. In most cases the positive effects are
    >>>overestimated.
    >>
    >>lets take an simple example. We have a table with five columns,
    >>all 40 chars long. We are doing a table scan, checking for a specific value in
    >>the 3rd column.
    >
    >
    > [lengthy description snipped]
    >
    >
    >>This is a significant amount of additional processing for VARCHAR fields.
    >
    >
    > Jerry, you are right. Processing variable length records is more
    > expensive than processing fixed length records. But it's only CPU
    > cycles. Real world databases are always I/O bound, never CPU bound.
    >
    Not necessarily. I've seen many cases where the system is running virtually 100% cpu for the
    database. Everything is in memory, so there is no I/O work being done. Also, remember the both the
    OS and hardware can buffer data, shorting data access time. And there is nothing such as "Only cpu
    cycles". Those same cpu cycles are used by the web server, MTA and all kinds of other things. The
    dbm isn't the only thing running on the machine. Even if it were - you'd still have the overhead of
    the networking to get data to and from the dbm.
    > Lets do another calculation. Assume we have a 1GHz CPU and a hard disk
    > with 5ms avg. seek time. Then for each seek of the hard disk we have
    > 5.000.000 CPU cycles to spare. You can do a lot of processing with 5
    > million CPU cycles.
    >
    Yep. But that's also not 5,000,000 instructions. Virtually every instruction takes multiple
    cycles. Depending on the instruction, it could be as many as 7 or 8 CPU cycles. And don't forget -
    the OS needs instructions to perform I/O, also. And finally, the CPU must perform the transfer from
    the disk buffer to RAM, or, if DMA (Direct Memory Access) is available, the CPU must pause while the
    data bus is being used for the transfer.
    > Another calculation. Lets stay with 200 byte records. Assume we get
    > 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can
    > spend 10.000 CPU cycles per row without losing speed. You don't need
    > 10.000 cycles to read a few length fields from the record and add them.
    > (remember: the record is already in memory)
    >
    Which is a lot less than 10K instructions - probably around 2-4K machine instructions, maybe fewer.
    And don't forget - many of those will be in the OS setting up and performing the I/O. In fact,
    such a request will most likely be CPU bound - it will take much more than 10K CPU cycles to do all
    the necessary work.
    > Also, nobody reads single rows for table scans. MyISAM reads hunks of
    > 8MB (default read_buffer_size, can be increased). There should be many
    > records in this 8MB buffer. The only uncomfortability from variable
    > length records is the copying of the last, probably incomplete record
    > to the buffer start to complete it with the next read. This effect
    > will be negligible if 100 or more rows fit in the read buffer.
    >
    Yes, 8M is the read buffer size. But that doesn't mean it reads 8M in at one time. That would be
    terribly inefficient. First of all, there is no reason to read 8M if all you want is 200 bytes.
    That's a lot of time spent transferring data unnecessarily from the disk.

    Secondly, now you've just wiped any previous operations in your buffer. Even more inefficient.
    Lets say you have 8K worth of data which is heavily used - in fact every other operation gets data
    from this 8K. That means you read 8M to get your 8K. Then another operation comes in and reads 8M
    to get 200 bytes, wiping out your original 8K in the buffer. Now you need to access the original 8K
    again. Whoops, it's been wiped out by the second read. Go get 8M so you can access your 8K again.

    I can't think of anything LESS optimal. Individual reads would be much more efficient!
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    > MySQL User Forums: [url]http://forums.mysql.com/[/url]

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  18. #18

    Default Re: Constant char length is a good practice?

    [email]howachen[/email] wrote:
    >
    >
    > thanks....
    >
    > in fact, currently our solution is always divide a table into two
    > sub-tables - static part (fixed length) & dynamic part (varchar/text)
    >
    > table join with other primary tables will frequently use the static
    > part so as to speed up the query of the overall system.
    >
    > Of coz there will additional overhead when you need to call the dynamic
    > part for detail information, as you need to perform join on the static
    > part & dynamic part instead of just a single query.
    >
    But joining two tables has its own overhead, also. It takes (at least) two seeks to retrieve the
    data from the disk, for instance. And comparing the extra primary key takes additional time.

    Your "solution" may in fact be significantly slower than having everything in one table.

    Again, you need to test using live data.


    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  19. #19

    Default Re: Constant char length is a good practice?

    Jerry Stuckle <jstucklexattglobal.net> wrote:
    > Axel Schwenke wrote:
    >>
    >> Jerry, you are right. Processing variable length records is more
    >> expensive than processing fixed length records. But it's only CPU
    >> cycles. Real world databases are always I/O bound, never CPU bound.
    >
    > Not necessarily. I've seen many cases where the system is running virtually 100% cpu for the
    > database. Everything is in memory, so there is no I/O work being done.
    I wouldn't call this a /database/ then. It's more like a cache.

    Also this discussion started with "there may be a severe impact on
    doing table scans" - your point in
    <d-2dndCiTaQXHzXZnZ2dnUVZ_vOdnZ2dcomcast.com>.
    Most of my arguments are specific to that. You're no longer talking
    about table scans.
    > the CPU must perform the transfer from the disk buffer to RAM, or,
    > if DMA (Direct Memory Access) is available, the CPU must pause while
    > the data bus is being used for the transfer.
    Are we talking about servers? There always is DMA. Also memory
    bandwidth is far above the assumed 20MB/s. IIRC it is somewhere in the
    1GB/s range. So a running DMA does not hog the bus completely. BTW, my
    home "server" - in fact a desktop machine with some more RAM and disks
    - shows about 10% cpu time spent in kernel/interrupts when the I/O
    system is maxed out (rebuild of soft RAID).

    For the curious: it's an AMD Athlon64/3000+ with 2GB RAM and 4 SATA
    disks 400GB each. Running Debian/Sarge x64 edition.
    >> Another calculation. Lets stay with 200 byte records. Assume we get
    >> 20MB/s from the hard disks. Thats 100.000 rows per second. Then we can
    >> spend 10.000 CPU cycles per row without losing speed. You don't need
    >> 10.000 cycles to read a few length fields from the record and add them.
    >> (remember: the record is already in memory)
    >>
    >
    > Which is a lot less than 10K instructions - probably around 2-4K
    > machine instructions, maybe fewer.
    So how many instructions will it take to read two integers (length of
    record, length of empty space behind record) from the record buffer
    and add them to the record pointer? 20, 50 instructions? That's less
    than 1% of the available. Come on!
    >> Also, nobody reads single rows for table scans. MyISAM reads hunks of
    >> 8MB (default read_buffer_size, can be increased). There should be many
    >> records in this 8MB buffer. The only uncomfortability from variable
    >> length records is the copying of the last, probably incomplete record
    >> to the buffer start to complete it with the next read. This effect
    >> will be negligible if 100 or more rows fit in the read buffer.
    >
    > Yes, 8M is the read buffer size. But that doesn't mean it reads 8M
    > in at one time.
    Sure it does. For TABLE SCANS.
    > there is no reason to read 8M if all you want is 200 bytes.
    We're doing a TABLE SCAN. We want *all* the records.
    > Secondly, now you've just wiped any previous operations in your buffer.
    The read buffer is per thread. A MySQL thread will only do one thing at
    a time. This time it's doing a TABLE SCAN.

    Jerry, sorry to say so, but I don't see any progress is this
    discussion. I'm feeling I'm wasting my time. Unless you can come up
    with some hard facts, substanciating your claims - please stop it!

    Thanks.

    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel Schwenke Guest

  20. #20

    Default Re: Constant char length is a good practice?

    Axel Schwenke wrote:
    > Jerry Stuckle <jstucklexattglobal.net> wrote:
    >
    >>Axel Schwenke wrote:
    >>
    >>>Jerry, you are right. Processing variable length records is more
    >>>expensive than processing fixed length records. But it's only CPU
    >>>cycles. Real world databases are always I/O bound, never CPU bound.
    >>
    >>Not necessarily. I've seen many cases where the system is running virtually 100% cpu for the
    >>database. Everything is in memory, so there is no I/O work being done.
    >
    >
    > I wouldn't call this a /database/ then. It's more like a cache.
    >
    Sure it's a database. But one of the things the DBM, OS and hardware
    can do to improve access is cache the data. And if you have the memory,
    large caches are great. But that doesn't mean it isn't a RDB.
    > Also this discussion started with "there may be a severe impact on
    > doing table scans" - your point in
    > <d-2dndCiTaQXHzXZnZ2dnUVZ_vOdnZ2dcomcast.com>.
    > Most of my arguments are specific to that. You're no longer talking
    > about table scans.
    >
    And many of those operations *may* be table scans. Or they can be more
    complex operations.
    >
    >>the CPU must perform the transfer from the disk buffer to RAM, or,
    >>if DMA (Direct Memory Access) is available, the CPU must pause while
    >>the data bus is being used for the transfer.
    >
    >
    > Are we talking about servers? There always is DMA. Also memory
    > bandwidth is far above the assumed 20MB/s. IIRC it is somewhere in the
    > 1GB/s range. So a running DMA does not hog the bus completely. BTW, my
    > home "server" - in fact a desktop machine with some more RAM and disks
    > - shows about 10% cpu time spent in kernel/interrupts when the I/O
    > system is maxed out (rebuild of soft RAID).
    >
    No, there is not always DMA. And no, DMA doesn't hog the bus completely.
    Please read what I said - "the CPU must pause while the data bus is
    being used for the transfer". That is still accurate.
    > For the curious: it's an AMD Athlon64/3000+ with 2GB RAM and 4 SATA
    > disks 400GB each. Running Debian/Sarge x64 edition.
    >
    That's one type of server. As an aside, mainframes have much better I/O
    and DMA - in fact, they can DMA without pausing the processing. But
    they still can be either I/O or CPU bound (or both at different times).

    But again, not everyone's running servers like this for their databases.
    >
    >>>Another calculation. Lets stay with 200 byte records. Assume we get
    >>>20MB/s from the hard disks. Thats 100.000 rows per second. Then we can
    >>>spend 10.000 CPU cycles per row without losing speed. You don't need
    >>>10.000 cycles to read a few length fields from the record and add them.
    >>>(remember: the record is already in memory)
    >>>
    >>
    >>Which is a lot less than 10K instructions - probably around 2-4K
    >>machine instructions, maybe fewer.
    >
    >
    > So how many instructions will it take to read two integers (length of
    > record, length of empty space behind record) from the record buffer
    > and add them to the record pointer? 20, 50 instructions? That's less
    > than 1% of the available. Come on!
    >
    First you have to fetch the buffer, which will take a few hundred
    instructions. Then to fetch the buffer contents, you must make a call
    to the OS. The necessary context switch to ring 0 and back will take at
    least 50 instructions. You're probably talking at least 500
    instructions, just to fetch the two integers. That would probably
    average about 1.5K to 2K machine cycles - or 15-20% of the available CPU.
    >
    >>>Also, nobody reads single rows for table scans. MyISAM reads hunks of
    >>>8MB (default read_buffer_size, can be increased). There should be many
    >>>records in this 8MB buffer. The only uncomfortability from variable
    >>>length records is the copying of the last, probably incomplete record
    >>>to the buffer start to complete it with the next read. This effect
    >>>will be negligible if 100 or more rows fit in the read buffer.
    >>
    >>Yes, 8M is the read buffer size. But that doesn't mean it reads 8M
    >>in at one time.
    >
    >
    > Sure it does. For TABLE SCANS.
    >
    I would be very surprised if MySQL did something so inefficient. Their
    developers are much smarter than that.
    >
    >>there is no reason to read 8M if all you want is 200 bytes.
    >
    >
    > We're doing a TABLE SCAN. We want *all* the records.
    >
    Eventually, yes. That doesn't mean you need to read all the records
    immediately. That's fine if you take this operation alone. But it also
    only makes sense if you completely ignore the fact other operations may
    be taking place concurrently. No other DBM I'm familiar is this
    inefficient, and I really doubt the MySQL developers would be that stupid.
    >
    >>Secondly, now you've just wiped any previous operations in your buffer.
    >
    >
    > The read buffer is per thread. A MySQL thread will only do one thing at
    > a time. This time it's doing a TABLE SCAN.
    >
    > Jerry, sorry to say so, but I don't see any progress is this
    > discussion. I'm feeling I'm wasting my time. Unless you can come up
    > with some hard facts, substanciating your claims - please stop it!
    >
    So you're saying that if thread 1 gets data in its buffer, then the next
    operation happens to get thread 2, that the data has to be read again?
    How inefficient!

    Yes, I agree there is no progress in this discussion. You have shown a
    complete level of ignorance of the most basic operations - even the fact
    that 1 machine cycle does not match 1 machine instruction. You also
    have no idea what's involved in a context switch - much less fetching
    data from the O/S. And the operations as you describe them are so
    inefficient that no DBM developer with any brains would employ them.

    You really should get more experience in the low level operations and
    what's really required to do some of this work. I have. In fact I used
    to do a significant amount of assembler programming on the PC. This was
    following my assembler work on mainframes while working for IBM - which
    included both OS and DBM programming over the years.

    I agree. I'm wasting my time.
    > Thanks.
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    > MySQL User Forums: [url]http://forums.mysql.com/[/url]

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

Similar Threads

  1. #39533 [NEW]: ord() cast with (unsigned char), chr() with (signed char)
    By fred at cashette dot com in forum PHP Bugs
    Replies: 0
    Last Post: November 16th, 02:54 PM
  2. Good Practice? Set all Session vars in Application?
    By JakeFlynn in forum Macromedia ColdFusion
    Replies: 3
    Last Post: March 11th, 10:00 AM
  3. How to get length of string? length() problems
    By Mitchua in forum PERL Modules
    Replies: 5
    Last Post: July 17th, 12:08 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139