Professional Web Applications Themes

Should I specify the CHARACTER SET & COLLATE for UTF8 contents? - MySQL

Since even I use, e.g. CREATE DATABASE `test_ascii` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci; CREATE TABLE `table` ( `str` TEXT CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL ) ENGINE = innodb; I can store UTF8 in the DB, even better, I don't need to set names "utf8" in each connection, so it is recommended? Any drawbacks? Thanks....

  1. #1

    Default Should I specify the CHARACTER SET & COLLATE for UTF8 contents?

    Since even I use, e.g.

    CREATE DATABASE `test_ascii` DEFAULT CHARACTER SET latin1 COLLATE
    latin1_swedish_ci;

    CREATE TABLE `table` (
    `str` TEXT CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL
    ) ENGINE = innodb;


    I can store UTF8 in the DB, even better, I don't need to set names
    "utf8" in each
    connection, so it is recommended? Any drawbacks?

    Thanks.

    howa Guest

  2. #2

    Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?


    howa 寫道:
     

    any updates?

    howa Guest

  3. #3

    Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?

    "howa" <com> wrote: 


    Howa,

    I'm beginning to get tired of answering you the same question over
    and over again. SET NAMES foo is just shorthand for

    SET character_set_client = foo;
    SET character_set_results = foo;
    SET character_set_connection = foo;

    So if if you SET NAMES utf8 you tell the database "everything I send
    is utf8 encoded" and "please encode everything you give back to me
    in utf8".

    OTOH, if you declare a column to be latin1 encoded, MySQL does not
    care if you store latin1, latin5 or utf8 in there - for most
    operations. Of course sorting and comparing strings is affected by
    the collation, but INSERT and SELECT are not.

    By default, character_set_client is latin1. If you INSERT something
    in a latin1 column, no conversation takes place. If you send utf8,
    you will get back utf8 later. BUT - if you SET NAMES utf8 and then
    insert into a latin1 column, MySQL will convert your data to latin1.
    Of course this will only work for a certain subset of your input -
    those characters that are available in latin1.


    Lesson to be learned: always be true to your database about the
    encoding you use. If you don't, bad things may happen.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  4. #4

    Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?


    Axel Schwenke 寫道:
     
    >
    >
    > Howa,
    >
    > I'm beginning to get tired of answering you the same question over
    > and over again. SET NAMES foo is just shorthand for
    >
    > SET character_set_client = foo;
    > SET character_set_results = foo;
    > SET character_set_connection = foo;
    >
    > So if if you SET NAMES utf8 you tell the database "everything I send
    > is utf8 encoded" and "please encode everything you give back to me
    > in utf8".
    >
    > OTOH, if you declare a column to be latin1 encoded, MySQL does not
    > care if you store latin1, latin5 or utf8 in there - for most
    > operations. Of course sorting and comparing strings is affected by
    > the collation, but INSERT and SELECT are not.
    >
    > By default, character_set_client is latin1. If you INSERT something
    > in a latin1 column, no conversation takes place. If you send utf8,
    > you will get back utf8 later. BUT - if you SET NAMES utf8 and then
    > insert into a latin1 column, MySQL will convert your data to latin1.
    > Of course this will only work for a certain subset of your input -
    > those characters that are available in latin1.
    >
    >
    > Lesson to be learned: always be true to your database about the
    > encoding you use. If you don't, bad things may happen.
    >
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    > MySQL User Forums: http://forums.mysql.com/[/ref]

    ok, thanks for your support...

    my last 2 questions:

    1. if we designed some columns might need to store UTF8 character,
    should we define the whole DB as UTF8, or just define UTF8 in the
    column definition? Which one is recommeded? for performance, for
    reliablity?

    2. Suppose my DB is defined as latin, but a table is UTF8, how to
    mysqlimport from a UTF8 text file to the UTF8 table? Currently, I found
    NO WAY to do so unless I alter the DB character set to UTF8. Since
    mysqlimport will simply ignore the table definition of UTF8, it will
    just use the DB default character set, and it is confusing.

    Thanks again!

    howa Guest

  5. #5

    Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?

    "howa" <com> wrote: 

    The encoding/collation is an attribute of the column. Each string-type
    column has this attribute set when it is created. This attribute can
    only be changed with an explicite ALTER TABLE statement. The encoding
    setting of a column - and only this - determines how to store data in
    this column.

    The encoding/collation setting for table, database and server is just
    a fallback that is used if you create a column without specifying an
    encoding. You can change the encoding/collation setting of a table,
    a database or the whole server - and no data will actually change.

    The recommendation is to use an 8-bit encoding (and any collation)
    for server, database and table. So if you forget to specify the
    encoding for a new column, it will get a space-saving 8-bit encoding.
    However, if your application demands utf8 for *all* columns anyway,
    there is no problem to set the defaults to utf8.

    Just keep in mind that utf8 is expensive. Use it - for columns -
    only where neccessary. Setting just the table or database encoding
    to utf8 has no performance implication at all.
     

    If you use LOAD DATA INFILE regularly to load utf8 encoded data,
    this would be a good reason to set the database encoding to utf8.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  6. #6

    Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?


    Axel Schwenke 寫道:
     
    >
    > The encoding/collation is an attribute of the column. Each string-type
    > column has this attribute set when it is created. This attribute can
    > only be changed with an explicite ALTER TABLE statement. The encoding
    > setting of a column - and only this - determines how to store data in
    > this column.
    >
    > The encoding/collation setting for table, database and server is just
    > a fallback that is used if you create a column without specifying an
    > encoding. You can change the encoding/collation setting of a table,
    > a database or the whole server - and no data will actually change.
    >
    > The recommendation is to use an 8-bit encoding (and any collation)
    > for server, database and table. So if you forget to specify the
    > encoding for a new column, it will get a space-saving 8-bit encoding.
    > However, if your application demands utf8 for *all* columns anyway,
    > there is no problem to set the defaults to utf8.
    >
    > Just keep in mind that utf8 is expensive. Use it - for columns -
    > only where neccessary. Setting just the table or database encoding
    > to utf8 has no performance implication at all.

    >
    > If you use LOAD DATA INFILE regularly to load utf8 encoded data,
    > this would be a good reason to set the database encoding to utf8.
    >
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    > MySQL User Forums: http://forums.mysql.com/[/ref]

    Thanks so much!

    howa Guest

Similar Threads

  1. Set collate property in Adobe Reader
    By Douglas_Willis@adobeforums.com in forum Adobe Acrobat SDK
    Replies: 18
    Last Post: November 5th, 07:54 PM
  2. #40072 [NEW]: UTF8 file include output return unknow character.
    By plasmapermanent at msn dot com in forum PHP Bugs
    Replies: 2
    Last Post: January 9th, 11:51 AM
  3. Listing contents of a directory folder to contents ofdirectory
    By Rubex_Cube in forum Coldfusion - Advanced Techniques
    Replies: 2
    Last Post: April 11th, 10:23 PM
  4. Collate in InDesign CS
    By Mark_Gale@adobeforums.com in forum Adobe Indesign Windows
    Replies: 0
    Last Post: June 4th, 05:53 PM
  5. Collate
    By Jacco Schalkwijk in forum Microsoft SQL / MS SQL Server
    Replies: 0
    Last Post: July 11th, 11:23 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139