Globalization - MariaDB - Databases - Software - Computers


Globalization
Prev Next

Globalization

Table of Contents

Character Set Support
Character Sets and Collations in General
Character Sets and Collations in MySQL
Specifying Character Sets and Collations
Connection Character Sets and Collations
Configuring the Character Set and Collation for Applications
Character Set for Error Messages
Collation Issues
String Repertoire
Operations Affected by Character Set Support
Unicode Support
Upgrading from Previous to Current Unicode Support
UTF-8 for Metadata
Column Character Set Conversion
Character Sets and Collations That MariaDB Supports
Setting the Error Message Language
Adding a Character Set
Character Definition Arrays
String Collating Support for Complex Character Sets
Multi-Byte Character Support for Complex Character Sets
Adding a Collation to a Character Set
Collation Implementation Types
Choosing a Collation ID
Adding a Simple Collation to an 8-Bit Character Set
Adding a UCA Collation to a Unicode Character Set
Character Set Configuration
MySQL Server Time Zone Support
Staying Current with Time Zone Changes
Time Zone Leap Second Support
MySQL Server Locale Support

This chapter covers issues of globalization, which includes internationalization (MySQL's capabilities for adapting to local use) and localization (selecting particular local conventions):

Character Set Support

Character Sets and Collations in General
Character Sets and Collations in MySQL
Specifying Character Sets and Collations
Connection Character Sets and Collations
Configuring the Character Set and Collation for Applications
Character Set for Error Messages
Collation Issues
String Repertoire
Operations Affected by Character Set Support
Unicode Support
Upgrading from Previous to Current Unicode Support
UTF-8 for Metadata
Column Character Set Conversion
Character Sets and Collations That MariaDB Supports

MySQL includes character set support that enables you to store data using a variety of character sets and perform comparisons according to a variety of collations. You can specify character sets at the server, database, table, and column level. MariaDB supports the use of character sets for the MyISAM, MEMORY, and InnoDB storage engines.

This chapter discusses the following topics:

Character set issues affect not only data storage, but also communication between client programs and the MariaDB server. If you want the client program to communicate with the server using a character set different from the default, you'll need to indicate which one. For example, to use the utf8 Unicode character set, issue this statement after connecting to the server:

SET NAMES 'utf8';

For more information about configuring character sets for application use and character set-related issues in client/server communication, see , "Configuring the Character Set and Collation for Applications", and , "Connection Character Sets and Collations".

Character Sets and Collations in General

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

Suppose that we have an alphabet with four letters: "A", "B", "a", "b". We give each letter a number: "A" = 0, "B" = 1, "a" = 2, "b" = 3. The letter "A" is a symbol, the number 0 is the encoding for "A", and the combination of all four letters and their encodings is a character set.

Suppose that we want to compare two string values, "A" and "B". The simplest way to do this is to look at the encodings: 0 for "A" and 1 for "B". Because 0 is less than 1, we say "A" is less than "B". What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): "compare the encodings." We call this simplest of all possible collations a binary collation.

But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters "a" and "b" as equivalent to "A" and "B"; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.

In real life, most character sets have many characters: not just "A" and "B" but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an "accent" is a mark attached to a character as in German "Ö"), and for multiple-character mappings (such as the rule that "Ö" = "OE" in one of the two German collations).

MySQL can do these things for you:

In these respects, MariaDB is far ahead of most other database management systems. However, to use these features effectively, you need to know what character sets and collations are available, how to change the defaults, and how they affect the behavior of string operators and functions.

Character Sets and Collations in MariaDB

The MariaDB server can support multiple character sets. To list the available character sets, use the SHOW CHARACTER SET statement. A partial listing follows. For more complete information, see , "Character Sets and Collations That MariaDB Supports".

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
...

Any given character set always has at least one collation. It may have several collations. To list the collations for a character set, use the SHOW COLLATION statement. For example, to see the collations for the latin1 (cp1252 West European) character set, use this statement to find those collation names that begin with latin1:

mysql> SHOW COLLATION LIKE 'latin1%';
+---------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+---------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | | 0 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | | 0 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | | 0 |
| latin1_general_cs | latin1 | 49 | | | 0 |
| latin1_spanish_ci | latin1 | 94 | | | 0 |
+---------------------+---------+----+---------+----------+---------+

The latin1 collations have the following meanings.

Collation Meaning
latin1_german1_ci German DIN-1
latin1_swedish_ci Swedish/Finnish
latin1_danish_ci Danish/Norwegian
latin1_german2_ci German DIN-2
latin1_bin Binary according to latin1 encoding
latin1_general_ci Multilingual (Western European)
latin1_general_cs Multilingual (ISO Western European), case sensitive
latin1_spanish_ci Modern Spanish

Collations have these general characteristics:

In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect.

Collation-Charts.Org is a useful site for information that shows how one collation compares to another.

Specifying Character Sets and Collations

Server Character Set and Collation
Database Character Set and Collation
Table Character Set and Collation
Column Character Set and Collation
Character String Literal Character Set and Collation
National Character Set
Examples of Character Set and Collation Assignment
Compatibility with Other DBMSs

There are default settings for character sets and collations at four levels: server, database, table, and column. The description in the following sections may appear complex, but it has been found in practice that multiple-level defaulting leads to natural and obvious results.

CHARACTER SET is used in clauses that specify a character set. CHARSET can be used as a synonym for CHARACTER SET.

Character set issues affect not only data storage, but also communication between client programs and the MariaDB server. If you want the client program to communicate with the server using a character set different from the default, you'll need to indicate which one. For example, to use the utf8 Unicode character set, issue this statement after connecting to the server:

SET NAMES 'utf8';

For more information about character set-related issues in client/server communication, see , "Connection Character Sets and Collations".

Server Character Set and Collation

MySQL Server has a server character set and a server collation. These can be set at server startup on the command line or in an option file and changed at runtime.

Initially, the server character set and collation depend on the options that you use when you start mysqld. You can use --character-set-server for the character set. Along with it, you can add --collation-server for the collation. If you don't specify a character set, that is the same as saying --character-set-server=latin1. If you specify only a character set (for example, latin1) but not a collation, that is the same as saying --character-set-server=latin1 --collation-server=latin1_swedish_ci because latin1_swedish_ci is the default collation for latin1. Therefore, the following three commands all have the same effect:

shell> mysqld
shell> mysqld --character-set-server=latin1
shell> mysqld --character-set-server=latin1 \
 --collation-server=latin1_swedish_ci

One way to change the settings is by recompiling. To change the default server character set and collation when building from sources, use the DEFAULT_CHARSET and DEFAULT_COLLATION options for CMake. For example:

shell> cmake . -DDEFAULT_CHARSET=latin1

Or:

shell> cmake . -DDEFAULT_CHARSET=latin1 \
 -DDEFAULT_COLLATION=latin1_german1_ci

Both mysqld and CMake verify that the character set/collation combination is valid. If not, each program displays an error message and terminates.

The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose.

The current server character set and collation can be determined from the values of the character_set_server and collation_server system variables. These variables can be changed at runtime.

Database Character Set and Collation

Every database has a database character set and a database collation. The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE db_name
 [[DEFAULT] CHARACTER SET charset_name]
 [[DEFAULT] COLLATE collation_name]
ALTER DATABASE db_name
 [[DEFAULT] CHARACTER SET charset_name]
 [[DEFAULT] COLLATE collation_name]

The keyword SCHEMA can be used instead of DATABASE.

All database options are stored in a text file named db.opt that can be found in the database directory.

The CHARACTER SET and COLLATE clauses make it possible to create databases with different character sets and collations on the same MariaDB server.

Example:

CREATE DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci;

MySQL chooses the database character set and database collation in the following manner:

The database character set and collation are used as default values for table definitions if the table character set and collation are not specified in CREATE TABLE statements. The database character set also is used by LOAD DATA INFILE. The character set and collation have no other purposes.

The character set and collation for the default database can be determined from the values of the character_set_database and collation_database system variables. The server sets these variables whenever the default database changes. If there is no default database, the variables have the same value as the corresponding server-level system variables, character_set_server and collation_server.

Table Character Set and Collation

Every table has a table character set and a table collation. The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying the table character set and collation:

CREATE TABLE tbl_name (column_list)
 [[DEFAULT] CHARACTER SET charset_name]
 [COLLATE collation_name]]
ALTER TABLE tbl_name
 [[DEFAULT] CHARACTER SET charset_name]
 [COLLATE collation_name]

Example:

CREATE TABLE t1 ( ... )
CHARACTER SET latin1 COLLATE latin1_danish_ci;

MySQL chooses the table character set and collation in the following manner:

The table character set and collation are used as default values for column definitions if the column character set and collation are not specified in individual column definitions. The table character set and collation are MariaDB extensions; there are no such things in standard SQL.

Column Character Set and Collation

Every "character" column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and a column collation. Column definition syntax for CREATE TABLE and ALTER TABLE has optional clauses for specifying the column character set and collation:

col_name {CHAR | VARCHAR | TEXT} (col_length)
 [CHARACTER SET charset_name]
 [COLLATE collation_name]

These clauses can also be used for ENUM and SET columns:

col_name {ENUM | SET} (val_list)
 [CHARACTER SET charset_name]
 [COLLATE collation_name]

Examples:

CREATE TABLE t1
(
 col1 VARCHAR(5)
 CHARACTER SET latin1
 COLLATE latin1_german1_ci
);
ALTER TABLE t1 MODIFY
 col1 VARCHAR(5)
 CHARACTER SET latin1
 COLLATE latin1_swedish_ci;

MySQL chooses the column character set and collation in the following manner:

The CHARACTER SET and COLLATE clauses are standard SQL.

If you use ALTER TABLE to convert a column from one character set to another, MariaDB attempts to map the data values, but if the character sets are incompatible, there may be data loss.

Character String Literal Character Set and Collation

Every character string literal has a character set and a collation.

A character string literal may have an optional character set introducer and COLLATE clause:

[_charset_name]'string' [COLLATE collation_name]

Examples:

SELECT 'string';
SELECT _latin1'string';
SELECT _latin1'string' COLLATE latin1_danish_ci;

For the simple statement SELECT 'string', the string has the character set and collation defined by the character_set_connection and collation_connection system variables.

The _charset_name expression is formally called an introducer. It tells the parser, "the string that is about to follow uses character set X." Because this has confused people in the past, we emphasize that an introducer does not change the string to the introducer character set like CONVERT() would do. It does not change the string's value, although padding may occur. The introducer is just a signal. An introducer is also legal before standard hex literal and numeric hex literal notation (x'literal' and 0xnnnn), or before bit-field literal notation (b'literal' and 0bnnnn).

Examples:

SELECT _latin1 x'AABBCC';
SELECT _latin1 0xAABBCC;
SELECT _latin1 b'1100011';
SELECT _latin1 0b1100011;

MySQL determines a literal's character set and collation in the following manner:

Examples:

Character set introducers and the COLLATE clause are implemented according to standard SQL specifications.

An introducer indicates the character set for the following string, but does not change now how the parser performs escape processing within the string. Escapes are always interpreted by the parser according to the character set given by character_set_connection.

The following examples show that escape processing occurs using character_set_connection even in the presence of an introducer. The examples use SET NAMES (which changes character-set-connection, as discussed in , "Connection Character Sets and Collations"), and display the resulting strings using the HEX() function so that the exact string contents can be seen.

Example 1:

mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT HEX('à\n'), HEX(_sjis'à\n');
+------------+-----------------+
| HEX('à\n') | HEX(_sjis'à\n') |
+------------+-----------------+
| E00A | E00A |
+------------+-----------------+
1 row in set (0.00 sec)

Here, "à" (hex value E0) is followed by "\n", the escape sequence for newline. The escape sequence is interpreted using the character_set_connection value of latin1 to produce a literal newline (hex value 0A). This happens even for the second string. That is, the introducer of _sjis does not affect the parser's escape processing.

Example 2:

mysql> SET NAMES sjis;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT HEX('à\n'), HEX(_latin1'à\n');
+------------+-------------------+
| HEX('à\n') | HEX(_latin1'à\n') |
+------------+-------------------+
| E05C6E | E05C6E |
+------------+-------------------+
1 row in set (0.04 sec)

Here, character_set_connection is sjis, a character set in which the sequence of "à" followed by "\" (hex values 05 and 5C) is a valid multi-byte character. Hence, the first two bytes of the string are interpreted as a single sjis character, and the "\" is not interpreted as an escape character. The following "n" (hex value 6E) is not interpreted as part of an escape sequence. This is true even for the second string; the introducer of _latin1 does not affect escape processing.

National Character Set

Standard SQL defines NCHAR or NATIONAL CHAR as a way to indicate that a CHAR column should use some predefined character set. MariaDB 5.6 uses utf8 as this predefined character set. For example, these data type declarations are equivalent:

CHAR(10) CHARACTER SET utf8
NATIONAL CHARACTER(10)
NCHAR(10)

As are these:

VARCHAR(10) CHARACTER SET utf8
NATIONAL VARCHAR(10)
NCHAR VARCHAR(10)
NATIONAL CHARACTER VARYING(10)
NATIONAL CHAR VARYING(10)

You can use N'literal' (or n'literal') to create a string in the national character set. These statements are equivalent:

SELECT N'some text';
SELECT n'some text';
SELECT _utf8'some text';

For information on upgrading character sets to MariaDB 5.6 from versions prior to 4.1, see the MySQL 3.23, 4.0, 4.1 Reference Manual.

Examples of Character Set and Collation Assignment

The following examples show how MariaDB determines default character set and collation values.

Example 1: Table and Column Definition

CREATE TABLE t1
(
 c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;

Here we have a column with a latin1 character set and a latin1_german1_ci collation. The definition is explicit, so that is straightforward. Notice that there is no problem with storing a latin1 column in a latin2 table.

Example 2: Table and Column Definition

CREATE TABLE t1
(
 c1 CHAR(10) CHARACTER SET latin1
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

This time we have a column with a latin1 character set and a default collation. Although it might seem natural, the default collation is not taken from the table level. Instead, because the default collation for latin1 is always latin1_swedish_ci, column c1 has a collation of latin1_swedish_ci (not latin1_danish_ci).

Example 3: Table and Column Definition

CREATE TABLE t1
(
 c1 CHAR(10)
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

We have a column with a default character set and a default collation. In this circumstance, MariaDB checks the table level to determine the column character set and collation. Consequently, the character set for column c1 is latin1 and its collation is latin1_danish_ci.

Example 4: Database, Table, and Column Definition

CREATE DATABASE d1
 DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;
USE d1;
CREATE TABLE t1
(
 c1 CHAR(10)
);

We create a column without specifying its character set and collation. We're also not specifying a character set and a collation at the table level. In this circumstance, MariaDB checks the database level to determine the table settings, which thereafter become the column settings.) Consequently, the character set for column c1 is latin2 and its collation is latin2_czech_ci.

Compatibility with Other DBMSs

For MaxDB compatibility these two statements are the same:

CREATE TABLE t1 (f1 CHAR(N) UNICODE);
CREATE TABLE t1 (f1 CHAR(N) CHARACTER SET ucs2);

Connection Character Sets and Collations

Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:

Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.

A "connection" is what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets or error messages, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:

Clients can fine-tune the settings for these variables, or depend on the defaults (in which case, you can skip the rest of this section). If you do not use the defaults, you must change the character settings for each connection to the server.

Two statements affect the connection-related character set variables as a group:

Note

ucs2, utf16, utf16le, and utf32 cannot be used as a client character set, which means that they do not work for SET NAMES or SET CHARACTER SET.

The MariaDB client programs MariaDB, mysqladmin, mysqlcheck, mysqlimport, and mysqlshow determine the default character set to use as follows:

When a client connects to the server, it sends the name of the character set that it wants to use. The server uses the name to set the character_set_client, character-set-results, and character_set_connection system variables. In effect, the server performs a SET NAMES operation using the character set name.

With the mysql client, if you want to use a character set different from the default, you could explicitly execute SET NAMES every time you start up. However, to accomplish the same result more easily, you can add the --default-character-set option setting to your mysql command line or in your option file. For example, the following option file setting changes the three connection-related character set variables set to koi8r each time you invoke mysql:

[mysql]
default-character-set=koi8r

If you are using the mysql client with auto-reconnect enabled (which is not recommended), it is preferable to use the charset command rather than SET NAMES. For example:

mysql> charset utf8
Charset changed

The charset command issues a SET NAMES statement, and also changes the default character set that mysql uses when it reconnects after the connection has dropped.

Example: Suppose that column1 is defined as CHAR(5) CHARACTER SET latin2. If you do not say SET NAMES or SET CHARACTER SET, then for SELECT column1 FROM t, the server sends back all the values for column1 using the character set that the client specified when it connected. On the other hand, if you say SET NAMES 'latin1' or SET CHARACTER SET latin1 before issuing the SELECT statement, the server converts the latin2 values to latin1 just before sending results back. Conversion may be lossy if there are characters that are not in both character sets.

If you do not want the server to perform any conversion of result sets or error messages, set character_set_results to NULL or binary:

SET character_set_results = NULL;

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

You must also consider the environment within which your MariaDB applications execute. See , "Configuring the Character Set and Collation for Applications".

For more information about character sets and error messages, see , "Character Set for Error Messages".

Configuring the Character Set and Collation for Applications

For applications that store data using the default MariaDB character set and collation (latin1, latin1_swedish_ci), no special configuration should be needed. If applications require data storage using a different character set or collation, you can configure character set information several ways:

When different applications require different character settings, the per-database technique provides a good deal of flexibility. If most or all applications use the same character set, specifying character settings at server startup or configuration time may be most convenient.

For the per-database or server-startup techniques, the settings control the character set for data storage. Applications must also tell the server which character set to use for client/server communications, as described in the following instructions.

The examples shown here assume use of the utf8 character set and utf8_general_ci collation.

Specify character settings per database. To create a database such that its tables will use a given default character set and collation for data storage, use a CREATE DATABASE statement like this:

CREATE DATABASE mydb
 DEFAULT CHARACTER SET utf8
 DEFAULT COLLATE utf8_general_ci;

Tables created in the database will use utf8 and utf8_general_ci by default for any character columns.

Applications that use the database should also configure their connection to the server each time they connect. This can be done by executing a SET NAMES 'utf8' statement after connecting. The statement can be used regardless of connection method: The mysql client, PHP scripts, and so forth.

In some cases, it may be possible to configure the connection to use the desired character set some other way. For example, for connections made using mysql, you can specify the --default-character-set=utf8 command-line option to achieve the same effect as SET NAMES 'utf8'.

For more information about configuring client connections, see , "Connection Character Sets and Collations".

If you change the default character set or collation for a database, stored routines that use the database defaults must be dropped and recreated so that they use the new defaults. (In a stored routine, variables with character data types use the database defaults if the character set or collation are not specified explicitly. See , "CREATE PROCEDURE and CREATE FUNCTION Syntax".)

Specify character settings at server startup. To select a character set and collation at server startup, use the --character-set-server and --collation-server options. For example, to specify the options in an option file, include these lines:

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

These settings apply server-wide and apply as the defaults for databases created by any application, and for tables created in those databases.

It is still necessary for applications to configure their connection using SET NAMES or equivalent after they connect, as described previously. You might be tempted to start the server with the --init_connect='SET NAMES 'utf8'' option to cause SET NAMES to be executed automatically for each client that connects. However, this will yield inconsistent results because the init-connect value is not executed for users who have the SUPER privilege.

Specify character settings at MariaDB configuration time. To select a character set and collation when you configure and build MariaDB from source, use the DEFAULT_CHARSET and DEFAULT_COLLATION options for CMake:

shell> cmake . -DDEFAULT_CHARSET=utf8 \
 -DDEFAULT_COLLATION=utf8_general_ci

The resulting server uses utf8 and utf8_general_ci as the default for databases and tables and for client connections. It is unnecessary to use --character-set-server and --collation-server to specify those defaults at server startup. It is also unnecessary for applications to configure their connection using SET NAMES or equivalent after they connect to the server.

Regardless of how you configure the MariaDB character set for application use, you must also consider the environment within which those applications execute. If you will send statements using UTF-8 text taken from a file that you create in an editor, you should edit the file with the locale of your environment set to UTF-8 so that the file encoding is correct and so that the operating system handles it correctly. If you use the mysql client from within a terminal window, the window must be configured to use UTF-8 or characters may not display properly. For a script that executes in a Web environment, the script must handle character encoding properly for its interaction with the MariaDB server, and it must generate pages that correctly indicate the encoding so that browsers know how to display the content of the pages. For example, you can include this <meta> tag within your <head> element:

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

Character Set for Error Messages

This section describes how the server uses character sets for constructing error messages and returning them to clients. For information about the language of error messages (rather than the character set), see , "Setting the Error Message Language".

In MariaDB 5.6, the server constructs error messages using UTF-8 and returns them to clients in the character set specified by the character_set_results system variable.

The server constructs error messages as follows:

To return a message to the client after it has been constructed, the server converts it from UTF-8 to the character set specified by the character_set_results system variable. If character_set_results has a value of NULL or binary, no conversion occurs. No conversion occurs if the variable value is utf8, either, because that matches the original error message character set.

For characters that cannot be represented in character_set_results, some encoding may occur during the conversion. The encoding uses Unicode code point values:

Clients can set character_set_results to control the character set in which they receive error messages. The variable can be set directly, or indirectly by means such as SET NAMES. For more information about character_set_results, see , "Connection Character Sets and Collations".

The encoding that occurs during the conversion to character_set_results before returning error messages to clients can result in different message content compared to earlier versions (before MariaDB 5.5). For example, if an error occurs for an attempt to drop a table named (KATAKANA LETTER PE) and character_set_results is a character set such as latin1 that does not contain that character, the resulting message sent to the client has an encoded table name:

ERROR 1051 (42S02): Unknown table '\30DA'

Before MariaDB 5.5, the name is not encoded:

ERROR 1051 (42S02): Unknown table 'ペ'

Collation Issues

Collation Names
Using COLLATE in SQL Statements
COLLATE Clause Precedence
Collations Must Be for the Right Character Set
Collation of Expressions
The _bin and binary Collations
The BINARY Operator
Examples of the Effect of Collation
Collation and INFORMATION_SCHEMA Searches

The following sections discuss various aspects of character set collations.

Collation Names

MySQL collation names follow these rules:

Using COLLATE in SQL Statements

With the COLLATE clause, you can override whatever the default collation is for a comparison. COLLATE may be used in various parts of SQL statements. Here are some examples:

COLLATE Clause Precedence

The COLLATE clause has high precedence (higher than ||), so the following two expressions are equivalent:

x || y COLLATE z x || (y COLLATE z)

Collations Must Be for the Right Character Set

Each character set has one or more collations, but each collation is associated with one and only one character set. Therefore, the following statement causes an error message because the latin2_bin collation is not legal with the latin1 character set:

mysql> SELECT _latin1 'x' COLLATE latin2_bin;
ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1'

Collation of Expressions

In the great majority of statements, it is obvious what collation MariaDB uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column x:

SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;

However, with multiple operands, there can be ambiguity. For example:

SELECT x FROM T WHERE x = 'Y';

Should the comparison use the collation of the column x, or of the string literal 'Y'? Both x and 'Y' have collations, so which collation takes precedence?

Standard SQL resolves such questions using what used to be called "coercibility" rules. MariaDB assigns coercibility values as follows:

MySQL uses coercibility values with the following rules to resolve ambiguities:

Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a "subset" of Unicode. Because it is a well-known principle that "what applies to a superset can apply to a subset," we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.

Examples:

Comparison Collation Used
column1 = 'A' Use collation of column1
column1 = 'A' COLLATE x Use collation of 'A' COLLATE x
column1 COLLATE x = 'A' COLLATE y Error

The COERCIBILITY() function can be used to determine the coercibility of a string expression:

mysql> SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
 -> 0
mysql> SELECT COERCIBILITY(VERSION());
 -> 3
mysql> SELECT COERCIBILITY('A');
 -> 4

See , "Information Functions".

For implicit conversion of a numeric or temporal value to a string, such as occurs for the argument 1 in the expression CONCAT(1, 'abc'), the result is a character (nonbinary) string that has a character set and collation determined by the character_set_connection and collation_connection system variables. See , "Type Conversion in Expression Evaluation".

The _bin and binary Collations

This section describes how _bin collations for nonbinary strings differ from the binary "collation" for binary strings.

Nonbinary strings (as stored in the CHAR, VARCHAR, and TEXT data types) have a character set and collation. A given character set can have several collations, each of which defines a particular sorting and comparison order for the characters in the set. One of these is the binary collation for the character set, indicated by a _bin suffix in the collation name. For example, latin1 and utf8 have binary collations named latin1_bin and utf8_bin.

Binary strings (as stored in the BINARY, VARBINARY, and BLOB data types) have no character set or collation in the sense that nonbinary strings do. (Applied to a binary string, the CHARSET() and COLLATION() functions both return a value of binary.) Binary strings are sequences of bytes and the numeric values of those bytes determine sort order.

The _bin collations differ from the binary collation in several respects.

The unit for sorting and comparison. Binary strings are sequences of bytes. Sorting and comparison is always based on numeric byte values. Nonbinary strings are sequences of characters, which might be multi-byte. Collations for nonbinary strings define an ordering of the character values for sorting and comparison. For the _bin collation, this ordering is based solely on binary code values of the characters (which is similar to ordering for binary strings except that a _bin collation must take into account that a character might contain multiple bytes). For other collations, character ordering might take additional factors such as lettercase into account.

Character set conversion. A nonbinary string has a character set and is converted to another character set in many cases, even when the string has a _bin collation:

For binary string columns, no conversion occurs. For the preceding cases, the string value is copied byte-wise.

Lettercase conversion. Collations provide information about lettercase of characters, so characters in a nonbinary string can be converted from one lettercase to another, even for _bin collations that ignore lettercase for ordering:

mysql> SET NAMES latin1 COLLATE latin1_bin;
Query OK, 0 rows affected (0.02 sec)
mysql> SELECT LOWER('aA'), UPPER('zZ');
+-------------+-------------+
| LOWER('aA') | UPPER('zZ') |
+-------------+-------------+
| aa | ZZ |
+-------------+-------------+
1 row in set (0.13 sec)

The concept of lettercase does not apply to bytes in a binary string. To perform lettercase conversion, the string must be converted to a nonbinary string:

mysql> SET NAMES binary;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT LOWER('aA'), LOWER(CONVERT('aA' USING latin1));
+-------------+-----------------------------------+
| LOWER('aA') | LOWER(CONVERT('aA' USING latin1)) |
+-------------+-----------------------------------+
| aA | aa |
+-------------+-----------------------------------+
1 row in set (0.00 sec)

Trailing space handling in comparisons. Nonbinary strings have PADSPACE behavior for all collations, including _bin collations. Trailing spaces are insignificant in comparisons:

mysql> SET NAMES utf8 COLLATE utf8_bin;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 1 |
+------------+
1 row in set (0.00 sec)

For binary strings, all characters are significant in comparisons, including trailing spaces:

mysql> SET NAMES binary;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 0 |
+------------+
1 row in set (0.00 sec)

Trailing space handling for inserts and retrievals. CHAR(N) columns store nonbinary strings. Values shorter than N characters are extended with spaces on insertion. For retrieval, trailing spaces are removed.

BINARY(N) columns store binary strings. Values shorter than N bytes are extended with 0x00 bytes on insertion. For retrieval, nothing is removed; a value of the declared length is always returned.

mysql> CREATE TABLE t1 (
 ->  a CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,
 ->  b BINARY(10)
 -> );
Query OK, 0 rows affected (0.09 sec)
mysql> INSERT INTO t1 VALUES ('a','a');
Query OK, 1 row affected (0.01 sec)
mysql> SELECT HEX(a), HEX(b) FROM t1;
+--------+----------------------+
| HEX(a) | HEX(b) |
+--------+----------------------+
| 61 | 61000000000000000000 |
+--------+----------------------+
1 row in set (0.04 sec)

The BINARY Operator

The BINARY operator casts the string following it to a binary string. This is an easy way to force a comparison to be done byte by byte rather than character by character. BINARY also causes trailing spaces to be significant.

mysql> SELECT 'a' = 'A';
 -> 1
mysql> SELECT BINARY 'a' = 'A';
 -> 0
mysql> SELECT 'a' = 'a ';
 -> 1
mysql> SELECT BINARY 'a' = 'a ';
 -> 0

BINARY str is shorthand for CAST(str AS BINARY).

The BINARY attribute in character column definitions has a different effect. A character column defined with the BINARY attribute is assigned the binary collation of the column character set. Every character set has a binary collation. For example, the binary collation for the latin1 character set is latin1_bin, so if the table default character set is latin1, these two column definitions are equivalent:

CHAR(10) BINARY CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin

The effect of BINARY as a column attribute differs from its effect prior to MariaDB 4.1. Formerly, BINARY resulted in a column that was treated as a binary string. A binary string is a string of bytes that has no character set or collation, which differs from a nonbinary character string that has a binary collation. For both types of strings, comparisons are based on the numeric values of the string unit, but for nonbinary strings the unit is the character and some character sets support multi-byte characters. , "The BINARY and VARBINARY Types".

The use of CHARACTER SET binary in the definition of a CHAR, VARCHAR, or TEXT column causes the column to be treated as a binary data type. For example, the following pairs of definitions are equivalent:

CHAR(10) CHARACTER SET binary BINARY(10)
VARCHAR(10) CHARACTER SET binary VARBINARY(10)
TEXT CHARACTER SET binary BLOB

Examples of the Effect of Collation

Example 1: Sorting German Umlauts

Suppose that column X in table T has these latin1 column values:

Muffler Müller MX Systems MySQL

Suppose also that the column values are retrieved using the following statement:

SELECT X FROM T ORDER BY X COLLATE collation_name;

The following table shows the resulting order of the values if we use ORDER BY with different collations.

latin1_swedish_ci latin1_german1_ci latin1_german2_ci
Muffler Muffler Müller
MX Systems Müller Muffler
Müller MX Systems MX Systems
MySQL MySQL MySQL

The character that causes the different sort orders in this example is the U with two dots over it (ü), which the Germans call "U-umlaut."

Example 2: Searching for German Umlauts

Suppose that you have three tables that differ only by the character set and collation used:

mysql> SET NAMES utf8;
mysql> CREATE TABLE german1 (
 ->  c CHAR(10)
 -> ) CHARACTER SET latin1 COLLATE latin1_german1_ci;
mysql> CREATE TABLE german2 (
 ->  c CHAR(10)
 -> ) CHARACTER SET latin1 COLLATE latin1_german2_ci;
mysql> CREATE TABLE germanutf8 (
 ->  c CHAR(10)
 -> ) CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Each table contains two records:

mysql> INSERT INTO german1 VALUES ('Bar'), ('Bär');
mysql> INSERT INTO german2 VALUES ('Bar'), ('Bär');
mysql> INSERT INTO germanutf8 VALUES ('Bar'), ('Bär');

Two of the above collations have an A = Ä equality, and one has no such equality (latin1_german2_ci). For that reason, you'll get these results in comparisons:

mysql> SELECT * FROM german1 WHERE c = 'Bär';
+------+
| c |
+------+
| Bar |
| Bär |
+------+
mysql> SELECT * FROM german2 WHERE c = 'Bär';
+------+
| c |
+------+
| Bär |
+------+
mysql> SELECT * FROM germanutf8 WHERE c = 'Bär';
+------+
| c |
+------+
| Bar |
| Bär |
+------+

This is not a bug but rather a consequence of the sorting properties of latin1_german1_ci and utf8_unicode_ci (the sorting shown is done according to the German DIN 5007 standard).

Collation and INFORMATION_SCHEMA Searches

String columns in INFORMATION_SCHEMA tables have a collation of utf8_general_ci, which is case insensitive. However, searches in INFORMATION_SCHEMA string columns are also affected by file system case sensitivity. For values that correspond to objects that are represented in the file system, such as names of databases and tables, searches may be case sensitive if the file system is case sensitive. This section describes how to work around this issue if necessary; see also Bug #34921.

Suppose that a query searches the SCHEMATA.SCHEMA_NAME column for the test database. On Linux, file systems are case sensitive, so comparisons of SCHEMATA.SCHEMA_NAME with 'test' match, but comparisons with 'TEST' do not:

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME = 'test';
+-------------+
| SCHEMA_NAME |
+-------------+
| test |
+-------------+
1 row in set (0.01 sec)
mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME = 'TEST';
Empty set (0.00 sec)

On Windows or Mac OS X where file systems are not case sensitive, comparisons match both 'test' and 'TEST':

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME = 'test';
+-------------+
| SCHEMA_NAME |
+-------------+
| test |
+-------------+
1 row in set (0.00 sec)
mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME = 'TEST';
+-------------+
| SCHEMA_NAME |
+-------------+
| TEST |
+-------------+
1 row in set (0.00 sec)

The value of the lower_case_table_names system variable makes no difference in this context.

This behavior occurs because the utf8_general_ci collation is not used for INFORMATION_SCHEMA queries when searching the file system for database objects. It is a result of optimizations implemented for INFORMATION_SCHEMA searches in MySQL. For information about these optimizations, see , "Optimizing INFORMATION_SCHEMA Queries".

Searches in INFORMATION_SCHEMA string columns for values that refer to INFORMATION_SCHEMA itself do use the utf8_general_ci collation because INFORMATION_SCHEMA is a "virtual" database and is not represented in the file system. For example, comparisons with SCHEMATA.SCHEMA_NAME match 'information_schema' or 'INFORMATION_SCHEMA' regardless of platform:

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME = 'information_schema';
+--------------------+
| SCHEMA_NAME |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)
mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME = 'INFORMATION_SCHEMA';
+--------------------+
| SCHEMA_NAME |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)

If the result of a string operation on an INFORMATION_SCHEMA column differs from expectations, a workaround is to use an explicit COLLATE clause to force a suitable collation (, "Using COLLATE in SQL Statements"). For example, to perform a case-insensitive search, use COLLATE with the INFORMATION_SCHEMA column name:

mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME COLLATE utf8_general_ci = 'test';
+-------------+
| SCHEMA_NAME |
+-------------+
| test |
+-------------+
1 row in set (0.00 sec)
mysql> SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA
 -> WHERE SCHEMA_NAME COLLATE utf8_general_ci = 'TEST';
| SCHEMA_NAME |
+-------------+
| test |
+-------------+
1 row in set (0.00 sec)

You can also use the UPPER() or LOWER() function:

WHERE UPPER(SCHEMA_NAME) = 'TEST'
WHERE LOWER(SCHEMA_NAME) = 'test'

Although a case-insensitive comparison can be performed even on platforms with case-sensitive file systems, as just shown, it is not necessarily always the right thing to do. On such platforms, it is possible to have multiple objects with names that differ only in lettercase. For example, tables named city, CITY, and City can all exist simultaneously. Consider whether a search should match all such names or just one and write queries accordingly:

WHERE TABLE_NAME COLLATE utf8_bin = 'City'
WHERE TABLE_NAME COLLATE utf8_general_ci = 'city'
WHERE UPPER(TABLE_NAME) = 'CITY'
WHERE LOWER(TABLE_NAME) = 'city'

The first of those comparisons (with utf8_bin) is case sensitive; the others are not.

String Repertoire

The repertoire of a character set is the collection of characters in the set.

String expressions have a repertoire attribute, which can have two values:

The ASCII range is a subset of UNICODE range, so a string with ASCII repertoire can be converted safely without loss of information to the character set of any string with UNICODE repertoire or to a character set that is a superset of ASCII. (All MariaDB character sets are supersets of ASCII with the exception of swe7, which reuses some punctuation characters for Swedish accented characters.) The use of repertoire enables character set conversion in expressions for many cases where MariaDB would otherwise return an "illegal mix of collations" error.

The following discussion provides examples of expressions and their repertoires, and describes how the use of repertoire changes string expression evaluation:

Operations Affected by Character Set Support

Result Strings
CONVERT() and CAST()
SHOW Statements and INFORMATION_SCHEMA

This section describes operations that take character set information into account.

Result Strings

MySQL has many operators and functions that return a string. This section answers the question: What is the character set and collation of such a string?

For simple functions that take string input and return a string result as output, the output's character set and collation are the same as those of the principal input value. For example, UPPER(X) returns a string whose character string and collation are the same as that of X. The same applies for INSTR(), LCASE(), LOWER(), LTRIM(), MID(), REPEAT(), REPLACE(), REVERSE(), RIGHT(), RPAD(), RTRIM(), SOUNDEX(), SUBSTRING(), TRIM(), UCASE(), and UPPER().

Note: The REPLACE() function, unlike all other functions, always ignores the collation of the string input and performs a case-sensitive comparison.

If a string input or function result is a binary string, the string has no character set or collation. This can be checked by using the CHARSET() and COLLATION() functions, both of which return binary to indicate that their argument is a binary string:

mysql> SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a');
+---------------------+-----------------------+
| CHARSET(BINARY 'a') | COLLATION(BINARY 'a') |
+---------------------+-----------------------+
| binary | binary |
+---------------------+-----------------------+

For operations that combine multiple string inputs and return a single string output, the "aggregation rules" of standard SQL apply for determining the collation of the result:

For example, with CASE ... WHEN a THEN b WHEN b THEN c COLLATE X END, the resulting collation is X. The same applies for UNION, ||, CONCAT(), ELT(), GREATEST(), IF(), and LEAST().

For operations that convert to character data, the character set and collation of the strings that result from the operations are defined by the character_set_connection and collation_connection system variables. This applies only to CAST(), CONV(), FORMAT(), HEX(), and SPACE().

If you are uncertain about the character set or collation of the result returned by a string function, you can use the CHARSET() or COLLATION() function to find out:

mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER());
+----------------+-----------------+-------------------+
| USER() | CHARSET(USER()) | COLLATION(USER()) |
+----------------+-----------------+-------------------+
| test@localhost | utf8 | utf8_general_ci |
+----------------+-----------------+-------------------+

CONVERT() and CAST()

CONVERT() provides a way to convert data between different character sets. The syntax is:

CONVERT(expr USING transcoding_name)

In MySQL, transcoding names are the same as the corresponding character set names.

Examples:

SELECT CONVERT(_latin1'Müller' USING utf8);
INSERT INTO utf8table (utf8column)
 SELECT CONVERT(latin1field USING utf8) FROM latin1table;

CONVERT(... USING ...) is implemented according to the standard SQL specification.

You may also use CAST() to convert a string to a different character set. The syntax is:

CAST(character_string AS character_data_type CHARACTER SET charset_name)

Example:

SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8);

If you use CAST() without specifying CHARACTER SET, the resulting character set and collation are defined by the character_set_connection and collation_connection system variables. If you use CAST() with CHARACTER SET X, the resulting character set and collation are X and the default collation of X.

You may not use a COLLATE clause inside a CONVERT() or CAST() call, but you may use it outside. For example, CAST(... COLLATE ...) is illegal, but CAST(...) COLLATE ... is legal:

SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin;

SHOW Statements and INFORMATION_SCHEMA

Several SHOW statements provide additional character set information. These include SHOW CHARACTER SET, SHOW COLLATION, SHOW CREATE DATABASE, SHOW CREATE TABLE and SHOW COLUMNS. These statements are described here briefly. For more information, see , "SHOW Syntax".

INFORMATION_SCHEMA has several tables that contain information similar to that displayed by the SHOW statements. For example, the CHARACTER-SETS and COLLATIONS tables contain the information displayed by SHOW CHARACTER SET and SHOW COLLATION. See , INFORMATION_SCHEMA Tables.

The SHOW CHARACTER SET statement shows all available character sets. It takes an optional LIKE clause that indicates which character set names to match. For example:

mysql> SHOW CHARACTER SET LIKE 'latin%';
+---------+-----------------------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+-----------------------------+-------------------+--------+
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
+---------+-----------------------------+-------------------+--------+

The output from SHOW COLLATION includes all available character sets. It takes an optional LIKE clause that indicates which collation names to match. For example:

mysql> SHOW COLLATION LIKE 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | | 0 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 0 |
| latin1_danish_ci | latin1 | 15 | | | 0 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 0 |
| latin1_general_ci | latin1 | 48 | | | 0 |
| latin1_general_cs | latin1 | 49 | | | 0 |
| latin1_spanish_ci | latin1 | 94 | | | 0 |
+-------------------+---------+----+---------+----------+---------+

SHOW CREATE DATABASE displays the CREATE DATABASE statement that creates a given database:

mysql> SHOW CREATE DATABASE test;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+

If no COLLATE clause is shown, the default collation for the character set applies.

SHOW CREATE TABLE is similar, but displays the CREATE TABLE statement to create a given table. The column definitions indicate any character set specifications, and the table options include character set information.

The SHOW COLUMNS statement displays the collations of a table's columns when invoked as SHOW FULL COLUMNS. Columns with CHAR, VARCHAR, or TEXT data types have collations. Numeric and other noncharacter types have no collation (indicated by NULL as the Collation value). For example:

mysql> SHOW FULL COLUMNS FROM person\G
*************************** 1. row ***************************
 Field: id
 Type: smallint(5) unsigned
 Collation: NULL
 Null: NO
 Key: PRI
 Default: NULL
 Extra: auto_increment Privileges: select,insert,update,references
 Comment:
*************************** 2. row ***************************
 Field: name
 Type: char(60)
 Collation: latin1_swedish_ci
 Null: NO
 Key:
 Default:
 Extra:
Privileges: select,insert,update,references
 Comment:

The character set is not part of the display but is implied by the collation name.

Unicode Support

The ucs2 Character Set (UCS-2 Unicode Encoding)
The utf16 Character Set (UTF-16 Unicode Encoding)
The utf16le Character Set (UTF-16LE Unicode Encoding)
The utf32 Character Set (UTF-32 Unicode Encoding)
The utf8 Character Set (Three-Byte UTF-8 Unicode Encoding)
The utf8mb3 "Character Set" (Alias for utf8)
The utf8mb4 Character Set (Four-Byte UTF-8 Unicode Encoding)

The initial implementation of Unicode support (in MariaDB 4.1) included two character sets for storing Unicode data:

These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:

Characters not supported by the aforementioned character sets include supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.

As of MariaDB 5.5.3, Unicode support includes supplementary characters, which requires new character sets that have a broader range and therefore take more space. The following table shows a brief feature comparison of previous and current Unicode support.

Before MariaDB 5.5 MySQL 5.5 and up
All Unicode 3.0 characters All Unicode 5.0 characters
No supplementary characters With supplementary characters
ucs2 character set, BMP only No change
utf8 character set for up to three bytes, BMP only No change
New utf8mb4 character set for up to four bytes, BMP or supplemental
New utf16 character set, BMP or supplemental
New utf16le character set, BMP or supplemental (5.6.1 and up)
New utf32 character set, BMP or supplemental

These changes are upward compatible. If you want to use the new character sets, there are potential incompatibility issues for your applications; see , "Upgrading from Previous to Current Unicode Support". That section also describes how to convert tables from utf8 to the (four-byte) utf8mb4 character set, and what constraints may apply in doing so.

MySQL 5.6 supports these Unicode character sets:

ucs2 and utf8 support BMP characters. utf8mb4, utf16, utf16le, and utf32 support BMP and supplementary characters.

A similar set of collations is available for most Unicode character sets. For example, each has a Danish collation, the names of which are ucs2_danish_ci, utf16_danish_ci, utf32_danish_ci, utf8_danish_ci, and utf8mb4_danish_ci. The exception is utf16le, which has only two collations. All Unicode collations are listed at , "Unicode Character Sets", which also describes collation properties for supplementary characters.

Note that although many of the supplementary characters come from East Asian languages, what MariaDB 5.6 adds is support for more Japanese and Chinese characters in Unicode character sets, not support for new Japanese and Chinese character sets.

The MariaDB implementation of UCS-2, UTF-16, and UTF-32 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte order or a BOM. In such cases, conversion of values will need to be performed when transferring data between those systems and MySQL. The implementation of UTF-16LE is little-endian.

MySQL uses no BOM for UTF-8 values.

Client applications that need to communicate with the server using Unicode should set the client character set accordingly; for example, by issuing a SET NAMES 'utf8' statement. ucs2, utf16, utf16le, and utf32 cannot be used as a client character set, which means that they do not work for SET NAMES or SET CHARACTER SET. (See , "Connection Character Sets and Collations".)

The following sections provide additional detail on the Unicode character sets in MySQL.

The ucs2 Character Set (UCS-2 Unicode Encoding)

In UCS-2, every character is represented by a two-byte Unicode code with the most significant byte first. For example: LATIN CAPITAL LETTER A has the code 0x0041 and it is stored as a two-byte sequence: 0x00 0x41. CYRILLIC SMALL LETTER YERU (Unicode 0x044B) is stored as a two-byte sequence: 0x04 0x4B. For Unicode characters and their codes, please refer to the Unicode Home Page.

In MySQL, the ucs2 character set is a fixed-length 16-bit encoding for Unicode BMP characters.

The utf16 Character Set (UTF-16 Unicode Encoding)

The utf16 character set is the ucs2 character set with an extension that enables encoding of supplementary characters:

Because utf16 supports surrogates and ucs2 does not, there is a validity check that applies only in utf16: You cannot insert a top surrogate without a bottom surrogate, or vice versa. For example:

INSERT INTO t (ucs2_column) VALUES (0xd800); /* legal */
INSERT INTO t (utf16_column)VALUES (0xd800); /* illegal */

There is no validity check for characters that are technically valid but are not true Unicode (that is, characters that Unicode considers to be "unassigned code points" or "private use" characters or even "illegals" like 0xffff). For example, since U+F8FF is the Apple Logo, this is legal:

INSERT INTO t (utf16_column)VALUES (0xf8ff); /* legal */

Such characters cannot be expected to mean the same thing to everyone.

Because MariaDB must allow for the worst case (that one character requires four bytes) the maximum length of a utf16 column or index is only half of the maximum length for a ucs2 column or index. For example, in MariaDB 5.6, the maximum length of a MEMORY table index key is 3072 bytes, so these statements create tables with the longest permitted indexes for ucs2 and utf16 columns:

CREATE TABLE tf (s1 VARCHAR(1536) CHARACTER SET ucs2) ENGINE=MEMORY;
CREATE INDEX i ON tf (s1);
CREATE TABLE tg (s1 VARCHAR(768) CHARACTER SET utf16) ENGINE=MEMORY;
CREATE INDEX i ON tg (s1);

The utf16le Character Set (UTF-16LE Unicode Encoding)

This is the same as utf16 but is little-endian rather than big-endian.

The utf32 Character Set (UTF-32 Unicode Encoding)

The utf32 character set is fixed length (like ucs2 and unlike utf16). utf32 uses 32 bits for every character, unlike ucs2 (which uses 16 bits for every character), and unlike utf16 (which uses 16 bits for some characters and 32 bits for others).

utf32 takes twice as much space as ucs2 and more space than utf16, but utf32 has the same advantage as ucs2 that it is predictable for storage: The required number of bytes for utf32 equals the number of characters times 4. Also, unlike utf16, there are no tricks for encoding in utf32, so the stored value equals the code value.

To demonstrate how the latter advantage is useful, here is an example that shows how to determine a utf8mb4 value given the utf32 code value:

/* Assume code value = 100cc LINEAR B WHEELED CHARIOT */
CREATE TABLE tmp (utf32_col CHAR(1) CHARACTER SET utf32,
 utf8mb4_col CHAR(1) CHARACTER SET utf8mb4);
INSERT INTO tmp VALUES (0x000100cc,NULL);
UPDATE tmp SET utf8mb4_col = utf32_col;
SELECT HEX(utf32_col),HEX(utf8mb4_col) FROM tmp;

MySQL is very forgiving about additions of unassigned Unicode characters or private-use-area characters. There is in fact only one validity check for utf32: No code value may be greater than 0x10ffff. For example, this is illegal:

INSERT INTO t (utf32_column) VALUES (0x110000); /* illegal */

The utf8 Character Set (Three-Byte UTF-8 Unicode Encoding)

UTF-8 (Unicode Transformation Format with 8-bit units) is an alternative way to store Unicode data. It is implemented according to RFC 3629, which describes encoding sequences that take from one to four bytes. (An older standard for UTF-8 encoding, RFC 2279, describes UTF-8 sequences that take from one to six bytes. RFC 3629 renders RFC 2279 obsolete; for this reason, sequences with five and six bytes are no longer used.)

The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different lengths:

The utf8 character set is the same in MariaDB 5.6 as before 5.6 and has exactly the same characteristics:

Exactly the same set of characters is available in utf8 as in ucs2. That is, they have the same repertoire.

The utf8mb3 "Character Set" (Alias for utf8)

In a future version of MySQL, it is possible that utf8 will become the 4-byte utf8, and that users who want to indicate 3-byte utf8 will have to say utf8mb3. To avoid some future problems which might occur with replication when master and slave servers have different MariaDB versions, it is possible for users to specify utf8mb3 in CHARACTER SET clauses, and utf8mb3_collation_substring in COLLATE clauses, where collation_substring is bin, czech_ci, danish_ci, esperanto_ci, estonian_ci, and so forth. For example:

CREATE TABLE t (s1 CHAR(1) CHARACTER SET utf8mb3;
SELECT * FROM t WHERE s1 COLLATE utf8mb3_general_ci = 'x';
DECLARE x VARCHAR(5) CHARACTER SET utf8mb3 COLLATE utf8mb3_danish_ci;
SELECT CAST('a' AS CHAR CHARACTER SET utf8) COLLATE utf8_czech_ci;

MySQL immediately converts instances of utf8mb3 in an alias to utf8, so in statements such as SHOW CREATE TABLE or SELECT CHARACTER_SET_NAME FROM INFORMATION_SCHEMA.COLUMNS or SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS, users will see the true name, utf8 or utf8_collation_substring.

The utf8mb3 alias is valid only in CHARACTER SET clauses, and in certain other places. For example, these are legal:

mysqld --character-set-server=utf8mb3
SET NAMES 'utf8mb3'; /* and other SET statements that have similar effect */
SELECT _utf8mb3 'a';

There is no utf8mb3 alias to the corresponding utf8 collation for collation names that include a version number (for example, utf8_unicode_520_ci) to indicate the Unicode Collation Algorithm version on which the collation is based.

The utf8mb4 Character Set (Four-Byte UTF-8 Unicode Encoding)

The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. The utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

utf8mb4 is a superset of utf8, so for an operation such as the following concatenation, the result has character set utf8mb4 and the collation of utf8mb4_col:

SELECT CONCAT(utf8_col, utf8mb4_col);

Similarly, the following comparison in the WHERE clause works according to the collation of utf8mb_col:

SELECT * FROM utf8_tbl, utf8mb4_tbl WHERE utf8_tbl.utf8_col = utf8mb4_tbl.utf8mb4_col;
Tip

To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MariaDB must reserve three (or four) bytes for each character in a CHAR CHARACTER SET utf8 (or utf8mb4) column because that is the maximum possible length. For example, MariaDB must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column.

Upgrading from Previous to Current Unicode Support

This section describes issues pertaining to Unicode support that you may face when upgrading to MariaDB 5.6 from an older MariaDB release. It also provides guidelines for downgrading from MariaDB 5.6 to an older release.

In most respects, upgrading to MariaDB 5.6 should present few problems with regard to Unicode usage, although there are some potential areas of incompatibility. These are the primary areas of concern:

Consequently, if you want to upgrade tables from utf8 to utf8mb4 to take advantage of supplementary-character support, it may be necessary to change some column or index definitions.

Tables can be converted from utf8 to utf8mb4 by using ALTER TABLE. Suppose that a table was originally defined as follows:

CREATE TABLE t1 (
 col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
 col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
) CHARACTER SET utf8;

The following statement converts t1 to use utf8mb4:

ALTER TABLE t1
 DEFAULT CHARACTER SET utf8mb4,
 MODIFY col1 CHAR(10)
 CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
 MODIFY col2 CHAR(10)
 CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;

In terms of table content, conversion from utf8 to utf8mb4 presents no problems:

In terms of table structure, the catch when converting from utf8 to utf8mb4 is that the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters because the maximum length of a character is four bytes instead of three. For the CHAR, VARCHAR, and TEXT data types, watch for these things when converting your MariaDB tables:

If the preceding conditions apply, you must either reduce the defined length of columns or indexes, or continue to use utf8 rather than utf8mb4.

Here are some examples where structural changes may be needed:

The preceding types of changes are most likely to be required only if you have very long columns or indexes. Otherwise, you should be able to convert your tables from utf8 to utf8mb4 without problems. You can do this by using ALTER TABLE as described earlier in this section after upgrading in place to 5.6.

The following items summarize other potential areas of incompatibility:

If you have upgraded to MariaDB 5.6, and then decide to downgrade back to an older release, these considerations apply:

UTF-8 for Metadata

Metadata is "the data about the data." Anything that describes the database-as opposed to being the contents of the database-is metadata. Thus column names, database names, user names, version names, and most of the string results from SHOW are metadata. This is also true of the contents of tables in INFORMATION_SCHEMA because those tables by definition contain information about database objects.

Representation of metadata must satisfy these requirements:

To satisfy both requirements, MariaDB stores metadata in a Unicode character set, namely UTF-8. This does not cause any disruption if you never use accented or non-Latin characters. But if you do, you should be aware that metadata is in UTF-8.

The metadata requirements mean that the return values of the USER(), CURRENT_USER(), SESSION_USER(), SYSTEM-USER(), DATABASE(), and VERSION() functions have the UTF-8 character set by default.

The server sets the character_set_system system variable to the name of the metadata character set:

mysql> SHOW VARIABLES LIKE 'character_set_system';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_system | utf8 |
+----------------------+-------+

Storage of metadata using Unicode does not mean that the server returns headers of columns and the results of DESCRIBE functions in the character_set_system character set by default. When you use SELECT column1 FROM t, the name column1 itself is returned from the server to the client in the character set determined by the value of the character_set_results system variable, which has a default value of latin1. If you want the server to pass metadata results back in a different character set, use the SET NAMES statement to force the server to perform character set conversion. SET NAMES sets the character_set_results and other related system variables. (See , "Connection Character Sets and Collations".) Alternatively, a client program can perform the conversion after receiving the result from the server. It is more efficient for the client to perform the conversion, but this option is not always available for all clients.

If character_set_results is set to NULL, no conversion is performed and the server returns metadata using its original character set (the set indicated by character_set_system).

Error messages returned from the server to the client are converted to the client character set automatically, as with metadata.

If you are using (for example) the USER() function for comparison or assignment within a single statement, don't worry. MariaDB performs some automatic conversion for you.

SELECT * FROM t1 WHERE USER() = latin1_column;

This works because the contents of latin1_column are automatically converted to UTF-8 before the comparison.

INSERT INTO t1 (latin1_column) SELECT USER();

This works because the contents of USER() are automatically converted to latin1 before the assignment.

Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a "subset" of Unicode. Because it is a well-known principle that "what applies to a superset can apply to a subset," we believe that a collation for Unicode can apply for comparisons with non-Unicode strings. For more information about coercion of strings, see , "Collation of Expressions".

Column Character Set Conversion

To convert a binary or nonbinary string column to use a particular character set, use ALTER TABLE. For successful conversion to occur, one of the following conditions must apply:

Suppose that a table t has a binary column named col1 defined as VARBINARY(50). Assuming that the information in the column is encoded using a single character set, you can convert it to a nonbinary column that has that character set. For example, if col1 contains binary data representing characters in the greek character set, you can convert it as follows:

ALTER TABLE t MODIFY col1 VARCHAR(50) CHARACTER SET greek;

If your original column has a type of BINARY(50), you could convert it to CHAR(50), but the resulting values will be padded with 0x00 bytes at the end, which may be undesirable. To remove these bytes, use the TRIM() function:

UPDATE t SET col1 = TRIM(TRAILING 0x00 FROM col1);

Suppose that table t has a nonbinary column named col1 defined as CHAR(50) CHARACTER SET latin1 but you want to convert it to use utf8 so that you can store values from many languages. The following statement accomplishes this:

ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;

Conversion may be lossy if the column contains characters that are not in both character sets.

A special case occurs if you have old tables from before MariaDB where a nonbinary column contains values that actually are encoded in a character set different from the server's default character set. For example, an application might have stored sjis values in a column, even though MySQL's default character set was latin1. It is possible to convert the column to use the proper character set but an additional step is required. Suppose that the server's default character set was latin1 and col1 is defined as CHAR(50) but its contents are sjis values. The first step is to convert the column to a binary data type, which removes the existing character set information without performing any character conversion:

ALTER TABLE t MODIFY col1 BLOB;

The next step is to convert the column to a nonbinary data type with the proper character set:

ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET sjis;

This procedure requires that the table not have been modified already with statements such as INSERT or UPDATE after an upgrade to MariaDB or later. In that case, MariaDB would store new values in the column using latin1, and the column will contain a mix of sjis and latin1 values and cannot be converted properly.

If you specified attributes when creating a column initially, you should also specify them when altering the table with ALTER TABLE. For example, if you specified NOT NULL and an explicit DEFAULT value, you should also provide them in the ALTER TABLE statement. Otherwise, the resulting column definition will not include those attributes.

Character Sets and Collations That MariaDB Supports

Unicode Character Sets
West European Character Sets
Central European Character Sets
South European and Middle East Character Sets
Baltic Character Sets
Cyrillic Character Sets
Asian Character Sets

MySQL supports 70+ collations for 30+ character sets. This section indicates which character sets MariaDB supports. There is one subsection for each group of related character sets. For each character set, the permissible collations are listed.

You can always list the available character sets and their default collations with the SHOW CHARACTER SET statement:

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+
| Charset | Description | Default collation |
+----------+-----------------------------+---------------------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci |
| dec8 | DEC West European | dec8_swedish_ci |
| cp850 | DOS West European | cp850_general_ci |
| hp8 | HP West European | hp8_english_ci |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci |
| latin1 | cp1252 West European | latin1_swedish_ci |
| latin2 | ISO 8859-2 Central European | latin2_general_ci |
| swe7 | 7bit Swedish | swe7_swedish_ci |
| ascii | US ASCII | ascii_general_ci |
| ujis | EUC-JP Japanese | ujis_japanese_ci |
| sjis | Shift-JIS Japanese | sjis_japanese_ci |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci |
| tis620 | TIS620 Thai | tis620_thai_ci |
| euckr | EUC-KR Korean | euckr_korean_ci |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci |
| greek | ISO 8859-7 Greek | greek_general_ci |
| cp1250 | Windows Central European | cp1250_general_ci |
| gbk | GBK Simplified Chinese | gbk_chinese_ci |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci |
| utf8 | UTF-8 Unicode | utf8_general_ci |
| ucs2 | UCS-2 Unicode | ucs2_general_ci |
| cp866 | DOS Russian | cp866_general_ci |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci |
| macce | Mac Central European | macce_general_ci |
| macroman | Mac West European | macroman_general_ci |
| cp852 | DOS Central European | cp852_general_ci |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |
| cp1251 | Windows Cyrillic | cp1251_general_ci |
| utf16 | UTF-16 Unicode | utf16_general_ci |
| utf16le | UTF-16LE Unicode | utf16le_general_ci |
| cp1256 | Windows Arabic | cp1256_general_ci |
| cp1257 | Windows Baltic | cp1257_general_ci |
| utf32 | UTF-32 Unicode | utf32_general_ci |
| binary | Binary pseudo charset | binary |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci |
+----------+-----------------------------+---------------------+

In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect.

Collation-Charts.Org is a useful site for information that shows how one collation compares to another.

Unicode Character Sets

MySQL 5.6 supports these Unicode character sets:

ucs2 and utf8 support Basic Multilingual Plane (BMP) characters. utf8mb4, utf16, utf16le, and utf32 support BMP and supplementary characters. utf16le was added in MariaDB 5.6.1.

You can store text in about 650 languages using these character sets. This section lists the collations available for each Unicode character set and describes their differentiating properties. For general information about the character sets, see , "Unicode Support".

A similar set of collations is available for most Unicode character sets. These are shown in the following list, where xxx represents the character set name. For example, xxx_danish_ci represents the Danish collations, the specific names of which are ucs2_danish_ci, utf16_danish_ci, utf32_danish_ci, utf8_danish_ci, and utf8mb4_danish_ci.

Collation support for utf16le is more limited. The only collations available are utf16le_general_ci and utf16le_bin. These are similar to utf16_general_ci and utf16_bin.

Unicode collation names may also include a version number (for example, xxx_unicode_520_ci) to indicate the Unicode Collation Algorithm version on which the collation is based, as described later in this section. For such collations, there is no utf8mb3 alias to the corresponding utf8 collation. See , "The utf8mb3 "Character Set" (Alias for utf8)".

MySQL implements the xxx_unicode_ci collations according to the Unicode Collation Algorithm (UCA) described at http://www.unicode.org/reports/tr10/. The collation uses the version-4.0.0 UCA weight keys: http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt. Currently, the xxx_unicode_ci collations have only partial support for the Unicode Collation Algorithm. Some characters are not supported yet. Also, combining marks are not fully supported. This affects primarily Vietnamese, Yoruba, and some smaller languages such as Navajo.

MySQL implements language-specific Unicode collations only if the ordering with xxx_unicode_ci does not work well for a language. Language-specific collations are UCA-based. They are derived from xxx_unicode_ci with additional language tailoring rules.

Collations based on UCA versions later than 4.0.0 include the version in the collation name. Thus, xxx_unicode_520_ci collations are based on UCA 5.2.0 weight keys: http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt.

LOWER() and UPPER() perform case folding according to the collation of their argument. A character that has uppercase and lowercase versions only in a Unicode version more recent than 4.0.0 will be converted by these functions only if the argument has a collation that uses a recent enough UCA version.

For any Unicode character set, operations performed using the xxx_general_ci collation are faster than those for the xxx_unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages "ß" is equal to "ss". utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see , "Examples of the Effect of Collation"):

Ä = A
Ö = O
Ü = U

A difference between the collations is that this is true for utf8_general_ci:

ß = s

Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):

ß = ss

MySQL implements language-specific collations for the utf8 character set only if the ordering with utf8_unicode_ci does not work well for a language. For example, utf8_unicode_ci works fine for German dictionary order and French, so there is no need to create special utf8 collations.

utf8_general_ci also is satisfactory for both German and French, except that "ß" is equal to "s", and not to "ss". If this is acceptable for your application, you should use utf8_general_ci because it is faster. If this is not acceptable (for example, if you require German dictionary order), use utf8_unicode_ci because it is more accurate.

If you require German DIN-2 (phone book) ordering, use the utf8_german2_ci collation, which compares the following sets of characters equal:

Ä = Æ = AE
Ö = Œ = OE
Ü = UE
ß = ss

utf8_german2_ci is similar to latin1_german2_ci, but the latter does not compare "Æ" equal to "AE" or "Œ" equal to "OE". There is no utf8_german_ci corresponding to latin1_german_ci for German dictionary order because utf8_general_ci suffices.

xxx_swedish_ci includes Swedish rules. For example, in Swedish, the following relationship holds, which is not something expected by a German or French speaker:

Ü = Y < Ö

The xxx_spanish_ci and xxx_spanish2_ci collations correspond to modern Spanish and traditional Spanish, respectively. In both collations, "ñ" (n-tilde) is a separate letter between "n" and "o". In addition, for traditional Spanish, "ch" is a separate letter between "c" and "d", and "ll" is a separate letter between "l" and "m"

In the xxx_roman_ci collations, I and J compare as equal, and U and V compare as equal.

The xxx_croatian_ci collations are tailored for these Croatian letters: Č, Ć, , Đ, Lj, Nj, Š, Ž.

For all Unicode collations except the "binary" (xxx_bin) collations, MariaDB performs a table lookup to find a character's collating weight. This weight can be displayed using the WEIGHT-STRING() function. (See , "String Functions".) If a character is not in the table (for example, because it is a "new" character), collating weight determination becomes more complex:

The utf16_bin Collation

There is a difference between "ordering by the character's code value" and "ordering by the character's binary representation," a difference that appears only with utf16_bin, because of surrogates.

Suppose that utf16_bin (the binary collation for utf16) was a binary comparison "byte by byte" rather than "character by character." If that were so, the order of characters in utf16_bin would differ from the order in utf8_bin. For example, the following chart shows two rare characters. The first character is in the range E000-FFFF, so it is greater than a surrogate but less than a supplementary. The second character is a supplementary.

Code point Character utf8 utf16
---------- --------- ---- -----
0FF9D HALFWIDTH KATAKANA LETTER N EF BE 9D FF 9D
10384 UGARITIC LETTER DELTA F0 90 8E 84 D8 00 DF 84

The two characters in the chart are in order by code point value because 0xff9d < 0x10384. And they are in order by utf8 value because 0xef < 0xf0. But they are not in order by utf16 value, if we use byte-by-byte comparison, because 0xff > 0xd8.

So MySQL's utf16_bin collation is not "byte by byte." It is "by code point." When MariaDB sees a supplementary-character encoding in utf16, it converts to the character's code-point value, and then compares. Therefore, utf8_bin and utf16_bin are the same ordering. This is consistent with the SQL:2008 standard requirement for a UCS_BASIC collation: "UCS_BASIC is a collation in which the ordering is determined entirely by the Unicode scalar values of the characters in the strings being sorted. It is applicable to the UCS character repertoire. Since every character repertoire is a subset of the UCS repertoire, the UCS_BASIC collation is potentially applicable to every character set. NOTE 11: The Unicode scalar value of a character is its code point treated as an unsigned integer."

If the character set is ucs2, comparison is byte-by-byte, but ucs2 strings should not contain surrogates, anyway.

For additional information about Unicode collations in MySQL, see Collation-Charts.Org (utf8).

West European Character Sets

Western European character sets cover most West European languages, such as French, Spanish, Catalan, Basque, Portuguese, Italian, Albanian, Dutch, German, Danish, Swedish, Norwegian, Finnish, Faroese, Icelandic, Irish, Scottish, and English.

For additional information about Western European collations in MySQL, see Collation-Charts.Org (ascii, cp850, dec8, hp8, latin1, macroman, swe7).

Central European Character Sets

MySQL provides some support for character sets used in the Czech Republic, Slovakia, Hungary, Romania, Slovenia, Croatia, Poland, and Serbia (Latin).

For additional information about Central European collations in MySQL, see Collation-Charts.Org (cp1250, cp852, keybcs2, latin2, macce).

South European and Middle East Character Sets

South European and Middle Eastern character sets supported by MariaDB include Armenian, Arabic, Georgian, Greek, Hebrew, and Turkish.

For additional information about South European and Middle Eastern collations in MySQL, see Collation-Charts.Org (armscii8, cp1256, geostd8, greek, hebrew, latin5).

Baltic Character Sets

The Baltic character sets cover Estonian, Latvian, and Lithuanian languages.

For additional information about Baltic collations in MySQL, see Collation-Charts.Org (cp1257, latin7).

Cyrillic Character Sets

The Cyrillic character sets and collations are for use with Belarusian, Bulgarian, Russian, Ukrainian, and Serbian (Cyrillic) languages.

For additional information about Cyrillic collations in MySQL, see Collation-Charts.Org (cp1251, cp866, koi8r, koi8u). ).

Asian Character Sets

The cp932 Character Set

The Asian character sets that we support include Chinese, Japanese, Korean, and Thai. These can be complicated. For example, the Chinese sets must allow for thousands of different characters. See , "The cp932 Character Set", for additional information about the cp932 and sjis character sets.

For answers to some common questions and problems relating support for Asian character sets in MySQL, see "MySQL 5.6 FAQ: MariaDB Chinese, Japanese, and Korean Character Sets".

The big5_chinese_ci collation sorts on number of strokes.

For additional information about Asian collations in MySQL, see Collation-Charts.Org (big5, cp932, eucjpms, euckr, gb2312, gbk, sjis, tis620, ujis).

The cp932 Character Set

Why is cp932 needed?

In MySQL, the sjis character set corresponds to the Shift_JIS character set defined by IANA, which supports JIS X0201 and JIS X0208 characters. (See http://www.iana.org/assignments/character-sets.)

However, the meaning of "SHIFT JIS" as a descriptive term has become very vague and it often includes the extensions to Shift_JIS that are defined by various vendors.

For example, "SHIFT JIS" used in Japanese Windows environments is a Microsoft extension of Shift_JIS and its exact name is Microsoft Windows Codepage : 932 or cp932. In addition to the characters supported by Shift_JIS, cp932 supports extension characters such as NEC special characters, NEC selected-IBM extended characters, and IBM selected characters.

Many Japanese users have experienced problems using these extension characters. These problems stem from the following factors:

The MariaDB cp932 character set is designed to solve these problems.

Because MariaDB supports character set conversion, it is important to separate IANA Shift_JIS and cp932 into two different character sets because they provide different conversion rules.

How does cp932 differ from sjis?

The cp932 character set differs from sjis in the following ways:

For some characters, conversion to and from ucs2 is different for sjis and cp932. The following tables illustrate these differences.

Conversion to ucs2:

sjis/cp932 Value sjis -> ucs2 Conversion cp932 -> ucs2 Conversion
5C 005C 005C
7E 007E 007E
815C 2015 2015
815F 005C FF3C
8160 301C FF5E
8161 2016 2225
817C 2212 FF0D
8191 00A2 FFE0
8192 00A3 FFE1
81CA 00AC FFE2

Conversion from ucs2:

ucs2 value ucs2 -> sjis Conversion ucs2 -> cp932 Conversion
005C 815F 5C
007E 7E 7E
00A2 8191 3F
00A3 8192 3F
00AC 81CA 3F
2015 815C 815C
2016 8161 3F
2212 817C 3F
2225 3F 8161
301C 8160 3F
FF0D 3F 817C
FF3C 3F 815F
FF5E 3F 8160
FFE0 3F 8191
FFE1 3F 8192
FFE2 3F 81CA

Users of any Japanese character sets should be aware that using --character-set-client-handshake (or --skip-character-set-client-handshake) has an important effect. See , "Server Command Options".

Setting the Error Message Language

By default, mysqld produces error messages in English, but they can also be displayed in any of several other languages: Czech, Danish, Dutch, Estonian, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, or Swedish.

You can select which language the server uses for error messages using the instructions in this section.

In MariaDB 5.6, the server searches for the error message file in two locations:

The lc-messages-dir system variable has only a global value and is read only. lc_messages has global and session values and can be modified at runtime, so the error message language can be changed while the server is running, and individual clients each can have a different error message language by changing their session lc_messages value to a different locale name. For example, if the server is using the fr_FR locale for error messages, a client can execute this statement to receive error messages in English:

mysql> SET lc_messages = 'en_US';

By default, the language files are located in the share/mysql/LANGUAGE directory under the MariaDB base directory.

For information about changing the character set for error messages (rather than the language), see , "Character Set for Error Messages".

You can change the content of the error messages produced by the server using the instructions in the MariaDB Internals manual, available at http://forge.mysql.com/wiki/MySQL_Internals_Error_Messages. If you do change the content of error messages, remember to repeat your changes after each upgrade to a newer version of MySQL.

Adding a Character Set

Character Definition Arrays
String Collating Support for Complex Character Sets
Multi-Byte Character Support for Complex Character Sets

This section discusses the procedure for adding a character set to MariaDB. The proper procedure depends on whether the character set is simple or complex:

For example, greek and swe7 are simple character sets, whereas big5 and czech are complex character sets.

To use the following instructions, you must have a MariaDB source distribution. In the instructions, MYSET represents the name of the character set that you want to add.

  1. Add a <charset> element for MYSET to the sql/share/charsets/Index.xml file. Use the existing contents in the file as a guide to adding new contents. A partial listing for the latin1 <charset> element follows:

    <charset name='latin1'>
     <family>Western</family>
     <description>cp1252 West European</description>
     ...
     <collation name='latin1_swedish_ci' id='8' order='Finnish, Swedish'>
     <flag>primary</flag>
     <flag>compiled</flag>
     </collation>
     <collation name='latin1_danish_ci' id='15' order='Danish'/>
     ...
     <collation name='latin1_bin' id='47' order='Binary'>
     <flag>binary</flag>
     <flag>compiled</flag>
     </collation>
     ...
    </charset>
    

    The <charset> element must list all the collations for the character set. These must include at least a binary collation and a default (primary) collation. The default collation is often named using a suffix of general_ci (general, case insensitive). It is possible for the binary collation to be the default collation, but usually they are different. The default collation should have a primary flag. The binary collation should have a binary flag.

    You must assign a unique ID number to each collation. The range of IDs from 1024 to 2047 is reserved for user-defined collations. To find the maximum of the currently used collation IDs, use this query:

    SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
    
  2. This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multi-byte functions, or both.

    For a simple character set, create a configuration file, MYSET.xml, that describes the character set properties. Create this file in the sql/share/charsets directory. You can use a copy of latin1.xml as the basis for this file. The syntax for the file is very simple:

    • Comments are written as ordinary XML comments (<!-- text -->).
    • Words within <map> array elements are separated by arbitrary amounts of whitespace.
    • Each word within <map> array elements must be a number in hexadecimal format.
    • The <map> array element for the <ctype> element has 257 words. The other <map> array elements after that have 256 words. See , "Character Definition Arrays".
    • For each collation listed in the <charset> element for the character set in Index.xml, MYSET.xml must contain a <collation> element that defines the character ordering.

    For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:

    • Create the file ctype-MYSET.c in the strings directory. Look at one of the existing ctype-*.c files (such as ctype-big5.c) to see what needs to be defined. The arrays in your file must have names like ctype_MYSET, to_lower_MYSET, and so on. These correspond to the arrays for a simple character set. See , "Character Definition Arrays".
    • For each <collation> element listed in the <charset> element for the character set in Index.xml, the ctype-MYSET.c file must provide an implementation of the collation.
    • If the character set requires string collating functions, see , "String Collating Support for Complex Character Sets".
    • If the character set requires multi-byte character support, see , "Multi-Byte Character Support for Complex Character Sets".
  3. Modify the configuration information. Use the existing configuration information as a guide to adding information for MYSYS. The example here assumes that the character set has default and binary collations, but more lines are needed if MYSET has additional collations.

    1. Edit mysys/charset-def.c, and "register" the collations for the new character set.

      Add these lines to the "declaration" section:

      #ifdef HAVE_CHARSET_MYSET
      extern CHARSET_INFO my_charset_MYSET_general_ci;
      extern CHARSET_INFO my_charset_MYSET_bin;
      #endif
      

      Add these lines to the "registration" section:

      #ifdef HAVE_CHARSET_MYSET
       add_compiled_collation(&my_charset_MYSET_general_ci);
       add_compiled_collation(&my_charset_MYSET_bin);
      #endif
      
    2. If the character set uses ctype-MYSET.c, edit strings/CMakeLists.txt and add ctype-MYSET.c to the definition of the STRINGS_SOURCES variable.
    3. Edit cmake/character_sets.cmake:

      1. Add MYSET to the value of with CHARSETS_AVAILABLE in alphabetic order.
      2. Add MYSET to the value of CHARSETS_COMPLEX in alphabetic order. This is needed even for simple character sets, or CMake will not recognize -DDEFAULT_CHARSET=MYSET.
  4. Reconfigure, recompile, and test.

Character Definition Arrays

Each simple character set has a configuration file located in the sql/share/charsets directory. For a character set named MYSYS, the file is named MYSET.xml. It uses <map> array elements to list character set properties. <map> elements appear within these elements:

For a complex character set as implemented in a ctype-MYSET.c file in the strings directory, there are corresponding arrays: ctype_MYSET[], to_lower_MYSET[], and so forth. Not every complex character set has all of the arrays. See also the existing ctype-*.c files for examples. See the CHARSET_INFO.txt file in the strings directory for additional information.

Most of the arrays are indexed by character value and have 256 elements. The <ctype> array is indexed by character value + 1 and has 257 elements. This is a legacy convention for handling EOF.

<ctype> array elements are bit values. Each element describes the attributes of a single character in the character set. Each attribute is associated with a bitmask, as defined in include/m_ctype.h:

#define _MY_U 01 /* Upper case */
#define _MY_L 02 /* Lower case */
#define _MY_NMR 04 /* Numeral (digit) */
#define _MY_SPC 010 /* Spacing character */
#define _MY_PNT 020 /* Punctuation */
#define _MY_CTR 040 /* Control character */
#define _MY_B 0100 /* Blank */
#define _MY_X 0200 /* heXadecimal digit */

The <ctype> value for a given character should be the union of the applicable bitmask values that describe the character. For example, 'A' is an uppercase character (_MY_U) as well as a hexadecimal digit (_MY_X), so its ctype value should be defined like this:

ctype['A'+1] = _MY_U | _MY_X = 01 | 0200 = 0201

The bitmask values in m_ctype.h are octal values, but the elements of the <ctype> array in MYSET.xml should be written as hexadecimal values.

The <lower> and <upper> arrays hold the lowercase and uppercase characters corresponding to each member of the character set. For example:

lower['A'] should contain 'a'
upper['a'] should contain 'A'

Each <collation> array indicates how characters should be ordered for comparison and sorting purposes. MariaDB sorts characters based on the values of this information. In some cases, this is the same as the <upper> array, which means that sorting is case-insensitive. For more complicated sorting rules (for complex character sets), see the discussion of string collating in , "String Collating Support for Complex Character Sets".

String Collating Support for Complex Character Sets

For a simple character set named MYSET, sorting rules are specified in the MYSET.xml configuration file using <map> array elements within <collation> elements. If the sorting rules for your language are too complex to be handled with simple arrays, you must define string collating functions in the ctype-MYSET.c source file in the strings directory.

The existing character sets provide the best documentation and examples to show how these functions are implemented. Look at the ctype-*.c files in the strings directory, such as the files for the big5, czech, gbk, sjis, and tis160 character sets. Take a look at the MY_COLLATION_HANDLER structures to see how they are used. See also the CHARSET_INFO.txt file in the strings directory for additional information.

Multi-Byte Character Support for Complex Character Sets

If you want to add support for a new character set named MYSET that includes multi-byte characters, you must use multi-byte character functions in the ctype-MYSET.c source file in the strings directory.

The existing character sets provide the best documentation and examples to show how these functions are implemented. Look at the ctype-*.c files in the strings directory, such as the files for the euc_kr, gb2312, gbk, sjis, and ujis character sets. Take a look at the MY_CHARSET_HANDLER structures to see how they are used. See also the CHARSET_INFO.txt file in the strings directory for additional information.

Adding a Collation to a Character Set

Collation Implementation Types
Choosing a Collation ID
Adding a Simple Collation to an 8-Bit Character Set
Adding a UCA Collation to a Unicode Character Set

A collation is a set of rules that defines how to compare and sort character strings. Each collation in MariaDB belongs to a single character set. Every character set has at least one collation, and most have two or more collations.

A collation orders characters based on weights. Each character in a character set maps to a weight. Characters with equal weights compare as equal, and characters with unequal weights compare according to the relative magnitude of their weights.

The WEIGHT_STRING() function can be used to see the weights for the characters in a string. The value that it returns to indicate weights is a binary string, so it is convenient to use HEX(WEIGHT_STRING(str)) to display the weights in printable form. The following example shows that weights do not differ for lettercase for the letters in 'AaBb' it if is a nonbinary case-insensitive string, but do differ if it is a binary string:

mysql> SELECT HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci));
+------------------------------------------------------+
| HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci)) |
+------------------------------------------------------+
| 41414242 |
+------------------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(BINARY 'AaBb'));
+-----------------------------------+
| HEX(WEIGHT_STRING(BINARY 'AaBb')) |
+-----------------------------------+
| 41614262 |
+-----------------------------------+

MySQL supports several collation implementations, as discussed in , "Collation Implementation Types". Some of these can be added to MariaDB without recompiling:

The following sections describe how to add collations of the first two types to existing character sets. All existing character sets already have a binary collation, so there is no need here to describe how to add one.

Summary of the procedure for adding a new collation:

  1. Choose a collation ID.
  2. Add configuration information that names the collation and describes the character-ordering rules.
  3. Restart the server.
  4. Verify that the collation is present.

The instructions here cover only collations that can be added without recompiling MySQL. To add a collation that does require recompiling (as implemented by means of functions in a C source file), use the instructions in , "Adding a Character Set". However, instead of adding all the information required for a complete character set, just modify the appropriate files for an existing character set. That is, based on what is already present for the character set's current collations, add data structures, functions, and configuration information for the new collation.Note

If you modify an existing collation, that may affect the ordering of rows for indexes on columns that use the collation. In this case, rebuild any such indexes to avoid problems such as incorrect query results. For further information, see , "Checking Whether Tables or Indexes Must Be Rebuilt".

Additional Resources

Collation Implementation Types

MySQL implements several types of collations:

Simple collations for 8-bit character sets

This kind of collation is implemented using an array of 256 weights that defines a one-to-one mapping from character codes to weights. latin1_swedish_ci is an example. It is a case-insensitive collation, so the uppercase and lowercase versions of a character have the same weights and they compare as equal.

mysql> SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT HEX(WEIGHT_STRING('a')), HEX(WEIGHT_STRING('A'));
+-------------------------+-------------------------+
| HEX(WEIGHT_STRING('a')) | HEX(WEIGHT_STRING('A')) |
+-------------------------+-------------------------+
| 41 | 41 |
+-------------------------+-------------------------+
1 row in set (0.01 sec)
mysql> SELECT 'a' = 'A';
+-----------+
| 'a' = 'A' |
+-----------+
| 1 |
+-----------+
1 row in set (0.12 sec)

For implementation instructions, see , "Adding a Simple Collation to an 8-Bit Character Set".

Complex collations for 8-bit character sets

This kind of collation is implemented using functions in a C source file that define how to order characters, as described in , "Adding a Character Set".

Collations for non-Unicode multi-byte character sets

For this type of collation, 8-bit (single-byte) and multi-byte characters are handled differently. For 8-bit characters, character codes map to weights in case-insensitive fashion. (For example, the single-byte characters 'a' and 'A' both have a weight of 0x41.) For multi-byte characters, there are two types of relationship between character codes and weights:

For implementation instructions, see , "Adding a Character Set".

Collations for Unicode multi-byte character sets

Some of these collations are based on the Unicode Collation Algorithm (UCA), others are not.

Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.

mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
 -> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)

UCA-based collations in MariaDB have these properties:

A many-characters-to-many-weights mapping is also possible (this is contraction with expansion), but is not supported by MySQL.

For implementation instructions, for a non-UCA collation, see , "Adding a Character Set". For a UCA collation, see , "Adding a UCA Collation to a Unicode Character Set".

Miscellaneous collations

There are also a few collations that do not fall into any of the previous categories.

Choosing a Collation ID

Each collation must have a unique ID. To add a collation, you must choose an ID value that is not currently used. The range of IDs from 1024 to 2047 is reserved for user-defined collations. The collation ID that you choose will appear in these contexts:

To determine the largest currently used ID, issue the following statement:

mysql> SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
+---------+
| MAX(ID) |
+---------+
| 210 |
+---------+

To display a list of all currently used IDs, issue this statement:

mysql> SELECT ID FROM INFORMATION_SCHEMA.COLLATIONS ORDER BY ID;
+-----+
| ID |
+-----+
| 1 |
| 2 |
| ... |
| 52 |
| 53 |
| 57 |
| 58 |
| ... |
| 98 |
| 99 |
| 128 |
| 129 |
| ... |
| 210 |
+-----+
Warning

Before MariaDB 5.5, which provides for a range of user-defined collation IDs, you must choose an ID in the range from 1 to 254. In this case, if you upgrade MySQL, you may find that the collation ID you choose has been assigned to a collation included in the new MariaDB distribution. In this case, you will need to choose a new value for your own collation.

In addition, before upgrading, you should save the configuration files that you change. If you upgrade in place, the process will replace the your modified files.

Adding a Simple Collation to an 8-Bit Character Set

This section describes how to add a simple collation for an 8-bit character set by writing the <collation> elements associated with a <charset> character set description in the MariaDB Index.xml file. The procedure described here does not require recompiling MySQL. The example adds a collation named latin1_test_ci to the latin1 character set.

  1. Choose a collation ID, as shown in , "Choosing a Collation ID". The following steps use an ID of 1024.
  2. Modify the Index.xml and latin1.xml configuration files. These files will be located in the directory named by the character_sets_dir system variable. You can check the variable value as follows, although the path name might be different on your system:

    mysql> SHOW VARIABLES LIKE 'character_sets_dir';
    +--------------------+-----------------------------------------+
    | Variable_name | Value |
    +--------------------+-----------------------------------------+
    | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
    +--------------------+-----------------------------------------+
    
  3. Choose a name for the collation and list it in the Index.xml file. Find the <charset> element for the character set to which the collation is being added, and add a <collation> element that indicates the collation name and ID, to associate the name with the ID. For example:

    <charset name='latin1'>
     ...
     <collation name='latin1_test_ci' id='1024'/>
     ...
    </charset>
    
  4. In the latin1.xml configuration file, add a <collation> element that names the collation and that contains a <map> element that defines a character code-to-weight mapping table for character codes 0 to 255. Each value within the <map> element must be a number in hexadecimal format.

    <collation name='latin1_test_ci'>
    <map>
     00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
     10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
     20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
     30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
     40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
     50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
     60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
     50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
     80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
     90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
     A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
     B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
     41 41 41 41 5B 5D 5B 43 45 45 45 45 49 49 49 49
     44 4E 4F 4F 4F 4F 5C D7 5C 55 55 55 59 59 DE DF
     41 41 41 41 5B 5D 5B 43 45 45 45 45 49 49 49 49
     44 4E 4F 4F 4F 4F 5C F7 5C 55 55 55 59 59 DE FF
    </map>
    </collation>
    
  5. Restart the server and use this statement to verify that the collation is present:

    mysql> SHOW COLLATION LIKE 'latin1_test_ci';
    +----------------+---------+------+---------+----------+---------+
    | Collation | Charset | Id | Default | Compiled | Sortlen |
    +----------------+---------+------+---------+----------+---------+
    | latin1_test_ci | latin1 | 1024 | | | 1 |
    +----------------+---------+------+---------+----------+---------+
    

Adding a UCA Collation to a Unicode Character Set

Defining a UCA Collation using LDML Syntax
LDML Syntax Supported in MySQL
Diagnostics During Index.xml Parsing

This section describes how to add a UCA collation for a Unicode character set by writing the <collation> element within a <charset> character set description in the MariaDB Index.xml file. The procedure described here does not require recompiling MySQL. It uses a subset of the Locale Data Markup Language (LDML) specification, which is available at http://www.unicode.org/reports/tr35/. With this method, you need not define the entire collation. Instead, you begin with an existing "base" collation and describe the new collation in terms of how it differs from the base collation. The following table lists the base collations of the Unicode character sets for which UCA collations can be defined. It is not possible to create user-defined UCA collations for utf16le; there is no utf16le_unicode_ci collation that would serve as the basis for such collations.

Table 9.1. MariaDB Character Sets Available for User-Defined UCA Collations

Character Set Base Collation
utf8 utf8_unicode_ci
ucs2 ucs2_unicode_ci
utf16 utf16_unicode_ci
utf32 utf32_unicode_ci

The following sections show how to add a collation that is defined using LDML syntax, and provide a summary of LDML rules supported in MySQL.

Defining a UCA Collation using LDML Syntax

To add a UCA collation for a Unicode character set without recompiling MySQL, use the following procedure. If you are unfamiliar with the LDML rules used to describe the collation's sort characteristics, see , "LDML Syntax Supported in MySQL".

The example adds a collation named utf8_phone_ci to the utf8 character set. The collation is designed for a scenario involving a Web application for which users post their names and phone numbers. Phone numbers can be given in very different formats:

+7-12345-67
+7-12-345-67
+7 12 345 67
+7 (12) 345 67
+71234567

The problem raised by dealing with these kinds of values is that the varying permissible formats make searching for a specific phone number very difficult. The solution is to define a new collation that reorders punctuation characters, making them ignorable.

  1. Choose a collation ID, as shown in , "Choosing a Collation ID". The following steps use an ID of 1029.
  2. To modify the Index.xml configuration file. This file will be located in the directory named by the character_sets_dir system variable. You can check the variable value as follows, although the path name might be different on your system:

    mysql> SHOW VARIABLES LIKE 'character_sets_dir';
    +--------------------+-----------------------------------------+
    | Variable_name | Value |
    +--------------------+-----------------------------------------+
    | character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
    +--------------------+-----------------------------------------+
    
  3. Choose a name for the collation and list it in the Index.xml file. In addition, you'll need to provide the collation ordering rules. Find the <charset> element for the character set to which the collation is being added, and add a <collation> element that indicates the collation name and ID, to associate the name with the ID. Within the <collation> element, provide a <rules> element containing the ordering rules:

    <charset name='utf8'>
     ...
     <collation name='utf8_phone_ci' id='1029'>
     <rules>
     <reset>\u0000</reset>
     <i>\u0020</i> <!-- space -->
     <i>\u0028</i> <!-- left parenthesis -->
     <i>\u0029</i> <!-- right parenthesis -->
     <i>\u002B</i> <!-- plus -->
     <i>\u002D</i> <!-- hyphen -->
     </rules>
     </collation>
     ...
    </charset>
    
  4. If you want a similar collation for other Unicode character sets, add other <collation> elements. For example, to define ucs2_phone_ci, add a <collation> element to the <charset name='ucs2'> element. Remember that each collation must have its own unique ID.
  5. Restart the server and use this statement to verify that the collation is present:

    mysql> SHOW COLLATION LIKE 'utf8_phone_ci';
    +---------------+---------+------+---------+----------+---------+
    | Collation | Charset | Id | Default | Compiled | Sortlen |
    +---------------+---------+------+---------+----------+---------+
    | utf8_phone_ci | utf8 | 1029 | | | 8 |
    +---------------+---------+------+---------+----------+---------+
    

Now test the collation to make sure that it has the desired properties.

Create a table containing some sample phone numbers using the new collation:

mysql> CREATE TABLE phonebook (
 ->  name VARCHAR(64),
 ->  phone VARCHAR(64) CHARACTER SET utf8 COLLATE utf8_phone_ci
 -> );
Query OK, 0 rows affected (0.09 sec)
mysql> INSERT INTO phonebook VALUES ('Svoj','+7 912 800 80 02');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO phonebook VALUES ('Hf','+7 (912) 800 80 04');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO phonebook VALUES ('Bar','+7-912-800-80-01');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO phonebook VALUES ('Ramil','(7912) 800 80 03');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO phonebook VALUES ('Sanja','+380 (912) 8008005');
Query OK, 1 row affected (0.00 sec)

Run some queries to see whether the ignored punctuation characters are in fact ignored for sorting and comparisons:

mysql> SELECT * FROM phonebook ORDER BY phone;
+-------+--------------------+
| name | phone |
+-------+--------------------+
| Sanja | +380 (912) 8008005 |
| Bar | +7-912-800-80-01 |
| Svoj | +7 912 800 80 02 |
| Ramil | (7912) 800 80 03 |
| Hf | +7 (912) 800 80 04 |
+-------+--------------------+
5 rows in set (0.00 sec)
mysql> SELECT * FROM phonebook WHERE phone='+7(912)800-80-01';
+------+------------------+
| name | phone |
+------+------------------+
| Bar | +7-912-800-80-01 |
+------+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM phonebook WHERE phone='79128008001';
+------+------------------+
| name | phone |
+------+------------------+
| Bar | +7-912-800-80-01 |
+------+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM phonebook WHERE phone='7 9 1 2 8 0 0 8 0 0 1';
+------+------------------+
| name | phone |
+------+------------------+
| Bar | +7-912-800-80-01 |
+------+------------------+
1 row in set (0.00 sec)

LDML Syntax Supported in MariaDB

This section describes the LDML syntax that MariaDB recognizes. This is a subset of the syntax described in the LDML specification available at http://www.unicode.org/reports/tr35/, which should be consulted for further information. MariaDB recognizes a large enough subset of the syntax that, in many cases, it is possible to download a collation definition from the Unicode Common Locale Data Repository and paste the relevant part (that is, the part between the <rules> and </rules> tags) into the MariaDB Index.xml file. The rules described here are all supported except that character sorting occurs only at the primary level. Rules that specify differences at secondary or higher sort levels are recognized (and thus can be included in collation definitions) but are treated as equality at the primary level.

The MariaDB server generates diagnostics when it finds problems while parsing the Index.xml file. See , "Diagnostics During Index.xml Parsing".

Character Representation

Characters named in LDML rules can be written literally or in \unnnn format, where nnnn is the hexadecimal Unicode code point value. For example, A and á can be written literally or as \u0041 and \u00E1. Within hexadecimal values, the digits A through F are not case sensitive; \u00E1 and \u00e1 are equivalent. For UCA 4.0.0 collations, hexadecimal notation can be used only for characters in the Basic Multilingual Plane, not for characters outside the BMP range of 0000 to FFFF. For UCA 5.2.0 collations, hexadecimal notation can be used for any character.

The Index.xml file itself should be written using UTF-8 encoding.

Syntax Rules

LDML has reset rules and shift rules to specify character ordering. Orderings are given as a set of rules that begin with a reset rule that establishes an anchor point, followed by shift rules that indicate how characters sort relative to the anchor point.

MySQL-Specific LDML Extensions

In MariaDB 5.6, an extension to LDML rules permits the <collation> element to include an optional version attribute in <collation> tags to indicate the UCA version on which the collation is based. If the version attribute is omitted, its default value is 4.0.0. For example, this specification indicates a collation that is based on UCA 5.2.0:

<collation id='nnn' name='utf8_xxx_ci' version='5.2.0'>
...
</collation>

Diagnostics During Index.xml Parsing

The MariaDB server generates diagnostics when it finds problems while parsing the Index.xml file:

Character Set Configuration

You can change the default server character set and collation with the --character-set-server and --collation-server options when you start the server. The collation must be a legal collation for the default character set. (Use the SHOW COLLATION statement to determine which collations are available for each character set.) See , "Server Command Options".

If you try to use a character set that is not compiled into your binary, you might run into the following problems:

You can force client programs to use specific character set as follows:

[client]
default-character-set=charset_name

This is normally unnecessary. However, when character_set_system differs from character_set_server or character_set_client, and you input characters manually (as database object identifiers, column values, or both), these may be displayed incorrectly in output from the client or the output itself may be formatted incorrectly. In such cases, starting the mysql client with --default-character-set=system_character_set-that is, setting the client character set to match the system character set-should fix the problem.

For MyISAM tables, you can check the character set name and number for a table with myisamchk -dvv tbl_name.

MySQL Server Time Zone Support

Staying Current with Time Zone Changes
Time Zone Leap Second Support

The MariaDB server maintains several time zone settings:

The current session time zone setting affects display and storage of time values that are zone-sensitive. This includes the values displayed by functions such as NOW() or CURTIME(), and values stored in and retrieved from TIMESTAMP columns. Values for TIMESTAMP columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval.

The current time zone setting does not affect values displayed by functions such as UTC-TIMESTAMP() or values in DATE, TIME, or DATETIME columns. Nor are values in those data types stored in UTC; the time zone applies for them only when converting from TIMESTAMP values. If you want locale-specific arithmetic for DATE, TIME, or DATETIME values, convert them to UTC, perform the arithmetic, and then convert back.

The current values of the global and client-specific time zones can be retrieved like this:

mysql> SELECT @@global.time_zone, @@session.time_zone;

timezone values can be given in several formats, none of which are case sensitive:

The MariaDB installation procedure creates the time zone tables in the MariaDB database, but does not load them. You must do so manually using the following instructions. (If you are upgrading to MariaDB 4.1.3 or later from an earlier version, you can create the tables by upgrading your MariaDB database. Use the instructions in , "mysql_upgrade - Check Tables for MariaDB Upgrade". After creating the tables, you can load them.)Note

Loading the time zone information is not necessarily a one-time operation because the information changes occasionally. For example, the rules for Daylight Saving Time in the United States, Mexico, and parts of Canada changed in 2007. When such changes occur, applications that use the old rules become out of date and you may find it necessary to reload the time zone tables to keep the information used by your MariaDB server current. See the notes at the end of this section.

If your system has its own zoneinfo database (the set of files describing time zones), you should use the mysql_tzinfo_to_sql program for filling the time zone tables. Examples of such systems are Linux, FreeBSD, Solaris, and Mac OS X. One likely location for these files is the /usr/share/zoneinfo directory. If your system does not have a zoneinfo database, you can use the downloadable package described later in this section.

The mysql_tzinfo_to_sql program is used to load the time zone tables. On the command line, pass the zoneinfo directory path name to mysql_tzinfo_to_sql and send the output into the mysql program. For example:

shell> mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql

mysql_tzinfo_to_sql reads your system's time zone files and generates SQL statements from them. mysql processes those statements to load the time zone tables.

mysql_tzinfo_to_sql also can be used to load a single time zone file or to generate leap second information:

If your system is one that has no zoneinfo database (for example, Windows or HP-UX), you can use the package of pre-built time zone tables that is available for download at the MariaDB Developer Zone:

http://dev.mysql.com/downloads/timezones.html

This time zone package contains .frm, .MYD, and .MYI files for the MyISAM time zone tables. These tables should be part of the MariaDB database, so you should place the files in the MariaDB subdirectory of your MariaDB server's data directory. The server should be stopped while you do this and restarted afterward.Warning

Do not use the downloadable package if your system has a zoneinfo database. Use the mysql_tzinfo_to_sql utility instead. Otherwise, you may cause a difference in datetime handling between MariaDB and other applications on your system.

For information about time zone settings in replication setup, please see , "Replication Features and Issues".

Staying Current with Time Zone Changes

As mentioned earlier, when the time zone rules change, applications that use the old rules become out of date. To stay current, it is necessary to make sure that your system uses current time zone information is used. For MySQL, there are two factors to consider in staying current:

If you are uncertain whether named time zones are available, for use either as the server's time zone setting or by clients that set their own time zone, check whether your time zone tables are empty. The following query determines whether the table that contains time zone names has any rows:

mysql> SELECT COUNT(*) FROM mysql.time_zone_name;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+

A count of zero indicates that the table is empty. In this case, no one can be using named time zones, and you don't need to update the tables. A count greater than zero indicates that the table is not empty and that its contents are available to be used for named time zone support. In this case, you should be sure to reload your time zone tables so that anyone who uses named time zones will get correct query results.

To check whether your MariaDB installation is updated properly for a change in Daylight Saving Time rules, use a test like the one following. The example uses values that are appropriate for the 2007 DST 1-hour change that occurs in the United States on March 11 at 2 a.m.

The test uses these two queries:

SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');
SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');

The two time values indicate the times at which the DST change occurs, and the use of named time zones requires that the time zone tables be used. The desired result is that both queries return the same result (the input time, converted to the equivalent value in the 'US/Central' time zone).

Before updating the time zone tables, you would see an incorrect result like this:

mysql> SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 01:00:00 |
+------------------------------------------------------------+
mysql> SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 02:00:00 |
+------------------------------------------------------------+

After updating the tables, you should see the correct result:

mysql> SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 01:00:00 |
+------------------------------------------------------------+
mysql> SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 01:00:00 |
+------------------------------------------------------------+

Time Zone Leap Second Support

In MariaDB 5.6, leap second values are returned with a time part that ends with :59:59. This means that a function such as NOW() can return the same value for two or three consecutive seconds during the leap second. It remains true that literal temporal values having a time part that ends with :59:60 or :59:61 are considered invalid.

If it is necessary to search for TIMESTAMP values one second before the leap second, anomalous results may be obtained if you use a comparison with 'YYYY-MM-DD hh:mm:ss' values:

mysql> CREATE TABLE t1 (a INT, ts TIMESTAMP DEFAULT NOW(), PRIMARY KEY (ts));
Query OK, 0 rows affected (0.11 sec)
mysql> # Simulate NOW() = '2009-01-01 02:59:59'
mysql> SET timestamp = 1230768022;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO t1 (a) VALUES (1);
Query OK, 1 row affected (0.07 sec)
mysql> # Simulate NOW() = '2009-01-01 02:59:60'
mysql> SET timestamp = 1230768023;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO t1 (a) VALUES (2);
Query OK, 1 row affected (0.02 sec)
mysql> SELECT * FROM t1;
+------+---------------------+
| a | ts |
+------+---------------------+
| 1 | 2008-12-31 18:00:22 |
| 2 | 2008-12-31 18:00:23 |
+------+---------------------+
2 rows in set (0.02 sec)
mysql> SELECT * FROM t1 WHERE ts = '2009-01-01 02:59:59';
Empty set (0.03 sec)

To work around this, you can use a comparison based on the UTC value actually stored in column, which has the leap second correction applied:

mysql> SELECT * FROM t1 WHERE UNIX_TIMESTAMP(ts) = 1230768023;
+------+---------------------+
| a | ts |
+------+---------------------+
| 2 | 2008-12-31 18:00:23 |
+------+---------------------+
1 row in set (0.02 sec)

MySQL Server Locale Support

The locale indicated by the lc_time_names system variable controls the language used to display day and month names and abbreviations. This variable affects the output from the DATE-FORMAT(), DAYNAME(), and MONTHNAME() functions.

The lc-time-names value does not affect the result from FORMAT(), but this function takes an optional third parameter that enables a locale to be specified to be used for the result number's decimal point, thousands separator, and grouping between separators. Permissible locale values are the same as the legal values for the lc_time_names system variable.

Locale names have language and region subtags listed by IANA (http://www.iana.org/assignments/language-subtag-registry) such as 'ja_JP' or 'pt_BR'. The default value is 'en_US' regardless of your system's locale setting, but you can set the value at server startup or set the GLOBAL value if you have the SUPER privilege. Any client can examine the value of lc_time_names or set its SESSION value to affect the locale for its own connection.

mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.09 sec)
mysql> SELECT @@lc_time_names;
+-----------------+
| @@lc_time_names |
+-----------------+
| en_US |
+-----------------+
1 row in set (0.00 sec)
mysql> SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');
+-----------------------+-------------------------+
| DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') |
+-----------------------+-------------------------+
| Friday | January |
+-----------------------+-------------------------+
1 row in set (0.00 sec)
mysql> SELECT DATE_FORMAT('2010-01-01','%W %a %M %b');
+-----------------------------------------+
| DATE_FORMAT('2010-01-01','%W %a %M %b') |
+-----------------------------------------+
| Friday Fri January Jan |
+-----------------------------------------+
1 row in set (0.00 sec)
mysql> SET lc_time_names = 'es_MX';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @@lc_time_names;
+-----------------+
| @@lc_time_names |
+-----------------+
| es_MX |
+-----------------+
1 row in set (0.00 sec)
mysql> SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');
+-----------------------+-------------------------+
| DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') |
+-----------------------+-------------------------+
| viernes | enero |
+-----------------------+-------------------------+
1 row in set (0.00 sec)
mysql> SELECT DATE_FORMAT('2010-01-01','%W %a %M %b');
+-----------------------------------------+
| DATE_FORMAT('2010-01-01','%W %a %M %b') |
+-----------------------------------------+
| viernes vie enero ene |
+-----------------------------------------+
1 row in set (0.00 sec)

The day or month name for each of the affected functions is converted from utf8 to the character set indicated by the character_set_connection system variable.

lc_time_names may be set to any of the following locale values. The set of locales supported by MariaDB may differ from those supported by your operating system.

ar_AE: Arabic - United Arab Emirates ar_BH: Arabic - Bahrain
ar_DZ: Arabic - Algeria ar_EG: Arabic - Egypt
ar_IN: Arabic - India ar_IQ: Arabic - Iraq
ar_JO: Arabic - Jordan ar_KW: Arabic - Kuwait
ar_LB: Arabic - Lebanon ar_LY: Arabic - Libya
ar_MA: Arabic - Morocco ar_OM: Arabic - Oman
ar_QA: Arabic - Qatar ar_SA: Arabic - Saudi Arabia
ar_SD: Arabic - Sudan ar_SY: Arabic - Syria
ar_TN: Arabic - Tunisia ar_YE: Arabic - Yemen
be_BY: Belarusian - Belarus bg_BG: Bulgarian - Bulgaria
ca_ES: Catalan - Spain cs_CZ: Czech - Czech Republic
da_DK: Danish - Denmark de_AT: German - Austria
de_BE: German - Belgium de_CH: German - Switzerland
de_DE: German - Germany de_LU: German - Luxembourg
el_GR: Greek - Greece en_AU: English - Australia
en_CA: English - Canada en_GB: English - United Kingdom
en_IN: English - India en_NZ: English - New Zealand
en_PH: English - Philippines en_US: English - United States
en_ZA: English - South Africa en_ZW: English - Zimbabwe
es_AR: Spanish - Argentina es_BO: Spanish - Bolivia
es_CL: Spanish - Chile es_CO: Spanish - Columbia
es_CR: Spanish - Costa Rica es_DO: Spanish - Dominican Republic
es_EC: Spanish - Ecuador es_ES: Spanish - Spain
es_GT: Spanish - Guatemala es_HN: Spanish - Honduras
es_MX: Spanish - Mexico es_NI: Spanish - Nicaragua
es_PA: Spanish - Panama es_PE: Spanish - Peru
es_PR: Spanish - Puerto Rico es_PY: Spanish - Paraguay
es_SV: Spanish - El Salvador es_US: Spanish - United States
es_UY: Spanish - Uruguay es_VE: Spanish - Venezuela
et_EE: Estonian - Estonia eu_ES: Basque - Basque
fi_FI: Finnish - Finland fo_FO: Faroese - Faroe Islands
fr_BE: French - Belgium fr_CA: French - Canada
fr_CH: French - Switzerland fr_FR: French - France
fr_LU: French - Luxembourg gl_ES: Galician - Spain
gu_IN: Gujarati - India he_IL: Hebrew - Israel
hi_IN: Hindi - India hr_HR: Croatian - Croatia
hu_HU: Hungarian - Hungary id_ID: Indonesian - Indonesia
is_IS: Icelandic - Iceland it_CH: Italian - Switzerland
it_IT: Italian - Italy ja_JP: Japanese - Japan
ko_KR: Korean - Republic of Korea lt_LT: Lithuanian - Lithuania
lv_LV: Latvian - Latvia mk_MK: Macedonian - FYROM
mn_MN: Mongolia - Mongolian ms_MY: Malay - Malaysia
nb_NO: Norwegian(Bokmål) - Norway nl_BE: Dutch - Belgium
nl_NL: Dutch - The Netherlands no_NO: Norwegian - Norway
pl_PL: Polish - Poland pt_BR: Portugese - Brazil
pt_PT: Portugese - Portugal rm_CH: Romansh - Switzerland
ro_RO: Romanian - Romania ru_RU: Russian - Russia
ru_UA: Russian - Ukraine sk_SK: Slovak - Slovakia
sl_SI: Slovenian - Slovenia sq_AL: Albanian - Albania
sr_RS: Serbian - Yugoslavia sv_FI: Swedish - Finland
sv_SE: Swedish - Sweden ta_IN: Tamil - India
te_IN: Telugu - India th_TH: Thai - Thailand
tr_TR: Turkish - Turkey uk_UA: Ukrainian - Ukraine
ur_PK: Urdu - Pakistan vi_VN: Vietnamese - Viet Nam
zh_CN: Chinese - China zh_HK: Chinese - Hong Kong
zh_TW: Chinese - Taiwan Province of China

lc-time-names currently does not affect the STR_TO_DATE() or GET_FORMAT() function.Copyright 1997, 2012, Oracle and/or its affiliates. All rights reserved. Legal Notices


Prev Next
Language Structure Home Chapter 10. Data Types