The utf8mb4 Character Set (Four-Byte UTF-8 Unicode Encoding)
The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. The utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
- For a BMP character,
utf8andutf8mb4have identical storage characteristics: same code values, same encoding, same length. - For a supplementary character,
utf8cannot store the character at all, whileutf8mb4requires four bytes to store it. Sinceutf8cannot store the character at all, you do not have any supplementary characters inutf8columns and you need not worry about converting characters or losing data when upgradingutf8data from older versions of MySQL.
utf8mb4 is a superset of utf8, so for an operation such as the following concatenation, the result has character set utf8mb4 and the collation of utf8mb4_col:
SELECT CONCAT(utf8_col, utf8mb4_col);
Similarly, the following comparison in the WHERE clause works according to the collation of utf8mb_col:
SELECT * FROM utf8_tbl, utf8mb4_tbl WHERE utf8_tbl.utf8_col = utf8mb4_tbl.utf8mb4_col;Tip
To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MariaDB must reserve three (or four) bytes for each character in a CHAR CHARACTER SET utf8 (or utf8mb4) column because that is the maximum possible length. For example, MariaDB must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column.