SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
🍣=🍺 Powered by Rabbit 2.1.6
🍣=🍺
とみたまさひろ
MyNA会
2015/04/22
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
とみた まさひろ
http://tmtms.hatenablog.com
http://twitter.com/tmtms
https://github.com/tmtm
MySQL 3.21 に日本語charsetを追加
MySQLのRubyバインディング作成
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
もっともRTされたツイート
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
もっともブクマされたブログ
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
長野県北部在住
日本MySQLユーザ会代表
名ばかり代表
「たまには何かしゃべれや (#゚Д゚)ゴルァ!!」
と言われたのでしゃべります
🍣=🍺 Powered by Rabbit 2.1.6
🍣=🍺 問題
🍣=🍺 Powered by Rabbit 2.1.6
MySQL的には🍣と🍺は同じ
🍣=🍺 Powered by Rabbit 2.1.6
ちなみに🍛と💩も(ry
🍣=🍺 Powered by Rabbit 2.1.6
PostgreSQLなら問題ないらしい
http://soudai1025.blogspot.jp/2015/03/postgresqlunicode-6.html
🍣=🍺 Powered by Rabbit 2.1.6
何故?
🍣=🍺 Powered by Rabbit 2.1.6
kamipo++
utf8_unicode_ci に対する日本の開発者の見解
http://blog.kamipo.net/entry/2015/03/08/145045
MySQL と Unicode Collation Algorithm (UCA)
http://blog.kamipo.net/entry/2015/03/17/103457
MySQL と寿司ビール問題
http://blog.kamipo.net/entry/2015/03/23/093052
🍣=🍺 Powered by Rabbit 2.1.6
MySQLの文字は Charset と
Collation がある
🍣=🍺 Powered by Rabbit 2.1.6
Charset
🍣=🍺 Powered by Rabbit 2.1.6
いわゆる文字コード
🍣=🍺 Powered by Rabbit 2.1.6
文字のバイト表現
🍣=🍺 Powered by Rabbit 2.1.6
Charset: utf8mb4
「A」 = 41
「あ」= E3 81 82
「🍣」= F0 9F 8D A3
「🍺」= F0 9F 8D BA
🍣=🍺 Powered by Rabbit 2.1.6
Collation
🍣=🍺 Powered by Rabbit 2.1.6
文字の照合規則・照合順序
🍣=🍺 Powered by Rabbit 2.1.6
Collation 一覧
mysql> show collation;
+--------------------------+----------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| big5_bin | big5 | 84 | | Yes | 1 |
| dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 |
| dec8_bin | dec8 | 69 | | Yes | 1 |
| cp850_general_ci | cp850 | 4 | Yes | Yes | 1 |
| cp850_bin | cp850 | 80 | | Yes | 1 |
| hp8_english_ci | hp8 | 6 | Yes | Yes | 1 |
| hp8_bin | hp8 | 72 | | Yes | 1 |
| koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 |
| koi8r_bin | koi8r | 74 | | Yes | 1 |
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
🍣=🍺 Powered by Rabbit 2.1.6
Charset 毎に Collation がある
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4 の Collation
全部で16個
mysql> show collation like 'utf8mb4%';
+------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4 の Collation
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+------------------------+---------+-----+---------+----------+---------+
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4 の Collation
utf8mb4_general_ci
utf8mb4_bin
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
utf8mb4_言語_ci
(utf8m4_japanese_ci は無い)
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_general_ci
utf8mb4 charset のデフォルト collation
ASCII大文字小文字を区別しない(A=a)
絵文字を区別しない(🍣=🍺)
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_bin
varchar(99) binary
全文字を区別する(A≠a, 🍣≠🍺)
PostgreSQLと同じならこれでいい
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_unicode_ci
Unicode Collation Algorithm 4.0.0
http://www.unicode.org/reports/tr10/
http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html
ASCII大文字小文字を区別しない(A=a)
絵文字を区別しない(🍣=🍺)
ひらがな、カタカナ、濁点有無、全角、半
角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_unicode_520_ci
Unicode Collation Algorithm 5.2.0
ASCII大文字小文字を区別しない(A=a)
絵文字を区別する(🍣≠🍺)
ひらがな、カタカナ、濁点有無、全角、半
角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
🍣=🍺 Powered by Rabbit 2.1.6
ハハ=パパ=ババ 問題
誰得
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_*_ci
Collation A : a 🍣 : 🍺 は : ぱ
general = = ≠
bin ≠ ≠ ≠
unicode = = =
unicode_
520
= ≠ =
🍣=🍺 Powered by Rabbit 2.1.6
ぼくらが本当に欲しかったもの
Collation A : a 🍣 : 🍺 は : ぱ
general = = ≠
bin ≠ ≠ ≠
unicode = = =
unicode_
520
= ≠ =
japanese = ≠ ≠
🍣=🍺 Powered by Rabbit 2.1.6
だ、だれか
utf8mb4_japanese_ci を作って
(;´Д`)
🍣=🍺 Powered by Rabbit 2.1.6
おまけ
🍣=🍺 Powered by Rabbit 2.1.6
同じ文字とみなされるかどうかは
weight_string() で確かめられる
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_general_ci
mysql> select hex(weight_string('🍣' collate utf8mb4_general_ci));
+----------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_general_ci)) |
+----------------------------------------------------+
| FFFD |
+----------------------------------------------------+
mysql> select hex(weight_string('🍺' collate utf8mb4_general_ci));
+----------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_general_ci)) |
+----------------------------------------------------+
| FFFD |
+----------------------------------------------------+
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_unicode_520_ci
mysql> select hex(weight_string('🍣' collate utf8mb4_unicode_520_ci));
+--------------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_unicode_520_ci)) |
+--------------------------------------------------------+
| FBC3F363 |
+--------------------------------------------------------+
mysql> select hex(weight_string('🍺' collate utf8mb4_unicode_520_ci));
+--------------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_unicode_520_ci)) |
+--------------------------------------------------------+
| FBC3F37A |
+--------------------------------------------------------+
🍣=🍺 Powered by Rabbit 2.1.6
おまけ2
🍣=🍺 Powered by Rabbit 2.1.6
パ と パ
utf8_unicode_ci では「パ」=「ハ」=「ハ」
「パ」は一文字、「パ」は二文字
'パ' LIKE 'パ' => 偽
'パ' = 'パ' => 真
🍣=🍺 Powered by Rabbit 2.1.6
= と LIKE は違うらしい
Per the SQL standard, LIKE
performs matching on a per-
character basis, thus it can
produce results different
from the = comparison
operator
http://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html#operator_like
🍣=🍺 Powered by Rabbit 2.1.6
おわり

Contenu connexe

Plus de Masahiro Tomita

「理論から学ぶデータベース実践入門」読書会スペシャル
「理論から学ぶデータベース実践入門」読書会スペシャル「理論から学ぶデータベース実践入門」読書会スペシャル
「理論から学ぶデータベース実践入門」読書会スペシャル
Masahiro Tomita
 
アジャイルジャパン長野サテライト
アジャイルジャパン長野サテライトアジャイルジャパン長野サテライト
アジャイルジャパン長野サテライト
Masahiro Tomita
 

Plus de Masahiro Tomita (20)

Ruby 2.5
Ruby 2.5Ruby 2.5
Ruby 2.5
 
本当はこわいMySQLプロトコル
本当はこわいMySQLプロトコル本当はこわいMySQLプロトコル
本当はこわいMySQLプロトコル
 
ネットワークこわい
ネットワークこわいネットワークこわい
ネットワークこわい
 
CSV
CSVCSV
CSV
 
MySQLの文字コード事情 2017春版
MySQLの文字コード事情 2017春版MySQLの文字コード事情 2017春版
MySQLの文字コード事情 2017春版
 
MySQLの文字コード事情 2017版
MySQLの文字コード事情 2017版MySQLの文字コード事情 2017版
MySQLの文字コード事情 2017版
 
Ruby24
Ruby24Ruby24
Ruby24
 
MySQLの文字コード事情
MySQLの文字コード事情MySQLの文字コード事情
MySQLの文字コード事情
 
進捗と品質
進捗と品質進捗と品質
進捗と品質
 
MySQLを拡張する
MySQLを拡張するMySQLを拡張する
MySQLを拡張する
 
「理論から学ぶデータベース実践入門」読書会スペシャル
「理論から学ぶデータベース実践入門」読書会スペシャル「理論から学ぶデータベース実践入門」読書会スペシャル
「理論から学ぶデータベース実践入門」読書会スペシャル
 
MyNAができるまで
MyNAができるまでMyNAができるまで
MyNAができるまで
 
文字化け
文字化け文字化け
文字化け
 
Crystal
CrystalCrystal
Crystal
 
メールの暗号化
メールの暗号化メールの暗号化
メールの暗号化
 
文字化け
文字化け文字化け
文字化け
 
進捗と品質
進捗と品質進捗と品質
進捗と品質
 
アジャイルジャパン長野サテライト
アジャイルジャパン長野サテライトアジャイルジャパン長野サテライト
アジャイルジャパン長野サテライト
 
本当はこわいエンコーディングの話
本当はこわいエンコーディングの話本当はこわいエンコーディングの話
本当はこわいエンコーディングの話
 
Sequelのすすめ
SequelのすすめSequelのすすめ
Sequelのすすめ
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

🍣=🍺

  • 1. 🍣=🍺 Powered by Rabbit 2.1.6 🍣=🍺 とみたまさひろ MyNA会 2015/04/22
  • 2. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 とみた まさひろ http://tmtms.hatenablog.com http://twitter.com/tmtms https://github.com/tmtm MySQL 3.21 に日本語charsetを追加 MySQLのRubyバインディング作成
  • 3. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 もっともRTされたツイート
  • 4. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 もっともブクマされたブログ
  • 5. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 長野県北部在住 日本MySQLユーザ会代表 名ばかり代表 「たまには何かしゃべれや (#゚Д゚)ゴルァ!!」 と言われたのでしゃべります
  • 6. 🍣=🍺 Powered by Rabbit 2.1.6 🍣=🍺 問題
  • 7. 🍣=🍺 Powered by Rabbit 2.1.6 MySQL的には🍣と🍺は同じ
  • 8. 🍣=🍺 Powered by Rabbit 2.1.6 ちなみに🍛と💩も(ry
  • 9. 🍣=🍺 Powered by Rabbit 2.1.6 PostgreSQLなら問題ないらしい http://soudai1025.blogspot.jp/2015/03/postgresqlunicode-6.html
  • 10. 🍣=🍺 Powered by Rabbit 2.1.6 何故?
  • 11. 🍣=🍺 Powered by Rabbit 2.1.6 kamipo++ utf8_unicode_ci に対する日本の開発者の見解 http://blog.kamipo.net/entry/2015/03/08/145045 MySQL と Unicode Collation Algorithm (UCA) http://blog.kamipo.net/entry/2015/03/17/103457 MySQL と寿司ビール問題 http://blog.kamipo.net/entry/2015/03/23/093052
  • 12. 🍣=🍺 Powered by Rabbit 2.1.6 MySQLの文字は Charset と Collation がある
  • 13. 🍣=🍺 Powered by Rabbit 2.1.6 Charset
  • 14. 🍣=🍺 Powered by Rabbit 2.1.6 いわゆる文字コード
  • 15. 🍣=🍺 Powered by Rabbit 2.1.6 文字のバイト表現
  • 16. 🍣=🍺 Powered by Rabbit 2.1.6 Charset: utf8mb4 「A」 = 41 「あ」= E3 81 82 「🍣」= F0 9F 8D A3 「🍺」= F0 9F 8D BA
  • 17. 🍣=🍺 Powered by Rabbit 2.1.6 Collation
  • 18. 🍣=🍺 Powered by Rabbit 2.1.6 文字の照合規則・照合順序
  • 19. 🍣=🍺 Powered by Rabbit 2.1.6 Collation 一覧 mysql> show collation; +--------------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | | dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 | | dec8_bin | dec8 | 69 | | Yes | 1 | | cp850_general_ci | cp850 | 4 | Yes | Yes | 1 | | cp850_bin | cp850 | 80 | | Yes | 1 | | hp8_english_ci | hp8 | 6 | Yes | Yes | 1 | | hp8_bin | hp8 | 72 | | Yes | 1 | | koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 | | koi8r_bin | koi8r | 74 | | Yes | 1 | | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | Yes | 1 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | Yes | 1 | | latin1_general_cs | latin1 | 49 | | Yes | 1 |
  • 20. 🍣=🍺 Powered by Rabbit 2.1.6 Charset 毎に Collation がある
  • 21. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4 の Collation 全部で16個 mysql> show collation like 'utf8mb4%'; +------------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +------------------------+---------+-----+---------+----------+---------+ | utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
  • 22. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4 の Collation | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | +------------------------+---------+-----+---------+----------+---------+
  • 23. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4 の Collation utf8mb4_general_ci utf8mb4_bin utf8mb4_unicode_ci utf8mb4_unicode_520_ci utf8mb4_言語_ci (utf8m4_japanese_ci は無い)
  • 24. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_general_ci utf8mb4 charset のデフォルト collation ASCII大文字小文字を区別しない(A=a) 絵文字を区別しない(🍣=🍺)
  • 25. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_bin varchar(99) binary 全文字を区別する(A≠a, 🍣≠🍺) PostgreSQLと同じならこれでいい
  • 26. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_unicode_ci Unicode Collation Algorithm 4.0.0 http://www.unicode.org/reports/tr10/ http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html ASCII大文字小文字を区別しない(A=a) 絵文字を区別しない(🍣=🍺) ひらがな、カタカナ、濁点有無、全角、半 角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
  • 27. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_unicode_520_ci Unicode Collation Algorithm 5.2.0 ASCII大文字小文字を区別しない(A=a) 絵文字を区別する(🍣≠🍺) ひらがな、カタカナ、濁点有無、全角、半 角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
  • 28. 🍣=🍺 Powered by Rabbit 2.1.6 ハハ=パパ=ババ 問題 誰得
  • 29. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_*_ci Collation A : a 🍣 : 🍺 は : ぱ general = = ≠ bin ≠ ≠ ≠ unicode = = = unicode_ 520 = ≠ =
  • 30. 🍣=🍺 Powered by Rabbit 2.1.6 ぼくらが本当に欲しかったもの Collation A : a 🍣 : 🍺 は : ぱ general = = ≠ bin ≠ ≠ ≠ unicode = = = unicode_ 520 = ≠ = japanese = ≠ ≠
  • 31. 🍣=🍺 Powered by Rabbit 2.1.6 だ、だれか utf8mb4_japanese_ci を作って (;´Д`)
  • 32. 🍣=🍺 Powered by Rabbit 2.1.6 おまけ
  • 33. 🍣=🍺 Powered by Rabbit 2.1.6 同じ文字とみなされるかどうかは weight_string() で確かめられる
  • 34. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_general_ci mysql> select hex(weight_string('🍣' collate utf8mb4_general_ci)); +----------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_general_ci)) | +----------------------------------------------------+ | FFFD | +----------------------------------------------------+ mysql> select hex(weight_string('🍺' collate utf8mb4_general_ci)); +----------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_general_ci)) | +----------------------------------------------------+ | FFFD | +----------------------------------------------------+
  • 35. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_unicode_520_ci mysql> select hex(weight_string('🍣' collate utf8mb4_unicode_520_ci)); +--------------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_unicode_520_ci)) | +--------------------------------------------------------+ | FBC3F363 | +--------------------------------------------------------+ mysql> select hex(weight_string('🍺' collate utf8mb4_unicode_520_ci)); +--------------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_unicode_520_ci)) | +--------------------------------------------------------+ | FBC3F37A | +--------------------------------------------------------+
  • 36. 🍣=🍺 Powered by Rabbit 2.1.6 おまけ2
  • 37. 🍣=🍺 Powered by Rabbit 2.1.6 パ と パ utf8_unicode_ci では「パ」=「ハ」=「ハ」 「パ」は一文字、「パ」は二文字 'パ' LIKE 'パ' => 偽 'パ' = 'パ' => 真
  • 38. 🍣=🍺 Powered by Rabbit 2.1.6 = と LIKE は違うらしい Per the SQL standard, LIKE performs matching on a per- character basis, thus it can produce results different from the = comparison operator http://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html#operator_like
  • 39. 🍣=🍺 Powered by Rabbit 2.1.6 おわり