Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Multibyte string handling in PHP with the mbstring extension By Daniel Rhodes of Warp Asylum ( www.warpasylum.co.uk ) As s...
What is mbstring for? <ul><li>Multibyte string handling
Supports many character encodings including unicode
Supports some different national languages *
Character encoding conversion
Some Japanese specific functions / settings </li></ul>
Mbstring is NOT... <ul><li>A magic way to get the internals of the PHP interpreter itself to suddenly operate natively wit...
How to get mbstring <ul><li>Regular (but not “built-in”) extension for PHP
On most PHP servers it's already there so...
...just switch it on!
Present and switched on out-of-the-box in Zend Server (CE and upwards)
If not present then download, but shouldn't need to compile etc </li></ul>
Some key directives for mbstring <ul><li>mbstring.internal_encoding
mbstring.language
See http://php.net/manual/en/mbstring.configuration.php </li></ul>
Easy peasy in Zend Server
Enough now – let's rock and roll! <ul><li>Mbstring gives us multibyte-safe versions of the “core” string handling functions
For example, we all know strlen() …
… So let's have a look at mb_strlen() </li></ul>
mb_strlen()
More mb_strlen()
Even more mb_strlen()
Still rocking and rolling... <ul>Mbstring gives us multibyte-safe versions of the “core” string handling functions <li>For...
… So let's have a look at mb_strpos() </li></ul>
mb_strpos()
More mb_strpos()
Wrapping up and moving on <ul>Mbstring gives us multibyte-safe versions of the “core” string handling functions <li>There ...
BE CAREFUL but you can make calls to strlen() (and etc) automatically call mb_strlen()  - this is the mbstring.func_overlo...
Mbstring specific functions <ul>Let's look at character encodings first <li>mb_detect_encoding()
mb_convert_encoding()
LOTS of supported encodings
Prochain SlideShare
Chargement dans…5
×

Multibyte string handling in PHP

Multibyte string handling in PHP with the mbstring extension

  • Identifiez-vous pour voir les commentaires

Multibyte string handling in PHP

  1. 1. Multibyte string handling in PHP with the mbstring extension By Daniel Rhodes of Warp Asylum ( www.warpasylum.co.uk ) As seen on Zend.com!
  2. 2. What is mbstring for? <ul><li>Multibyte string handling
  3. 3. Supports many character encodings including unicode
  4. 4. Supports some different national languages *
  5. 5. Character encoding conversion
  6. 6. Some Japanese specific functions / settings </li></ul>
  7. 7. Mbstring is NOT... <ul><li>A magic way to get the internals of the PHP interpreter itself to suddenly operate natively with unicode (you'll have to wait and follow the development of PHP itself for that!) </li></ul>
  8. 8. How to get mbstring <ul><li>Regular (but not “built-in”) extension for PHP
  9. 9. On most PHP servers it's already there so...
  10. 10. ...just switch it on!
  11. 11. Present and switched on out-of-the-box in Zend Server (CE and upwards)
  12. 12. If not present then download, but shouldn't need to compile etc </li></ul>
  13. 13. Some key directives for mbstring <ul><li>mbstring.internal_encoding
  14. 14. mbstring.language
  15. 15. See http://php.net/manual/en/mbstring.configuration.php </li></ul>
  16. 16. Easy peasy in Zend Server
  17. 17. Enough now – let's rock and roll! <ul><li>Mbstring gives us multibyte-safe versions of the “core” string handling functions
  18. 18. For example, we all know strlen() …
  19. 19. … So let's have a look at mb_strlen() </li></ul>
  20. 20. mb_strlen()
  21. 21. More mb_strlen()
  22. 22. Even more mb_strlen()
  23. 23. Still rocking and rolling... <ul>Mbstring gives us multibyte-safe versions of the “core” string handling functions <li>For example, we all know strpos() …
  24. 24. … So let's have a look at mb_strpos() </li></ul>
  25. 25. mb_strpos()
  26. 26. More mb_strpos()
  27. 27. Wrapping up and moving on <ul>Mbstring gives us multibyte-safe versions of the “core” string handling functions <li>There are LOTS of these multibyte-safe versions of “core” string handling functions – please have a look
  28. 28. BE CAREFUL but you can make calls to strlen() (and etc) automatically call mb_strlen() - this is the mbstring.func_overload directive </li></ul>
  29. 29. Mbstring specific functions <ul>Let's look at character encodings first <li>mb_detect_encoding()
  30. 30. mb_convert_encoding()
  31. 31. LOTS of supported encodings
  32. 32. ( http://php.net/manual/en/mbstring.supported-encodings.php )
  33. 33. Mbstring.detect_order directive comes into play here </li></ul>
  34. 34. mb_detect_encoding()
  35. 35. mb_detect_order()
  36. 36. More mb_detect_order()
  37. 37. Mbstring specific functions <ul>Still looking at character encodings ... <li>mb_detect_encoding()
  38. 38. mb_convert_encoding()
  39. 39. LOTS of supported encodings
  40. 40. ( http://php.net/manual/en/mbstring.supported-encodings.php )
  41. 41. Mbstring.detect_order directive comes into play here </li></ul>
  42. 42. mb_convert_encoding()
  43. 43. More mb_convert_encoding()
  44. 44. Regular expressions on multibyte strings <ul><li>mb_regex_encoding() but note that supported encodings for regex purposes is actually a SUBSET of supported encodings for mbstring itself!
  45. 45. mb_ereg()
  46. 46. mb_ereg_match()
  47. 47. mb_ereg_replace()
  48. 48. … and many more!
  49. 49. Note: PHP's regular preg_*() functions can also “do” UTF-8 with the /u pattern modifier !! </li></ul>
  50. 50. mb_ereg()
  51. 51. More mb_ereg()
  52. 52. Summary of mbstring functions <ul><li>Directive setting functions
  53. 53. Multibyte versions of regular string functions
  54. 54. Regex functions
  55. 55. Encoding detection / conversion
  56. 56. Japanese specific functions / settings
  57. 57. Other misc stuff </li></ul>
  58. 58. Putting it all together <ul><li>Mbstring gets PHP working with multibyte
  59. 59. BUT...
  60. 60. Don't forget your:
  61. 61. PHP script files (best to have encoding of file same as mbstring.internal_encoding)
  62. 62. Database
  63. 63. Output (ie. Probably HTML)
  64. 64. Input (ie. Form submissions etc) </li></ul>
  65. 65. Multibyting your database <ul><li>Oracle – I'm no expert but look at NCHAR as opposed to CHAR ('N' for 'national language')
  66. 66. PostgreSQL – I'm no expert but IIRC Postgres automagically understands and converts input / output character encodings
  67. 67. MySQL – can choose a “collation” for server, each schema, each table, each column!
  68. 68. MySQL – collation means “charset + sort order” (for example CS means case-sensitive sort order) </li></ul>
  69. 69. More multibyting your database <ul><li>MySQL – easiest to put everything on 'utf8_unicode_ci' or 'utf8_general_ci' (but note that these two collations differ when sorting and doing LIKE etc! See http://forums.mysql.com/read.php?103,187048,188748#msg-188748)
  70. 70. You'll need to do an SQL query of:
  71. 71. SET NAMES utf8 and / or SET CHARACTER SET utf8
  72. 72. After connecting and before reading / writing
  73. 73. (otherwise characters will become garbled) </li></ul>
  74. 74. Multibyting your output HTML <ul><li>For example, for UTF8, we need to output this kind of HTTP header:
  75. 75. Content-Type: &quot;text/html; charset=UTF-8;&quot;
  76. 76. ie. header(&quot;Content-Type: text/html; charset=UTF-8;&quot;);
  77. 77. Possible but less desirable to output as a meta tag in the HTML <head>:
  78. 78. <meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-8;&quot;/>
  79. 79. (or simply <meta charset=”UTF-8”> for HTML5)
  80. 80. Don't forget lang=”xy” or xml:lang=”xy” where needed </li></ul>
  81. 81. Multibyting your input <ul><li>Theoretically possible, but unusual, to have a <form> with a different encoding to its host page
  82. 82. Out-of-the-box, form data on a SJIS host page comes in as SJIS. Form data on an EUC-JP host page comes in as EUC-JP and etc
  83. 83. Or have I just been very lucky?
  84. 84. Look at mbstring.http_input directive if struggling </li></ul>
  85. 85. That's all folks! <ul>I'll leave you with some things to think about: <li>Iconv (a built-in extension) might be better if all you need is to detect / change encodings
  86. 86. Previous examples of preg_match() failing will probably work with the /u patter modifier (to enable UTF-8)
  87. 87. No mb version of trim() or preg_match_all()
  88. 88. Mbstring in action: http://twitter.com/japxlate http://mapanese.info
  89. 89. Questions welcome at daniel.rhodes@warpasylum.co.uk </li></ul>

×