Make your website completely UTF-8 friendly

Posted on March 23, 2008. Filed under: mysql+, PHP, UTF-8 | Tags: , , , , |

LAST UPDATED JUNE 15, 2009

Running an Internationalization / Localization [or i18n / L10n] friendly website can be tricky, and sometimes downright maddening for those who haven’t yet delved into the world of Unicode. Allowing your users to post in whichever language and / or characters of their choice to your site is crucial for any modern website.

Here are a few things I have very painfully learned over the last 5 or so years on this topic … specifically with PHP and MySQL.

There are hundreds of character sets representing most of the languages on Earth, usually one per geographic location [Latin, Cyrillic, Greek, Arabic, Korean, Chinese etc…]. One character set that covers all of these is UTF-8. So how can you put ‘UTF-8‘ to practical use? Easy … here’s how I’ve done it:

 

Headers! Get your headers!
The most important area to implement UTF-8 is in your charset header within your outgoing HTML headers. This tells the browser that you have multi-byte characters in your HTML and you’d like it do display them as such [and not as the default ISO-8859-1].
To do this, put this at the very top of your PHP scripts [with the headers and before any HTML is echoed]:


<?php
    header("Content-Type: text/html; charset=utf-8");
?>

And this in your HTML <head> section:


<?php
    echo "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";
?>

 

MySQL / UTF-8 love
The second most important thing is to make sure your database is also UTF-8 friendly. Be sure to set all your table / column collations [char / text] to utf8_unicode_ci. This tells MySQL to treat this data as UTF-8.

Once you’ve done that, you’ll need to tell PHP to connect to the MySQL daemon under a UTF-8 connection [otherwise the default is latin1 … and your data will be stored in MySQL as such — no good!]. Run this right after you connect to MySQL:


<?php
    mysql_query("SET NAMES 'utf8'");
    mysql_query("SET CHARACTER SET utf8");
?>

 

Multibyte fun
Last, take advantage of PHP’s Multibyte String Functions! Oftentimes this is as easy as prefixing your string comparison functions with mb_. But, before you start using these functions you’ll need to tell PHP which character set to use [once again!] because the default is ISO-8859-1:


<?php
    mb_internal_encoding("UTF-8");
?>

 

Forms
One often neglected method is ensuring that the data the server gets is UTF-8 encoded. One way to try and do this with HTML forms is to include the accept-charset attribute in your form tag. I say “try” because it’s just a suggestion to the client which submits the form. Be aware that some clients may not pay much attention to the attribute, especially older browsers. [Thanks to Alejandro for the heads up :-)]


<form action="/action" method="post" accept-charset="utf-8">

Also see here: www.w3schools.com/TAGS/att_form_accept_charset.asp.

If you’ve gotten this far you should see some dramatic improvements to your web site’s accessibility and usability, drawing in users from around the world.

NOTE: This is a work in progress and I fully welcome any new ideas to this cocktail of methods. If you have anything to add, PLEASE DO SO!

Make a Comment

Leave a comment

14 Responses to “Make your website completely UTF-8 friendly”

RSS Feed for Adventures in PHP / DHTML / CSS and MySQL Comments RSS Feed

This post is copied on another website without a link back to your original post or any credit to you. I found your original here by Googling the title. Check out:
http://www.virtisys.com/2008/03/23/make-your-website-completely-utf8-friendly/

I have contacted the hosting company for virtisys regarding copyright infringement of my own work (they ripped off one of my posts as well). You might consider doing the same. Or if you don’t care, then please pardon the interruption.

Cheers,
Jeremy

Thanks a lot.
million times thanks.

Great little tut! Thanks!

Thank you

Thanks for the article. Adding accept-charset=”utf-8″ to all forms may also help to be sure that the browser only sends utf-8 strings.

[…] This post was mentioned on Twitter by opendir, opendir. opendir said: UTF-8 kódolással készült site-hoz tippek / utf-8,tipp: http://tinyurl.com/yekqgdf – Néhány apr&oacut […]

Gosh, I’ve spent hours trying to figure out how to display UTF-8 chars (chinese, roman, greek, english chars) which are retreived from mysql. And the title of this article is an exact match for what I’m wanting to accomplish. But unfortunately, it doesn’t work. Do standard web fonts display these utf8 characters or does it have to be a unicode font or something else? What else could it be?

It could potentially be the font. Arial seems to be fairly safe and most OSes have the unicode version of it, I believe. But if you’re using standard fonts then it should work…

Thank you

THANKYOU, THANKYOU, THANKYOU!!

Been struggling for months and going slowly more and more insane as firefox consistently failed to pick up on the utf-8 character definition and I was at a loss to explain why – the PHP header was the solution and now at last I don’t have to run all my strings through content filters!! Thank-you VERY much =D

Glad I could help! :-)

Hi there, just became aware of your blog through Google, and found that it is really informative. Im going to watch out for brussels. Ill be grateful if you continue this in future. Lots of people will be benefited from your writing. Cheers! decdkgcbafdb

Not sure what I’m doing wrong here ?

<?php

header("Content-Type: text/html; charset=utf-8");

echo '

‘;

echo ‘안녕하세요’;

?>

All I get is :

?????


Where's The Comment Form?

Liked it here?
Why not try sites on the blogroll...