Make your website completely UTF-8 friendly
LAST UPDATED JUNE 15, 2009
Running an Internationalization / Localization [or i18n / L10n] friendly website can be tricky, and sometimes downright maddening for those who haven’t yet delved into the world of Unicode. Allowing your users to post in whichever language and / or characters of their choice to your site is crucial for any modern website.
Here are a few things I have very painfully learned over the last 5 or so years on this topic … specifically with PHP and MySQL.
There are hundreds of character sets representing most of the languages on Earth, usually one per geographic location [Latin, Cyrillic, Greek, Arabic, Korean, Chinese etc…]. One character set that covers all of these is UTF-8
. So how can you put ‘UTF-8
‘ to practical use? Easy … here’s how I’ve done it:
Headers! Get your headers!
The most important area to implement UTF-8
is in your charset
header within your outgoing HTML headers. This tells the browser that you have multi-byte characters in your HTML and you’d like it do display them as such [and not as the default ISO-8859-1
].
To do this, put this at the very top of your PHP scripts [with the headers and before any HTML is echoed]:
<?php
header("Content-Type: text/html; charset=utf-8");
?>
And this in your HTML <head> section:
<?php
echo "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";
?>
MySQL / UTF-8 love
The second most important thing is to make sure your database is also UTF-8
friendly. Be sure to set all your table / column collations [char / text] to utf8_unicode_ci
. This tells MySQL to treat this data as UTF-8.
Once you’ve done that, you’ll need to tell PHP to connect to the MySQL daemon under a UTF-8
connection [otherwise the default is latin1
… and your data will be stored in MySQL as such — no good!]. Run this right after you connect to MySQL:
<?php
mysql_query("SET NAMES 'utf8'");
mysql_query("SET CHARACTER SET utf8");
?>
Multibyte fun
Last, take advantage of PHP’s Multibyte String Functions! Oftentimes this is as easy as prefixing your string comparison functions with mb_
. But, before you start using these functions you’ll need to tell PHP which character set to use [once again!] because the default is ISO-8859-1
:
<?php
mb_internal_encoding("UTF-8");
?>
Forms
One often neglected method is ensuring that the data the server gets is UTF-8 encoded. One way to try and do this with HTML forms is to include the accept-charset
attribute in your form tag. I say “try” because it’s just a suggestion to the client which submits the form. Be aware that some clients may not pay much attention to the attribute, especially older browsers. [Thanks to Alejandro for the heads up :-)]
<form action="/action" method="post" accept-charset="utf-8">
Also see here: www.w3schools.com/TAGS/att_form_accept_charset.asp.
If you’ve gotten this far you should see some dramatic improvements to your web site’s accessibility and usability, drawing in users from around the world.
NOTE: This is a work in progress and I fully welcome any new ideas to this cocktail of methods. If you have anything to add, PLEASE DO SO!
This post is copied on another website without a link back to your original post or any credit to you. I found your original here by Googling the title. Check out:
http://www.virtisys.com/2008/03/23/make-your-website-completely-utf8-friendly/
I have contacted the hosting company for virtisys regarding copyright infringement of my own work (they ripped off one of my posts as well). You might consider doing the same. Or if you don’t care, then please pardon the interruption.
Cheers,
Jeremy
Jeremy
March 24, 2008
Thanks a lot.
million times thanks.
Farshad
January 12, 2009
Great little tut! Thanks!
Carl
February 7, 2009
Thank you
sutenmgmail
May 12, 2009
Thanks for the article. Adding accept-charset=”utf-8″ to all forms may also help to be sure that the browser only sends utf-8 strings.
Alejandro
June 15, 2009
[…] This post was mentioned on Twitter by opendir, opendir. opendir said: UTF-8 kódolással készült site-hoz tippek / utf-8,tipp: http://tinyurl.com/yekqgdf – Néhány apr&oacut […]
Tweets that mention Make your website completely UTF-8 friendly « Adventures in PHP / DHTML / CSS and MySQL -- Topsy.com
February 9, 2010
Gosh, I’ve spent hours trying to figure out how to display UTF-8 chars (chinese, roman, greek, english chars) which are retreived from mysql. And the title of this article is an exact match for what I’m wanting to accomplish. But unfortunately, it doesn’t work. Do standard web fonts display these utf8 characters or does it have to be a unicode font or something else? What else could it be?
ACL surgery recoverer
August 8, 2011
It could potentially be the font. Arial seems to be fairly safe and most OSes have the unicode version of it, I believe. But if you’re using standard fonts then it should work…
PureForm
August 8, 2011
Thank you
Review
October 15, 2011
[…] first. The links below will help. http://us.php.net/manual/en/function.base64-encode.php https://pureform.wordpress.com/2008/03/23/make-your-website-completely-utf-8-friendly/ Related posts:Stop event […]
Stop UTF 8 with javascript
November 9, 2011
THANKYOU, THANKYOU, THANKYOU!!
Been struggling for months and going slowly more and more insane as firefox consistently failed to pick up on the utf-8 character definition and I was at a loss to explain why – the PHP header was the solution and now at last I don’t have to run all my strings through content filters!! Thank-you VERY much =D
rayzorj3ladeuk
April 3, 2012
Glad I could help! :-)
PureForm
April 3, 2012
Hi there, just became aware of your blog through Google, and found that it is really informative. Im going to watch out for brussels. Ill be grateful if you continue this in future. Lots of people will be benefited from your writing. Cheers! decdkgcbafdb
Johnf980
September 16, 2014
Not sure what I’m doing wrong here ?
<?php
header("Content-Type: text/html; charset=utf-8");
echo '
‘;
echo ‘안녕하세요’;
?>
All I get is :
?????
Eddie
April 29, 2015