Everything you ever wanted to know about mysql, php and encodings, but were afraid to ask! Why is the Cyrillic alphabet displayed in questions on the website? How to properly configure the mysql server to work with Cyrillic? How to change encoding in mysql? How to change encoding in php scripts? Which encoding should I choose? How to convert a database from one encoding to another? These and many similar questions have been raised with enviable persistence again and again in various forums for many years now. In this post, I tried to tell you what needs to be done to prevent such problems from arising and give the most effective advice in case they do arise. If you do not find the answer to your question here, write to me and I will definitely supplement this text taking into account your
Mysql, php and encoding case. Source of problems.
Problems with encodings in Mysql are due to the history of the creation of this program. Since mysql was developed by Europeans, it was natural for them to choose the more convenient latin1 as the main encoding. It’s strange, but to this day most Mysql installations work with this encoding by default, which creates problems for Cyrillic users with adding strings in Russian and Ukrainian to the database - these characters are simply absent in latin1.
Therefore, the first thing you need to do if you have problems with encodings in mysql is to check which encoding is the main encoding for a given mysql installation. There are several ways to check this.
Setting up the mysql server for the required encoding.
If you are a server administrator or you configure your own mysql on a work machine.
Open the mysql.ini configuration file (/etc/mysql.cnf for os linux) and find the following lines.
[mysqld]
default-character-set=encoding_name
character-set-server=encoding_name
init-connect="SET NAMES encoding_name"
skip-character-set-client-handshake
Instead of “encoding_name” you need to substitute the name of the encoding that you will use. For texts in Russian and Ukrainian languages, you can use utf8 or cp1251 (note that encoding names in mysql are written without the usual hyphen!!!). But I would advise using only utf8 - this way you will save yourself a lot of nerves in the future.
If there are no such lines in the configuration file, this means that the database uses by default the encoding that was specified during compilation. Add the encoding settings you need to the config (examples below) and restart mysql.
If you have problems with encoding on a hosting where you do not have administrator rights, then you can check the encoding settings for mysql in another way: establish a connection to mysql (using the mysql console command or phpmyadmin - whichever is more convenient for you) and execute this sql- request: show variables like 'char%'. This query will show you the values of mysql variables that are related to encodings. Most likely, you will see something like this
character_set_client latin1
character_set_connection latin1
character_set_database cp1251
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
I specifically gave above an example of an INCORRECTLY CONFIGURED SERVER!!! Please note that it uses three (!) different encodings in different cases. In such a situation, it will be difficult for a novice web programmer to get the script to work correctly. Try to ensure that all variables are set to work with the same encoding. Then 99% of the problems that are discussed on the forums will simply not arise. It’s not even that important which encoding you choose - the main thing is that it is the same everywhere. But still, try to specify in the settings the encoding that you will actually use to store data.
So, a good option is if the show variables like 'char%' command from the paragraph above will show you a list of identical encodings for each of the variables, and it will be even better if this encoding matches the one you use.
If the mysql encoding is different from yours, do not rush to get upset. You can change any of these variables either globally, for everyone by editing the configs (if you are a server administrator), or only for yourself - with the sql query set character_set_database=utf8 (if you are a user). Such a request will have to be executed from your PHP script immediately after establishing a connection to the mysql server. Below is an example for setting the utf8 encoding from a php script.
mysql_query('SET character_set_database = utf8');
mysql_query('SET NAMES utf8');
The settings for cp1251 are set similarly.
mysql_query('SET character_set_database = cp1251');
mysql_query('SET NAMES cp1251');
As for character_set_database, try to immediately create a database in the required encoding (as an option, send such a request to the hosting technical support), then you will avoid at least one extra request to mysql while the script is running. If successful, then the line with 'character_set_database' from the above code can be removed.
Examples of mysql server settings for correct work with encodings.
With a properly configured server, you will no longer need to make requests from the script to set the correct encoding.
Settings for utf8
[mysqld]
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
init-connect="SET NAMES utf8"
skip-character-set-client-handshake
[mysqldump]
default-character- set=utf8
[client]
default-character-set = utf8
Settings for cp1251
[mysqld]
default-character-set=cp1251
character-set-server=cp1251
collation-server=cp1251_general_ci
init-connect="SET NAMES cp1251"
skip-character -set-client-handshake
[mysqldump]
default-character-set=cp1251
[client]
default-character-set = cp1251
Checks the real encoding in which mysql databases are stored.
If you have configured everything (both the server and the php script) correctly, according to the instructions above, but Russian letters are still not displayed - check whether your strings are actually saved in the encoding that you specified in the settings!!!
A simple way to check is to take a database dump in sql format and open it in a text editor. Sql format is plain text. If your mysql database is in cp1251 encoding, open it in Notepad. If utf8 - in any editor that supports Unicode. Scroll through the file and make sure that all Cyrillic inscriptions are legible and that the sql commands create table and create database that appear in the dump contain the correct names of the mysql encoding (the encoding that you specified in the server settings or in requests from php scripts.
If the encoding is not suitable, make a backup of the database just in case, re-encode the sql dump in any text transcoder, replace the names of the encodings in the file with the correct ones and upload the resulting file to the mysql server. Now everything should be in order with the encodings
Mysql, php and encoding case. Source of problems.
Problems with encodings in Mysql are due to the history of the creation of this program. Since mysql was developed by Europeans, it was natural for them to choose the more convenient latin1 as the main encoding. It’s strange, but to this day most Mysql installations work with this encoding by default, which creates problems for Cyrillic users with adding strings in Russian and Ukrainian to the database - these characters are simply absent in latin1.
Therefore, the first thing you need to do if you have problems with encodings in mysql is to check which encoding is the main encoding for a given mysql installation. There are several ways to check this.
Setting up the mysql server for the required encoding.
If you are a server administrator or you configure your own mysql on a work machine.
Open the mysql.ini configuration file (/etc/mysql.cnf for os linux) and find the following lines.
[mysqld]
default-character-set=encoding_name
character-set-server=encoding_name
init-connect="SET NAMES encoding_name"
skip-character-set-client-handshake
Instead of “encoding_name” you need to substitute the name of the encoding that you will use. For texts in Russian and Ukrainian languages, you can use utf8 or cp1251 (note that encoding names in mysql are written without the usual hyphen!!!). But I would advise using only utf8 - this way you will save yourself a lot of nerves in the future.
If there are no such lines in the configuration file, this means that the database uses by default the encoding that was specified during compilation. Add the encoding settings you need to the config (examples below) and restart mysql.
If you have problems with encoding on a hosting where you do not have administrator rights, then you can check the encoding settings for mysql in another way: establish a connection to mysql (using the mysql console command or phpmyadmin - whichever is more convenient for you) and execute this sql- request: show variables like 'char%'. This query will show you the values of mysql variables that are related to encodings. Most likely, you will see something like this
character_set_client latin1
character_set_connection latin1
character_set_database cp1251
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
I specifically gave above an example of an INCORRECTLY CONFIGURED SERVER!!! Please note that it uses three (!) different encodings in different cases. In such a situation, it will be difficult for a novice web programmer to get the script to work correctly. Try to ensure that all variables are set to work with the same encoding. Then 99% of the problems that are discussed on the forums will simply not arise. It’s not even that important which encoding you choose - the main thing is that it is the same everywhere. But still, try to specify in the settings the encoding that you will actually use to store data.
So, a good option is if the show variables like 'char%' command from the paragraph above will show you a list of identical encodings for each of the variables, and it will be even better if this encoding matches the one you use.
If the mysql encoding is different from yours, do not rush to get upset. You can change any of these variables either globally, for everyone by editing the configs (if you are a server administrator), or only for yourself - with the sql query set character_set_database=utf8 (if you are a user). Such a request will have to be executed from your PHP script immediately after establishing a connection to the mysql server. Below is an example for setting the utf8 encoding from a php script.
mysql_query('SET character_set_database = utf8');
mysql_query('SET NAMES utf8');
The settings for cp1251 are set similarly.
mysql_query('SET character_set_database = cp1251');
mysql_query('SET NAMES cp1251');
As for character_set_database, try to immediately create a database in the required encoding (as an option, send such a request to the hosting technical support), then you will avoid at least one extra request to mysql while the script is running. If successful, then the line with 'character_set_database' from the above code can be removed.
Examples of mysql server settings for correct work with encodings.
With a properly configured server, you will no longer need to make requests from the script to set the correct encoding.
Settings for utf8
[mysqld]
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
init-connect="SET NAMES utf8"
skip-character-set-client-handshake
[mysqldump]
default-character- set=utf8
[client]
default-character-set = utf8
Settings for cp1251
[mysqld]
default-character-set=cp1251
character-set-server=cp1251
collation-server=cp1251_general_ci
init-connect="SET NAMES cp1251"
skip-character -set-client-handshake
[mysqldump]
default-character-set=cp1251
[client]
default-character-set = cp1251
Checks the real encoding in which mysql databases are stored.
If you have configured everything (both the server and the php script) correctly, according to the instructions above, but Russian letters are still not displayed - check whether your strings are actually saved in the encoding that you specified in the settings!!!
A simple way to check is to take a database dump in sql format and open it in a text editor. Sql format is plain text. If your mysql database is in cp1251 encoding, open it in Notepad. If utf8 - in any editor that supports Unicode. Scroll through the file and make sure that all Cyrillic inscriptions are legible and that the sql commands create table and create database that appear in the dump contain the correct names of the mysql encoding (the encoding that you specified in the server settings or in requests from php scripts.
If the encoding is not suitable, make a backup of the database just in case, re-encode the sql dump in any text transcoder, replace the names of the encodings in the file with the correct ones and upload the resulting file to the mysql server. Now everything should be in order with the encodings