Latin1 encoded tables or databases with UTF8 data stored: convert them to native mysql UTF8 Tables/Database

Submitted by fabio on Thu, 2007-08-02 01:42.

In the years before MySQL 4.1, the first MySQL version which provided a complete and good support to different character encodings, developers who needed their databases to store UTF8 encoded data used different encodings (usually latin1) for table and databases while still storing real UTF8 data inside.

While the DB and tables were, as an example latin1 encoded, inside there were actually UTF8 encoded data.

So.. I had to play with a latin1 encoded database which actually contained UTF8 encoded data inside. As I was converting the data stored in the database to Drupal, I had to convert it to a fully UTF8 based database.

The simple way of simply changing the table/db encoding using an ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8; will not work. MySQL, expecting latin1 data, will convert the data to UTF8. But, as the data is already UTF8, this will generate lot of strange characters (especially on accented letters).

Fixing this one wasn't easy .. and I had to play a bit with Google, the MySQL docs and my mysql client.

Finally I've been able to do it.. The trick is not to convert directly to UTF8. The first step is to convert to a BLOB (or BINARY), then convert the BLOB field to UTF8.
Using this approach mysql will not automatically convert between character encodings and your stored data will not be changed.

Following some php/mysql code which could help you if you need to convert a DB/Tables using the above approach.

/**
 * Converts fields contained into $fields array on the table $table on the db $dbname to blob than to utf8.
 * This is the procedure to follow when convert the field encoding when it's different than utf8 (eg latin1) while it already contains utf8 data inside
*/
function convert_to_utf8($dbname, $tables_fields) {

  foreach($tables_fields as $table => $fields) {
    foreach($fields as $field) {
      // convert each field to BLOB
      myq('ALTER TABLE ' . $dbname . '.' . $table . ' MODIFY ' . $field . ' BLOB');
    }
  }
  myq("ALTER DATABASE $dbname charset=utf8");
  
  foreach($tables_fields as $table => $fields) {
    myq('ALTER TABLE ' . $dbname . '.' . $table . ' charset utf8');
  }
  
  foreach($tables_fields as $table => $fields) {
    foreach($fields as $field) {
      // convert each field to utf8
      myq('ALTER TABLE ' . $dbname . '.' . $table . ' MODIFY ' . $field . ' TEXT CHARACTER SET utf8');
    }
  }
  
  print "converted to utf8 db" . $dbname ."\n";
}

function myq($query) {
  $result = mysql_query($query) or die("error: " . mysql_error() . "\nQuery: " . $query);
  return $result;
}

The above function can be invoked using a syntax similar to the following:

$table_fields = array(
  'blog_entry' => array('body_html', 'title'),
  'blog_entry_comment' => array('comment', 'name'),
  'media_file' => array('comment', 'name'),
  'media_file_comment' => array('comment', 'name'),
  'media_folder' => array('comment', 'name'),
);

convert_to_utf8('database_name', $table_fields);

Based on:
MySQL Documentation: Column Character Set Conversion
Worpress Documentation: Converting Database Character Sets

Thakns a TON for this post!

Submitted by Rahul Dewan (not verified) on Wed, 2011-03-02 12:46.

Thanks a ton for your post. We had been struggling with this issue for a Danish site we've been building.

Our goals was to migrate content from a TYPO3 website with a Latin1 encoded database into a Drupal website with a UTF8 encoded database.

Your clear and articulate description of converting to BLOB and then to utf8_general_ci solved the problem for us.

Thanks very much!

Regards,
Rahul Dewan

THX, you saved my day !!

Submitted by Clorr (not verified) on Sun, 2011-02-20 20:40.

THX, you saved my day !!

Cool! Glad for it!

Submitted by fabio on Sun, 2011-02-20 21:21.

Cool! Glad for it!

help needed...

Submitted by Farzin (not verified) on Thu, 2009-08-20 13:30.

hi , i have saved ur script into a php and ran in on my local server but nothing happend .

how should i open a connection to my db ,
is there a way that i define them some how in your script and the script it self connects to the db and do the rest ??

thanks in advanced

The file should be composed

Submitted by fabio on Thu, 2009-08-20 16:12.

The file should be composed something like:

$hostname = 'localhost';
$username = 'your';
$password = 'yourpass';

$db = mysql_connect($hostname, $username, $password)
  or die("Unable to connect to MySQL");

FUNCTIONS CODE see above

INVOCATIONS

Please note that you have to add the correct tables names to the invocations.

Let me know if this helped.

thanks

Submitted by Farzin (not verified) on Fri, 2009-08-21 13:33.

hi , thank u for replying so fast :d

after continuing my searchs i have found some mysql codes that i used them in phpmyadmin-->sql scripting .
then it converted my db correcttly.

my problem was that i had a database that it had tables that were utf-8 but the data inside was latin1 !!!

and this is how i made it correct.
i have backed up the mydb.sql , replaced utf8_bin with latin1_swedish_ci , then replaced utf8 with latin1 for charset . ( i have replaced all with notepad ++ and not from php or mysql ) then reuploaded the db.

then i used the sql command "table alter" command for every field and table . first changed them to binary (BLOB or relevand fields like varbinary(XXX) .
then changed the tables to utf8 and then changed the fields charset again to utf8 and whole the data became correctly readable :D

but without your script i should continue my works for about 50 tables :( it is nuke evolution.

i am gonna use your script and will notice u if i could use it at last :/? and if i could implement the script correctly.

thanks again.

it works like a charm :D gr8 script

Submitted by Farzin (not verified) on Fri, 2009-08-21 23:09.

hi i have done some changes on the script and it works like a charm , but there is only one small problem on it

as i have read on the word press i think :/?

The string field types need to be converted to their binary field types counterparts.
The list is as follows:

CHAR -> BINARY
VARCHAR -> VARBINARY
TINYTEXT -> TINYBLOB
TEXT -> BLOB
MEDIUMTEXT -> MEDIUMBLOB
LONGTEXT -> LONGBLOB

but your script will change all the defined fields in the second part to text filed at last. i know it is not so much hard to change whloe script for different types of them with different defined fileds.

but i dont have the knowledge to convert your script to an automated script that detects relevant filed types and converts them accordingly :D

its just an idea and i dont know how to do it because i'm a mechanical engineer not a computer programmer.
but as a fan of scripting this idea is not a bad one :D

i have tried so many free or commercial database and mysql converter programs but none could do this job correctly.

thanks for your great help.& best wishes.

and here is my little more completed script

<--start php here 
/*
Please Dont delete any of these comments . 
*/
//define your user name and password here

$hostname = 'localhost';
$username = 'root'; //your user name
$password = 'farzin'; // your password 
$pre= "nuke_"; //your nuke database prefix  , example "nuke_" attention : dont forget to type the "_" if it is used in the prefix.
$dbname="nukeevo21"; //your database name

$db = mysql_connect($hostname, $username, $password)
  or die("Unable to connect to MySQL");
  
//Define Tables and  fields here
$table_fields = array(
  ($pre.'authors') => array('name'),
  ($pre.'banner') => array('alttext'),
 );

convert_to_utf8($dbname, $table_fields);


//and here is the script , do not modify anything after here...
/**
 * Converts fields contained into $fields array on the table $table on the db $dbname to blob and then to utf8.
 * This is the procedure to follow when convert the field encoding when it's different than utf8 (eg latin1) while it already contains utf8 data inside
*/
function convert_to_utf8($dbname, $tables_fields) {

  foreach($tables_fields as $table => $fields) {
    foreach($fields as $field) {
      // convert each field to BLOB
      myq('ALTER TABLE ' . $dbname . '.' . $table . ' MODIFY ' . $field . ' BLOB');
    }
  }
  myq("ALTER DATABASE $dbname charset=utf8");
  
  foreach($tables_fields as $table => $fields) {
    myq('ALTER TABLE ' . $dbname . '.' . $table . ' charset utf8');
  }
  
  foreach($tables_fields as $table => $fields) {
    foreach($fields as $field) {
      // convert each field to utf8
      myq('ALTER TABLE ' . $dbname . '.' . $table . ' MODIFY ' . $field . ' TEXT CHARACTER SET utf8');
    }
  }
  
  print "< div align='center'>< strong>DataBase " . $dbname ." :< /strong>< font color='green'> successfully< /font> converted to utf8 .
  < br>< br>written by < a href='mailto: fvaresano [at] yahoo [dot] it'>Fabio Varesano< /a> < br>
and converted & completed by < a href='mailto: farzin [dot] sb [at] gmail [dot] com'>Farzin sadeghi < /a>< /div>\n";
}

function myq($query) {
  $result = mysql_query($query) or die("error: " . mysql_error() . "\nQuery: " . $query);
  return $result;
}


  
  --end php here>

in the preview i have seen some coding mistakes .

i think this one solves the problem .
sorry admin, please delete my last comment.
i didnt know what does

 could do :D i've learned now b-( :P

what is.....

Submitted by aloha (not verified) on Fri, 2008-08-22 01:26.

hello ,
I am new at this and wonder what to do with above php/mwsql code.
I mean do I have to name for it and aploade it to my server?
actualy how do I have to use it?
pls alittle more explaining would be helpful for me.
Thanks in advance.
Regards

You should put that code

Submitted by fabio on Fri, 2008-08-22 01:35.

You should put that code into a .php file then open that file through your webserver. The code assumes that you already opened a connection to the database.

If you describe what you are trying to do I can help you more.

UTF convertion does not happen

Submitted by Ratheesh (not verified) on Sat, 2008-05-31 13:27.

Hello,
Require your help pls.

I have a table with a column of charset latin1.
I purposely stored utf data into the column.

As expected the data is visible as ????????? due to the latin1 encoding

Now as per your article I converted the column from varchar to varbinary and then back to varchar with utf8 encoding.

But still the data in the column is displayed as ?????????
I expected it to be displayed properly as the charset has been changed now.

Whats the solution for this?

Thanks

well.. If you followed the

Submitted by fabio on Sun, 2008-06-01 19:01.

well.. If you followed the instruction correctly the data should be displayed correctly now. Check that the configuration you have is actually the same I had and that you made each step correctly.

Also check your db connection encoding.

Thanks!

Submitted by Andreas (not verified) on Thu, 2007-08-02 14:09.

Thanks for this informations! Really helpful!

Post new comment

The content of this field is kept private and will not be shown publicly.
If you have a personal or company website insert its address in the form http://www.example.com/ .
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre> <del> <img> <h2> <h3> <h4> <b> <video> <sub> <sup>
  • Lines and paragraphs break automatically.
  • Images can be added to this post.
  • You may use [inline:xx] tags to display uploaded files or images inline.
  • You may insert videos with [video:URL]
  • Each email address will be obfuscated in a human readable fashion or (if JavaScript is enabled) replaced with a spamproof clickable link.

More information about formatting options