Skip to content

Ronald Bradford - MySQL Expert
Syndicate content
Expert times and information on MySQL
Updated: 8 hours 55 min ago

Don’t Assume – Per Session Buffers

Mon, 03/08/2010 - 19:24

MySQL has a number of global buffers, i.e. your SGA. There are also a number of per session/thread buffers that combined with other memory usage constitutes an unbounded PGA. One of the most common errors in mis-configured MySQL environments is the setting of the 4 primary per session buffers thinking they are global buffers.

Global buffers include:

    The four important per session buffers are:

    I have seen people see these values > 5M. The defaults range from 128K to 256K. My advice for any values above 256K is simple. What proof do you have this works better? When nothing is forthcoming, the first move is to revert to defaults or a maximum of 256K for some benchmarkable results. The primary reason for this is MySQL internally as quoted by Monty Taylor – for values > 256K, it uses mmap() instead of malloc() for memory allocation.

    These are not all the per session buffers you need to be aware of. Others include thread_stack, max_allowed_packet,binlog_cache_size and most importantly max_connections.

    MySQL also uses memory in other areas most noticeably in internal temporary tables and MEMORY based tables.

    As I mentioned, there is no bound for the total process memory allocation for MySQL, so some incorrectly configured variables can easily blow your memory usage.

    References About “Don’t Assume”

    “Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

    For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

    The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL

Free advice on your my.cnf

Mon, 03/08/2010 - 18:36

Today, while on IRC in #pentaho I came across a discussion and a published my.cnf. In this configuration I found some grossly incorrect values for per session buffers (see below).

It doesn’t take a MySQL expert to spot the issues, however there is plenty of bad information available on the Internet and developers not knowing MySQL well can easily be mislead. This has spurred me to create a program to rid the world of bad MySQL configuration. While my task is potential infinite, it will enable me to give back and hopefully do a small amount of good. You never know, saving those CPU cycles may save energy and help the planet.

Stay tuned for more details of my program.

[mysqld]
...
sort_buffer_size = 6144K
myisam_sort_buffer_size = 1G
join_buffer_size = 1G
bulk_insert_buffer_size = 1G
read_buffer_size     = 6144K
read_rnd_buffer_size = 6144K
key_buffer_size		= 1024M
max_allowed_packet	= 32M
thread_stack		= 192K
thread_cache_size       = 256

query_cache_limit	= 512M
query_cache_size        = 512M
...
Categories: Blogs, MySQL

MySQL is crashing, what do I do?

Mon, 03/08/2010 - 17:34

Let me start by saying the majority of environments never experience problems of MySQL crashing. I have seen production environments up for years. On my own server I have seen 575 days of MySQL uptime and the problem was hardware, not MySQL.

However it does occur, and the reasons may be obscure.

Confirming mysqld has crashed

To the unsuspecting, MySQL may indeed be crashing and you never know about it. The reason is because most MySQL installations have two running processes, these are mysqld and mysqld_safe.

ps -ef | grep mysqld
root     28822     1  0 Feb22 ?        00:00:00 /bin/sh bin/mysqld_safe
mysql    28910 28822  0 Feb22 ?        00:30:08 /opt/mysql51/bin/mysqld --basedir=/opt/mysql51 --datadir=/opt/mysql51/data --user=mysql --log-error=/opt/mysql51/log/error.log --pid-file=/opt/mysql51/data/dc1.onegreendog.com.pid

One of the functions of mysqld_safe is to restart mysqld if it fails. Unless you review your mysql error log and for low volume systems you will never know. Hint Have you checked your MySQL error log today?

You can determine quickly via SQL your instance uptime.

mysql> SHOW GLOBAL STATUS LIKE '%uptime%';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| Uptime        | 1033722 |
+---------------+---------+
1 row in set (0.00 sec)

This is the number of seconds since start time. While not easily readable for humans, this is more user friendly display. (NOTE: Works for 5.1+ only)

mysql> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP() - variable_value) AS server_start
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE variable_name='Uptime';
+---------------------+
| server_start        |
+---------------------+
| 2010-02-22 15:22:13 |
+---------------------+
1 row in set (0.07 sec)
Debugging a mysqld core file

When correctly configured, mysqld will generate a core file (See How to crash mysqld intentionally for background information on required settings).

Your first check is to determine if the mysqld binary used has debugging information and symbols stripped. You need this information not stripped for identifying symbol names.

$ file bin/mysqld
bin/mysqld: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0,
dynamically linked (uses shared libs), for GNU/Linux 2.4.0, not stripped

You can use gdb and with a backtrace command (bt) you can see a stack trace of calls. This won’t help the average DBA without C or MySQL internal knowledge greatly, however it’s essential information to get to the bottom of the problem.

In the following example I’m going to use Bug #38508 to intentionally crash my test instance.

mysql> drop table if exists t1,t2;
mysql> create table t1(a bigint);
mysql> create table t2(b tinyint);
mysql> insert into t2 values (null);
mysql> prepare stmt from "select 1 from t1 join  t2 on a xor b where b > 1  and a =1";
mysql> execute stmt;
mysql> execute stmt;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Lost connection is the first sign of a problem. We check the error log to confirm.

$ tail data/`hostname`.err
100306 14:51:49 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=1
max_threads=151
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338301 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x521f160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x401b6100 thread_stack 0x40000
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(my_print_stacktrace+0x2e)[0x8abfbe]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(handle_segfault+0x322)[0x5df252]
/lib64/libpthread.so.0[0x35fb00de80]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN9Item_cond10fix_fieldsEP3THDPP4Item+0x7f)[0x5654ff]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN9Item_cond10fix_fieldsEP3THDPP4Item+0xb8)[0x565538]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z11setup_condsP3THDP10TABLE_LISTS2_PP4Item+0xf6)[0x621f96]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN4JOIN7prepareEPPP4ItemP10TABLE_LISTjS1_jP8st_orderS7_S1_S7_P13st_select_lexP18st_select_lex_unit+0x2db)[0x645f3b]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x7a4)[0x654d24]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_resultm+0x16c)[0x659f9c]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld[0x5ec92a]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z21mysql_execute_commandP3THD+0x602)[0x5efb22]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN18Prepared_statement7executeEP6Stringb+0x3bd)[0x66587d]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN18Prepared_statement12execute_loopEP6StringbPhS2_+0x7c)[0x66874c]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z22mysql_sql_stmt_executeP3THD+0xa7)[0x668c27]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z21mysql_execute_commandP3THD+0x1123)[0x5f0643]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357)[0x5f5047]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe93)[0x5f5ee3]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f67a6]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(handle_one_connection+0x246)[0x5e9146]
/lib64/libpthread.so.0[0x35fb006307]
/lib64/libc.so.6(clone+0x6d)[0x35fa4d1ded]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x5249320 = select 1 from t1 join  t2 on a xor b where b > 1  and a =1
thd->thread_id=1
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
100306 14:51:49 mysqld_safe Number of processes running now: 0
100306 14:51:49 mysqld_safe mysqld restarted
100306 14:51:49 [Note] Plugin 'FEDERATED' is disabled.
100306 14:51:50  InnoDB: Started; log sequence number 0 44233
100306 14:51:50 [Note] Event Scheduler: Loaded 0 events
100306 14:51:50 [Note] /home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld: ready for connections.
Version: '5.1.38'  socket: '/tmp/mysql.sock.3999'  port: 3999  MySQL Community Server (GPL)

Confirming we got the “Writing a core file” line, we can find and use this.

$ find . -name "core*"
./data/core.23290
$ gdb bin/mysqld data/core.23290
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1".

Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
Core was generated by `/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld --defaults-fi'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000035fb00b142 in pthread_kill () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000035fb00b142 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000005df285 in handle_segfault (sig=11) at mysqld.cc:2552
#2  
#3  0x00000000005654ff in Item_cond::fix_fields (this=0x5249dd0, thd=0x521f160, ref=) at item_cmpfunc.cc:3900
#4  0x0000000000565538 in Item_cond::fix_fields (this=0x52435b8, thd=0x521f160, ref=) at item_cmpfunc.cc:3912
#5  0x0000000000621f96 in setup_conds (thd=0x521f160, tables=, leaves=0x52494d0, conds=0x5244e38) at sql_base.cc:7988
#6  0x0000000000645f3b in JOIN::prepare (this=0x5243770, rref_pointer_array=0x5248a90, tables_init=, wild_num=, conds_init=,
    og_num=, order_init=0x0, group_init=0x0, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x52488c0, unit_arg=0x5248498) at sql_select.cc:412
#7  0x0000000000654d24 in mysql_select (thd=0x521f160, rref_pointer_array=0x5c3fd0, tables=0x4, wild_num=0, fields=@0x52489c8, conds=0x52435b8, og_num=0, order=0x0, group=0x0, having=0x0,
    proc_param=0x0, select_options=0, result=0x52688a0, unit=0x5248498, select_lex=0x52488c0) at sql_select.cc:2377
#8  0x0000000000659f9c in handle_select (thd=0x521f160, lex=0x52483f8, result=0x52688a0, setup_tables_done_option=0) at sql_select.cc:268
#9  0x00000000005ec92a in execute_sqlcom_select (thd=0x521f160, all_tables=0x52494d0) at sql_parse.cc:5011
#10 0x00000000005efb22 in mysql_execute_command (thd=0x521f160) at sql_parse.cc:2206
#11 0x000000000066587d in Prepared_statement::execute (this=0x5245d60, expanded_query=, open_cursor=false) at sql_prepare.cc:3579
#12 0x000000000066874c in Prepared_statement::execute_loop (this=0x5245d60, expanded_query=0x401b43c0, open_cursor=false, packet=, packet_end=)
    at sql_prepare.cc:3253
#13 0x0000000000668c27 in mysql_sql_stmt_execute (thd=) at sql_prepare.cc:2524
#14 0x00000000005f0643 in mysql_execute_command (thd=0x521f160) at sql_parse.cc:2215
#15 0x00000000005f5047 in mysql_parse (thd=0x521f160, inBuf=0x5243520 "execute stmt", length=12, found_semicolon=0x401b6060) at sql_parse.cc:5931
#16 0x00000000005f5ee3 in dispatch_command (command=COM_QUERY, thd=0x521f160, packet=0x525fde1 "execute stmt", packet_length=) at sql_parse.cc:1213
#17 0x00000000005f67a6 in do_command (thd=0x521f160) at sql_parse.cc:854
#18 0x00000000005e9146 in handle_one_connection (arg=dwarf2_read_address: Corrupted DWARF expression.
) at sql_connect.cc:1127
#19 0x00000035fb006307 in start_thread () from /lib64/libpthread.so.0
#20 0x00000035fa4d1ded in clone () from /lib64/libc.so.6
(gdb) quit

You can use gdb to obtain additional information based on the type of information available.

Now what?

Is the problem a bug? Is it data corruption? Is it hardware related?

Gathering the information is the first step in informing you of more detail that will enable you to search, discuss and seek professional advice to address your problem.

References
Categories: Blogs, MySQL

Advanced reporting options for MySQL

Mon, 03/08/2010 - 05:31

I’m seeking help from the MySQL community for what tools are used today to generate complex reports for enterprise applications that use MySQL. In an Oracle world, you have Oracle Report Writer, in Microsoft Crystal Reports.

In the open source world there is Jasper Reports, Pentaho Reports and BIRT however I don’t know the power of complex reporting with these.

If anybody has experience using or evaluating these tools please let me know. This may lead to possible work.

Categories: Blogs, MySQL

Migrating MySQL latin1 to utf8 – The process

Sat, 03/06/2010 - 20:24

Having covered the preparation and character set options of performing a latin1 to utf8 MySQL migration, just how do you perform the migration correctly.

Example Case

Just to recap, we have the following example table and data.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| ☺             |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
Migration approach 1

The easiest way is to use mysqldump to dump the schema and data, to change the schema definitions and then a client reload process, all with the correct client character sets. Working on our sample table test_latin1 which I have in a conv schema you can do.

$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset conv > conv.sql
$ sed -i -e "s/latin1/utf8/g" conv.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.sql

If you attempt this which technically works (as I've seen a similar example on the Internet) your bound to screw up something. While the mysqldump and mysql load use the correct client command line options, performing a blind conversion of all references of latin1 to utf8 will not only change your table definition, in my example it will change the name of table, and if would change any values of data that contained the word 'latin1'. For example.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺             |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.01 sec)

Be sure to realize now you need to connect with --default-character-set=utf8 or in our example, we use set names as a stop gap measure.

Migration Approach 2

A good practice with using mysqldump regardless of migration is to always dump the schema with the --no-data option, and then dump the data separately with the --no-create-info option. In this case, you can then manually edit the schema file, carefully changing latin1 where appropriate. This is your production data, so some manual hand verification is a good thing. The additional benefit is you can create your schema and verify the syntax is correct before loading any data. While mysqldump creates a single file, due to this dump and reload, you can split your data file and perform some level of parallelism for data loading depending on your hardware capabilities.

$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset --no-data conv > conv.schema.sql
$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset --no-create-info conv > conv.data.sql
# For simplicity in this example I'm using a more strict global replacement. NOTE: this syntax depends on the version of mysql
$ sed -i -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g" conv.schema.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.schema.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.data.sql
$ mysql -uroot -p --default-character-set=utf8 conv
mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺             |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)
Other software interactions

While I only discuss the MySQL data and mysql client usage here, you should ensure that your programming language connecting to MySQL uses the correct character set management, e.g. PHP uses the default_charset settings in php.ini and even your webserver may benefit from the correct encoding, e.g. Apache httpd uses AddDefaultCharset httpd.conf.

Migration Approach Problems

In our example our migration is successfully, however like life, your production data is unlikely to be perfect. In my next post I will talk about identifying and handling exceptions, including data that was wrongly encoded originally into latin1, or translation such as html entities from rich editor input fields for example.

Categories: Blogs, MySQL

Don’t Assume – Data Integrity

Sat, 03/06/2010 - 20:11

MySQL has the same level of data integrity for numbers and strings as Oracle; when MySQL is correctly configured. By default (a reason I wish I knew why it is still the default), MySQL performs silent conversions on boundary conditions of data that will result in your data not always being what is specified. Let’s look at the following examples to demonstrate default behavior.

For numbers
mysql> DROP TABLE IF EXISTS example;
mysql> CREATE TABLE example(i1  TINYINT, i2 TINYINT UNSIGNED, c1 VARCHAR(5));
mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 1 warning (0.08 sec)
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
|    1 | NULL | NULL |
|   -1 | NULL | NULL |
|  100 | NULL | NULL |
|  127 | NULL | NULL |
+------+------+------+
4 rows in set (0.00 sec)

As you can see for one value we inserted 500, yet the value of 127 is stored? For this example I have used the TINYINT numeric data type to demonstrate truncation. TINYINT is a 1 byte integer that stores values from -128 to +127. Unlike Oracle, MySQL has 9 different data types for numeric columns, and using these wisely can improve your database disk footprint, for example BIGINT v INT. Is there a big deal?.

MySQL also has a nice feature for numeric data types, the UNSIGNED attribute that ensures only a positive integer or 0 value. Let’s see what happens with this column.

Unsigned
mysql> TRUNCATE TABLE example;
mysql> INSERT INTO example (i2) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 2 warnings (0.00 sec)
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
| NULL |    1 | NULL |
| NULL |    0 | NULL |
| NULL |  100 | NULL |
| NULL |  255 | NULL |
+------+------+------+
4 rows in set (0.00 sec)

Now you see that -1 and 500 are now not the expected values, and before while 500 was silently truncated to 127, now it’s truncated to 255.

For Strings

As you can now assume, the following also occurs for strings.

mysql> TRUNCATE TABLE example;
mysql> INSERT INTO example (c1) VALUES (NULL),('a'),('abcde'),('xyz12345');
Query OK, 4 rows affected, 1 warning (0.00 sec)
mysql> SELECT * FROM example;
+------+------+-------+
| i1   | i2   | c1    |
+------+------+-------+
| NULL | NULL | NULL  |
| NULL | NULL | a     |
| NULL | NULL | abcde |
| NULL | NULL | xyz12 |
+------+------+-------+
4 rows in set (0.00 sec)
Show warnings

As you can see here, the mysql client shows that warnings occurred, but if you don’t review the warning you would never know, a situation that is rarely reviewed with development in richer programming languages. Let us look at these actual warnings more closely.

mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 1 warning (0.08 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1264 | Out of range value for column 'i1' at row 4 |
+---------+------+---------------------------------------------+

mysql> INSERT INTO example (i2) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 2 warnings (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1264 | Out of range value for column 'i2' at row 2 |
| Warning | 1264 | Out of range value for column 'i2' at row 4 |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)
mysql> INSERT INTO example (c1) VALUES (NULL),('a'),('abcde'),('xyz12345');
Query OK, 4 rows affected, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1265 | Data truncated for column 'c1' at row 4 |
+---------+------+-----------------------------------------+
1 row in set (0.00 sec)
Using sql_mode

The solution is the sql_mode configuration variable and at minimum the value of STRICT_ALL_TABLES defined. We can demonstrate the expected behavior with the following syntax.

mysql> set SESSION sql_mode=STRICT_ALL_TABLES;
Query OK, 0 rows affected (0.00 sec)

mysql> TRUNCATE TABLE example;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
ERROR 1264 (22003): Out of range value for column 'i1' at row 4
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
|    1 | NULL | NULL |
|   -1 | NULL | NULL |
|  100 | NULL | NULL |
+------+------+------+
3 rows in set (0.00 sec)

As you can see, even with an error for a single INSERT statement, some data was actually stored. You should read Don’t Assume – Transactions for some insights here.

When it comes to dates, there is greater complexity and this is grounds for another entry of this series.

References About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL

How to crash mysqld intentionally

Fri, 03/05/2010 - 15:27

While some may think I’m daft, I have a legitimate reason for wanting to crash mysqld. However first we need to find a way to crash it.

Great thanks to Alan K, Mark L, Harrison and Hartmut on #mysql-dev for several suggestions and a config option I was unaware of. My investigation even lead to a documentation bug logged as #51739.

My first thought was to find a known bug and if necessary install the correct version to test that. A good one was suggested, Bug #48508 which fails on several versions that I will use to demonstrate with, however the simplest way is to issue kill -11

By default, no core file will be produced which is what I’m seeking but with the right options this is possible. First, the user running mysqld probably has a core file limit size of 0.

$ ulimit -c
0

You can fix this with with ulimit or you can specify this in the [mysqld_safe] section with http://dev.mysql.com/doc/refman/5.1/en/mysqld-safe.html#option_mysqld_safe_core-file-size">core-file-size=unlimited

$ ulimit -c unlimited
$ ulimit -c
unlimited

The option I was not aware of is you also have to also specify core-file in your my.cnf

[mysqld]
core-file

I also for my CentOS 5.4 installation ran the following kernel commands, but this may be unnecessary.

sudo /sbin/sysctl -w kernel.core_pattern="core"
sudo /sbin/sysctl -w fs.suid_dumpable= 1

It is now easy to produce a core file.

$ bin/mysqld_safe &
$ killall -11 mysqld
$ bin/mysqld_safe: line 137:  2656 Segmentation fault      (core dumped) ...
100304 16:46:43 mysqld_safe Number of processes running now: 0
100304 16:46:43 mysqld_safe mysqld restarted
$ find . -name "core*"
./data/core.99999

NOTE: Do no run killall on a multi-instance server. I use this syntax here only for simplicity in presentation. It is best to run ps and kill the appropriate pid.

On a side note, I also tried to produce a core on Mac OS X without success. I’d still like to document that way, so if anybody can assist please ping me.

Categories: Blogs, MySQL

Don’t Assume – Transactions

Thu, 03/04/2010 - 18:01

MySQL by default is a NON transactional database. For the hobbyist (See The Hobbyist and the Professional), startup entrepreneur and website developer this may not appear foreign, however to the seasoned Oracle DBA who has only used Oracle the concept is very foreign.

In MySQL you have to be concerned with two situations that will catch the unprepared out. The first is the default autocommit mode. This is TRUE, i.e. all statements are automatically committed on completion.

mysql> SELECT @@autocommit,TRUE;
+--------------+------+
| @@autocommit | TRUE |
+--------------+------+
|            1 |    1 |
+--------------+------+
1 row in set (0.00 sec)

The second is the storage engine used. Again a foreign term for Oracle DBA’s, a storage engine is a technology that stores and retrieves the underlying data from the MySQL database. MySQL has many different storage engines, each with relative strengths and weaknesses and different features. For the purpose of this discussion it is important to know that engines are either non-transactional or transactional. The default storage engine MyISAM is NON transactional. MySQL provides by default the InnoDB storage engine which is transactional. There are distinct advantages of a non transactional environment which I will not go into at this time.

Having recently written about this in my upcoiming book Expert PHP and MySQL I will demonstrate what happens with both MyISAM and InnoDB.

Non-transactional Tables

To show the difference, Listing 6-7 demonstrates that atomicity is not possible with non-transactional tables. The following tables are used in this example.

DROP TABLE IF EXISTS non_trans_parent;
CREATE TABLE non_trans_parent (
  id   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  val  VARCHAR(10) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (val)
) ENGINE=MyISAM DEFAULT CHARSET latin1;
DROP TABLE IF EXISTS non_trans_child;
CREATE TABLE non_trans_child (
  id        INT UNSIGNED NOT NULL AUTO_INCREMENT,
  parent_id INT UNSIGNED NOT NULL,
  created   TIMESTAMP NOT NULL,
PRIMARY KEY (id),
INDEX (parent_id)
) ENGINE=MyISAM DEFAULT CHARSET latin1;

To test things out, perform a sample transaction that inserts records into these two tables:

START TRANSACTION;
INSERT INTO non_trans_parent(val) VALUES(‘a’);
INSERT INTO non_trans_child(parent_id,created) VALUES(LAST_INSERT_ID(),NOW());
INSERT INTO non_trans_parent (val) VALUES(‘a’);
ERROR 1062 (23000): Duplicate entry ‘a’ for key ‘val’
ROLLBACK;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+--------+------+--------------------------------------------------------------
| Level  | Code | Message
+--------+------+--------------------------------------------------------------
| Warning| 1196 | Some non-transactional changed tables couldn’t be rolled back
+--------+------+--------------------------------------------------------------
SELECT * FROM non_trans_parent;
+----+-----+
| id | val |
+----+-----+
|  1 | a   |
+----+-----+
SELECT * FROM non_trans_child;
+----+-----------+---------------------+
| id | parent_id | created             |
+----+-----------+---------------------+
|  1 |         1 | 2009–09–21 23:44:25 |
+----+-----------+---------------------+

As you can see, data that you would have expected to not exist from the transaction is present.

Transactional Tables

Repeat these SQL statements using the transactional storage engine InnoDB; you will observe the difference between transactional and non transactional processing. The following tables, shown in Listing 6-8, are used in this example.

DROP TABLE IF EXISTS trans_parent;
CREATE TABLE trans_parent (
  id   INT UNSIGNED NOT NULL AUTO_INCREMENT,
  val  VARCHAR(10) NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY (val)
) ENGINE=InnoDB DEFAULT CHARSET latin1;
DROP TABLE IF EXISTS trans_child;
CREATE TABLE trans_child (
  id        INT UNSIGNED NOT NULL AUTO_INCREMENT,
  parent_id INT UNSIGNED NOT NULL,
  created   TIMESTAMP NOT NULL,
PRIMARY KEY (id),
INDEX (parent_id)
) ENGINE=InnoDB DEFAULT CHARSET latin1;

Perform a sample transaction that inserts records into these two tables:

START TRANSACTION;
INSERT INTO trans_parent (val) VALUES(‘a’);
INSERT INTO trans_child (parent_id,created) VALUES(LAST_INSERT_ID(),NOW());
INSERT INTO trans_parent (val) VALUES(‘a’);
ERROR 1062 (23000): Duplicate entry ‘a’ for key ‘val’
ROLLBACK;
Query OK, 0 rows affected (0.01 sec)
SELECT * FROM trans_parent;
Empty set (0.00 sec)
SELECT * FROM trans_child;
Empty set (0.00 sec)

As you can see, no data has been recorded as part of the failing transaction.

About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL

Upcoming book – Expert PHP and MySQL

Thu, 03/04/2010 - 00:09

This month will see the release of the book Expert PHP and MySQL which I was a co-author of. Initially this will be available for purchase in PDF format from the Wrox website and I am hopeful this will be available in print format for the MySQL Users Conference.

More then just your standard PHP and MySQL there is detailed content on technologies including Memcached, Sphinx, Gearman, MySQL UDFs and PHP extensions. We will be posting more information at www.ExpertPhpandMySQL.com. You can download a PDF version of Chapter 1 Techniques Every Expert Programmer Needs to Know.

The book includes the following content:

  1. Techniques Every Expert Programmer Needs to Know
  2. Advanced PHP Constructs
  3. MySQL Drivers and Storage Engines
  4. Improved Performance through Caching
  5. Memcached MySQL
  6. Advanced MySQL
  7. Extending MySQL with User-defined Functions
  8. Writing PHP Extensions
  9. Full Text Search using SPHINX
  10. Multi-tasking in PHP and MySQL
  11. Rewrite Rules
  12. User Authentication with PHP and MySQL
  13. Understanding the INFORMATION_SCHEMA
  14. Security
  15. Service and Command Lines
  16. Optimization and Debugging
Categories: Blogs, MySQL

Don’t Assume – Common Terminology

Wed, 03/03/2010 - 16:36

In Oracle the default transaction isolation is READ_COMMITTED. In MySQL the default is REPEATABLE_READ. Because MySQL also has READ_COMMITTED I have seen in more then one production MySQL environment a transaction isolation of READ_COMMITTED. The explanation and ultimately incorrect assumption is the default in Oracle is READ_COMMITTED so we made that the default in MySQL.

I’m not going to discuss the specific differences of these isolation levels (see reference lines below) except to say it that READ_COMMITTED in Oracle more closely relates to the MySQL default of REPEATABLE_READ and not READ_COMMITTED. Just because the same term for a common feature exists, don’t assume the underlying functionality is the same or that either or both actually conform to the SQL ANSI standard.

While switching your MySQL environment to READ_COMMITTED is possible, there is still conjucture if this actually provides any performance improvement. There are different cases of improving locking contention, in one case Heikki Tuuri the creator of InnoDB suggests READ_COMMITTED may overcome an adjacent range gap locking contention problem while in a tpcc-like benchmark a far greater number of deadlocks were detected.

I will close by stating two facts. When changing the MySQL transaction isolation from the default of REPEATABLE_READ you are using a code path that is less tested and not used as frequently to the millions of default MySQL installations, and you are also required to change the default replication format, again a code path less tested and potential a significant increase in I/O load.

References About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL

Don’t Assume – Session Scope

Tue, 03/02/2010 - 21:37

MySQL system variables and status variables have two scopes. These are GLOBAL and SESSION which are self explanatory.
This is important to realize when altering system variables dynamically. The following example does not produce the expected results.

mysql> USE test;
Database changed
mysql> CREATE TABLE example1(
    -> id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    -> col1 VARCHAR(10) NOT NULL,
    -> PRIMARY KEY(id)
    -> );
Query OK, 0 rows affected (0.29 sec)

mysql> SHOW CREATE TABLE example1\G
*************************** 1. row ***************************
       Table: example1
Create Table: CREATE TABLE `example1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

We see that the table has a default CHARACTER SET of latin1. If you wanted to ensure all tables are created as utf8 you change the appropriate system variable. For example, we change the GLOBAL system variable and re-create the table.

mysql> SHOW GLOBAL VARIABLES like 'char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
| character_sets_dir       | /Users/rbradfor/mysql/mysql-5.1.39-osx10.5-x86/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> SET GLOBAL character_set_server=utf8;
Query OK, 0 rows affected (0.10 sec)
mysql> DROP TABLE example1;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE example1( id INT UNSIGNED NOT NULL AUTO_INCREMENT, col1 VARCHAR(10) NOT NULL, PRIMARY KEY(id) );
Query OK, 0 rows affected (0.12 sec)

mysql> SHOW CREATE TABLE example1\G
*************************** 1. row ***************************
       Table: example1
Create Table: CREATE TABLE `example1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

The table still is latin1. This is because now we have a SESSION scope that differs from the GLOBAL scope as seen in this output.

mysql> SELECT @@GLOBAL.character_set_server,@@SESSION.character_set_server;
+-------------------------------+--------------------------------+
| @@GLOBAL.character_set_server | @@SESSION.character_set_server |
+-------------------------------+--------------------------------+
| utf8                          | latin1                         |
+-------------------------------+--------------------------------+
1 row in set (0.00 sec)

The solution is easy however the trap can be easily overlooked and especially when changing other MySQL system variables.

mysql> SET SESSION character_set_server=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> DROP TABLE example1;Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE example1( id INT UNSIGNED NOT NULL AUTO_INCREMENT, col1 VARCHAR(10) NOT NULL, PRIMARY KEY(id) );
Query OK, 0 rows affected (0.09 sec)

mysql> SHOW CREATE TABLE example1\G
*************************** 1. row ***************************
       Table: example1
Create Table: CREATE TABLE `example1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` varchar(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

By default, and when not specified in SHOW and SET commands the default scope is GLOBAL, however prior to MySQL 5.0.2 the default was SESSION. A note you will find in the 5.0 Reference Manual but not the current GA version 5.1 Reference Manual. See also SHOW STATUS Gotcha written in August 2006.

There are also other gotchas with scope that we will discuss at some other time.

References About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for other MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL

Don’t Assume Series – MySQL for the Oracle DBA

Tue, 03/02/2010 - 19:10

As part of my MySQLCamp for the Oracle DBA series of talks to help the Oracle DBA understand, use and appreciate MySQL I have also developed a series of short interesting posts I have termed “Don’t Assume”. Many of these are re-occurring points during my consulting experiences as I observe Oracle DBA’s using MySQL. I am putting the finishing touches to my MySQL for the Oracle DBA series of talks and I’m excited to highlight some of the subtle differences and unique characteristics of MySQL RDBMS in comparison to Oracle and some extent other products including SQL Server.

Stay tuned for more soon.

I will be presenting at the MySQL Users Conference 2010 in Santa Clara, April 2010 two presentations from this series, IGNITION and LIFTOFF. This series also includes JUMPSTART and VELOCITY.

Categories: Blogs, MySQL

A Cassandra twitter clone

Thu, 02/25/2010 - 05:59

Following my successful Cassandra Cluster setup and having a potential client example to work with running Ruby On Rails (RoR), I came across the following examples in Ruby.

Not being a ruby developer, I thought it was time to investigate further. Starting first on Mac OS X 10.5, I found the first line example of installing cassandra via gem unsuccessful.

$ gem install cassandra
Updating metadata for 1 gems from http://gems.rubyforge.org
.
complete
ERROR:  could not find cassandra locally or in a repository

Some more reading highlights Otherwise, you need to install Java 1.6, Git 1.6, Ruby, and Rubygems in some reasonable way.

In case you didn’t read my earlier posts, Java 6 is installed, but not the default.

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
export PATH=$JAVA_HOME/bin:$PATH

I achieved installing RubyGems via Installing Ruby on Rails on Mac OS X.

$ sudo gem install rubygems-update

Updating metadata for 1 gems from http://gems.rubyforge.org
.
complete
Successfully installed rubygems-update-1.3.6
1 gem installed
Installing ri documentation for rubygems-update-1.3.6...
Installing RDoc documentation for rubygems-update-1.3.6...
Could not find main page README
Could not find main page README
Could not find main page README
Could not find main page README

$ sudo update_rubygems
RubyGems 1.3.6 installed

=== 1.3.6 / 2010-02-17

NOTE:

http://rubygems.org is now the default source for downloading gems.

You may have sources set via ~/.gemrc, so you should replace
http://gems.rubyforge.org with http://rubygems.org

http://gems.rubyforge.org will continue to work for the forseeable future.

New features:

* `gem` commands
  * Added `gem push` and `gem owner` for interacting with modern/Gemcutter
    sources
  * `gem dep` now supports --prerelease.
  * `gem fetch` now supports --prerelease.
  * `gem server` now supports --bind.  Patch #27357 by Bruno Michel.
  * `gem rdoc` no longer overwrites built documentation.  Use --overwrite
    force rebuilding.  Patch #25982 by Akinori MUSHA.
* Captial letters are now allowed in prerelease versions.

Bug fixes:

* Development deps are no longer added to rubygems-update gem so older
  versions can update sucessfully.
* Installer bugs:
  * Prerelease gems can now depend on non-prerelease gems.
  * Development dependencies are ignored unless explicitly needed.  Bug #27608
    by Roger Pack.
* `gem` commands
  * `gem which` now fails if no paths were found.  Adapted patch #27681 by
    Caio Chassot.
  * `gem server` no longer has invalid markup.  Bug #27045 by Eric Young.
  * `gem list` and friends show both prerelease and regular gems when
    --prerelease --all is given
* Gem::Format no longer crashes on empty files.  Bug #27292 by Ian Ragsdale.
* Gem::GemPathSearcher handles nil require_paths. Patch #27334 by Roger Pack.
* Gem::RemoteFetcher no longer copies the file if it is where we want it.
  Patch #27409 by Jakub Šťastný.

Deprecation Notices:

* lib/rubygems/timer.rb has been removed.
* Gem::Dependency#version_requirements is deprecated and will be removed on or
  after August 2010.
* Bulk index update is no longer supported.
* Gem::manage_gems was removed in 1.3.3.
* Time::today was removed in 1.3.3.

------------------------------------------------------------------------------

RubyGems installed the following executables:
	/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/gem

NOTE: This second command took over 60 seconds with no user feedback.

I was then able to successfully install cassandra via ruby’s gem package manager.

$ sudo gem install cassandra
Building native extensions.  This could take a while...
Successfully installed thrift-0.2.0
Successfully installed thrift_client-0.4.0
Successfully installed simple_uuid-0.1.0
Successfully installed cassandra-0.7.5
4 gems installed
Installing ri documentation for thrift-0.2.0...

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known
Installing ri documentation for thrift_client-0.4.0...
Installing ri documentation for simple_uuid-0.1.0...
Installing ri documentation for cassandra-0.7.5...
Installing RDoc documentation for thrift-0.2.0...

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known
Installing RDoc documentation for thrift_client-0.4.0...
Installing RDoc documentation for simple_uuid-0.1.0...
Installing RDoc documentation for cassandra-0.7.5...

My use of cassandra_helper provided the following expected dependency error.

$ cassandra_helper cassandra
Set the CASSANDRA_INCLUDE environment variable to use a non-default cassandra.in.sh and friends.
(in /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5)
You need to install git 1.6 or 1.7

I found instructions to install git at Installing git (OSX) and installed via GUI installer.

I had to include to my current session path to get my Ruby Cassandra installation.

$ export PATH=/usr/local/git/bin:$PATH

$ cassandra_helper cassandra
Set the CASSANDRA_INCLUDE environment variable to use a non-default cassandra.in.sh and friends.
(in /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5)
Checking Cassandra out from git
Initialized empty Git repository in /Users/rbradfor/cassandra/server/.git/
remote: Counting objects: 16715, done.
remote: Compressing objects: 100% (2707/2707), done.
remote: Total 16715 (delta 9946), reused 16011 (delta 9364)
Receiving objects: 100% (16715/16715), 19.22 MiB | 1.15 MiB/s, done.
Resolving deltas: 100% (9946/9946), done.
Updating Cassandra.
Buildfile: build.xml

clean:

BUILD SUCCESSFUL
Total time: 2 seconds
HEAD is now at 298a0e6 check-in debian packaging
Building Cassandra
Buildfile: build.xml

build-subprojects:

init:
    [mkdir] Created dir: /Users/rbradfor/cassandra/server/build/classes
    [mkdir] Created dir: /Users/rbradfor/cassandra/server/build/test/classes
    [mkdir] Created dir: /Users/rbradfor/cassandra/server/src/gen-java

check-gen-cli-grammar:

gen-cli-grammar:
     [echo] Building Grammar /Users/rbradfor/cassandra/server/src/java/org/apache/cassandra/cli/Cli.g  ....

build-project:
     [echo] apache-cassandra-incubating: /Users/rbradfor/cassandra/server/build.xml
    [javac] Compiling 247 source files to /Users/rbradfor/cassandra/server/build/classes
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

build:

BUILD SUCCESSFUL
Total time: 42 seconds
CASSANDRA_HOME: /Users/rbradfor/cassandra/server
CASSANDRA_CONF: /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5/conf
Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5/conf/storage-conf.xml
....

I was then able to complete the example at up and running with cassandra running via the ruby interactive console.

I was also able to fire up the cassandra-cli and see the data added in ruby.

$ bin/cassandra-cli -host localhost
Connected to localhost/9160
cassandra> get Twitter.Statuses['1']
=> (column=user_id, value=5, timestamp=1267072406503471)
=> (column=text, value=Nom nom nom nom nom., timestamp=1267072406503471)
Returned 2 results.
cassandra> get Twitter.UserRelationships['5'];
=> (super_column=user_timeline,
     (column=???!??zvZ+?!, value=1, timestamp=1267072426991872)
     (column=??-?!???C?th?, value=2, timestamp=1267072427019091))
Returned 1 results.

No sure about the data in the second example.

Categories: Blogs, MySQL

Configuring a Cassandra Cluster

Thu, 02/25/2010 - 04:22

Continuing on from Getting started with Cassandra I’m now trying to configure two servers as a cluster. The Getting Started Step 3 was not clear the first time I read it (after writing this is makes sense), so a Google search yielded the second link as Building a Small Cassandra Cluster for Testing and Development. I love finding reference material from people I know, Padraig being a significant contributor to Drizzle.

Here is what I did to create a running Cassandra Cluster.

  • Stop individual Cassandra instances
  • Re-created data and log directories (I did this just to ensure a clean slate)
  • I added to my local hosts file two aliases for my servers (cass01 and cass02). This helped in the following step.
  • Three changes are needed to the default conf/storage-conf.xml file on my first server.
    • Change <ListenAddress> from localhost to cass01
    • Change <ThristAddress> from localhost to cass01
    • Change <Seed> from 127.0.0.1 to cass01
  • On my second server I changed the <ListenAddress> and <ThriftAddress> accordingly to cass02 and made <Seed> cass01
  • Started Cassandra servers and tested successfully using the set …/get Keyspace1.Standard1['jsmith'] example. I was able to connect to both hosts via cassandra-cli and see the results created on just one node. I was able to create data on the second node and view on the first node.

A new command is available to describe your cluster.

$ bin/nodeprobe -host cass01 ring
Address       Status     Load          Range                                      Ring
                                       148029780173059661585165369000220362256
192.168.100.4 Up         0 bytes       59303445267720348277007645348152900920     |<--|
192.168.100.5 Up         0 bytes       148029780173059661585165369000220362256    |-->|

Now with my first introduction successful, time to start using and seeing the true power of using Cassandra.

Categories: Blogs, MySQL

Edward Screven of Oracle to Answer Questions for future of MySQL

Wed, 02/24/2010 - 20:17

For those of you on the O’Reilly MySQL conference list you will no doubt see this email, but for readers here is the important bits.


Oracle Executive Will Speak at O’Reilly MySQL Conference & Expo
Edward Screven to Answer Questions re: Future of MySQL

Sebastopol, CA, February 24, 2010—Wonder about the future of MySQL? Curious about what Oracle plans for the open source database software? Expect answers when Edward Screven, Oracle’s chief corporate architect and leader of the MySQL business, speaks at the O’Reilly MySQL Conference & Expo, scheduled for April 12-15, at the Santa Clara Convention Center and the Hyatt Regency Santa Clara.

Edward Screven reports to CEO Larry Ellison, and he drives technology and architecture decisions across all Oracle products to ensure that product directions are consistent with Oracle’s overall strategy. He’ll discuss the current and future state of MySQL, now part of the Oracle family of products. His presentation will also cover Oracle’s investment in MySQL technology and community, as well as the role that open source in general is playing within heterogeneous customer environments around the world.

I have not found a link yet to provide reference to this.

Categories: Blogs, MySQL

Ineffective concatenated indexes

Wed, 02/24/2010 - 17:20

In MySQL significant performance improvements can be achieved by the correct use of indexes. It is important to understand different MySQL index implementations and one key improvement on indexes defined on single columns is to use multiple column or more commonly known concatenated indexes.

However it’s also possible to define ineffective indexes. This example shows you how to identify a concatenated index that is ineffective.

CREATE TABLE example (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  a  INT UNSIGNED NOT NULL,
  b  INT UNSIGNED NOT NULL,
  c  INT UNSIGNED NOT NULL,
  d  INT UNSIGNED NOT NULL,
  x  VARCHAR(10),
  y  VARCHAR(10),
  z  VARCHAR(10),
PRIMARY KEY (id),
UNIQUE INDEX (a,b,c,d)
) ENGINE=InnoDB;

INSERT INTO example(a,b,c,d) VALUES
(1,0,1,1),(1,0,1,2), (1,0,2,3), (1,0,4,5),
(2,0,2,1),(2,0,2,2), (2,0,2,3), (2,0,2,5),
(3,0,2,1),(3,0,2,3), (3,0,3,3), (3,0,3,5);

And our sample query is

SELECT id,x,y,z
FROM   example
WHERE  a = 2
AND    c = 2
AND    d = 3;

The EXPLAIN plan is

+----+-------------+---------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+---------+------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | example | ref  | a             | a    | 4       | const |    4 | Using where |
+----+-------------+---------+------+---------------+------+---------+-------+------+-------------+

While we are using the index (see the key column), the full benefit of the index is not utilized (see the key_len). 4 indicates the number of bytes used, that is only 1 INT column.

Let’s look at a second example.

SELECT id,x,y,z
FROM   example
WHERE  a = 2
AND    b = 0
AND    c = 2
AND    d = 3;

+----+-------------+---------+-------+---------------+------+---------+-------------------------+------+-------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref                     | rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+-------------------------+------+-------+
|  1 | SIMPLE      | example | const | a             | a    | 16      | const,const,const,const |    1 |       |
+----+-------------+---------+-------+---------------+------+---------+-------------------------+------+-------+

In this example the index is used however the key_len is 16, that is 4 x 4 byte INT columns. This is ideal for this index.

In the above example, the client was using a common data structure however was not using one column, it’s values were all effectively 0. Certain queries were written with this knowledge and instead of specifying the column, they elected to remove it, however the impact of this developer code change was increased load on the database and more inefficient performance.

While this was easily addressed by a code change, an alternative could have been to change the index definition. It was not possible to remove the column due to legacy requirements.

ALTER TABLE example
DROP INDEX a,  ADD UNIQUE INDEX (a,c,d);
mysql> explain SELECT id,x,y,z FROM   example  WHERE  a = 2 AND    c = 2 AND    d = 3;
+----+-------------+---------+-------+---------------+------+---------+-------------------+------+-------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref               | rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+-------------------+------+-------+
|  1 | SIMPLE      | example | const | a             | a    | 12      | const,const,const |    1 |       |
+----+-------------+---------+-------+---------------+------+---------+-------------------+------+-------+

A better solution may be to enforce better integrity on this business rule, that b only contains 0. MySQL does not support check constraint. The closest option would be to use an ENUM data type and a STRICT sql_mode however ENUM is only a string object so this would break other code.

ALTER TABLE example MODIFY b ENUM('0') NOT NULL DEFAULT '0';

Not only is performance impacted in this situation, in other examples of ineffective indexes it’s simply a waste of diskspace to sort additional columns in an index that are never used, and also a waste of memory as the index pages stored in memory contain information that is not used.

Categories: Blogs, MySQL

Getting started with Cassandra

Wed, 02/24/2010 - 00:54

With the motivation from today’s public news on Twitter’s move from MySQL to Cassandra, my own skills desire following in-depth discussions at last November’s Open SQL Camp to consider Cassandra and yesterday’s discussion with a new client on persistent key-value store products, today I download installed and configured for the first time. Not that today’s news was unexpected, if you follow the Twitter Engineering Open Source projects you would have seen Cassandra as well as other products being used or evaluated by Twitter.

So I went from nothing to a working Cassandra node in under 5 minutes. This is what I did.

  1. While I knew this was an Apache project, a Google Search yields for me the 3rd link for the The Apache Cassandra Project at http://incubator.apache.org/cassandra/. Congrats for Cassandra now a top level Apache Project. This url will update soon.
  2. Download Cassandra. Hard to miss with a big green button on home page. Current version is 0.5
  3. I read Getting Started, which is the 3rd top level link on menu after Home and Download. Step 1 is picking a version which I’ve already done, Step 2 is Running a single node.
  4. The Getting Started indicated a problem on Mac OS X for the required minimum Java version. I was installing on Mac OS X 10.5 and CentOS 5.4. I’ve experienced this Java 6 default path issue before. Set my JAVA_HOME and PATH accordingly (after I updated the wiki with correct value)
  5. I extracted the tar file, changed to the directory and took at look at the README.txt file. Yes, I always check this first with any software and relevant because it includes valuable instructions on creating the default data and log directories.
  6. Start with bin/cassandra -f. No problems!
  7. I then followed the instructions from the link in Step 2 with the CassandraCli. This tests and confirms the installation is operational.

Ok, a working environment. I’ve now installed on a second machine and tested however I now need to configure the cluster, and the documentation is not as straightforward. Time to try out Google again.

On a side note, this is one reason why I love Open Source. I followed the instructions online and found a mistake in the Mac OS X path, I simply registered and corrected providing the benefit of my experience for the next reader(s).

You may also like to view future posts including.

Categories: Blogs, MySQL

The correct approach to rolling MySQL logs

Tue, 02/23/2010 - 02:55

I say correct because there are several incorrect approaches to managing MySQL logs. In MySQL you have two important log files, the MySQL error log (configured with –log-error) and the MySQL slow query log (configured with –log-slow-queries or –slow-query-log and –slow-query-log-file which is available from 5.1.29).

The ideal management of these log files is different for each type of file.

The MySQL Error Log

With the error log you want to have one file showing a contiguous history of the server instance. This log provides valuable information over time and you should not discard this. You do NOT want to roll your error log. If for feel the content in the error log is too much, then these are errors you need to be addressing, and archiving after correcting. There are circumstances where error logs are rolled outside of your control.

The first is the FLUSH LOGS command. When this command is run, the error log is renamed to -old and a new log is created. For example:

$ ls -l log/error*
-rw-rw---- 1 mysql root  1733 Feb 22 16:11 error.log

$ mysql -uroot -p -e "FLUSH LOGS"

$ ls -l log/error*
-rw-rw---- 1 mysql mysql    0 Feb 22 18:08 error.log
-rw-rw---- 1 mysql root  1733 Feb 22 16:11 error.log-old

What happens when you run this command again?

$ mysql -uroot -p -e "FLUSH LOGS"

$ ls -l log/error*
-rw-rw---- 1 mysql mysql 0 Feb 22 18:10 log/error.log
-rw-rw---- 1 mysql mysql 0 Feb 22 18:08 log/error.log-old

You have now lost all valuable information in the error log, both the current log and the -old log are ZERO bytes in size.

The second example is the Ubuntu specific MySQL distribution on Ubuntu OS that logs MySQL information to the system error log (i.e. /var/log/syslog). You are then receiving a daily log rotate via the default OS log-rotate settings. You effectively lose information after 7 days. Here is what you will find on a stock Ubuntu server.

$ ls -l /var/log
...
-rw-r----- 1 mysql       adm       0 2008-05-28 20:33 mysql.err
-rw-r----- 1 mysql       adm       0 2008-05-28 20:33 mysql.log
...
-rw-r----- 1 syslog      adm  278480 2010-02-22 20:22 syslog
-rw-r----- 1 syslog      adm  366934 2010-02-22 06:25 syslog.0
-rw-r----- 1 syslog      adm   21025 2010-02-21 06:27 syslog.1.gz
-rw-r----- 1 syslog      adm   18551 2010-02-20 06:47 syslog.2.gz
-rw-r----- 1 syslog      adm   20086 2010-02-19 06:25 syslog.3.gz
-rw-r----- 1 syslog      adm   17135 2010-02-18 06:40 syslog.4.gz
-rw-r----- 1 syslog      adm   19238 2010-02-17 06:32 syslog.5.gz
-rw-r----- 1 syslog      adm   16101 2010-02-16 06:34 syslog.6.gz

You have to troll the syslog files to find any mysql specific information, even that is not possible with one command.

$ grep syslog syslog.0 | grep mysql

$ zcat syslog*gz | grep mysql
Feb 16 22:12:20 db1 mysqld[21769]: 100216 22:12:20 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_40.MYI'; try to repair it
Feb 16 22:12:20 db1 mysqld[21769]: 100216 22:12:20 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_33.MYI'; try to repair it
Feb 16 22:30:15 db1 mysqld[21769]: 100216 22:19:13 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_29.MYI'; try to repair it
Feb 16 22:30:17 db1 mysqld[21769]: 100216 22:19:13 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_5508_44.MYI'; try to repair it
...

I’m lucky I looked, because tomorrow would have been too late to see these errors, as this is the oldest log file.

I place a disclaimer that if you have proper backups in place, these files are retrievable, but that takes time and effort which I consider unnecessary for a database server. The mysql log is one of the most important log files for this type of server, you want this information on the server.

The solution to this problem is easy in Ubuntu, always define the -log-error variable for the [mysqld] and [mysqld_safe] sections to a specific file, e.g. /var/log/mysql/mysql.log

The MySQL slow query log

With the slow query log, you DO want to rotate the log file produced. The best practice is to rotate this file daily. In addition you should both analyze the log file producing a top 5 or top 10 list of slow SQL queries each day. As load may not always predictable, it’s ideal to also analyze the combined logs of the past 7 days for cross reference.

The daily granularity allows you to track your load of slow performing queries more consistently and it also enables you verify more easily the impact of improvements made when they have been deployed.

Care needs to be taken to roll your log filed. Simply moving the log file will not work. For example.

A good tip to help in slow query SQL analysis is to
Comment your SQL (url)

$ mysql -uroot -p -e "select sleep(5)"
$ tail -1 log/slow.log
select sleep(5);

$ ls -l log/slow*
-rw-rw---- 1 mysql mysql 352 Feb 22 15:26 log/slow.log

$ mv log/slow.log log/slow.log.1; touch log/slow.log
$ ls -l log/slow*
-rw-r--r-- 1 root  root    0 Feb 22 15:26 log/slow.log
-rw-rw---- 1 mysql mysql 352 Feb 22 15:26 log/slow.log.1

$ mysql -uroot -p -e "SELECT SLEEP(4)"

$ ls -l log/slow*
-rw-r--r-- 1 root  root    0 Feb 22 15:26 log/slow.log
-rw-rw---- 1 mysql mysql 533 Feb 22 16:01 log/slow.log.1
$ tail -1 log/slow.log.1
SELECT SLEEP(4);

As you can see, the slow log was not written to, but the previous file which has the same inode.

$ rm -f log/slow.log; mv log/slow.log.1 log/slow.log
$ ls -l log/slow.log*
-rw-rw---- 1 mysql mysql 533 Feb 22 16:01 log/slow.log
$ cp log/slow.log log/slow.log.`date +%M`; > log/slow.log
$ mysql -uroot -p -e "SELECT SLEEP(3)"
$ ls -l log/slow.log*
-rw-rw---- 1 mysql mysql 181 Feb 22 16:03 log/slow.log
-rw-r----- 1 root  root  533 Feb 22 16:03 log/slow.log.03
$  tail -1 log/slow.log
SELECT SLEEP(3);

$ cp log/slow.log log/slow.log.`date +%M`; > log/slow.log
$ mysql -uroot -p -e "SELECT SLEEP(6)"
$ ls -l log/slow.log*
-rw-rw---- 1 mysql mysql 181 Feb 22 16:04 log/slow.log
-rw-r----- 1 root  root  533 Feb 22 16:03 log/slow.log.03
-rw-r----- 1 root  root  181 Feb 22 16:04 log/slow.log.04
$ tail -1 log/slow.log
SELECT SLEEP(6);

As you can see by copying and truncating you can perform an effective log rotate manually. Ideally you should config logrotate to manage this log.

More Information
Categories: Blogs, MySQL

What’s your MySQL version?

Mon, 02/22/2010 - 18:10

I’ve heard that the mechanic’s wife always has a car that needs repair or tuneup, the painter’s wife always had walls of peeling paint, you get the picture. What about MySQL DBA’s and their own databases? While I have many versions of MySQL for testing including for example the latest 5.1.44 which I was using for my previous post, what is running on my production server? Let’s see:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.25-rc |
+-----------+

That’s really old. And yes, to prove my point that we can be our own worst enemy, the previous version before 5.1.25 was 5.1.6. Yes, .6 which worked just fine, and never crashed once for my 20+ websites. While I have downloaded onto my production server several versions ready for upgrade including versions 5.1.30, 5.1,38, and 5.4.1 I’ve never actually gone through the upgrade process.

Categories: Blogs, MySQL

Migrating MySQL latin1 to utf8 – Character Set Options

Mon, 02/22/2010 - 17:18

Continuing on from preparation in our MySQL latin1 to utf8 migration let us first understand where MySQL uses character sets. MySQL defines the character set at 4 different levels for the structure of data.

  • Instance
  • Schema
  • Table
  • Column

In MySQL 5.1, the default character set is latin1. If not specified, this is what you will get. For example.

mysql> create table test1(c1 varchar(10) not null);
mysql> show create table test1\G
Create Table: CREATE TABLE `test1` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

If you want all tables in your instance to always be a default of utf8, you can changed the server variable character_set_server. This can be set dynamically.

mysql> set global character_set_server=utf8;
mysql> set session character_set_server=utf8;
mysql> create table test2(c1 varchar(10) not null);
mysql> show create table test2\G
Create Table: CREATE TABLE `test2` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

If you change this dynamically be sure to include the option in your my.cnf to ensure this option is persisted for a mysqld restart.

You can define the default character set for all new tables in a given schema. You specify this when you create the schema.

mysql> set global character_set_server=latin1;
mysql> set session character_set_server=latin1;
mysql> create schema test_ucs2 default character set ucs2;
mysql> use test_ucs2;
mysql> create table test3(c1 varchar(10) not null);
mysql> show create table test3\G
Create Table: CREATE TABLE `test3` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=ucs2

Even though we have a schema default, you can always specify the default character set for a given table which overrides any defaults.

mysql> use test_ucs2;
mysql> create table test4_utf8 (c varchar(10) not null) default charset utf8;
mysql> show create table test4_utf8\G
Create Table: CREATE TABLE `test4_utf8` (
  `c` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

And finally, if you really wanted to be specific you can define the character set on a per column level.

mysql> create table test4_utf8_latin1 (c varchar(10) not null, c2 varchar(20) charset latin1) default charset utf8;
mysql> show create table test4_utf8_latin1\G
*************************** 1. row ***************************
       Table: test4_utf8_latin1
Create Table: CREATE TABLE `test4_utf8_latin1` (
  `c` varchar(10) NOT NULL,
  `c2` varchar(20) CHARACTER SET latin1 DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

With great flexibility comes great responsibility. You should have a defined standard for your application that is simple and easy to understand. I am not a proponent of using utf8 for everything, the primary reason why is memory. As part of my consulting I spend a lot of time with clients that have limited resources, e.g. database servers with 2GB or 4GB of RAM. MySQL stores utf8 efficiently on disk, but when this data is stored in memory for internal usage, it automatically uses 3 bytes, when on disk it may only be 1 byte. You can test this by creating a MEMORY table with latin1 and utf8 examples and comparing the difference in size. Is this a serious problem? Well that depends on many factors such as the number of database connections, persistent or not persistent connections, the size of the results etc. While it’s difficult in MySQL to instrument the memory precisely on a per connection basis, prudence should be a consideration for any physical resources, especially RAM.

Now that we understand what’s possible, how can we change our existing latin1 tables in our preparation example?

We could try a simple ALTER TABLE command.

mysql> alter table test_latin1 default charset utf8;
mysql> show create table test_latin1\G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) CHARACTER SET latin1 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

This does not work, because we are only changing the default storage engine of the table. The underlying columns remain the same. If we were to add a new column, it would default to utf8. We can however achieve what we expected with the CONVERT option.

mysql> alter table test_latin1 convert to character set utf8;
mysql> show create table test_latin1\G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

We look at our data, and it looks great? Are we done with our conversion?

mysql> select * from test_latin1;
+---------------+
| c             |
+---------------+
| a             |
| abc           |
| ☺           |
| abc ☺☹☻ |
+---------------+

The answer is no. While it may look like the data is correct, MySQL also manages character sets for the communication channel. In this case, we are still communicating in latin1. To ensure moving forward in the future we must always communicate in utf8 to ensure we correctly pass utf8 to the database. We can test this with the mysql client, and as you will see our data is still corrupt.

mysql> set names utf8;
mysql> select * from test_latin1;
+------------------------+
| c                      |
+------------------------+
| a                      |
| abc                    |
| ☺                 |
| abc ☺☹☻ |
+------------------------+

mysql> show session variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

While you can see how we could migrate the schema definition, this does not complete our migration. In my next post, I will discuss the various different ways to correctly perform a data migration between latin1 and utf8.

Categories: Blogs, MySQL