Skip to content

Blogs

Hibernate Profiler New Feature: Parameters Values

Ayende @ Rahien - 3 hours 46 min ago

One of the annoying things about the Hibernate port of the profiler was that JDBC didn’t provide us with the parameters values.

Eric has just fixed and that is now live:

hibernate-parameters

Enjoy


Categories: Blogs

An Introduction to Data Compression in SQL Server 2008

Aloha DBA - Brad M. McGehee - 5 hours 28 sec ago

This is an excerpt from my free eBook, Brad’s Sure Guide to SQL Server 2008.

image There is one thing every DBA knows with certainty, and that is that databases grow with time. MDFs grow, backups grow, and it never stops. The more data we have, the more work SQL Server has to perform in order to deal with it all; whether it’s executing a query on a table with 10 million rows, or backing up a 5 TB database. Whether we like it or not, we are fighting a losing battle, and DBA’s can’t reverse the information explosion. Or can we?

While we can’t stop growth, SQL Server 2008 (Enterprise Edition only), gives us some new tools to help us better deal with all this data, and that is the promise of compression. Given the right circumstances, DBAs can use data compression to reduce the size of our MDFS, and backup compression can help us reduce the amount of space our backups take. Not only does compression reduce physical file sizes, it reduces disk I/O, which can greatly enhance the performance of many database applications, along with database backups.

When we discuss SQL Server compression, we need to think of it two different ways. First, there is data compression, which includes row-level and page-level compression that occurs within the MDF files of our databases. Second, there is backup compression, which occurs only when data is backed up. While both of these are forms of compression, they are architected differently. Because of this, it is important to treat them separately.

Data Compression comes in two different forms:

  • Row-level Data Compression: Row-level data compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, saving additional space. In turn, more rows can fit into a single data page.
  • Page-level Data Compression: Page-level data compression starts with row-level data compression, then adds two additional compression features: prefix and dictionary compression. We will take a look at what this means a little later in this chapter. As you can imagine, page-level compression offers increased data compression over row-level compression alone.

Backup Compression comes in a single form:

  • Backup Compression: Backup compression does not use row-level or page-level data compression. Instead, backup compression occurs only at the time of a backup, and it uses its own proprietary compression technique. Backup compression can be used when using, or not using, data compression, although using backup compression on a database that is already compressed using data compression may not offer additional benefits.

In the next section, we will take a high-level overview of data compression, and then we will drill down into the detail of the different types of compression available with SQL Server 2008.

Data Compression Overview

Data compression has been around for years. For example, who hasn’t zipped a file at some point in their career? While compression isn’t a new technology, it’s new to SQL Server. Unlike zip compression, SQL Server’s data compression does not automatically compress an entire database; instead, data compression can only be used for these database objects:

  • A table stored as a heap
  • A table stored as a clustered index
  • A non-clustered index
  • An indexed view
  • Partitioned tables and indexes

In other words, as the DBA, you must evaluate each of the above objects in your database, decide if you want to compress it, then decide whether you want to compress it using either row-level or page level compression. Once you have completed this evaluation, then you must turn on compression for that object. There is no single switch you can flip that will turn compression on or off for all the objects listed above, although you could write a Transact-SQL script to accomplish this task.

Fortunately, other than turning compression on or off for the above objects, you don’t have to do anything else to enable compression. You don’t have to re-architect your database or your application, as data compression is entirely handled under the covers by the SQL Server Storage Engine. When data is passed to the Storage Engine, it is compressed and stored in the designated compressed format (on disk and in the Buffer Cache). When the Storage Engine passes the information to another component of SQL Server, then the Storage Engine has to uncompress it. In
other words, every time data has to be passed to or from the Storage Engine, it has to be compressed or uncompressed. While this does take extra CPU overhead to accomplish, in many cases, the amount of disk I/O saved by compression more than makes up for the CPU costs, boosting the overall performance of SQL Server.

Here’s a simplified example. Let’s say that we want to update a row in a table, and that the row we want to update is currently stored on disk in a table that is using row-level data compression. When we execute the UPDATE statement, the Relational Engine (Query Processor) parses, compiles, and optimizes the UPDATE statement, ready to execute it. Before the statement can be executed, the Relational Engine needs the row of data that is currently stored on disk in the compressed format, so the Relational Engine requests the data by asking the Storage Engine to go get it. The Storage Engine (with the help of the SQLOS) goes and gets the compressed data from disk and brings it into the Buffer Cache, where the data continues to remain in its compressed format.

Once the data is in the Buffer Cache, the row is handed off to the Relational Engine from the Storage Engine. During this pass off, the compressed row is uncompressed and given to the Relational Engine to UPDATE. Once the row has been updated, it is then passed back to the Storage Engine, where is it again compressed and stored in the Buffer Cache. At some point, the row will be flushed to disk, where it is stored on disk in its compressed format.

Data compression offers many benefits. Besides the obvious one of reducing the amount of physical disk space required to store data—and the disk I/O needed to write and read it—it also reduces the amount of Buffer Cache memory needed to store data in the Buffer Cache. This in turn allows more data to be stored in the Buffer Cache, reducing the need for SQL Server to access the disk to get data, as the data is now more likely to be in memory than disk, further reducing disk I/O.

Just as data compression offers benefits, so it has some disadvantages. Using compression uses up additional CPU cycles. If your server has plenty to spare, then you have no problem. But if your server is already experiencing a CPU bottleneck, then perhaps compression is better left turned off.

Row‐Level Data Compression

The simplest method of data compression, row-level compression, works by:

  • Reducing the amount of metadata used to store a row.
  • Storing fixed length numeric data types as if they were variable-length data types. For example, if you store the value 1 in a bigint data type, storage will only take 1 byte, not 8 bytes, which the bigint data types normally takes.
  • Storing CHAR data types as variable-length data types. For example, if you have a CHAR (100) data type, and only store 10 characters in it, blank characters are not stored, thus reducing the space needed to the store data.
  • Not storing NULL or 0 values

Row-level data compression offers less compression than page-level data compression, but it also incurs less overhead, reducing the amount of CPU resources required to implement it.

Page Level Data Compression

Page-level data compression offers greater compression, but at the expense of greater CPU utilization. It works using these techniques:

  • It starts out by using row-level data compression to get as many rows as it can on a single page.
  • Next, prefix compression is run. Essentially, repeating patterns of data at the beginning of the values of a given column are removed and substituted with an abbreviated reference that is stored in the compression information (CI) structure that immediately follows the page header of a data page.
  • And last, dictionary compression is used. Dictionary compression searches for repeated values anywhere on a page and stores them in the CI. One of the major differences between prefix and dictionary compression is that prefix compression is restricted to one column, while dictionary compression works anywhere on a data page.

The amount of compression provided by page-level data compression is highly dependent on the data stored in a table or index. If a lot of the data repeats itself, then compression is more efficient. If the data is more random, then little benefits can be gained using page-level compression.

Data Compression Demo

Data compression can be performed using either SQL Server Management Studio (SSMS) or by using Transact-SQL. For this demo, we will take a look at how you can compress a table that uses a clustered index, using SSMS.

Let’s say that we want to compress the Sales.Customer table (which has a clustered index) in the AdventureWorks database. The first step is to right-click on the table in SSMS, select “Storage,” and then select “Manage Compression.”

image 
Figure 1: SSMS can be used to manage compression.

This brings up the Data Compression Wizard, displayed below.

image

Figure 2: The Data Compression Wizard, or Transact-SQL commands, can be used to manage data compression.

After clicking “Next,” the wizard displays the following screen, which allows you not only to select the compression type, but it also allows you to calculate how much space you will save once compression has been turned on.

image

Figure 3: Use this screen to select the compression type, and to calculate how much space will be saved.

First, let’s see how much space we will save if we use row-level compression on this table. To find out, click on the drop-down box below “Compression Type,” select “Row,” and then click “Calculate.”

image

Figure 4: For this table, row-level compression doesn’t offer much compression.

After clicking “Calculate,” the wizard runs and calculates how much space is currently being used, and how much space would be used after row-level compression. As we can see, very little space will be saved, about 1.6%.

Now, let’s see how much compression savings page-level compression offers us for this particular table. Again, I go to the drop-down menu under “Compression Type,” select “Page,” then press “Calculate.”

image 
Figure 5: Page-level compression is higher than row-level compression.

After pressing “Calculate,” we see that compression has improved greatly, now saving about 20% space. At this point, if you should decide to turn on page-level compression for this table, click on the “Next” button.

image

Figure 6: The wizard allows you several options in which to turn on compression.

At the above screen, you can choose to perform the compression now (not a good idea during busy production times because the initial compression process can be very CPU and disk I/O intensive), schedule it to run later, or just to script the Transact-SQL code so you can run it at your convenience.

Once you have compressed this table (a clustered index), keep in mind that any non-clustered indexes that this table may have are not automatically compressed for you. Remember, compression is based on a per object basis. If you want to compress the non-clustered indexes for this table, you will have to compress each one, one at a time.

While this wizard helps you to see how much compression either method offers, it does not suggest which compression method should be used, nor does it recommend whether compression should be used at all for this object. As the DBA, it will be your job to evaluate each compressible object to determine if it should have compression enabled or not. In other words, you must decide if the benefits of compression outweigh the negatives.

Backup Compression

For years, there have been third-party programs that allow you to compress and speed up SQL Server backups. In most regards, the backup compression included with the Enterprise Edition of SQL Server is very plain vanilla. In fact, if you already are using a third-party backup program, I would suggest you continue using it, because SQL Server 2008 backup compression offers fewer features. In fact, the only option SQL Server 2008 backup compression offers you is to turn it off or on. That’s it.

SQL Server 2008 backup compression, like the third-party add-ons, compresses backups, which not only saves backup space, but it can substantially reduce backup times. Unlike data compression, there is very little downside to using backup compression, other than the additional CPU resources required to perform the compression (or decompression during a restore). Assuming that you perform backups during slower times of the day, the additional CPU resources used will not be noticeable.

The time and space savings offered by backup compression depends on the data in your database. If you are heavily using data compression in your databases, or are using Transparent Data Encryption, then using backup compression probably won’t offer you many benefits, as already compressed data, or encrypted data, is not very compressible.

Let’s take a brief look at how you turn on SQL Server 2008 backup compression. While our example will use SSMS, you can use Transact-SQL to perform the same task. To backup AdventureWorks, right-click on the database in SSMS, select “Tasks,” and then select “Back Up,” and the backup dialog box appears.

image

Figure 7: As with any backup, you can use the backup dialog box to make your selections.

Once you have selected your backup options, next click on “Options,” and the following screen appears.

image 

Figure 8: Backup compression options are limited.

At the top of figure 8 are the standard backup options, while at the bottom of the screen you see the options for backup compression. Notice that you only have three choices.

The first option, “Use the default server settings” tells the backup to use the server’s default backup compression setting. In SQL Server 2008, there is a new sp_configure option called “backup compression default.” By default, it is set to have backup compression off. If you want, you can set this option so that backup compression is turned on by default. So if you choose the “Use the default server settings” option above, then whatever option is set for the “backup compression default” will be used for the backup.

The “Compress Backup” option turns backup compression on, and the “Do not compress backup” option turns it off. Both of these options override the “backup compress default” server setting, whatever it happens to be.

Once you have chosen your backup compression method, you proceed with the backup just like any other SQL Server backup.

If you need to restore a compressed backup, you don’t have to do anything special, it will uncompress itself automatically. Although you can only compress backups using the Enterprise Edition of SQL Server 2008, you can restore a compressed backup to any edition of SQL Server 2008. On the other hand, you cannot restore a compressed SQL Server 2008 backup to any previous version of SQL Server.

Summary

In this article, we have learned about the two forms of data compression, and about backup compression. While data compression might seem like a seductive new feature of SQL Server, I highly recommend that it is only used by experienced DBAs. While it offers lots of promise for increased performance, it can just as easily cause performance problems if misused. Backup compression, on the other hand, can be used by DBAs of all skill levels.

Categories: Blogs, SQL Server

Taking Advantage of SQL Server Tools

Aloha DBA - Brad M. McGehee - 5 hours 39 min ago

Reprinted from my editorial in Database Weekly.

An important question I think you should be asking yourself, when it comes to your professional development, is "Are You Taking Full Advantage of the SQL Server Tools Available to You?" I think it’s important enough that, when I make presentations at conferences or user groups, I often add this quote to one of my slides:

"One of the differences between an average DBA and an exceptional DBA is that the exceptional DBA thoroughly understands how to use the available tools to their fullest potential."

I started adding this quote after I discovered just how many DBAs don’t seem to be fully familiar with how SQL Server tools can help them perform their jobs better. How did I find this out? I asked. During many of my presentations, I ask for a show of hands of who feels they are experts at using these tools, and I am always surprised to see how few hands are raised. While clearly not perfect, SQL Server does come with a wide variety of tools which I think all DBAs need to take the time to master, even if the learning curve for mastering some of them is high. Specifically, I recommend DBAs become skilled with the following tools:

  • SQL Server Management Studio (Sure it seems obvious, but there’s a lot of hidden power included in SSMS that many DBAs aren’t familiar with)
  • SSMS Query Editor (especially for analyzing execution plans)
  • Profiler
  • Performance Monitor
  • SQL Server Configuration Manager
  • Database Engine Tuning Advisor
  • DMVs/DMFs
  • SQLCMD
  • Dedicated Administrator Connection (DAC)
  • SQL Server Business Intelligence Development Studio

Some of the above tools are easy to learn, while others are tough to master; equally, some of these tools will be used on a daily basis, while others might only be used occasionally. In any case, I challenge all DBAs to master these tools if they want to become as successful as they possibly can be.

Do you agree that DBAs should master all of the tools I’ve mentioned, or am I asking too much? And which tools have I left out?

Categories: Blogs, SQL Server

Cut the abstractions by putting test hooks

Ayende @ Rahien - 5 hours 58 min ago

I have been hearing about testable code for a long time now. It looks somewhat like this, although I had to cut on the number of interfaces along the way.

We go through a lot of contortions to be able to do something fairly simple, avoid hitting a data store in our tests.

This is actually inaccurate, we are putting in a lot of effort into being able to do that without changing production code. There are even a lot of explanations how testable code is decoupled, and easy to change, etc.

In my experience, one common problem is that we put in too much abstraction in our code. Sometimes it actually serve a purpose, but in many cases, it is just something that we do to enable testing. But we still pay the hit in the design complexity anyway.

We can throw all of that away, and keep only what we need to run production code, but that would mean that we would have harder time with the tests. But we can resolve the issue very easily by making my infrastructure aware of testing, such as this example:

image

But now your production code was changed by tests?!

Yes, it was, so? I never really got the problem people had with this, but at this day and age, putting in the hooks to make testing easier just make sense. Yes, you can go with the “let us add abstractions until we can do it”, but it is much cheaper and faster to go with this approach.

Moreover, notice that this is part of the infrastructure code, which I don’t consider as production code (you don’t touch it very often, although of course it has to be production ready), so I don’t have any issue with this.

Nitpicker corner: Let us skip the whole TypeMock discussion, shall we?

TDD fanatic corner: I don’t really care about testable code, I care about tested code. If I have a regression suite, that is just what i need.

Categories: Blogs

Notes from SQLSaturday #33

It Depends - Andy Warren - Mon, 03/08/2010 - 21:13

I did the relatively quick flight to Charlotte on Friday afternoon, arriving just before 2 pm, grabbed lunch to go from a bbq place in the food court and waited for the hotel shuttle. About a 20 minute wait, not much more than the time it would have taken to go through the rental car process. Off to the Hampton Inn, which was right off the interstate, and in an area with more hotels in one area than almost any place I’ve seen
except Orlando!

Took time to check email, then headed off to the nearby Hyatt where I was to meet the other speakers for the trip to the speaker party. I was really early and had time to chat with Denny Cherry about his recent SAN training (good stuff), and then finally we departed for the party about 5:15 along with his wife – turns out all the over speakers went direct or had not arrived yet, so there was plenty of room.

Next stop was the SQLSentry office and most all the speakers were there, so we spent at least an hour just chatting before going to the actual party. Great dinner, I was between Allen White and Geoff Hiten, and we had a lot of fun. Finally finished up about 10 pm and I rode back with Allen, we usually only see each other once a year so we enjoyed the chance to talk in person more than usual!

Saturday morning I was up early and took the hotel shuttle over to the Microsoft office (about 1/2 mile) and only a few people there when I arrived about 7:10 or so. They finally let us in and it was just a bit disorganized, sponsors not sure where to set up, but that worked itself out and by 7:40 all was flowing smoothly. Blythe Morrow was there so I sat with her for a while to talk about the logistics phase of the event.

The keynote was scheduled for 8:15, but with a steady stream of attendees still arriving we delayed the start, probably to about 8:40. As I mentioned last week the keynote was largely unrehearsed, starting off with an intro from Peter Shire (SQL Sentry/Charlotte Chapter Leader/SQLSaturday leader), then Steve Jones with a quick story about the formation of the SQLSaturday concept (he didn’t think it would work!), and then I ran through six or seven slides, talking about how and why we started and our early goals, some lessons learned, and where we hope it goes under the guidance of PASS. Rushabh Mehta finished up with just a few quick words, focusing on the message that the main goal for PASS right now was to keep things running and not to break anything. We finished up with a ceremonial turning over of the key from Steve & I to Rushabh, a styrofoam key about 3 feet long. Good theater I guess!

My presentation wasn’t until 10:15, so I wandered some, shared some more thoughts with Blythe, and had some more coffee. Finally found my room, signs were mildly confusing, track 5 was in room 4. One interesting part of the day was that because this is a large MS office, they had a lot of people on hand to open what would be otherwise locked doors and make sure attendees stayed in designated areas. Took a lot of people – I’d guess 10-15 between MS and their on site security – kudos to them for providing that level of support.

I did my presentation without any problems to a crowd of about 25, good questions, and from the brief survey I did all were having a good day. For most it was their first SQLSaturday and it seemed to be meeting or exceeding their expectations – always good!

Then back over to the cafeteria area, chatted with David Waugh from Confio, then spent some time with Kevin Kline, that morphed into lunch (sandwiches and chips), and then a couple hours with Blythe where others came and went (Patrick Leblanc, Jessica Moss, Steve Jones, etc), mostly talking about SQLSaturday.

I left a little earlier than planned, around 4 pm, riding to the airport with Steve and lucking out to get an earlier flight than the original 8 pm, leaving us just enough time for a quick dinner before the flights left.

Attendance was somewhere around 220 and it looked to me like all went well. Congrats to the SQLSaturday #33 team for a great event!

Categories: Blogs, SQL Server

Don’t Assume – Per Session Buffers

Ronald Bradford - MySQL Expert - Mon, 03/08/2010 - 19:24

MySQL has a number of global buffers, i.e. your SGA. There are also a number of per session/thread buffers that combined with other memory usage constitutes an unbounded PGA. One of the most common errors in mis-configured MySQL environments is the setting of the 4 primary per session buffers thinking they are global buffers.

Global buffers include:

    The four important per session buffers are:

    I have seen people see these values > 5M. The defaults range from 128K to 256K. My advice for any values above 256K is simple. What proof do you have this works better? When nothing is forthcoming, the first move is to revert to defaults or a maximum of 256K for some benchmarkable results. The primary reason for this is MySQL internally as quoted by Monty Taylor – for values > 256K, it uses mmap() instead of malloc() for memory allocation.

    These are not all the per session buffers you need to be aware of. Others include thread_stack, max_allowed_packet,binlog_cache_size and most importantly max_connections.

    MySQL also uses memory in other areas most noticeably in internal temporary tables and MEMORY based tables.

    As I mentioned, there is no bound for the total process memory allocation for MySQL, so some incorrectly configured variables can easily blow your memory usage.

    References About “Don’t Assume”

    “Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

    For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

    The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL

Free advice on your my.cnf

Ronald Bradford - MySQL Expert - Mon, 03/08/2010 - 18:36

Today, while on IRC in #pentaho I came across a discussion and a published my.cnf. In this configuration I found some grossly incorrect values for per session buffers (see below).

It doesn’t take a MySQL expert to spot the issues, however there is plenty of bad information available on the Internet and developers not knowing MySQL well can easily be mislead. This has spurred me to create a program to rid the world of bad MySQL configuration. While my task is potential infinite, it will enable me to give back and hopefully do a small amount of good. You never know, saving those CPU cycles may save energy and help the planet.

Stay tuned for more details of my program.

[mysqld]
...
sort_buffer_size = 6144K
myisam_sort_buffer_size = 1G
join_buffer_size = 1G
bulk_insert_buffer_size = 1G
read_buffer_size     = 6144K
read_rnd_buffer_size = 6144K
key_buffer_size		= 1024M
max_allowed_packet	= 32M
thread_stack		= 192K
thread_cache_size       = 256

query_cache_limit	= 512M
query_cache_size        = 512M
...
Categories: Blogs, MySQL

Drizzling from the Rackspace Cloud

Eric Day - Mon, 03/08/2010 - 17:57

Since I left Sun back in January, folks have been asking what was next. I’m happy to say that I’m going to continue hacking on open source projects like Drizzle and Gearman, but now at the Rackspace Cloud. Not only will I be there, but I get to continue working closely with a few of the amazing Drizzle hackers who have also joined, including Monty Taylor, Jay Pipes, Stewart Smith, and Lee Bieber.

Why Rackspace Cloud? Late last year I was considering what I wanted to do next with the Oracle acquisition looming near, and this was one of the options that presented itself. Rackspace had been a supporter of Drizzle from early on by offering virtual machines to develop and test on, and when talking to some folks more closely, something really hit home. Rackspace provides first-class service and “fanatical” support – they are not a software company. One might ask why an open source software developer would be interested in a company that doesn’t create software or vice-versa, and the answer is that Rackspace wants to find ways to offer the best possible service now and into the future. What better way than to help develop the next generation of service software and get a jump start into integrating this into their architecture? Both the open source community and Rackspace win.

Another thing I learned while talking with Rackspace is that one of their core principles is transparency. This applies to both customer and employees, and anyone within an open source community can appreciate this. The more I learned about the company and the folks within it, the more impressed I was at the lack of internal barriers or “need-to-know” information. One of Drizzle’s core goals is also transparency, from discussing design decisions on public mailing lists and IRC, to having the entire project management infrastructure hosted out in the open at Launchpad.

What does this mean for the Drizzle project? It means continued support for a number of core developers, more infrastructure for development, and most importantly in my eyes, more context. One of the Drizzle tag-lines is “A Lightweight SQL Database for Cloud and Web,” so what better place to develop a database designed for the cloud than on one of the fastest growing cloud platforms. We’ll get a detailed look at the demands, get feedback from cloud customers, and have the perfect test bed for offering new services. We’ll also be able to work closely with a top-notch group of DBAs, developers, and sysadmins in one of the most demanding service architectures out there. This invaluable context will help the Drizzle developers make more informed decisions moving forward, which also means better software for the community.

Personally, this also means getting back to my hosting roots. Before Sun, I worked at Concentric for almost 10 years in a clustered hosting environment. I’m very familiar with many of the multi-tenant scalability concerns Rackspace has, and I’m excited to be working in this type of environment again. We’ve already been working closely with the MySQL DBAs at Rackspace to learn what the biggest pain points are for a multi-tenant architecture, and we’ll be taking steps to address these as it will help anyone wanting to run Drizzle in a cloud-like environment. Drizzle’s modular architecture has already proved useful, as some of these concerns are easily answered with “oh, we have a plugin point for that.”

I’m excited, this is going to be a fun ride.

Categories: Blogs, MySQL

MySQL is crashing, what do I do?

Ronald Bradford - MySQL Expert - Mon, 03/08/2010 - 17:34

Let me start by saying the majority of environments never experience problems of MySQL crashing. I have seen production environments up for years. On my own server I have seen 575 days of MySQL uptime and the problem was hardware, not MySQL.

However it does occur, and the reasons may be obscure.

Confirming mysqld has crashed

To the unsuspecting, MySQL may indeed be crashing and you never know about it. The reason is because most MySQL installations have two running processes, these are mysqld and mysqld_safe.

ps -ef | grep mysqld
root     28822     1  0 Feb22 ?        00:00:00 /bin/sh bin/mysqld_safe
mysql    28910 28822  0 Feb22 ?        00:30:08 /opt/mysql51/bin/mysqld --basedir=/opt/mysql51 --datadir=/opt/mysql51/data --user=mysql --log-error=/opt/mysql51/log/error.log --pid-file=/opt/mysql51/data/dc1.onegreendog.com.pid

One of the functions of mysqld_safe is to restart mysqld if it fails. Unless you review your mysql error log and for low volume systems you will never know. Hint Have you checked your MySQL error log today?

You can determine quickly via SQL your instance uptime.

mysql> SHOW GLOBAL STATUS LIKE '%uptime%';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| Uptime        | 1033722 |
+---------------+---------+
1 row in set (0.00 sec)

This is the number of seconds since start time. While not easily readable for humans, this is more user friendly display. (NOTE: Works for 5.1+ only)

mysql> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP() - variable_value) AS server_start
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE variable_name='Uptime';
+---------------------+
| server_start        |
+---------------------+
| 2010-02-22 15:22:13 |
+---------------------+
1 row in set (0.07 sec)
Debugging a mysqld core file

When correctly configured, mysqld will generate a core file (See How to crash mysqld intentionally for background information on required settings).

Your first check is to determine if the mysqld binary used has debugging information and symbols stripped. You need this information not stripped for identifying symbol names.

$ file bin/mysqld
bin/mysqld: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0,
dynamically linked (uses shared libs), for GNU/Linux 2.4.0, not stripped

You can use gdb and with a backtrace command (bt) you can see a stack trace of calls. This won’t help the average DBA without C or MySQL internal knowledge greatly, however it’s essential information to get to the bottom of the problem.

In the following example I’m going to use Bug #38508 to intentionally crash my test instance.

mysql> drop table if exists t1,t2;
mysql> create table t1(a bigint);
mysql> create table t2(b tinyint);
mysql> insert into t2 values (null);
mysql> prepare stmt from "select 1 from t1 join  t2 on a xor b where b > 1  and a =1";
mysql> execute stmt;
mysql> execute stmt;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Lost connection is the first sign of a problem. We check the error log to confirm.

$ tail data/`hostname`.err
100306 14:51:49 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=1
max_threads=151
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338301 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x521f160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x401b6100 thread_stack 0x40000
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(my_print_stacktrace+0x2e)[0x8abfbe]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(handle_segfault+0x322)[0x5df252]
/lib64/libpthread.so.0[0x35fb00de80]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN9Item_cond10fix_fieldsEP3THDPP4Item+0x7f)[0x5654ff]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN9Item_cond10fix_fieldsEP3THDPP4Item+0xb8)[0x565538]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z11setup_condsP3THDP10TABLE_LISTS2_PP4Item+0xf6)[0x621f96]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN4JOIN7prepareEPPP4ItemP10TABLE_LISTjS1_jP8st_orderS7_S1_S7_P13st_select_lexP18st_select_lex_unit+0x2db)[0x645f3b]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x7a4)[0x654d24]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_resultm+0x16c)[0x659f9c]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld[0x5ec92a]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z21mysql_execute_commandP3THD+0x602)[0x5efb22]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN18Prepared_statement7executeEP6Stringb+0x3bd)[0x66587d]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_ZN18Prepared_statement12execute_loopEP6StringbPhS2_+0x7c)[0x66874c]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z22mysql_sql_stmt_executeP3THD+0xa7)[0x668c27]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z21mysql_execute_commandP3THD+0x1123)[0x5f0643]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357)[0x5f5047]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe93)[0x5f5ee3]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f67a6]
/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld(handle_one_connection+0x246)[0x5e9146]
/lib64/libpthread.so.0[0x35fb006307]
/lib64/libc.so.6(clone+0x6d)[0x35fa4d1ded]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x5249320 = select 1 from t1 join  t2 on a xor b where b > 1  and a =1
thd->thread_id=1
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
100306 14:51:49 mysqld_safe Number of processes running now: 0
100306 14:51:49 mysqld_safe mysqld restarted
100306 14:51:49 [Note] Plugin 'FEDERATED' is disabled.
100306 14:51:50  InnoDB: Started; log sequence number 0 44233
100306 14:51:50 [Note] Event Scheduler: Loaded 0 events
100306 14:51:50 [Note] /home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld: ready for connections.
Version: '5.1.38'  socket: '/tmp/mysql.sock.3999'  port: 3999  MySQL Community Server (GPL)

Confirming we got the “Writing a core file” line, we can find and use this.

$ find . -name "core*"
./data/core.23290
$ gdb bin/mysqld data/core.23290
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1".

Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
Core was generated by `/home/rbradfor/mysql/mysql-5.1.38-linux-x86_64-glibc23/bin/mysqld --defaults-fi'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000035fb00b142 in pthread_kill () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000035fb00b142 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000005df285 in handle_segfault (sig=11) at mysqld.cc:2552
#2  
#3  0x00000000005654ff in Item_cond::fix_fields (this=0x5249dd0, thd=0x521f160, ref=) at item_cmpfunc.cc:3900
#4  0x0000000000565538 in Item_cond::fix_fields (this=0x52435b8, thd=0x521f160, ref=) at item_cmpfunc.cc:3912
#5  0x0000000000621f96 in setup_conds (thd=0x521f160, tables=, leaves=0x52494d0, conds=0x5244e38) at sql_base.cc:7988
#6  0x0000000000645f3b in JOIN::prepare (this=0x5243770, rref_pointer_array=0x5248a90, tables_init=, wild_num=, conds_init=,
    og_num=, order_init=0x0, group_init=0x0, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x52488c0, unit_arg=0x5248498) at sql_select.cc:412
#7  0x0000000000654d24 in mysql_select (thd=0x521f160, rref_pointer_array=0x5c3fd0, tables=0x4, wild_num=0, fields=@0x52489c8, conds=0x52435b8, og_num=0, order=0x0, group=0x0, having=0x0,
    proc_param=0x0, select_options=0, result=0x52688a0, unit=0x5248498, select_lex=0x52488c0) at sql_select.cc:2377
#8  0x0000000000659f9c in handle_select (thd=0x521f160, lex=0x52483f8, result=0x52688a0, setup_tables_done_option=0) at sql_select.cc:268
#9  0x00000000005ec92a in execute_sqlcom_select (thd=0x521f160, all_tables=0x52494d0) at sql_parse.cc:5011
#10 0x00000000005efb22 in mysql_execute_command (thd=0x521f160) at sql_parse.cc:2206
#11 0x000000000066587d in Prepared_statement::execute (this=0x5245d60, expanded_query=, open_cursor=false) at sql_prepare.cc:3579
#12 0x000000000066874c in Prepared_statement::execute_loop (this=0x5245d60, expanded_query=0x401b43c0, open_cursor=false, packet=, packet_end=)
    at sql_prepare.cc:3253
#13 0x0000000000668c27 in mysql_sql_stmt_execute (thd=) at sql_prepare.cc:2524
#14 0x00000000005f0643 in mysql_execute_command (thd=0x521f160) at sql_parse.cc:2215
#15 0x00000000005f5047 in mysql_parse (thd=0x521f160, inBuf=0x5243520 "execute stmt", length=12, found_semicolon=0x401b6060) at sql_parse.cc:5931
#16 0x00000000005f5ee3 in dispatch_command (command=COM_QUERY, thd=0x521f160, packet=0x525fde1 "execute stmt", packet_length=) at sql_parse.cc:1213
#17 0x00000000005f67a6 in do_command (thd=0x521f160) at sql_parse.cc:854
#18 0x00000000005e9146 in handle_one_connection (arg=dwarf2_read_address: Corrupted DWARF expression.
) at sql_connect.cc:1127
#19 0x00000035fb006307 in start_thread () from /lib64/libpthread.so.0
#20 0x00000035fa4d1ded in clone () from /lib64/libc.so.6
(gdb) quit

You can use gdb to obtain additional information based on the type of information available.

Now what?

Is the problem a bug? Is it data corruption? Is it hardware related?

Gathering the information is the first step in informing you of more detail that will enable you to search, discuss and seek professional advice to address your problem.

References
Categories: Blogs, MySQL

March 1-7 is National Procrastination Week

From An Expert's Guide to Oracle Technology March 1-7 is National Procrastination Week (NPW). Happy NPW! I meant to write about it last week. But I was celebrating it instead. ;-) LewisC
Categories: Blogs, Oracle

It’s Like the MVP Summit

SQL Musings - Steve Jones - Mon, 03/08/2010 - 14:35
I missed the 2010 MVP Summit, which is disappointing. There were a lot of SQL 11 talks and I was looking forward to those. I was also hoping to get together with a lot of MVP friends that I rarely see.
However this weekend in Charlotte, for SQL Saturday #33, it appears that it’s like a mini-summit, and thanks to Peter Shire and SQL Sentry for assembling an amazing lineup. I am sure I am forgetting someone, but look at this lineup:
  • Andy Warren
  • Andy Leonard
  • Jessica Moss
  • Kevin Kline
  • Rushabh Mehta
  • Wayne Snyder
  • Allen White (the roommate I missed at the Summit)
  • Andrew Kelley
  • Aaron Bertrand
  • Tim Ford
  • Denny Cherry
  • Geoff Hiten
  • Ashton Hobbs
  • Alejandro Mesa
  • Mike Walsh
  • Patrick LeBlanc
  • Kendal van Dyke
  • and more
Not all MVPs, but a few probably deserve it.
I think I’m the weakest link of this bunch. This was one of the best lineups I’ve seen at an event, and incredible for a free event. SQL Pros in Charlotte, you should consider yourselves lucky to have the chance to attend this amazing day of training for free.
Categories: Blogs, SQL Server

Lessons learned from building the NHibernate Profiler

Ayende @ Rahien - Mon, 03/08/2010 - 11:00

Last week I gave a talk about the things I learned from building NH Prof.

Skills Matter had recorded the session and made it available.

Looking forward for your comments, but I should disclaimer that this was after a full day of teaching and on 50 min of sleep in the last 48 hours

Categories: Blogs

Slaying relational hydras (or dating them)

Ayende @ Rahien - Mon, 03/08/2010 - 09:46

Sometimes client confidentiality can be really annoying, because the problem sets & suggested solutions that come up are really interesting. That said, since I am interesting in having future clients, it is pretty much a must have. As such, the current post represent a real world customer problem, but probably in a totally different content. In fact, I would be surprised if the customer was able to recognize the problem as his.

That said, the problem is actually quite simple. Consider a dating site, where you can input your details and what you seek, and the site will match you with the appropriate person. I am going to ignore a lot of things here, so if you actually have built a dating site, try not to cringe.

At the most basic level, we have two screens, the My Details screen, where the users can specifies their stats and their preferences:

image

And the results screen, which shows the user the candidate matching their preferences.

There is just one interesting tidbit, the list of qualities is pretty big (hundreds or thousands of potential qualities).

Can you design a relational model that would be a good fit for this? And allow efficient searching?

I gave it some thought, and I can’t think of one, but maybe you can.

I’ll follow up on this post in a day or two, showing how to implement the problem using Raven.

Categories: Blogs

Advanced reporting options for MySQL

Ronald Bradford - MySQL Expert - Mon, 03/08/2010 - 05:31

I’m seeking help from the MySQL community for what tools are used today to generate complex reports for enterprise applications that use MySQL. In an Oracle world, you have Oracle Report Writer, in Microsoft Crystal Reports.

In the open source world there is Jasper Reports, Pentaho Reports and BIRT however I don’t know the power of complex reporting with these.

If anybody has experience using or evaluating these tools please let me know. This may lead to possible work.

Categories: Blogs, MySQL

Geek City: More About Nonclustered Index Keys

Kalen Delaney - Mon, 03/08/2010 - 01:04
I thought I had said almost all that could be said about nonclustered index keys in a post made almost exactly two years ago , on March 16, 2008. But there's more. To get all the benefit from today's post, you'll really have to read that one, but I'll...(read more)
Categories: Blogs, SQL Server

How's your Batch workload doing?

Yup, batch. The stuff that has been around on the mainframe a lot longer than I have. It's the process many of us love to hate and it seems to fit the old adage "can't live with it, can't live without it". So, here's my question.
Categories: Blogs, DB2

What's new this week from Optim 03/05/2010 (a guest blog post)

Well, it's Friday and continuing on with my latest tradition, it's once again time to publish Kathy's guest blog post covering the latest and greatest Optim news... Enjoy!!!!
Categories: Blogs, DB2

Rhino Divan DB – Performance

Ayende @ Rahien - Sun, 03/07/2010 - 11:00

The usual caveat applies, I am running this in process, using small data size and small number of documents.

This isn’t so much as real benchmarks, but they are good enough to give me a rough indication about where i am heading, and whatever or not i am going in completely the wrong direction.

Those two should be obvious:

image image

This one is more interesting, RDB doesn’t do immediate indexing, I chose to accept higher CUD throughput and make indexing a background process. That means that the index may be inconsistent for a short while, but it greatly reduce the amount of work required to insert/update a document.

But, what is that short while in which the document and the DB may be inconsistent. The average time seems to be around 25ms in my tests, with some spikes toward 100 ms in some of the starting results. In general, it looks like things are improving the longer the database run. Trying it out over a 5,000 document size give me an average update duration of 27ms, but note that I am testing the absolute worst usage pattern, lot of small documents inserted one at a time with index requests coming in as well.

image

Be that as it may, having inconsistency period measured in a few milliseconds seems acceptable to me. Especially since RDB is nice enough to actually tell me if there are any inconsistencies in the results, so I can chose whatever to accept them or retry the request.

Categories: Blogs

Migrating MySQL latin1 to utf8 – The process

Ronald Bradford - MySQL Expert - Sat, 03/06/2010 - 20:24

Having covered the preparation and character set options of performing a latin1 to utf8 MySQL migration, just how do you perform the migration correctly.

Example Case

Just to recap, we have the following example table and data.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| â˜ș             |         3 |              3 | latin1     | E298BA                     |
| abc â˜șâ˜č☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
Migration approach 1

The easiest way is to use mysqldump to dump the schema and data, to change the schema definitions and then a client reload process, all with the correct client character sets. Working on our sample table test_latin1 which I have in a conv schema you can do.

$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset conv > conv.sql
$ sed -i -e "s/latin1/utf8/g" conv.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.sql

If you attempt this which technically works (as I've seen a similar example on the Internet) your bound to screw up something. While the mysqldump and mysql load use the correct client command line options, performing a blind conversion of all references of latin1 to utf8 will not only change your table definition, in my example it will change the name of table, and if would change any values of data that contained the word 'latin1'. For example.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| â˜ș             |         3 |              1 | utf8       | E298BA                     |
| abc â˜șâ˜č☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.01 sec)

Be sure to realize now you need to connect with --default-character-set=utf8 or in our example, we use set names as a stop gap measure.

Migration Approach 2

A good practice with using mysqldump regardless of migration is to always dump the schema with the --no-data option, and then dump the data separately with the --no-create-info option. In this case, you can then manually edit the schema file, carefully changing latin1 where appropriate. This is your production data, so some manual hand verification is a good thing. The additional benefit is you can create your schema and verify the syntax is correct before loading any data. While mysqldump creates a single file, due to this dump and reload, you can split your data file and perform some level of parallelism for data loading depending on your hardware capabilities.

$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset --no-data conv > conv.schema.sql
$ mysqldump -uroot -p --default-character-set=latin1 --skip-set-charset --no-create-info conv > conv.data.sql
# For simplicity in this example I'm using a more strict global replacement. NOTE: this syntax depends on the version of mysql
$ sed -i -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g" conv.schema.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.schema.sql
$ mysql -uroot -p --default-character-set=utf8 conv < conv.data.sql
$ mysql -uroot -p --default-character-set=utf8 conv
mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| â˜ș             |         3 |              1 | utf8       | E298BA                     |
| abc â˜șâ˜č☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)
Other software interactions

While I only discuss the MySQL data and mysql client usage here, you should ensure that your programming language connecting to MySQL uses the correct character set management, e.g. PHP uses the default_charset settings in php.ini and even your webserver may benefit from the correct encoding, e.g. Apache httpd uses AddDefaultCharset httpd.conf.

Migration Approach Problems

In our example our migration is successfully, however like life, your production data is unlikely to be perfect. In my next post I will talk about identifying and handling exceptions, including data that was wrongly encoded originally into latin1, or translation such as html entities from rich editor input fields for example.

Categories: Blogs, MySQL

Don’t Assume – Data Integrity

Ronald Bradford - MySQL Expert - Sat, 03/06/2010 - 20:11

MySQL has the same level of data integrity for numbers and strings as Oracle; when MySQL is correctly configured. By default (a reason I wish I knew why it is still the default), MySQL performs silent conversions on boundary conditions of data that will result in your data not always being what is specified. Let’s look at the following examples to demonstrate default behavior.

For numbers
mysql> DROP TABLE IF EXISTS example;
mysql> CREATE TABLE example(i1  TINYINT, i2 TINYINT UNSIGNED, c1 VARCHAR(5));
mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 1 warning (0.08 sec)
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
|    1 | NULL | NULL |
|   -1 | NULL | NULL |
|  100 | NULL | NULL |
|  127 | NULL | NULL |
+------+------+------+
4 rows in set (0.00 sec)

As you can see for one value we inserted 500, yet the value of 127 is stored? For this example I have used the TINYINT numeric data type to demonstrate truncation. TINYINT is a 1 byte integer that stores values from -128 to +127. Unlike Oracle, MySQL has 9 different data types for numeric columns, and using these wisely can improve your database disk footprint, for example BIGINT v INT. Is there a big deal?.

MySQL also has a nice feature for numeric data types, the UNSIGNED attribute that ensures only a positive integer or 0 value. Let’s see what happens with this column.

Unsigned
mysql> TRUNCATE TABLE example;
mysql> INSERT INTO example (i2) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 2 warnings (0.00 sec)
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
| NULL |    1 | NULL |
| NULL |    0 | NULL |
| NULL |  100 | NULL |
| NULL |  255 | NULL |
+------+------+------+
4 rows in set (0.00 sec)

Now you see that -1 and 500 are now not the expected values, and before while 500 was silently truncated to 127, now it’s truncated to 255.

For Strings

As you can now assume, the following also occurs for strings.

mysql> TRUNCATE TABLE example;
mysql> INSERT INTO example (c1) VALUES (NULL),('a'),('abcde'),('xyz12345');
Query OK, 4 rows affected, 1 warning (0.00 sec)
mysql> SELECT * FROM example;
+------+------+-------+
| i1   | i2   | c1    |
+------+------+-------+
| NULL | NULL | NULL  |
| NULL | NULL | a     |
| NULL | NULL | abcde |
| NULL | NULL | xyz12 |
+------+------+-------+
4 rows in set (0.00 sec)
Show warnings

As you can see here, the mysql client shows that warnings occurred, but if you don’t review the warning you would never know, a situation that is rarely reviewed with development in richer programming languages. Let us look at these actual warnings more closely.

mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 1 warning (0.08 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1264 | Out of range value for column 'i1' at row 4 |
+---------+------+---------------------------------------------+

mysql> INSERT INTO example (i2) VALUES (1), (-1), (100), (500);
Query OK, 4 rows affected, 2 warnings (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1264 | Out of range value for column 'i2' at row 2 |
| Warning | 1264 | Out of range value for column 'i2' at row 4 |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)
mysql> INSERT INTO example (c1) VALUES (NULL),('a'),('abcde'),('xyz12345');
Query OK, 4 rows affected, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1265 | Data truncated for column 'c1' at row 4 |
+---------+------+-----------------------------------------+
1 row in set (0.00 sec)
Using sql_mode

The solution is the sql_mode configuration variable and at minimum the value of STRICT_ALL_TABLES defined. We can demonstrate the expected behavior with the following syntax.

mysql> set SESSION sql_mode=STRICT_ALL_TABLES;
Query OK, 0 rows affected (0.00 sec)

mysql> TRUNCATE TABLE example;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO example (i1) VALUES (1), (-1), (100), (500);
ERROR 1264 (22003): Out of range value for column 'i1' at row 4
mysql> SELECT * FROM example;
+------+------+------+
| i1   | i2   | c1   |
+------+------+------+
|    1 | NULL | NULL |
|   -1 | NULL | NULL |
|  100 | NULL | NULL |
+------+------+------+
3 rows in set (0.00 sec)

As you can see, even with an error for a single INSERT statement, some data was actually stored. You should read Don’t Assume – Transactions for some insights here.

When it comes to dates, there is greater complexity and this is grounds for another entry of this series.

References About “Don’t Assume”

“Don’t Assume” is a series of posts to help the Oracle DBA understand, use and appreciate the subtle differences and unique characteristics of the MySQL RDBMS in comparison to Oracle. These points as essential to operate MySQL effectively in a production environment and avoid any loss of data or availability.

For more posts in this series be sure to follow the mysql4oracledba tag and also watch out for MySQL for Oracle DBA presentations.

The MySQLCamp for the Oracle DBA is a series of educational talks all Oracle DBA resources should attend. Two presentations from this series IGNITION and LIFTOFF will be presented at the MySQL Users Conference 2010 in Santa Clara, April 2010 This series also includes JUMPSTART and VELOCITY. If you would like to here these presentations in your area, please contact me.

Categories: Blogs, MySQL