Open Source Bridge Database Sessions
Open Source Bridge, the “conference for open source citizens,” is right around the corner! The sessions were just announced and it’s going to be packed with quite a variety of really interesting talks. From open cloud computing topics to hardware hacking to language hacks (like HipHop from Facebook), I’m really looking forward to being there (I’m helping organize the event, but hopefully I’ll have time to attend sessions as well).
I wanted to point out a few of the great database talks:
- State of MariaDB – Monty Widenius
- Cassandra: Strategies for Distributed Data Storage – Kelvin Kakugawa
- Developing Replication Plugins for Drizzle – Padraig O’Sullivan
- Drizzle, Scaling MySQL for the Future – Brian Aker
- SELECT * FROM Internet Using YQL – Jonathan LeBlanc
- Introduction to MongoDB – Michael Dirolf
- Introduction to PostgreSQL – Christophe Pettus, Josh Berkus
- Relational vs. Non-Relational – Josh Berkus
- Stacks of Cache – Duncan Beevers
- CouchApp Evently Guided Hack w/ CouchDB – J Chris Anderson
Beyond the DB talks, I’m also exited for a few other talks around high performance and high availability, from Facebook operations to Rasmus Lerdorf’s talk on making your PHP applications faster. I’ll also take the opportunity to shamelessly plug my own talk on writing high performance multi-core applications. There are also rumors of donut trucks, tesla coils, and scavenger hunts.
You should register to attend today, it’s going to be awesome.
Threads with Events
Last week I was surprised to see this paper bubble back up on Planet MySQL. It describes the pros and cons of thread and event based programming for high concurrency applications (like a web server), arguing that thread-based programming is superior if you use an appropriate lightweight threading implementation. I don’t entirely disagree with this, but the problem is such a library does not exist that is standard, portable, and useful for all types of applications. We have POSIX threads in the portable Linux/Unix/BSD world, so we need to work with this. Other experimental libraries based on lightweight threads or “fibers” are really interesting as they can maintain your stack without all the normal overhead, but it is hard to get the scheduling correct for all application types. I would even argue that thread and event based programming is actually not all that different, it’s just a matter of how state is maintained (stack vs state variables) and how scheduling is performed.
The comparisons done in that paper also put a C-based web server using a co-routine threading library against a Java based server that depends on the poll() system call. I’m sorry, but this is comparing apples to oranges. First, you’re in the Java VM with a number of runtime components (like garbage collection) which may be getting in the way. Also, the standard poll() system call is not an efficient event-handling mechanism, it’s much better to use epoll or some other Kernel-based handling mechanism.
One high-concurrency userland threading implementation I do like is in Erlang. Erlang processes are extremely lightweight and I’ve written apps that depend heavily on them. One interesting application I saw was caching objects where each object got it’s own Erlang process. This put a whole new spin on cache management, and it looked like it could actually scale reasonably well. The “problem” with Erlang, which may or may not be a problem depending on your requirements, is that it is still a bit of overhead running byte-code in a VM, as well as it being a functional language. I love functional programming, but I’ve found it still ties most developer’s heads in knots if they don’t have a reason to use it regularly. For open source projects trying to build a contributor community, it can act as one more hurdle.
So, what is the “best” paradigm?
Back in 2000 some colleagues and I wrote a hybrid thread-event library that would create one event-handler instance per thread, and connections would be spread across the pool of event-handling threads. I believe this gave the best of both worlds, and I saw high throughputs with fairly minimal overhead. I wrote a number of servers based on this architecture, including HTTP, IMAP, POP3, and DNS, and with each server type this model proved to be efficient and scalable. Ultimately the best architecture depends on your application. If you never intend to have many connections, and your applications has long-running computations, one-thread-per-connection would probably be best. If you need to handle large numbers of connections and have short, non-blocking request processing, event-based scales extremely well. You can of course create a hybrid of these two and have all connections managed by event threads and asynchronous queues to dedicated processing threads for heavy request handling (this is sort of what I did in the C Gearman Job Server).
There is no single correct answer, so take a look at your options before deciding how to approach your own applications. Don’t be afraid to create hybrids as well. Regardless of which paradigm you choose, concurrent programming can be hard, especially at the lower levels. There have been a number of higher level abstractions to help developers, from new libraries to new languages, but most of these come with a cost in performance or flexibility. When you need to squeeze every bit of performance out of your application, you will most likely end up in C or C++ dealing with these issues directly.
This is actually one of the problems I’m attempting to address with the Scale Stack Event modules. I’m trying to create a healthy level of abstraction on hybrid thread/event based applications so you don’t have any overhead or limitations while a lot of the common headaches are taken care of for you. If you have a need for such a system, get in touch, I’d be interested to talk. Since it is BSD licensed you can use it in any application, including commercial.
Drizzle Developer Day Recap
Last Friday we held the Drizzle Developer Day at the Santa Clara convention center, taking advantage of the fact that many developers and interested contributors were already there for the MySQL Conference & Expo. Minus a few small glitches like wifi and pizza consumption location, I would say it was an overall success. There were a lot of new folks interested in learning about Drizzle and getting the server up and running. The day was organized by splitting folks up into small groups with matching interests, and then switching up groups every hour or so. We had groups focused on replication, documentation, writing plugins, the optimizer, Boots (the new client tool), and a “getting started” group.
The first group I participated in was about Boots, the new command line tool developed by a group of students I sponsored at Portland State University. One of the students who created it was there (Chromakode), so he gave a demo of all the features and ways you could extend it for custom use. Baron from Percona was there and had a lot of good feedback on what is needed by DBAs, as well as for monitoring/troubleshooting problems. Some of the new features in Boots will help quite a bit with this since you are able to write simple Python scripts that work inside the program rather than having to write a bunch of shell processing code around the existing tool. This extended into a discussion about testing tools for production systems, and how to capture and replay production traffic with the same timing and load (or increased load).
The next group I sat in on was around creating plugins. There were topics like getting started with writing your own plugin, a script to generate a skeleton for your own, and more advanced topics like dependency tracking. Since I used the same pandora-plugin system for another project and added dependency tracking there, I am interested in getting dependency tracking into Drizzle. We didn’t get to any code, but this will require some changes in how plugins are loaded in the Drizzle kernel.
I had to leave a little early to catch my flight home, but for the second half of the day I bounced between helping a group get started from scratch (mainly installing dependencies to getting Drizzle built and running) and the other group topics. Thanks to everyone who showed up and helped participate, we all had some great conversations providing valuable feedback for directions to take moving forward.
Boots: A Modular CLI for Databases
Back in October I wrote about a student group I was sponsoring to create a new command line tool for Drizzle. The group wrapped up their part of the project (the term ended), and we now have a new tool called Boots! A few of the developers are still active in the project, and I’m planning to get involved more as well. We also have a couple students interested in hacking on it for Drizzle’s Google Summer of Code.
Boots is written in Python and aims to replace the the previous ‘drizzle’ tool (which was modified from the ‘mysql’ command line tool). It doesn’t support everything that the old tool has yet (like tab completion), but it adds some new features. For example, there are multiple ‘lingos’, or modular languages, that can be used to communicate with the shell. This allows you to use plain SQL, Python, or even LISP to interact with the shell. One of the lingos, piped-sql, lets you do interesting things such as:
shell$ boots -u root -h 127.0.0.1 -l pipedsql
Boots (v0.2.0)
127.0.0.1:3306 (server v5.1.40)
> SELECT * FROM mysql.user; | csv_out("users.csv")
5 rows in set (0.06s server | +0.00s working)
> Boots quit.
shell$ cat users.csv
localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0
...
It’s ready to use, so download and install it now! If you have any features you would like to see, please get in touch through the Boots blueprints, mailing list, or #boots IRC channel on irc.freenode.net. One of the original developers from the project, Chromakode (the same from the awesome xkcd.com shell), will also be attending the MySQL Conference & Expo next week and helping out with the Drizzle booth. Come find one of us to talk more about the project there!
Scale Stack and Database Proxy Prototype
Back in January when I was between jobs I had a free weekend to do some fun hacking. I decided to start a new open source project that had been brewing in the back of my head and since then have been poking at it on the weekends and an occasional late night. I decided to call it Scale Stack because it aims to provide a scalable network service stack. This may sound a bit generic and boring, but let me show a graph of a database proxy module I slapped together in the past couple days:
I setup MySQL 5.5.2-m2 and ran the sysbench read-only tests against it with 1-8192 threads. I then started up the database proxy module built on Scale Stack so sysbench would route through that, and you can see the concurrency improved quite a bit at higher thread counts. The database module doesn’t do much, it simply does connection concentration, mapping M to N connections, where N is a fixed parameter given at startup. In this case I always mapped all incoming sysbench connections down to 128 connections between Scale Stack and MySQL. It also uses a fixed number of threads and is entirely non-blocking. As you can see the max throughput around 64 threads is a bit lower, but I’ve not done much to optimize this yet (there should be some easy improvements where I simply stuck in a mutex instead of doing a lockless queue). It’s only a simple proof-of-concept module to see how well this would work, but it’s a start to a potentially useful module built on the other Scale Stack components. One other thing to mention is that these tests were run on a single 16-core Intel machine. I’d really like to test this with multiple machines at some point.
So, what is Scale Stack?
Check out the website for a simple overview of what it is. The goal is to pick up where the operating system kernel leaves off with the network stack. It is written in C++ and is extremely modular with only the module loader, option parsing, and basic log in the kernel library. It uses Monty Taylor’s pandora-build autoconf files to provide a sane modular build system, along with some modifications I made so dependency tracking is done between modules. You can actually use it to write modules that would do anything, I’m just most interested in network service based modules. The kernel/module loader is also just a library, so you can actually embed this into existing applications as well. Some of the modules I’ve written for it are a threaded event handling module based on libevent/pthreads and a TCP socket module. There is also an echo server and simple proxy module I created while testing the event and socket modules. The database proxy module builds on top of the event and socket module. The code is under the BSD license and is up on Launchpad, so feel free to check it out and contribute. If you need a base to build high-performance network services on, you should definitely take a look and talk with me.
What’s up next?
I have a long list of things I would like to do with this, but first up are still some basics. This includes other socket type modules like TLS/SSL, UDP, and Unix sockets. Then are some more protocol modules such as Drizzle, a real MySQL protocol module, and others like HTTP, Gearman, and memcached. It’s fairly trivial to write these since the socket modules handle all buffering and provide a simple API. As for the DatabaseProxy module, I’d like to rework how things are now so it’s not MySQL protocol specific, integrate other protocol modules, improve performance, add in multi-tenancy support for quality-of-service queuing based on account rules, and a laundry list of other features I won’t bore you with right now.
I also have plans for other services besides a database proxy, especially one that could combine a number of protocols into a generic URI server with pluggable handlers so you can do some interesting translations between modules (like Apache httpd but not http-centric). For example, think of the crazy things you can do with Twisted for Python, but now with a fast, threaded C++ kernel. I also still need to experiment with live reloading of modules, but I’m not sure if this will be worthwhile yet.
If any of this sounds interesting, get in touch, I’d love to have some help! I’ll have some blog posts later on how to get started writing modules, but for now just take a look at the existing modules. The EchoServer is a good place to start since it is pretty simple. Also, if you’ll be at the MySQL Conference and Expo next week, I’d be happy to talk more about it then.
Gearman Releases and Talks at the MySQL Conference
I spent some time this weekend fixing up the Gearman MySQL UDFs (user defined functions) and fixed a few bugs in the Gearman Server. You can find links to the new releases on the Gearman website. The UDFs now use Monty Taylor’s pandora-build autoconf files instead of the old fragile autoconf setup that relied on pkgconfig.
If you are attending the MySQL Conference & Expo next week and want to learn more about Gearman, be sure to check out one of the three sessions Giuseppe Maxia and I are giving:
- Getting started with Gearman for MySQL
- Boosting Database Performance with Gearman
- Gearman MySQL hacks, or Everything you wanted to do with a database server and you never dared to hope
Hope to see you there!
Writing Authentication Plugins for Drizzle
In this post I’m going to describe how to write an authentication plugin for Drizzle. The plugin I’ll be demonstrating is a simple file-based plugin that takes a file containing a list of ‘username:password’ entries (one per line like a .htpasswd file for Apache). The first step is to setup a proper build environment and create a branch, see the Drizzle wiki page to get going. From here I’ll assume you have Drizzle checked out from bzr and are able to compile it.
Setup a development branch and plugin directory
Change to your shared-repository directory for Drizzle and run (assuming you branched ‘lp:drizzle’ to ‘drizzle’):
shell$ bzr branch drizzle auth-file Branched 1432 revision(s). shell$ cd auth-file
Next, we’ll want to create the plugin directory and create plugin.ini and auth_file.cc.
shell$ mkdir plugin/auth_file
plugin/auth_file/plugin.ini:
[plugin] title=File-based Authentication description=A simple plugin to authenticate against a list of username:password entries in a plain text file. version=0.1 author=Eric Day <eday@oddments.org> license=PLUGIN_LICENSE_GPL
plugin/auth_file/auth_file.cc:
/* -*- mode: c++; c-basic-offset: 2; indent-tabs-mode: nil; -*-
* vim:expandtab:shiftwidth=2:tabstop=2:smarttab:
*
* Copyright (C) 2010 Eric Day
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; version 2 of the License.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include "config.h"
#include <string>
#include "drizzled/plugin/authentication.h"
#include "drizzled/security_context.h"
using namespace std;
using namespace drizzled;
namespace auth_file
{
class AuthFile: public plugin::Authentication
{
public:
AuthFile(string name_arg):
plugin::Authentication(name_arg)
{ }
bool authenticate(const SecurityContext &sctx, const string &password)
{
/* Let "root" user always succeed for now because of test suite. */
if (sctx.getUser() == "root" && password.empty())
return true;
/* Only allow hard coded username for now. */
if (sctx.getUser() == "auth_file")
return true;
return false;
}
};
static int init(plugin::Context &context)
{
context.add(new AuthFile("auth_file"))
return 0;
}
} /* namespace auth_file */
DRIZZLE_PLUGIN(auth_file::init, NULL);
All authentication plugins need to inherit from the ‘plugin::Authentication’ class and implement an ‘authenticate’ method. This takes the user context and a password as its arguments and simply returns true if the user is allowed or false otherwise. As you can see, this plugin will verify all sessions for the ‘auth_file’ user with any password and deny everything else. It also allows ‘root’ with no password for the test suite, we’ll fix this later so it’s not required. The init method is called when the plugin is loaded, and here we want to register an instance of the plugin class with the kernel. The DRIZZLE_PLUGIN definition is required so the Drizzle kernel can load the module and grab some basic information about it (like the name of the init method).
Create tests cases to verify our plugin works
We’ll want to add some test cases so we can check our plugin as we make progress. This is done by creating test case and result files inside the plugin directory. You’ll want to create the following directories and files:
shell$ mkdir plugin/auth_file/tests shell$ mkdir plugin/auth_file/tests/t shell$ mkdir plugin/auth_file/tests/r
plugin/auth_file/tests/t/basic-master.opt
--plugin-add=auth_file
plugin/auth_file/tests/t/basic.test
--replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT --replace_regex /@'.*?'/@'LOCALHOST'/ --error ER_ACCESS_DENIED_ERROR connect (bad_user,localhost,bad_user,,,); --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT connect (auth_file,localhost,auth_file,,,); connection auth_file; SELECT 1;
plugin/auth_file/tests/r/basic.result
connect(localhost,bad_user,,test,MASTER_PORT,); ERROR 28000: Access denied for user 'bad_user'@'LOCALHOST' (using password: NO) SELECT 1; 1 1
The files in the ‘tests/t’ directory drive the test system, and the file in ‘tests/r’ are the results that should match the output. This test tries two connections, one with ‘bad_user’ which should fail, and another with ‘auth_file’ user which should pass. Before writing the code to check against a list of users in a file, we’ll compile what we have so far and check the test cases to make sure things are working properly.
shell$ ./config/autorun.sh ... shell$ ./configure --with-debug ... shell$ make -j 3 ... shell$ make check ... auth_file.basic [ pass ] 6 ...
It works! To save some time while developing, you can also test just the auth_file plugin without everything else by running:
( cd tests && ./dtr --suite=auth_file )
Add options
We’re going to want users to be able to specify a location for the file to load, so we’ll need to tell the kernel about the option through the plugin interface. This is done by adding:
#include "drizzled/configmake.h"
...
static char* users_file= NULL;
static const char DEFAULT_USERS_FILE[]= SYSCONFDIR "/drizzle.users";
...
static DRIZZLE_SYSVAR_STR(users,
users_file,
PLUGIN_VAR_READONLY,
N_("File to load for usernames and passwords"),
NULL, /* check func */
NULL, /* update func*/
DEFAULT_USERS_FILE /* default */);
static drizzle_sys_var* sys_variables[]=
{
DRIZZLE_SYSVAR(users),
NULL
};
...
DRIZZLE_PLUGIN(auth_file::init, auth_file::sys_variables);
The first include is there so we can have access to the SYSCONFDIR macro, which maps to the ‘etc’ directory of our install path. That path plus the file ‘drizzle.users’ is our default. We also define a variable to either this default path or a custom path the user specifies. Next, we define a system variable with the macro DRIZZLE_SYSVAR_STR and provide some information like where to store it, the help string, and the default value. We also need to define a system variables list. The new variable is the only entry in the list right now, but you could define more system variables and add them to this list (just make sure it is NULL terminated). Last, we modify our DRIZZLE_PLUGIN call to give a second argument instead of NULL. This tells the kernel to look for variables in the provided list when loading. With this option, we’ll now be able to specify: –auth-file-users=/some/path/to/drizzles.users
Write the plugin
With the plugin compiling, tests setup, and options specified, we can start to write some real code. First up is adding a couple class methods, the new AuthFile class looks like:
class AuthFile: public plugin::Authentication
{
public:
AuthFile(string name_arg);
/**
* Retrieve the last error encountered in the class.
*/
string& getError(void);
/**
* Load the users file into a local map.
*
* @return True on success, false on error. If false is returned an error
* is set and can be retrieved with getError().
*/
bool loadFile(void);
private:
bool authenticate(const SecurityContext &sctx, const string &password);
string error;
map<string, string> users;
};
We’ve moved the method definitions out of the class (for Drizzle coding standards) and now have two new declarations: loadFile() to load the specified users file into a std::map, and getError() to return errors, if any. The getError() method simply returns the ‘error’ data member, but loadFile() is a bit more interesting:
bool AuthFile::loadFile(void)
{
ifstream file(users_file);
if (!file.is_open())
{
error = "Could not open users file: ";
error += users_file;
return false;
}
while (!file.eof())
{
string line;
getline(file, line);
if (line == "" || line[line.find_first_not_of(" \t")] == '#')
continue;
string username;
string password;
size_t password_offset = line.find(":");
if (password_offset == string::npos)
username = line;
else
{
username = string(line, 0, password_offset);
password = string(line, password_offset + 1);
}
pair<map<string, string>::iterator, bool> result;
result = users.insert(pair<string, string>(username, password));
if (result.second == false)
{
error = "Duplicate entry found in users file: ";
error += username;
file.close();
return false;
}
}
file.close();
return true;
}
This method opens the users file, and for each line, either ignores it because of blank lines/comments or parses out the username:password pair. Note that you don’t need to specify a password option.
Next up, we change the authenticate() method to use the map instead of the hard coded values:
bool AuthFile::authenticate(const SecurityContext &sctx, const string &password)
{
map<string, string>::const_iterator user = users.find(sctx.getUser());
if (user == users.end())
return false;
if (password == user->second)
return true;
return false;
}
This method now looks up users in the map and, if found with a password match, lets the user in. Now lets update our test case to use this. First we need to create a users file to allow the ‘root’ and ‘auth_file’ user we put in our test cases:
plugin/auth_file/tests/t/basic.users
# Always allow root user with no password for drizzletest program root auth_file
plugin/auth_file/tests/t/basic-master.opt
--plugin-add=auth_file --auth-file-users=$DRIZZLE_TEST_DIR/../plugin/auth_file/tests/t/basic.users
Now it’s time to recompile and check our new code:
shell$ make -j 3 ... shell$ ( cd tests && ./dtr --suite=auth_file ) ... auth_file.basic [ pass ] 6 ...
It still works! I’d like to say we’re done here, but notice we’ve not actually tested any passwords. Before trying that, a little explanation about how password authentication is required.
Verifying passwords in Drizzle
Because Drizzle has a pluggable protocol, the usernames and passwords can be coming from any source. They could be coming from the embedded console plugin which passes the password through as plain text, or from the MySQL protocol plugin that uses the custom MySQL hashing algorithm. This means a simple string equality does not suffice for all password sources. The code above handles the plain text case, but since the default connection method is the MySQL protocol, including for the test suite, we need to also handle the case when the user supplied password is hashed.
Verify MySQL Hashed Passwords
To accomplish this we add in an extra check in the authenticate() method. This now looks like:
bool AuthFile::authenticate(const SecurityContext &sctx, const string &password)
{
map<string, string>::const_iterator user = users.find(sctx.getUser());
if (user == users.end())
return false;
if (sctx.getPasswordType() == SecurityContext::MYSQL_HASH)
return verifyMySQLHash(user->second, sctx.getPasswordContext(), password);
if (password == user->second)
return true;
return false;
}
This extra check calls the verifyMySQLHash() method to verify the local password with the client-scrambled password, using the random bytes the server sent during the handshake (password context). This method is:
#include "drizzled/util/convert.h"
#include "drizzled/algorithm/sha1.h"
...
/**
* Verify the local and remote scrambled password match using the MySQL
* hashing algorithm.
*
* @param[in] password Plain text password that is stored locally.
* @param[in] scramble_bytes The random bytes the server sent to client
* to use for scrambling the password.
* @param[in] scrambled_password The result of the client scrambling the
* password remotely.
* @return True if the password matched, false if not.
*/
bool verifyMySQLHash(const string &password,
const string &scramble_bytes,
const string &scrambled_password);
...
bool AuthFile::verifyMySQLHash(const string &password,
const string &scramble_bytes,
const string &scrambled_password)
{
if (scramble_bytes.size() != SHA1_DIGEST_LENGTH ||
scrambled_password.size() != SHA1_DIGEST_LENGTH)
{
return false;
}
SHA1_CTX ctx;
uint8_t local_scrambled_password[SHA1_DIGEST_LENGTH];
uint8_t temp_hash[SHA1_DIGEST_LENGTH];
uint8_t scrambled_password_check[SHA1_DIGEST_LENGTH];
/* Generate the double SHA1 hash for the password stored locally first. */
SHA1Init(&ctx);
SHA1Update(&ctx, reinterpret_cast<const uint8_t *>(password.c_str()),
password.size());
SHA1Final(temp_hash, &ctx);
SHA1Init(&ctx);
SHA1Update(&ctx, temp_hash, SHA1_DIGEST_LENGTH);
SHA1Final(local_scrambled_password, &ctx);
/* Hash the scramble that was sent to client with the local password. */
SHA1Init(&ctx);
SHA1Update(&ctx, reinterpret_cast<const uint8_t*>(scramble_bytes.c_str()),
SHA1_DIGEST_LENGTH);
SHA1Update(&ctx, local_scrambled_password, SHA1_DIGEST_LENGTH);
SHA1Final(temp_hash, &ctx);
/* Next, XOR the result with what the client sent to get the original
single-hashed password. */
for (int x= 0; x < SHA1_DIGEST_LENGTH; x++)
temp_hash[x]= temp_hash[x] ^ scrambled_password[x];
/* Hash this result once more to get the double-hashed password again. */
SHA1Init(&ctx);
SHA1Update(&ctx, temp_hash, SHA1_DIGEST_LENGTH);
SHA1Final(scrambled_password_check, &ctx);
/* These should match for a successful auth. */
return memcmp(local_scrambled_password, scrambled_password_check, SHA1_DIGEST_LENGTH) == 0;
}
I won't get into the details of what this method does, this is left as an exercise for the reader. :) The one thing we do care about is if this works, so back to adding to our test cases:
plugin/auth_file/tests/t/basic.users
# Always allow root user with no password for drizzletest program root auth_file auth_file_password:test_password
plugin/auth_file/tests/t/basic.test
--replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT --replace_regex /@'.*?'/@'LOCALHOST'/ --error ER_ACCESS_DENIED_ERROR connect (bad_user,localhost,bad_user,,,); --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT connect (auth_file,localhost,auth_file,,,); connection auth_file; SELECT 1; --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT connect (auth_file_password,localhost,auth_file_password,test_password,,); connection auth_file_password; SELECT 1; --replace_result $MASTER_MYSOCK MASTER_SOCKET $MASTER_MYPORT MASTER_PORT --replace_regex /@'.*?'/@'LOCALHOST'/ --error ER_ACCESS_DENIED_ERROR connect (bad_user_password,localhost,auth_file_password,bad_password,,);
plugin/auth_file/tests/r/basic.result
connect(localhost,bad_user,,test,MASTER_PORT,); ERROR 28000: Access denied for user 'bad_user'@'LOCALHOST' (using password: NO) SELECT 1; 1 1 SELECT 1; 1 1 connect(localhost,auth_file_password,bad_password,test,MASTER_PORT,); ERROR 28000: Access denied for user 'auth_file_password'@'LOCALHOST' (using password: YES)
With the test files updated, lets compile and run the tests:
shell$ make -j 3 ... shell$ ( cd tests && ./dtr --suite=auth_file ) ... auth_file.basic [ pass ] 11 ...
It works! At this point we have a fully functional plugin. To finish up the plugin, we'll want to commit the changes, push the branch to Launchpad, and propose the plugin for review so it can be merged into the trunk. You can see the full source code in lp:~eday/drizzle/auth-file (it should also appear in the Drizzle trunk in the next couple of days). There are improvements that can be made such as checking if the file changed to reload while running (being conscious of the possibility of multiple concurrent readers) or being able to store passwords in a format other than plain text. Patches are welcome!
I hope this gives you enough information to get started writing your own authentication plugins. I'm going to be working on a direct LDAP authentication plugin next, supporting both plain text and MySQL hashed passwords. If you need any help getting started with your own, come ask your questions on IRC or on the Drizzle mailing list. We'll also be hosting a Drizzle Developer Day after the MySQL Conference where you can get started in person.
Thoughts on “NoSQLâ€
I’ve decided to jump on the bandwagon and spill my thoughts on “NoSQL” since it’s been such a hot topic lately ([1], [2], [3], [4]). Since I work on the Drizzle project some folks would probably think I take the SQL side of the “debate,” but actually I’m pretty objective about the topic and find value in projects on both sides. Let me explain.
Last November at OpenSQL Camp I assembled a panel to debate “SQL vs NoSQL.” We had folks representing a variety of projects, including Cassandra, CouchDB, Drizzle, MariaDB, MongoDB, MySQL, and PostgreSQL. Even though I realized this was a poor name for such a panel, I went with it anyways because this “debate” was really starting to heat up. The conclusion I was hoping for is that the two are not at odds because the two categories of projects can peacefully co-exist in the same toolbox for data management. Beyond the panel name, even the term “NoSQL” is a bit misleading. I talked with Eric Evans (one of my new co-workers over on the Cassandra team) who reintroduced the term, and even he admits it is vague and doesn’t do the projects categorized by it any favors. What happens when Cassandra has a SQL interface stacked on top of it? Yeah.
One reason for all this confusion is that for some people, the term “database” equates to “relational database.” This makes the non-relational projects look foreign because they don’t fit the database model that became “traditional” due it’s popularity. Anyone who has ever read up on other database models would quickly realize relational is just one of many models, and many of the “NoSQL” projects fit quite nicely into one of these categories. The real value these new projects are providing are in their implementation details, especially with dynamic scale-out (adding new nodes to live systems) and synchronization mechanisms (eventual consistency or tunable quorum). There are a lot of great ideas in these projects, and people on the “SQL” side should really take the time to study them – there are some tricks to learn.

One of the main criticisms of the “NoSQL” projects is that they are taking a step back, simply reinventing a component that already exists in a relational model. While this may have some truth, if you gloss over the high-level logical data representations, this is just wrong. Sure, it may look like a simple key-value store from the outside, but there is a lot more under the hood. For many of these projects it was a design decision to focus on the implementation details where it matters, and not bother with things like parsing SQL and optimizing joins. I think there is still some value in supporting some form of a SQL interface because this gets you instant adoption by pretty much any developer out there. Love it or hate it, people know SQL. As for joins, scaling them with distributed relational nodes has been a research topic for years, and it’s a hard problem. People have worked around this by accepting new data models and consistency levels. It all depends on what your problem requires.
I fully embrace the “NoSQL” projects out there, there is something we can all learn from them even if we don’t put them into production. We should be thrilled we have more open source tools in our database toolbox, especially non-relational ones. We are no longer required to smash every dataset “peg” into the relational “hole.” Use the best tool for the job, this may still be a relational database. Explore your options, try to learn a few things, model your data in a number of ways, and find out what is really required. When it comes time to making a decision just remember:
Drizzle Protocol Changes
On an entirely unrelated note to the MySQL protocol discussions happening yesterday, the MySQL protocol is now the default protocol in Drizzle as of Monday’s tarball (3/15). Drizzle supports a limited version of the MySQL protocol, only supporting the subset of commands Drizzle cares about (no server-side prepared statements, replication, or deprecated commands due to SQL query equivalents). Not all MySQL clients have been fully tested with it, but our entire test suite is using it now with the libdrizzle MySQL implementation. The latest release of libdrizzle also includes defaulting to the MySQL protocol and port for Drizzle connections.
There has been some debate about this change, even amongst some of the core developers. The current Drizzle protocol is a slight modification of the MySQL protocol. It has been running on the IANA assigned Drizzle port (4427) by default since the beginning. We are developing a new Drizzle protocol which will have a number of new features, as well as being more extensible than the old protocol. It may be some time before this protocol is stable and tested well enough to make it the default. Thinking into the future, there are a couple upgrade paths to consider when it is ready:
The first option is to declare a certain release the “new protocol release” and clients talking to it would need to be upgraded at the same time. This would switch the line-level protocol on port 4427 from the old Drizzle protocol to the new. Clients would be required to upgrade because those using the old protocol would break when trying to connect to the new server-side protocol module (there are no clever, efficient hacks here because different sides send the first byte for each protocol). This option doesn’t involve any dependencies on the MySQL protocol module or APIs at all, but does make a forced upgrade situation in the future for clients tools and APIs.
The second option is to make the MySQL protocol the default protocol now, and when the new Drizzle protocol module is ready and available, tell folks they can start using it. Those people using libdrizzle or it’s derived APIs will have an easy transition because a libdrizzle release can change the default as well. This allows us to put the Drizzle protocol module (port 4427) in an experimental mode that changes between releases as we develop it. This introduces a required dependency to the MySQL protocol module for the time being, and possible confusion when we make the switch back.
Both approaches have pros and cons, and for better or for worse, it has been decided to take the latter approach. If you start running the drizzle tarball being released today, the client tools will speak MySQL to port 3306 by default, and the latest libdrizzle release defaults to this as well. If you are running Drizzle on the same machine as MySQL/MariaDB and you get a port conflict on 3306 when starting, be sure to start drizzled with –mysql-protocol-port=X to bind to a different port (you’ll of course need to use the same port in the client utils/APIs when connecting).
Open Source Bridge 2010
A couple months ago Selena Deckelmann asked if I wanted to co-chair the Open Source Bridge Conference this year, and I was thrilled to say yes! This conference is all volunteer run by some of the most dedicated volunteers I have ever seen, I’m excited to be working with such a fantastic group of people. The conference is also backed by the 501(c)3 non-profit Technocation which is primarily run by Sheeri Cabral who is well known in the MySQL community.
The conference is June 1-4 in Portland, OR, and will be held at the Portland Art Museum. The call for proposals is open until the end-of-day on March 25th, so please submit your ideas now! Early registration is also open until April 1st, so now would also be a great time to sign up. We are still looking for sponsors, so if your organization or company is interested, please get in touch!
Emphasizing what our about page describes, this conference is slightly different in some regards to others in that we’re focused on open source citizenship. We want folks to openly share ideas in a variety of ways and to get things done when they attend. We also are trying to keep the cost as low as possible so it is accessible to a larger audience.
We’re also partnering with O’Reilly’s OSCON which is in Portland during late July because both sides feel the two conference complement each other, we are in no way trying to compete. Portland is bursting with so much open source, one conference could not contain it! You should come check it out. :)
Drizzling from the Rackspace Cloud
Since I left Sun back in January, folks have been asking what was next. I’m happy to say that I’m going to continue hacking on open source projects like Drizzle and Gearman, but now at the Rackspace Cloud. Not only will I be there, but I get to continue working closely with a few of the amazing Drizzle hackers who have also joined, including Monty Taylor, Jay Pipes, Stewart Smith, and Lee Bieber.
Why Rackspace Cloud? Late last year I was considering what I wanted to do next with the Oracle acquisition looming near, and this was one of the options that presented itself. Rackspace had been a supporter of Drizzle from early on by offering virtual machines to develop and test on, and when talking to some folks more closely, something really hit home. Rackspace provides first-class service and “fanatical” support – they are not a software company. One might ask why an open source software developer would be interested in a company that doesn’t create software or vice-versa, and the answer is that Rackspace wants to find ways to offer the best possible service now and into the future. What better way than to help develop the next generation of service software and get a jump start into integrating this into their architecture? Both the open source community and Rackspace win.
Another thing I learned while talking with Rackspace is that one of their core principles is transparency. This applies to both customer and employees, and anyone within an open source community can appreciate this. The more I learned about the company and the folks within it, the more impressed I was at the lack of internal barriers or “need-to-know” information. One of Drizzle’s core goals is also transparency, from discussing design decisions on public mailing lists and IRC, to having the entire project management infrastructure hosted out in the open at Launchpad.
What does this mean for the Drizzle project? It means continued support for a number of core developers, more infrastructure for development, and most importantly in my eyes, more context. One of the Drizzle tag-lines is “A Lightweight SQL Database for Cloud and Web,” so what better place to develop a database designed for the cloud than on one of the fastest growing cloud platforms. We’ll get a detailed look at the demands, get feedback from cloud customers, and have the perfect test bed for offering new services. We’ll also be able to work closely with a top-notch group of DBAs, developers, and sysadmins in one of the most demanding service architectures out there. This invaluable context will help the Drizzle developers make more informed decisions moving forward, which also means better software for the community.
Personally, this also means getting back to my hosting roots. Before Sun, I worked at Concentric for almost 10 years in a clustered hosting environment. I’m very familiar with many of the multi-tenant scalability concerns Rackspace has, and I’m excited to be working in this type of environment again. We’ve already been working closely with the MySQL DBAs at Rackspace to learn what the biggest pain points are for a multi-tenant architecture, and we’ll be taking steps to address these as it will help anyone wanting to run Drizzle in a cloud-like environment. Drizzle’s modular architecture has already proved useful, as some of these concerns are easily answered with “oh, we have a plugin point for that.”
I’m excited, this is going to be a fun ride.
C++, or Something Like It
I’ve developed primarily in C most of my career, and recently decided to give C++ a shot as my “primary language” due to hacking on Drizzle and MySQL. The past few months I’ve read and experimented with most features C++ provides over C, including reading Scott Meyer’s excellent “Effective” series books (highly recommended). Along the way I’ve been developing a project I’ve wanted to write for a while, and I’m finding some features to be problematic. I thought I’d share these issues so others can be aware of them and perhaps I can learn better workarounds.
The project I’ve been working on uses dynamic shared object loading at runtime (using dlopen() and friends), is threaded, and has about every strict compiler warning on you can find and being treated as errors (thanks to Monty Taylor’s pandora-build project). I’m also testing on various architectures and compilers, including Linux, OpenSolaris, and OSX. I also have been trying my best to avoid any dependencies on large C++ libraries like Boost and just stick to the standard language and STL. With these requirements in mind, here are the issues I’ve run into:
Can’t Reliably Use Exceptions
My first pass relied on exceptions, but this proved problematic on some architectures as soon as custom exceptions were being throw across module boundaries. This comes down to ABI issues for some shared object formats generated by some compiler versions. While you can make it work in some environments, it’s not going to be portable. This means I’ve had to catch exceptions closer to where they are throw, requiring a lot more try/catch blocks, and not being able to take full advantage of automatic stack cleanup. This also means resorting back to the C way or handling exceptions: returning and checking return codes while generating error strings. To be completely exception safe, this means not using std::string for error returns since they can throw exceptions while building useful error messages. Not using exceptions has had a viral effect throughout the rest of the design of the code, making it look more like C. I was a bit disappointed by this, as not having to check every function’s return code was keeping the code very clean. :)
Limited Use of the STL and std::string
I was excited to take advantage of the STL, as writing things like doubly-linked lists and hash tables for every C struct was getting a bit old (I did have a set of macros I used, but they were not the most popular in some circles because of certain C-preprocessor features). When I learned more about the internals of the STL, and how it relies heavily on copying objects, my heart sunk a little. It completely makes sense in the design, it’s just not as efficient as it could be (especially coming from a place where I would optimize to reduce pointer copies in C). No worries, I just created private copy constructors/assignment operators and only used pointers to objects. This came with it’s own set of issues with pointer management and avoiding leaks if the ‘new’ operator were to fail. Once working out the memory management issues, there were still exceptions to watch out for, including figuring out all the methods that may throw (due to an internal allocation usually). This is especially annoying when doing simple std::string operations like assignment or concatenation, and having to always catch around those. With other annoyances like the reference-to-reference issues and std::unary_function having a non-virtual destuctor, I’ve ended up using a watered down set of STL algorithms and resorted to a mix of non-STL containers and custom algorithms for some things. The lack of thread safety concerns in STL containers and differences in implementations have also lead me to not use STL containers for thread communication (using a mutex for every access is not efficient).
Conclusion
For the sake of consistency, I’ve wondered if it’s worth incorporating STL components? Is it better to have a mix or none at all? This would leave only inheritance, polymorphism, member protections, namespaces, and automatic object destruction the only C++ features being used. These are still very good reasons to use C++, but I’ve found the transition to not be as productive as I had hoped. I am very curious to hear other folks thoughts on their experience with any of the issues above.
MySQL Conf & Drizzle Dev Day
I’m glad to announce that we’ll be having a Drizzle developer day again this year on the Friday after the MySQL Conference! Be sure to sign up and add any topic ideas you may have so we know what folks are interested in. Space is limited!
While at the MySQL Conference, I’ll be speaking with Monty Taylor on “Using Drizzle.” This will take a non-developer approach to the project, so everyday DBAs and web developers should find this interesting. I’ll also be teaming up with Giuseppe Maxia to talk about Gearman in three sessions. These include:
- Getting started with Gearman for MySQL
- Boosting Database Performance with Gearman
- Gearman MySQL hacks, or Everything you wanted to do with a database server and you never dared to hope
We’re also going to have a combo Drizzle/Gearman booth in the expo hall, so be sure to stop by and chat. See you there!
Linux Conf AU 2010
I was really excited when I had my Gearman talk accepted to Linux Conf AU 2010 because I had never been out that far in the Pacific (only Hawaii). Of course it wasn’t in Australia this year, and instead in Wellington, New Zealand. My wife came too, and we also made a vacation out of the down times we had around the conference. It turned out Brian couldn’t make it this year so Monty, Stewart, and I gave the Drizzle talk. It was great to see some familiar faces, including Mark Atwood, Giuseppe Maxia, Josh Berkus, and Selena Deckelmann. Josh actually ended up being on the same flight out, so we got to catch up while going through New Zealand customs at 5am after a 13 hour flight. :)
New Zealand is an amazing place. We flew in and out of Auckland and took the train to Wellington. The train ride mostly consisted of grazing sheep once out of the metro areas, did you know there are more sheep than people in NZ? Beyond the sheep, there were great views along the way, especially in the middle near the larger mountains and volcanoes. We stopped for a day to hike the Tongariro Alpine Crossing. It was sunny when we started, but it it was raining with 40mph winds at the top, so we didn’t get to see as much as we hoped. There were still beautiful views on each side though.
The conference was very well run, thanks to anyone who had a hand in it! The speakers dinner was at this great museum nearby on the waterfront and included live Maori singing and dancing. The vegan options were tasty, and I got to meet a few interesting folks there (like the folks from Dreamwidth, a LiveJournal-like blogging service). Some notable sessions during the conference were “The World’s Worst Inventions” by Paul Fenwick, “Anti-features” Keynote by Benjamin Mako Hill, “The Hydras GCC Static Analysis Plugins” by Taras Glek, and “Simplicity Through Optimization” by Paul McKenney. There were many other great sessions, and some I wish I could have attended.
I’m certainly going to try to go again next year, which if you didn’t hear will be in Brisbane!


