(Note: This article is adapted from Jay’s new book, MySQL Weekend Crash Course look for it at a store near you. A book store! -Ed.)
MySQL has become the database of choice for many Web developers over the last few years and for good reason. It’s fast, free, easy to use, and has great community support.
But many experienced developers refused to touch MySQL because, they complained, the product didn’t implement features that were absolutely critical in an SQL server. MySQL’s most egregious omission, according to some, was its lack of transaction support. But thanks to recent developments in MySQL land, that’s no longer the case.
When it first hit the cyber-street, MySQL offered only one table type for data storage, the ISAM table now upgraded to the MyISAM type for all recent versions of MySQL. But MyISAM tables were limited. Very limited.
Then the folks from Sleepycat Software came into the picture. Sleepycat creates and sells a database storage engine which is used mostly with embedded devices. The storage engine comes with an API that allows developers to integrate Sleepycat’s data storage software into their products. And that’s just what the folks at MySQL did, they integrated the Berkeley DB (or BDB) table from Sleepycat. This was the first transactional table type included available to MySQL users.
Berkeley DB tables were followed shortly by two other transactional table types: InnoDB and Gemini. Gemini tables are adopted from another embedded storage mechanism; this one from NuSphere, a Progress Software property. InnoDB tables were designed specifically for MySQL.
But we’re getting ahead of ourselves here. Before we take a closer look at each of these different options, we need to start at square one: Why you want transaction support in the first place.
Getting in on the Transactions
If you’re relatively new to databases, or if MySQL has been the database on which you’ve cut your teeth, you may not even know what a transaction is. Put simply, a transaction is a series of SQL statements that are executed as a single unit; either all the statements are executed completely or none are executed at all.
Why are transactions so important? Consider a fairly typical Web application: a shopping cart. In a simple shopping cart application, you’re likely to have tables that look a little something like this (only less simplified, and with real info instead of the dummy data):
products +------------+-------------------+---------------------------+---------------+ | product_id | product_name | product_description | product_price | +------------+-------------------+---------------------------+---------------+ | 1 | Old Pair of Socks | Plenty of holes | 4.25 | | 2 | Old T-shirt | Ketchup stains everywhere | 6.75 | +------------+-------------------+---------------------------+---------------+ inventory +------------+----------+ | product_id | in_stock | +------------+----------+ | 1 | 25 | | 2 | 12 | +------------+----------+ buyers +----------+-------+---------+------------+------------+ | buyer_id | fname | lname | phone | fax | +----------+-------+---------+------------+------------+ | 1 | John | Doe | 7185551414 | NULL | | 2 | Jane | Johnson | 7185551414 | 2126667777 | +----------+-------+---------+------------+------------+ orders +----------+----------+-------------+ | order_id | buyer_id | order_price | +----------+----------+-------------+ | 1 | 1 | 11.00 | +----------+----------+-------------+ order_items +----------+------------+---------------+ | order_id | product_id | product_price | +----------+------------+---------------+ | 1 | 1 | 4.25 | | 1 | 2 | 6.75 | +----------+------------+---------------+
You can see how these tables fit together in this diagram:
When it comes time to process an order, you have to run several SQL statements within a script (written in PHP, Perl, Java, or whatever language you prefer). In the script, you want to take a look at what items the buyer wants, see if there’s adequate inventory to complete the order, and if there is adequate inventory, you want to complete the order. In pseudo-code, the script used to complete the order would look something like this:
get buyer data and shopping cart data from web forms insert buyer data into buyer table start order by creating row in orders table get current order_id for each item desired check available inventory if inventory is available write item to order_items table decrement inventory endif end for loop get total for items for the order update orders table
I simplified this listing so that it’s easier to spot a potential problem. Consider what would happen if the power failed on the machine hosting the database just as it was in the middle of checking the inventory of the first item in the order. You’d restart your machine to find a row in the orders table without any child rows in the order_items table. It’s quite possible you’d be left with data that was, to a large extent, incomprehensible. You wouldn’t know what orders had been placed unless your customers sent you email, wondering just when you planned to send them the old pairs of socks they had requested not the best way to run a business.
If you had transaction support in place, however, you could treat this group of statements as a single unit. If for any reason the database failed to complete all of the statements in their entirety, the data would revert (or roll back) to the condition it was in prior to the first statement’s execution.
Of course there’s a bit more to transactions. A transaction-capable database must implement four specific properties. Wanna know what they are? Turn the page.
Are You on ACID?
Transaction-capable databases must implement four properties, collectively known by the mnemonic acronym ACID.
- Atomicity: Transactions must be singular operations. Either all statements are executed as a single (atomic) unit or none are executed at all.
- Consistency: In a transactional database, data will move from one consistent state to another. There will never be a point during a transaction when data is partly processed.
- Isolation: The dealings of one transaction will not be visible to other clients until the transaction is completed successfully or rolled back. You can be sure that the data available to one transaction is accurate because it is isolated from changes other clients might make.
- Durability: When a transaction completes successfully, the changes are permanent. Nothing, not even a disk crash or power failure, will erase the changes made by a successfully completed transaction.
SQL servers that allow transactions make use of several keywords: BEGIN WORK, COMMIT, and ROLLBACK. The phrase BEGIN WORK lets the SQL server know that the SQL statements that follow are part of a transaction. The transaction is not completed until either a COMMIT or ROLLBACK statement is executed. COMMIT writes the changes to the database. Once a transaction has been successfully COMMITed, only another successfully committed SQL statement can alter the data. No crashes or concurrently run SQL statements will effect the data. The ROLLBACK command tells the database that all of the statements within the transaction should be ignored and the database should revert to the point it was in prior to the start of the transactions. In the case of a crash, all transactions that were not expressly committed are automatically rolled back.
You can now revisit the pseudo-code first presented on the previous page. I’ve improved the listing by incorporating a transaction.
get buyer data and shopping cart data from web forms insert buyer data into buyer table BEGIN WORK start order by creating row in orders table get current order_id for each item desired check available inventory if inventory is available write item to order_items table decrement inventory else set error variable to true endif end for loop if error variable is true ROLLBACK else get total for items for the order update orders table COMMIT
Notice the ROLLBACK that I’ve added. This ensures that if inventory isn’t available to complete even a portion of a user’s request, the entire order is ignored. No row will be written to the orders or order_items table. Are you beginning to see how good transactions can be?
In a Web environment, you can expect multiple users (database clients or threads) to be accessing the script simultaneously. Therefore, the inventory of a given item will be changing continually. But when using transactions, you don’t have to worry that a user will complete an order for an item that is actually out of stock. That’s because of the I (for Isolation) in ACID. In a transactional environment, each transaction is isolated from the other, so one transaction cannot see the working of another transaction until the first one is complete.
The last sentence was a bit of a simplification, but it’s pretty close, and it’s good enough for now. In order to isolate one transaction from another, database systems must implement some sort of locking scheme. That is, the database needs a way for one client (or thread) to lock out (or isolate itself) from all other clients.
Locking is a key element of transactions. I’ll be talking about this subject extensively throughout this tutorial, starting on the next page.
Locking makes some portion of data the property of a single client. That client says, in effect, “this data here is mine, and the rest of you can only do what I expressly permit.” Locks can have one of two effects: A lock may prevent other clients from altering data (with UPDATE or DELETE statements) or a lock may prevent all access to some data preventing UPDATEs, DELETEs, and even SELECTs.
To understand locking mechanisms in MySQL, you first need to recognize that MySQL is an unusual product. It isn’t really a single, unified piece of software. Rather, it uses technology from several different sources, and the way you implement your transactions in MySQL largely depends on the table type you use. Each table-type uses a different method of locking, and the differences in those locking mechanisms will effect how you write your code.
MyISAM tables are very fast for SELECTs, but they have some serious drawbacks when it comes to locking. These shortcomings are what prevented MySQL from implementing some key database features, including transactions.
Looking at the way MyISAM struggles with locking, you really begin to appreciate the power and value of actual transactions. So before we get into BDB, Gemini, and InnoDB, let’s first take a look at the limitations of MyISAM’s table-level locking.
Using MyISAM Tables
I already mentioned that MyISAM tables (usually MySQL’s default table type) don’t support transactions. This is largely because MyISAM tables offer only table-level locking, which means that locks can only be placed on entire tables. So if you want to prevent a single row in a table from being changed, you need to prevent all rows in the table from being changed.
Take our inventory table as an example. If one client is buying an item, you’ll want to check the inventory, and if the item is there in sufficient quantity, you’ll decrement the number available after the sale. To make sure the quantity doesn’t change between the time you check the availability and the time your change the inventory, you’ll want to put a lock on that row. But because MyISAM offers only table-level locking, you’ll have to cut off access to all other rows in the table.
MyISAM offers two types of table-level locks, a read lock and a write lock. When a read lock is initiated by a client, all other clients are prevented both from making changes to the table via INSERTs, DELETEs, or UPDATES.
To see how a read-level lock works on a MyISAM table, open up two copies of the MySQL command-line client. Then create a table and insert some data with the statements below
create table inventory ( product_id int not null primary key, in_stock int not null, index index_on_in_stock(in_stock) )type=myisam; INSERT INTO inventory (product_id, in_stock) VALUES(1,25), (2,12);
Now in one client, place a read lock on the inventory table with the following command:
LOCK TABLES inventory READ;
Now in the second copy of the client run a SELECT * FROM inventory. You’ll see that the command executes just fine. However, if you try to run an UPDATE, it’s a different story. Try the following command with the lock in place.
UPDATE inventory set in_stock=24 where product_id=1;
You’ll see that this copy of the client does not respond. It’s locked out and can’t execute the command. Only the client that placed the lock can change the table. Now go back to the first copy of the command-line client and release the lock with:
Once the lock is released the second client will be free to run the UPATE command and change the row.
A write lock prevents other clients from even running SELECTs on the locked table. You can place a write lock with the following command:
LOCK TABLES inventory WRITE;
Go back and run the previous experiment but this time issue a WRITE lock. You’ll see that the other client is prevented from doing anything, even reading from the locked table with a SELECT.
So now you know all the locks that are possible with MyISAM tables. As I mentioned earlier, the folks at MySQL AB (who run MySQL development) often argue that transactions really aren’t necessary. They say that by properly applying locks and writing clever SQL, you should be able to avoid the need for transactions. Below I’ve written some pseudo-code that adds transaction-like abilities to the shopping cart pseudo-code using MyISAM tables.
INSERT query into buyers table. run last_insert_id() to get buyer_id run INSERT into orders table run last_insert_id() to get order_id get write lock on inventory table for each of the items in the order get quantity from the inventory table if quantity is > 0 insert into order_items table update inventory table subtracting the ordered item elseif quantity = 0 delete all items from order_items with current order_id delete item from orders table with current order_id update inventory table to replenish items previously subtracted set error variable to true break from for loop if error variable is not true update orders table with the current order_id, adding the order_total else output error
I might be able to clean this up a bit and remove a few of the SQL statements, but I think you see the way you’d need to go writing transaction-like code with MyISAM tables. While this code might add a degree of isolation (remember the I in ACID) by way of locks, there are other ACID properties missing here. Most notable is the lack of consistency: The data moves through several inconsistent states before the script is done. If the power goes out during one these inconsistent phases, you’ve got problems.
While ISAM locks are better than nothing, they are really no substitute for true ACID transactions. In Lesson 2 of this tutorial, I’ll discuss locking and transactions in BDB, InnoDB, and Gemini tables.
Come on back, ya’here.