In Lesson 1 of this tutorial, you leaned how to approximate transaction-like behavior with MyISAM tables. In this lesson, you’ll see how you can achieve real transactions with each of MySQL’s three transactional table types: BDB, InnoDB, and Gemini. But before I get to specifics surrounding each table, I need to revisit the notion of locking.
Remember that MyISAM supports only table-level locks of two types: read locks and write locks. The transactional tables types offer more granular locking mechanisms: They can place locks on specific subsets of data. You’ll read more about each table’s locking mechanisms as I get to them in the following pages.
Working with MySQL’s Transactional Table Types
Transactional table types can apply two types of locks. These are known as shared locks and exclusive locks. A shared lock is similar to the MyISAM read lock. When a client places a shared lock on data, other clients cannot alter that data with UPDATE or DELETE statements, but they can read the locked data via SELECTs. When a client gains an exclusive lock, other clients can neither alter the locked data via UPDATEs and DELETEs, nor can they read the data with SELECTs.
Before we get to the actual code, there’s a MySQL setting I need to mention. MySQL contains an environment variable called
autocommit. By default, autocommit is set to 1. When autocommit is set to 1, you cannot run transactions with any of the MySQL table types. In this mode, MySQL treats every SQL statement as its own transaction. To use groups of statements as transactions you need to turn autocommit mode off. So in all the code snippets that follow, you’ll notice that I start each with the line:
OK, now on to the rest of the code, starting with BDB tables.
Using BDB Tables
To check if you have BDB tables available, run the following query:
mysql> show variables like '%have%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | have_bdb | YES | | have_innodb | YES | | have_isam | YES | | have_raid | NO | | have_symlink | YES | | have_openssl | NO | +---------------+-------+ 6 rows in set (0.00 sec)
If the first row in the listing says “no” you don’t have BDB installed. If you’ve got either InnoDB or Gemini, that’s fine — you can just use one of those, and I’ll talk about these tables soon (but please stick around, what you learn here will be useful there). If you decide to work with BDB tables, you can either install a MySQL-Max binary or recompile MySQL from source using the
--with-bdb flag. BDB tables are available for most operating systems, including Windows, Linux, and Mac OS X.
To create BDB tables, you can include
type=bdb at the end of the CREATE statement. For example:
create table inventory( product_id int not null primary key, in_stock int not null )type=bdb;
Or, if you wish to move data from a MyISAM to a BDB table, you can run an ALTER statement, like so:
ALTER table table_name type=bdb;
Note that before you run an ALTER statement like this, you should backup the table. This type of ALTER statement actually dumps the table data, creates a new BDB table, and then loads the dumped data into the table. It’s an involved process that can fail, and if it does fail you want to be protected. Also be prepared to wait a while. If you’ve got large table, the dump-and-reload process can take a bit of time.
BDB tables offer page-level locking. What’s a page? It’s a group of rows. The exact number of rows in a page varies depending on the number of rows in your tables, the operating system in use, and other factors. During a transaction, MySQL will apply a lock to a page worth of data. All other rows are unaffected by locks and ready to process whatever queries you throw at them.
Page-level locking is a vast improvement over the table-level locking used by MyISAM tables. If only groups of rows — small subsets of whole tables — are locked by a single client, that means many clients can access different portions of the table simultaneously. In fact, it’s acceptable, even commonplace, to have many clients applying various locks to different portions of a table at the same time. The simultaneous access of data by different clients is usually referred to as concurrency. In database land (where the men have hairless backs and the women shapely legs) the greater the concurrency the better.
One of the great things about transactions is that for the most part you don’t need to worry about manually applying the locks. Just start a transaction, run your SQL statements as you normally would, and the clever MySQL engine will mostly take care of the locking for you. Neat-o!
Take a look at a simple transaction in a BDB table working with the following table:
mysql> select * from inventory; +------------+----------+ | product_id | in_stock | +------------+----------+ | 1 | 24 | | 2 | 47 | +------------+----------+ 2 rows in set (0.05 sec)
To see the effect of a transaction, open up two copies of the command-line client. In one, start a transaction and run a SELECT on the first row:
mysql> set autocommit=0; mysql> begin work; mysql> select * from inventory where product_id=1;
Now in the second client, start a transaction and try to update the row 1.
mysql> set autocommit=0; mysql> begin work; mysql> update inventory set in_stock=23 where product_id=1;</code>
You’ll see that the second client, the one trying to do the UPDATE, doesn’t respond. That’s because the first client has a share lock on that row. To release the lock, commit the transaction in the first client:
You’ll notice that the second client is immediately freed to run the update operation.
Take a look at the following improvement to the pseudo-code I used in Lesson 1 of the tutorial. Note the use of the BEGIN WORK-COMMIT/ROLLBACK syntax.
set autocommit=0 INSERT query into buyers table. run last_insert_id() to get user_id BEGIN WORK run INSERT into orders table run last_insert_id() to get order_id for each of the items in the order get quantity from the inventory table if quantity is > 0 insert into order_items table update inventory table subtracting the ordered item elseif quantity = 0 set error variable if error variable is not set update orders table with the current order_id, adding the order_total COMMIT else ROLLBACK output error
Pretty nice, eh? Much cleaner than the MyISAM transaction-equivalent, I think. Keep in mind that you get all the ACID properties with this code if you’re using BDB tables. There is only one line of this code that doesn’t really work well and needs to be improved.
Notice that for each item in the order, I first check the quantity from the
inventory table. That quantity will come from a SELECT statement, something like
SELECT in_stock FROM inventory WHERE product_id=1. But in most cases, the inventory is going to change just after this SELECT is run. If there’s sufficient quantity in stock, this person is going to buy the item and therefore the quantity listed in the table will decrement by one when the transaction is complete. You’d hate to see other clients get data that will be inaccurate in a matter of milliseconds.
In a case like this, you’d be much better off putting an exclusive lock on a page containing the row in question until the transaction is complete. Remember, with an exclusive lock, all other clients are forbidden from even reading the locked rows. But unfortunately, BDB tables have no way to execute exclusive locks on pages of data. Only by instituting a table-wide write lock can you safeguard the data in the inventory table.
When revised with the table-wide write lock the pseudo-code looks like this:
set autocommit=0 INSERT query into buyers table. run last_insert_id() to get user_id BEGIN WORK run INSERT into orders table run last_insert_id() to get order_id LOCK TABLES inventory WRITE for each of the items in the order get quantity from the inventory table if quantity is > 0 insert into order_items table update inventory table subtracting the ordered item elseif quantity = 0 set error variable UNLOCK TABLES; if error variable is not set update orders table with the current order_id, adding the order_total COMMIT else ROLLBACK
The LOCK TABLES command here is a bit messy. But with the other two MySQL tables you won’t have to worry about this kind of lock.
Working with Gemini Tables
It’s difficult to talk about Gemini tables without touching on the controversy surrounding the maker of the Gemini table, NuSphere. The NuSphere corporation is currently in a legal dispute with MySQL AB, the folks who code the MySQL core engine. The issues include copyright infringement and abuse of the GNU GPL. You may want to read further about these issues before choosing the Gemini table type.
Gemini offers a more granular level of locking than BDB. With Gemini (and InnoDB, which I’ll discuss momentarily), you get the advantage of row-level locking. With Gemini, you can apply both share and exclusive locks to one row at a time. A row-level locking scheme will generally achieve higher concurrency than page-level locking, and much greater concurrency than table-level locking.
You can check if you have Gemini installed by running
show variables like '%have%'. And if you don’t have Gemini, you can get an installer that includes MySQL, PHP, mod_perl and other goodies from the NuSphere website. Note that the most current release of the Gemini table seems to have some bugs running on Windows 98, so if any of the following experiments crash your NuSphere MySQL installation on Windows 98, don’t blame me.
To create Gemini tables include the
type=gemini syntax at the end of the CREATE statement, and to transfer a current table to Gemini, run an ALTER TABLE query. The same caveats I mentioned for BDB apply here. For example:
create table inventory( product_id int not null primary key, in_stock int not null )type=gemini; ALTER table table_name type=gemini;
Unlike BDB, Gemini tables implement both shared and exclusive locks. If a thread runs an UPDATE or DELETE statement on a row, Gemini will place an exclusive lock on that row, preventing other threads from either reading or altering the locked row until the transaction is complete. If you think about it, this makes perfect sense. If a row is about to change, other threads should be protected from seeing potentially inaccurate data.
If a thread runs a SELECT statement on a row, a shared lock will be placed on that row so other threads can read the row with SELECTs but they won’t be able to alter the row with UPDATE or DELETE statements.
You can do some quick experiments with Gemini tables to see how this locking works. Assume you have the following Gemini table.
mysql> select * from inventory; +------------+----------+ | product_id | in_stock | +------------+----------+ | 1 | 24 | | 2 | 47 | +------------+----------+ 2 rows in set (0.05 sec)
Open two copies of the command-line client. In the first start a transaction, then run a SELECT on row 1.
mysql> set autocommit=0; mysql> begin work; mysql> SELECT * FROM inventory WHERE product_id=1;
Now in the second client, start a transaction and try two statements, a SELECT and an UPDATE.
mysql> set autocommit=0; mysql> begin work; mysql> SELECT * FROM inventory WHERE product_id=1; mysql> UPDATE inventory SET in_stock=23 WHERE product_id=1;
You’ll see that the SELECT executes immediately, but the UPDATE doesn’t execute. The UPDATE won’t execute until you COMMIT or ROLLBACK the transaction started in the first client, which releases the shared lock.
You can re-run this experiment trying an UPDATE in the first client. You’ll notice that the second client is prevented from reading the locked row with a SELECT. When a client runs an UPDATE statement, it gets an exclusive lock on that row.
When I discussed BDB tables, I pointed out that at times when inventory numbers are changing fast, you need a SELECT to obtain an exclusive lock so your users access the most up-to-date information. When you run a SELECT on the
inventory table to find out the number of items currently in stock, you’re better off locking out all other threads from reading that row because that number is likely to change very quickly — an UPDATE will almost always follow that SELECT.
With Gemini tables, you can get a exclusive lock while running a SELECT by using the FOR UPDATE keywords. For example.
SELECT * FROM inventory WHERE product_id=1 FOR UPDATE
Using this added power, the shopping cart pseudo-code can be re-written like so:
set autocommit=0 INSERT query into buyers table. run last_insert_id() to get user_id BEGIN WORK run INSERT into orders table run last_insert_id() to get order_id for each of the items in the order get quantity from the inventory table with SELECT ... FOR UPDATE if quantity is > 0 insert into order_items table update inventory table subtracting the ordered item elseif quantity = 0 set error variable if error variable is not set update orders table with the current order_id, adding the order_total COMMIT else ROLLBACK
This is much cleaner, and much safer, than non-transactional code. And you can do something similar with InnoDB tables.
Using InnoDB Tables
InnoDB offers row-level locking and works on a wide variety of operating systems, including most UNIX variants, MacOS X, and Windows. This table type is now included by default in most MySQL installations, so if you get current binary from mysql.com, you’ll have InnoDB tables. You can check if you have InnoDB installed by running
SHOW VARIABLES LIKE '%have%';.
You can create InnoDB tables by using
type=innodb in the CREATE statement or by running an ALTER TABLE command. For example:
create table inventory( product_id int not null primary key, in_stock int not null )type=innodb; ALTER table table_name type=innodb;
The InnoDB transaction model is not quite as straightforward as the Gemini model. If you don’t understand how InnoDB implements transactions, you could end up with some serious problems with your data.
InnoDB uses what’s called a multi-versioning concurrency model. Essentially, this means that each transaction is locked to a view of the data at a specific point in time. When a thread initiates a transaction, InnoDB gets a snapshot of the data based on the exact moment the transaction started. To understand how this works, take a look at the following example. You have a simple InnoDB table:
mysql> select * from inventory; +------------+----------+ | product_id | in_stock | +------------+----------+ | 1 | 24 | | 2 | 47 | +------------+----------+
In one copy of the command line client, you start a transaction:
mysql> set AUTOCOMMIT=0; mysql> begin work;
This thread now has a snapshot of the table seen above. If you were to then open another copy of the client and start another transaction, that second client would have the same snapshot of the same table. At this point, the first client runs an UPDATE statement and commits the transaction.
mysql> UPDATE inventory set in_stock=23 WHERE product_id=1 mysql> commit;
Even after this COMMIT, the second client will not see the results of the UPDATE. The second client receives a snapshot of the data at the moment its transaction began. Until the second client finishes its transaction (with a COMMIT or ROLLBACK), it will not be able to view the changes implemented by other threads that took place while the transaction was active.
Things get really sticky if the second client was also running an UPDATE on the same row. If the first and second clients start their transactions at the same time, they would get the same snapshots of data. The first client would run an update,
WHERE order_id=3, but the second client would be unaware of the change. The second client could then run its own update,
UPDATE WHERE order_id=3. After both of these clients commit their transactions, only one of the values can survive. As it turns out, the last one to COMMIT wins.
As you can see, this is potentially a dangerous situation. If you’re not careful, some of your threads could see out-of-date information.
But there are steps you can take to make sure the rows obtained by your transactions are up to date. It may not surprise you that staying current involves locking. With InnoDB, threads can apply the same types of locks that were available in Gemini.
In Gemini tables, when you run a SELECT within a transaction, a share lock is placed on the rows read by the SELECT. However, by default, InnoDB does not place any locks on rows during a SELECT. But you can let InnoDB know that you want to obtain a share lock by the adding the clause LOCK IN SHARE MODE at the end of a SELECT.
When a thread issues a SELECT with a LOCK IN SHARE MODE clause, InnoDB goes beyond the snapshot of data taken at the start of the transaction and looks for the latest committed data. If another thread is operating on that data (via an UPDATE, DELETE, or another LOCK IN SHARE MODE), the first thread won’t be able to obtain the share lock and will wait until the other thread finishes its transaction and releases its locks. By attempting to obtain the share lock and receiving the latest committed data, the LOCK IN SHARE MODE clause ensures that the data read by a SELECT will be accurate and up to date.
If you’d like to see how this works, try the following, using the same table shown above. In one client, start a transaction and then run an UPDATE statement:
mysql> set autocommit=0; mysql> begin work; mysql> update inventory set in_stock=22 where product_id=1;
Now in the second client, run two different SELECTs. The first you will run without the LOCK IN SHARE MODE clause. Try
SELECT * FROM inventory. This will execute immediately, but the data is potentially inaccurate because there’s no telling if the first client will COMMIT or ROLLBACK the transaction. If you alter that SELECT statement to include the LOCK IN SHARE MODE clause, the client will not respond until the first client issues a COMMIT or ROLLBACK.
Threads can obtain exclusive locks in SELECT statements by using the same FOR UPDATE clause that is used with Gemini.
The pseudo-code shown for Gemini will also work for InnoDB tables. But be very careful when writing your SELECT statements for InnoDB. You want to make sure you’re getting the latest, most accurate data.
The Right Choice for You
At this point, InnoDB is the most logical choice for most. It offers row-level locking, is available on most every operating system, and is getting a lot of support from MySQL AB. InnoDB is also the first table type to enforce foreign key constraints. But if you’re working with an installation that has support of one of the other transactional tables, both of those will work just fine.
When you decide on the table type you wish to work with, you’ll have more technical matters to consider. You should read everything the MySQL manual has to say on the table type you end up using.
Pay especially careful attention to your logs. The transactional tables use logs to help recover from crashes. If a hard-disk failure causes a crash on your system, you could be in trouble if your logs are on the same hard disk as your MySQL data.
If you’re unhappy with the types of locks initiated by specific statements, you can adjust the locking behavior of Gemini and BDB table with the SET TRANSACTION command. I won’t cover it in detail here because the defaults discussed in this tutorial are usually preferable. Note that InnoDB tables are unaffected by SET TRANSACTION command.
Transactions are a great improvement to the MySQL database. If you’re not using them now, consider using them in the future. Because when that backhoe works it way through your power cord, you’ll be awfully glad you did.