how MySQL Query Optimization works to run faster?

By: Guest
Date: 0000-00-00-00:00:00-
Response
7
Database management systems implement abstract concepts but do so on real hardware bound by real physical constraints. As a result, queries take time—sometimes an annoyingly long time. Find out how to minimize your wait in this sample chapter.

The world of relational database theory is a world dominated by tables and sets, and operations on tables and sets. A database is a set of tables, and a table is a set of rows and columns. When you issue a SELECT statement to retrieve rows from a table, you get back another set of rows and columns—that is, another table. These are abstract notions that make no reference to the underlying representation a database system uses to operate on the data in your tables. Another abstraction is that operations on tables happen all at once; queries are conceptualized as set operations and there is no concept of time in set theory.

The real world, of course, is quite different. Database management systems implement abstract concepts but do so on real hardware bound by real physical constraints. As a result, queries take time—sometimes an annoyingly long time. And we, being impatient creatures, don't like to wait, so we leave the abstract world of instantaneous mathematical operations on sets and look around for ways to speed up our queries. Fortunately, there are several techniques for doing so:

*

We index tables to allow the database server to look up rows more quickly.

*

We consider how to write queries to take advantage of those indexes to the fullest extent, and use the EXPLAIN statement to check whether the MySQL server really is doing so.

*

We write queries to affect the server's scheduling mechanism so that queries arriving from multiple clients cooperate better.

*

We modify the server's operating parameters to get it to perform more efficiently. We think about what's going on with the underlying hardware and how we can work around its physical constraints to improve performance.

Those are the kinds of issues that this chapter focuses on, with the goal of assisting you in optimizing the performance of your database system so that it processes your queries as quickly as possible. MySQL is already quite fast, but even the fastest database can run queries more quickly if you help it do so.

Using Indexing

Indexing is the most important tool you have for speeding up queries. Other techniques are available to you, too, but generally the one thing that makes the most difference is the proper use of indexes. On the MySQL mailing list, people often ask for help in making a query run faster. In a surprisingly large number of cases, there are no indexes on the tables in question, and adding indexes often solves the problem immediately. It doesn't always work like that, because optimization isn't always simple. Nevertheless, if you don't use indexes, in many cases you're just wasting your time trying to improve performance by other means. Use indexing first to get the biggest performance boost and then see what other techniques might be helpful.

This section describes what an index is and how indexing improves query performance. It also discusses the circumstances under which indexes might degrade performance and provides guidelines for choosing indexes for your table wisely. In the next section, we'll discuss MySQL's query optimizer that attempts to find the most efficient way to execute queries. It's good to have some understanding of the optimizer in addition to knowing how to create indexes because then you'll be better able to take advantage of the indexes you create. Certain ways of writing queries actually prevent your indexes from being useful, and generally you'll want to avoid having that happen.

Benefits of Indexing

Let's consider how an index works by beginning with a table that has no indexes. An unindexed table is simply an unordered collection of rows. For example, Figure 4.1 shows the ad table that was discussed in Chapter 1, "Getting Started with MySQL and SQL." There are no indexes on this table, so to find the rows for a particular company, it's necessary to examine each row in the table to see if it matches the desired value. This involves a full table scan, which is slow, as well as tremendously inefficient if the table is large but contains only a few records that match the search criteria.

Figure 4.2 shows the same table, but with the addition of an index on the company_num column in the ad table. The index contains an entry for each row in the ad table, but the index entries are sorted by company_num value. Now, instead of searching through the table row by row looking for items that match, we can use the index. Suppose that we're looking for all rows for company 13. We begin scanning the index and find three values for that company. Then we reach the index value for company 14, which is higher than the one we're looking for. Index values are sorted, so when we read the index record containing 14, we know we won't find any more matches and can quit looking. Thus, one efficiency gained by using the index is that we can tell where the matching rows end and can skip the rest. Another efficiency comes about through the use of positioning algorithms for finding the first matching entry without doing a linear scan from the start of the index (for example, a binary search is much quicker than a scan). That way, we can quickly position to the first matching value and save a lot of time in the search. Databases use various techniques for positioning to index values quickly, but it's not so important here what those techniques are. What's important is that they work and that indexing is a good thing.

Figure 4.1

Figure 4.1 Unindexed ad table.

Figure 4.2

Figure 4.2 Indexed ad table.

You might be asking why we don't just sort the data rows and dispense with the index. Wouldn't that produce the same type of improvement in search speed? Yes, it would—if the table had a single index. But you might want to add a second index, and you can't sort the data rows two different ways at once. (For example, you might want one index on customer names and another on customer ID numbers or phone numbers.) Using indexes as entities separate from the data rows solves the problem and allows multiple indexes to be created. In addition, rows in the index are generally shorter than data rows. When you insert or delete new values, it's easier to move around shorter index values to maintain the sort order than to move around the longer data rows.

The particular details of index implementations vary for different MySQL storage engines. For example, for a MyISAM table, the table's data rows are kept in a data file, and index values are kept in an index file. You can have more than one index on a table, but they're all stored in the same index file. Each index in the index file consists of a sorted array of key records that are used for fast access into the data file.

By contrast, the BDB and InnoDB storage engines do not separate data rows and index values in the same way, although both maintain indexes as sets of sorted values. By default, the BDB engine uses a single file per table to store both data and index values. The InnoDB engine uses a single tablespace within which it manages data and index storage for all InnoDB tables. InnoDB can be configured to create each table with its own tablespace, but even so, a table's data and indexes are stored in the same tablespace file.

The preceding discussion describes the benefit of an index in the context of single-table queries, where the use of an index speeds searches significantly by eliminating the need for full table scans. Indexes are even more valuable when you're running queries involving joins on multiple tables. In a single-table query, the number of values you need to examine per column is the number of rows in the table. In a multiple-table query, the number of possible combinations skyrockets because it's the product of the number of rows in the tables.

Suppose that you have three unindexed tables, t1, t2, and t3, each containing a column i1, i2, and i3, respectively, and each consisting of 1,000 rows that contain the numbers 1 through 1000. A query to find all combinations of table rows in which the values are equal looks like this:

SELECT t1.i1, t2.i2, t3.i3

FROM t1, t2, t3

WHERE t1.i1 = t2.i2 AND t2.i1 = t3.i3;

The result of this query should be 1,000 rows, each containing three equal values. If we process the query in the absence of indexes, we have no idea which rows contain which values without scanning them all. Consequently, we must try all combinations to find the ones that match the WHERE clause. The number of possible combinations is 1,000 x 1,000 x 1,000 (one billion!), which is a million times more than the number of matches. That's a lot of wasted effort. The example illustrates that as tables grow, the time to process joins on those tables grows even more if no indexes are used, leading to very poor performance. We can speed things up considerably by indexing the tables, because the indexes allow the query to be processed like this:

1.

Select the first row from table t1 and see what value the row contains.

2.

Using the index on table t2, go directly to the row that matches the value from t1. Similarly, using the index on table t3, go directly to the row that matches the value from t2.

3.

Proceed to the next row of table t1 and repeat the preceding procedure. Do this until all rows in t1 have been examined.

In this case, we still perform a full scan of table t1, but we can do indexed lookups on t2 and t3 to pull out rows from those tables directly. The query runs about a million times faster this way—literally. This example is contrived for the purpose of making a point, of course. Nevertheless, the problems it illustrates are real, and adding indexes to tables that have none often results in dramatic performance gains.

MySQL uses indexes in several ways:

*

As just described, indexes are used to speed up searches for rows matching terms of a WHERE clause or rows that match rows in other tables when performing joins.

*

For queries that use the MIN() or MAX() functions, the smallest or largest value in an indexed column can be found quickly without examining every row.

*

MySQL can often use indexes to perform sorting and grouping operations quickly for ORDER BY and GROUP BY clauses.

*

Sometimes MySQL can use an index to reading all the information required for a query. Suppose that you're selecting values from an indexed numeric column in a MyISAM table, and you're not selecting other columns from the table. In this case, when MySQL reads an index value from the index file, it obtains the same value that it would get by reading the data file. There's no reason to read values twice, so the data file need not even be consulted.

Costs of Indexing

In general, if MySQL can figure out how to use an index to process a query more quickly, it will. This means that, for the most part, if you don't index your tables, you're hurting yourself. You can see that I'm painting a rosy picture of the benefits of indexing. Are there disadvantages? Yes, there are. There are costs both in time and in space. In practice, these drawbacks tend to be outweighed by the advantages, but you should know what they are.

First, indexes speed up retrievals but slow down inserts and deletes, as well as updates of values in indexed columns. That is, indexes slow down most operations that involve writing. This occurs because writing a record requires writing not only the data row, it requires changes to any indexes as well. The more indexes a table has, the more changes need to be made, and the greater the average performance degradation. In the section "Loading Data Efficiently," we'll go into more detail about this phenomenon and what you can do about it.

Second, an index takes up disk space, and multiple indexes take up correspondingly more space. This might cause you to reach a table size limit more quickly than if there are no indexes:

*

For a MyISAM table, indexing it heavily may cause the index file to reach its maximum size more quickly than the data file.

*

For BDB tables, which store data and index values together in the same file, adding indexes causes the table to reach the maximum file size more quickly.

*

All InnoDB tables that are located within the InnoDB shared tablespace compete for the same common pool of space, and adding indexes depletes storage within this tablespace more quickly. However, unlike the files used for MyISAM and BDB tables, the InnoDB shared tablespace is not bound by your operating system's file-size limit, because it can be configured to use multiple files. As long as you have additional disk space, you can expand the tablespace by adding new components to it.

InnoDB tables that use individual tablespaces are constrained the same way as BDB tables because data and index values are stored together in a single file.

The practical implication of both these factors is that if you don't need a particular index to help queries perform better, don't create it.

Choosing Indexes

The syntax for creating indexes is covered in the section "Creating Indexes," of Chapter 2, "MySQL SQL Syntax and Use." I assume here that you've read that section. But knowing syntax doesn't in itself help you determine how your tables should be indexed. That requires some thought about the way you use your tables. This section gives some guidelines on how to identify candidate columns for indexing and how best to set up indexes:

Index columns that you use for searching, sorting, or grouping, not columns you only display as output. In other words, the best candidate columns for indexing are the columns that appear in your WHERE clause, columns named in join clauses, or columns that appear in ORDER BY or GROUP BY clauses. Columns that appear only in the output column list following the SELECT keyword are not good candidates:

SELECT

col_a <- not a candidate

FROM

tbl1 LEFT JOIN tbl2

ON tbl1.col_b = tbl2.col_c <- candidates

WHERE

col_d = expr; <- a candidate

The columns that you display and the columns you use in the WHERE clause might be the same, of course. The point is that appearance of a column in the output column list is not in itself a good indicator that it should be indexed.

Columns that appear in join clauses or in expressions of the form col1 = col2 in WHERE clauses are especially good candidates for indexing. col_b and col_c in the query just shown are examples of this. If MySQL can optimize a query using joined columns, it cuts down the potential table-row combinations quite a bit by eliminating full table scans.

Consider column cardinality. The cardinality of a column is the number of distinct values that it contains. For example, a column that contains the values 1, 3, 7, 4, 7, and 3 has a cardinality of four. Indexes work best for columns that have a high cardinality relative to the number of rows in the table (that is, columns that have many unique values and few duplicates). If a column contains many different age values, an index will differentiate rows readily. An index will not help for a column that is used to record sex and contains only the two values 'M' and 'F'. If the values occur about equally, you'll get about half of the rows whichever value you search for. Under these circumstances, the index might never be used at all, because the query optimizer generally skips an index in favor of a full table scan if it determines that a value occurs in a large percentage of a table's rows. The conventional wisdom for this percentage used to be "30%." Nowadays the optimizer is more complex and takes other factors into account, so the percentage is not the sole determinant of when MySQL prefers a scan over using an index.

Index short values. Use smaller data types when possible. For example, don't use a BIGINT column if a MEDIUMINT is large enough to hold the values you need to store. Don't use CHAR(100) if none of your values are longer than 25 characters. Smaller values improve index processing in several ways:

*

Shorter values can be compared more quickly, so index lookups are faster.

*

Smaller values result in smaller indexes that require less disk I/O.

*

With shorter key values, index blocks in the key cache hold more key values. MySQL can hold more keys in memory at once, which improves the likelihood of locating key values without reading additional index blocks from disk.

For the InnoDB and BDB storage engines that use clustered indexes, it's especially beneficial to keep the primary key short. A clustered index is one where the data rows are stored together with (that is, clustered with) the primary key values. Other indexes are secondary indexes; these store the primary key value with the secondary index values. A lookup in a secondary index yields a primary key value, which then is used to locate the data row. The implication is that primary key values are duplicated into each secondary index, so if primary key values are longer, the extra storage is required for each secondary index as well.

Index prefixes of string values. If you're indexing a string column, specify a prefix length whenever it's reasonable to do so. For example, if you have a CHAR(200) column, don't index the entire column if most values are unique within the first 10 or 20 characters. Indexing the first 10 or 20 characters will save a lot of space in the index, and probably will make your queries faster as well. By indexing shorter values, you gain the advantages described in the previous item relating to comparison speed and disk I/O reduction. You want to use some common sense, of course. Indexing just the first character from a column isn't likely to be that helpful because then there won't be very many distinct values in the index.

You can index prefixes of CHAR, VARCHAR, BINARY, VARBINARY, BLOB, and TEXT columns. The syntax is described in "Creating Indexes," in Chapter 2.

Take advantage of leftmost prefixes. When you create an n-column composite index, you actually create n indexes that MySQL can use. A composite index serves as several indexes because any leftmost set of columns in the index can be used to match rows. Such a set is called a "leftmost prefix." (This is different from indexing a prefix of a column, which is using the first n characters of the column for index values.)

Suppose that you have a table with a composite index on columns named state, city, and zip. Rows in the index are sorted in state/city/zip order, so they're automatically sorted in state/city order and in state order as well. This means that MySQL can take advantage of the index even if you specify only state values in a query, or only state and city values. Thus, the index can be used to search the following combinations of columns:

state, city, zip

state, city

state

MySQL cannot use the index for searches that don't involve a leftmost prefix. For example, if you search by city or by zip, the index isn't used. If you're searching for a given state and a particular ZIP code (columns 1 and 3 of the index), the index can't be used for the combination of values, although MySQL can narrow the search using the index to find rows that match the state.

Don't over-index. Don't index everything in sight based on the assumption "the more, the better." That's a mistake. Every additional index takes extra disk space and hurts performance of write operations, as has already been mentioned. Indexes must be updated and possibly reorganized when you modify the contents of your tables, and the more indexes you have, the longer this takes. If you have an index that is rarely or never used, you'll slow down table modifications unnecessarily. In addition, MySQL considers indexes when generating an execution plan for retrievals. Creating extra indexes creates more work for the query optimizer. It's also possible (if unlikely) that MySQL will fail to choose the best index to use when you have too many indexes. Maintaining only the indexes you need helps the query optimizer avoid making such mistakes.

If you're thinking about adding an index to a table that is already indexed, consider whether the index you're thinking about adding is a leftmost prefix of an existing multiple-column index. If so, don't bother adding the index because, in effect, you already have it. (For example, if you already have an index on state, city, and zip, there is no point in adding an index on state.)

Match index types to the type of comparisons you perform. When you create an index, most storage engines choose the index implementation they Match index types to the type of comparisons you perform. When you create an index, most storage engines choose the index implementation they will use. For example, InnoDB always uses B-tree indexes. MySQL also uses B-tree indexes, except that it uses R-tree indexes for spatial data types. However, the MEMORY storage engine supports hash indexes and B-tree indexes, and allows you to select which one you want. To choose an index type, consider what kind of comparison operations you plan to perform on the indexed column:

*

For a hash index, a hash function is applied to each column value. The resulting hash values are stored in the index and used to perform lookups. (A hash function implements an algorithm that is likely to produce distinct hash values for distinct input values. The advantage of using hash values is that they can be compared more efficiently than the original values.) Hash indexes are very fast for exact-match comparisons performed with the = or <=> operators. But they are poor for comparisons that look for a range of values, as in these expressions:

id < 30

weight BETWEEN 100 AND 150

*

B-tree indexes can be used effectively for comparisons involving exact or range-based comparisons that use the <, <=, =, >=, >, <>, !=, and BETWEEN operators. B-tree indexes can also be used for LIKE pattern matches if the pattern begins with a literal string rather than a wildcard character.

If you use a MEMORY table only for exact-value lookups, a hash index is a good choice. This is the default index type for MEMORY tables, so you need do nothing special. If you need to perform range-based comparisons with a MEMORY table, you should use a B-tree index instead. To specify this type of index, add USING BTREE to your index definition. For example:

CREATE TABLE lookup

(

id INT NOT NULL,

name CHAR(20),

PRIMARY KEY USING BTREE (id)

) ENGINE = MEMORY;

If the types of statements that you expect to execute warrant it, a single MEMORY table can have both hash indexes and B-tree indexes, even on the same column.

Some types of comparisons cannot use indexes. If you perform comparisons only by passing column values to a function such as STRCMP(), there is no value in indexing it. The server must evaluate the function value for each row, which precludes use of an index on the column.

Use the slow-query log to identify queries that may be performing badly. This log can help you find queries that might benefit from indexing. You can view this log directly (it is written as a text file), or use the mysqldumpslow utility to summarize its contents. (See Chapter 11, "General MySQL Administration," for a discussion of MySQL's log files.) If a given query shows up over and over in the slow-query log, that's a clue you've found a query that might not be written optimally. You may be able to rewrite it to make it run more quickly. Keep in mind when assessing your slow-query log that "slow" is measured in real time, so more queries will show up in the slow-query log on a heavily loaded server than on a lightly loaded one.

[d] By: Guest
Date: 0000-00-00-00:00:00--
Response
2
Improving Database Performance with Partitioning

Improving Database Performance with Partitioning

A few years ago, I wrote an article entitled "The Foundation of Excellent Performance" (still available at http://www.tdan.com/i016fe03.htm) where I argued against the notion that SQL code was the number one contributor to performance in a database-driven system. Instead, I stated in the article that I firmly believed how good physical database design was far and away the leading component of superior database performance. In addition, I showed that Oracle's own research illustrated how poor design was the main culprit behind database downtime (planned or unplanned). In the years since then, I've not changed my stance and still think that any DBA who wants a high-performance database has got to invest in intelligent and savvy physical design to produce the kind of response times that make end users smile instead of scream.

One of the reasons I'm very excited about the release of MySQL 5.1 is that it contains a potent new weapon for designing supercharged databases that any MySQL DBA should quickly learn how to use and exploit. By smartly using the new 5.1 partitioning feature, a DBA can oftentimes dramatically improve the performance of most any VLDB or data warehouse they happen to be in charge of.

What is Partitioning?

Partitioning is a physical database design technique that many data modelers and DBAs are quite familiar with. Although partitioning can be used to accomplish a number of various objectives, the main goal is to reduce the amount of data read for particular SQL operations so that overall response time is reduced.

There are two major forms of partitioning:

1. Horizontal Partitioning - this form of partitioning segments table rows so that distinct groups of physical row-based datasets are formed that can be addressed individually (one partition) or collectively (one-to-all partitions). All columns defined to a table are found in each set of partitions so no actual table attributes are missing. An example of horizontal partitioning might be a table that contains ten years worth of historical invoice data being partitioned into ten distinct partitions, where each partition contains a single year's worth of data.

2. Vertical Partitioning - this partitioning scheme is traditionally used to reduce the width of a target table by splitting a table vertically so that only certain columns are included in a particular dataset, with each partition including all rows. An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren't addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another.

Before database vendors began building partitioning (mainly horizontal) into their engines, DBAs and data modelers had to physically design separate table structures to hold the desired partitions, which either held redundant data (separate tables with data that were based off a live parent table) or were linked together to form one logical parent object (usually via a view). This practice has since been made obsolete for the most part for horizontal partitioning, although it is sometimes still done for vertical partitioning.

Partitioning in MySQL 5.1

One of the great new features in MySQL 5.1 is support for horizontal partitioning. The really good news about MySQL and the new 5.1 partitioning feature is all the major forms of partitioning are supported:

1. Range - this partitioning mode allows a DBA to specify various ranges for which data is assigned. For example, a DBA may create a partitioned table that is segmented by three partitions that contain data for the 1980's, 1990's, and everything beyond and including the year 2000.

2. Hash - this partitioning mode allows a DBA to separate data based on a computed hash key that is defined on one or more table columns, with the end goal being an equal distribution of values among partitions. For example, a DBA may create a partitioned table that has ten partitions that are based on the table's primary key.

3. Key - a special form of Hash where MySQL guarantees even distribution of data through a system-generated hash key.

4. List - this partitioning mode allows a DBA to segment data based on a pre-defined list of values that the DBA specifies. For example, a DBA may create a partitioned table that contains three partitions based on the years 2004, 2005, and 2006.

5. Composite - this final partitioning mode allows a DBA to perform sub-partitioning where a table is initially partitioned by, for example range partitioning, but then each partition is segmented even further by another method (for example, hash).

There are a number of benefits that come with partitioning, but the two main advantages are:

Increased performance - during scan operations, the MySQL optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution. For example, a million row table may be broken up into ten different partitions in range style so that each partition contains 100,000 rows. If a query is issued that only needs data from one of the partitions, and a table scan operation is necessary, only 100,000 rows will be accessed instead of a million. Obviously, it is much quicker for MySQL to sample 100,000 rows than one million so the query will complete much sooner. The same benefit is derived should index access be possible as local partitioned indexes are created for partitioned tables. Finally, it is possible to stripe a partitioned table across different physical drives by specifying different file system/directory paths for specific partitions. This allows physical I/O contention to be reduced when multiple partitions are accessed at the same time.

Simplified data management - partitioning allows a DBA to have more control over how data is managed inside of the database. By intelligently creating partitions, a DBA can simplify how certain data operations are performed. For example, a DBA can drop specific partitions in a partitioned table while the remaining partitions remain intact (as opposed to crafting a fragmentation-producing mass delete operation for the whole table). Further, partitions are automatically maintained by MySQL so the DBA doesn't have to manually separate and maintain a horizontal partitioning scheme for a table. For example, a DBA can create a history table that holds data for customers that are partitioned across various year ranges, and have those partitioned automatically enforced by the database server with no DBA intervention being necessary.

From a design-for-performance standpoint, we're mainly interested in point one above. By smartly using partitioning and matching the design to properly coded queries, a dramatic performance impact can be realized. Let's take a quick test drive of partitioning in MySQL 5.1 to see this in action. Note that all tests below were done on a Dell Optiplex box with a Pentium 4 3.00GHz processor, 1GB of RAM, running Fedora Core 4 and MySQL 5.1.6 alpha.

Partitioning in Action

To see the positive benefit partitioning can have on a database, let's create identical MyISAM tables that contain date sensitive information, but let's partition one and leave the other a standard heap table. For our partitioned table, we'll partition based on range and use a function that segments the data based on year:

mysql> CREATE TABLE part_tab

-> ( c1 int default NULL,

-> c2 varchar(30) default NULL,

-> c3 date default NULL

->

-> ) engine=myisam

-> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),

-> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,

-> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,

-> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,

-> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,

-> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),

-> PARTITION p11 VALUES LESS THAN MAXVALUE );

Query OK, 0 rows affected (0.00 sec)

Notice that we designed partitions for a particular year and finished with one catch-all partition to get anything that doesn't fall into any of the specific date partitions. Now let's create a mirror MyISAM table that's not partitioned:

mysql> create table no_part_tab

-> (c1 int(11) default NULL,

-> c2 varchar(30) default NULL,

-> c3 date default NULL) engine=myisam;

Query OK, 0 rows affected (0.02 sec)

Now let's create a procedure (thanks to Peter Gulutzan for the code…) that will fill our partitioned table with 8 million rows that distributes data fairly evenly across the various partitions. Once filled, we'll then insert the same data into our non-partitioned MyISAM clone table:

mysql> delimiter //

mysql> CREATE PROCEDURE load_part_tab()

-> begin

-> declare v int default 0;

-> while v < 8000000

-> do

-> insert into part_tab

-> values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 3652));

-> set v = v + 1;

-> end while;

-> end

-> //

Query OK, 0 rows affected (0.00 sec)

mysql> delimiter ;

mysql> call load_part_tab();

Query OK, 1 row affected (8 min 17.75 sec)

mysql> insert into no_part_tab select * from part_tab;

Query OK, 8000000 rows affected (51.59 sec)

Records: 8000000 Duplicates: 0 Warnings: 0

With our tables now ready, let's issue a simple date range query on both tables - the non-partitioned table first and then the partitioned table - followed by EXPLAIN's, and see what MySQL does:

mysql> select count(*) from no_part_tab where

-> c3 > date '1995-01-01' and c3 < date '1995-12-31';

+----------+

| count(*) |

+----------+

| 795181 |

+----------+

1 row in set (38.30 sec)

mysql> select count(*) from part_tab where

-> c3 > date '1995-01-01' and c3 < date '1995-12-31';

+----------+

| count(*) |

+----------+

| 795181 |

+----------+

1 row in set (3.88 sec)

mysql> explain select count(*) from no_part_tab where

-> c3 > date '1995-01-01' and c3 < date '1995-12-31'G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: no_part_tab

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 8000000

Extra: Using where

1 row in set (0.00 sec)

mysql> explain partitions select count(*) from part_tab where

-> c3 > date '1995-01-01' and c3 < date '1995-12-31'G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: part_tab

partitions: p1

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 798458

Extra: Using where

1 row in set (0.00 sec)

The power of proper partition and query design can easily be seen as the partitioned table access delivers a whopping 90% response time reduction over the non-partitioned table. The EXPLAIN plans showcase why this is (notice the new EXPLAIN syntax for partitioned objects) as only the first partition in the partitioned table is accessed with all others being skipped.

As a MySQL DBA, it's easy to get excited about the potential benefits that partitioning can provide, but you always want to make sure that the tool you use for database design matches the requirements and scenario of your particular application. Partitioning is best suited for VLDB's that contain a lot of query activity that targets specific portions/ranges of one or more database tables. Of course, other situations lend themselves to partitioning as well (e.g. data archiving, etc.)

A Quick Side Note on Vertical Partitioning

Although MySQL 5.1 automates horizontal partitioning, don't lose sight of vertical partitioning schemes when designing your databases. Although you have to do vertical partitioning manually, you can benefit from the practice in certain circumstances. For example, let's say you didn't normally need to reference or use the VARCHAR column defined in our previously shown partitioned table. Would the elimination of this column help query speed? Let's find out:

mysql> desc part_tab;

+-------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+-------+-------------+------+-----+---------+-------+

| c1 | int(11) | YES | | NULL | |

| c2 | varchar(30) | YES | | NULL | |

| c3 | date | YES | | NULL | |

+-------+-------------+------+-----+---------+-------+

3 rows in set (0.03 sec)

mysql> alter table part_tab drop column c2;

Query OK, 8000000 rows affected (42.20 sec)

Records: 8000000 Duplicates: 0 Warnings: 0

mysql> desc part_tab;

+-------+---------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+-------+---------+------+-----+---------+-------+

| c1 | int(11) | YES | | NULL | |

| c3 | date | YES | | NULL | |

+-------+---------+------+-----+---------+-------+

2 rows in set (0.00 sec)

mysql> select count(*) from part_tab where

-> c3 > date '1995-01-01' and c3 < date '1995-12-31';

+----------+

| count(*) |

+----------+

| 795181 |

+----------+

1 row in set (0.34 sec)

By removing the VARCHAR column from the design, you actually get another 90+% reduction in query response time. Beyond partitioning, this speaks to the effect wide tables can have on queries and why you should always ensure that all columns defined to a table are actually needed.

Wrap Up

A short article like this can't possibly cover all the benefits and mechanics of MySQL 5.1's partitioning, but a few notes of interest include:

* All storage engines support partitioning (MyISAM, Archive, InnoDB, etc.)

* Indexing support for partitioned tables include local indexes, which mirror each partition in a one-to-one scheme. In other words, if a partitioned table has ten partitions, then a local index for that table would also contain ten partitions.

* Metadata regarding partitioned tables can be found in the INFORMATION_SCHEMA database, with a new PARTITIONS table being available.

* All SHOW commands support the return of partitioned table and index metadata.

* Maintenance functions and a number of other operations can be performed on partitions (rather than acting on a full table), including:

o ADD PARTITION

o DROP PARTITION

o COALESCE PARTITION

o REORGANIZE PARTITION

o ANALYZE PARTITION

o CHECK PARTITION

o OPTIMIZE PARTITION

o REBUILD PARTITION

o REPAIR PARTITION

From a performance standpoint, the main take-away is that MySQL 5.1 partitioning is a powerful new tool that can be used in many physical database designs to dramatically improve performance and ease DBA management burdens. For more information on MySQL partitioning, you can visit out the online reference manual at http://dev.mysql.com/doc/refman/5.1/en/partitioning.html and visit the MySQL forums as there is a forum section devoted to partitioning, which can be referenced at http://forums.mysql.com/list.php?106.

Download a copy of MySQL 5.1 (which is now is beta) today and give partitioning a try. I think you will be pleased with all the new possibilities partitioning provides when it comes to creating a top-notch physical database design, which is the number one contributor to overall database performance.

[d] By: Guest
Date: 0000-00-00-00:00:00--
Response
What is 1 + 100

Just Updated::