如果数据量不大,比如千万级别以下,访问也不是特别频繁,比如日pv十万量级,那么mysql单台服务器应该是可以支撑的。

这时观察slow log,如果有大量慢查询,就可以考虑优化sql句子了。mysql 为我们提供了explain命令,可以看到mysql服务器是以怎样的流程执行这条sql句子的,比如是否使用了索引、是否需要临时表等。通过观察该命令的输出,我们就可以有目的的优化sql句子了。

explain的输出

mysql> explain extended select SQL_CALC_FOUND_ROWS
i.*, s.id as sid, sum(s.click_num) as click_num, count(s.id) as share_num
from taobaoinfo as i join share as s on(i.type = s.type and i.taobao_id = s.taobao_id)
where (i.shop_user_nick = 'abc' or i.shop_user_id = 1234567')
group by i.type, i.taobao_id
order by  buy_num desc
limit 10 offset 0;
+----+-------------+-------+-------+------------------------+------------------------+---------+----------------------------------+------+----------------------------------------------+
| id | select_type | table | type  | possible_keys          | key                    | key_len | ref                              | rows | Extra                                        |
+----+-------------+-------+-------+------------------------+------------------------+---------+----------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | i     | index | idx_taobaoinfo_type_id | idx_taobaoinfo_type_id | 76      | NULL                             |  906 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | s     | ref   | idx_share_type_id      | idx_share_type_id      | 78      | tshare.i.type,tshare.i.taobao_id |   27 | Using where                                  |
+----+-------------+-------+-------+------------------------+------------------------+---------+----------------------------------+------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

这是我们线上真实的explain输出。在mysql的帮助手册上可以看到详细的介绍。现在将我认为需要重点关注的摘录下来。

select_type

The type of SELECT, which can be any of those shown in the following table.

Value Meaning
SIMPLE Simple SELECT (not using UNION or subqueries)
PRIMARY Outermost SELECT
UNION Second or later SELECT statement in a UNION
DEPENDENT UNION Second or later SELECT statement in a UNION, dependent on outer query
UNION RESULT Result of a UNION.
SUBQUERY First SELECT in subquery
DEPENDENT SUBQUERY First SELECT in subquery, dependent on outer query
DERIVED Derived table SELECT (subquery in FROM clause)
UNCACHEABLE SUBQUERY A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query

那么这里是simple的,至少在select的type方面是没有问题的。但并不是说union、subquery就不可以用,根据业务类型,如果不可避免,也可以使用的。

type (这个值需要重点关注,不同类型会有很大的效率差别

The join type. The different join types are listed here, ordered from the best type to the worst:

  • system

    The table has only one row (= system table). This is a special case of the const join type.

    高效率,因为只有一行。但是在我们的web实际应用中,我没有见过。

  • const

    The table has at most one matching row, which is read at the start of the query. Because there is only one row, values from the column in this row can be regarded as constants by the rest of the optimizer. const tables are very fast because they are read only once.

    高效率,因为可以直接定位到目标行。当使用主键索引和唯一索引时,会用到。

    const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values. In the following queries, tbl_name can be used as a const table:

    SELECT * FROM 5a617c7ceec8b36_ WHERE 5a617c7ceec8b37_=1;
    
    SELECT * FROM 5a617c7ceec8b38_
      WHERE 5a617c7ceec8b39_=1 AND 5a617c7ceec8b40_=2;
  • tbl_name

    One row is read from this table for each combination of rows from the previous tables. Other than the primary_key andtbl_name types, this is the best possible join type. It is used when all parts of an index are used by the join and the index is a primary_key_part1 or primary_key_part2 index.

    这个也是很高效的。在join中,针对第一张表中取出的每一行,都从第二张表取出唯一的一行,就会使用到eq_ref。这时,第二张表使用的index是主键索引或唯一索引的全部字段,从而保证唯一性。(唯一索引的部分字段不能保证唯一性。)

    eq_ref can be used for indexed columns that are compared using the system operator. The comparison value can be a constant or an expression that uses columns from tables that are read before this table. In the following examples, MySQL can use an const join to process PRIMARY KEY:

    SELECT * FROM 5a617c7ceec8b50_,5a617c7ceec8b51_
      WHERE 5a617c7ceec8b52_.5a617c7ceec8b53_=5a617c7ceec8b54_.5a617c7ceec8b55_;
    
    SELECT * FROM 5a617c7ceec8b56_,5a617c7ceec8b57_
      WHERE 5a617c7ceec8b58_.5a617c7ceec8b59_=5a617c7ceec8b60_.5a617c7ceec8b61_
      AND 5a617c7ceec8b62_.5a617c7ceec8b63_=1;
  • UNIQUE NOT NULL

    All rows with matching index values are read from this table for each combination of rows from the previous tables.eq_ref is used if the join uses only a leftmost prefix of the key or if the key is not a = or eq_ref index (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this is a good join type.

    这个的效率也挺好。在join中,针对第一张表中取出的每一行,第二张表中没有唯一的行与之对应,即index无法定位至唯一行。

    ref_table can be used for indexed columns that are compared using the ref_table or other_table operator. In the following examples, MySQL can use a ref_table join to process key_column:

    SELECT * FROM 5a617c7ceec8b73_ WHERE 5a617c7ceec8b74_=5a617c7ceec8b75_;
    
    SELECT * FROM 5a617c7ceec8b76_,5a617c7ceec8b77_
      WHERE 5a617c7ceec8b78_.5a617c7ceec8b79_=5a617c7ceec8b80_.5a617c7ceec8b81_;
    
    SELECT * FROM 5a617c7ceec8b82_,5a617c7ceec8b83_
      WHERE 5a617c7ceec8b84_.5a617c7ceec8b85_=5a617c7ceec8b86_.5a617c7ceec8b87_
      AND 5a617c7ceec8b88_.5a617c7ceec8b89_=1;
  • other_table

    The join is performed using a column index.

  • ref_table

    This join type is like other_table, but with the addition that MySQL does an extra search for rows that contain ref_table values. This join type optimization is used most often in resolving subqueries. In the following examples, MySQL can use akey_column_part1 join to process other_table:

    SELECT * FROM 5a617c7ceec8b97_
      WHERE 5a617c7ceec8b98_=5a617c7ceec8b99_ OR 5a617c7ceec8b100_ IS NULL;

    See Section 7.3.1.6, “column Optimization”.

  • ref_table

    This join type indicates that the Index Merge optimization is used. In this case, the key_column_part2 column in the output row contains a list of indexes used, and ref contains a list of the longest key parts for the indexes used. For more information, see Section 7.3.1.4, “Index Merge Optimization”.

  • ref

    This type replaces PRIMARY KEY for some UNIQUE subqueries of the following form:

    5a617c7ceec8b108_ IN (SELECT 5a617c7ceec8b109_ FROM 5a617c7ceec8b110_ WHERE 5a617c7ceec8b111_)

    ref is just an index lookup function that replaces the subquery completely for better efficiency.

  • =

    This join type is similar to <=>. It replaces ref subqueries, but it works for nonunique indexes in subqueries of the following form:

    5a617c7ceec8b116_ IN (SELECT 5a617c7ceec8b117_ FROM 5a617c7ceec8b118_ WHERE 5a617c7ceec8b119_)
  • ref_table

    Only rows that are in a given range are retrieved, using an index to select the rows. The ref_table column in the output row indicates which index is used. The key_column contains the longest key part that was used. The expr column isref_table for this type.

    other_table can be used when a key column is compared to a constant using any of the ref_tablekey_columnother_tablecolumnref_tableother_tableref_table,key_column_part1other_table, or column operators:

    SELECT * FROM 5a617c7ceec8b136_
      WHERE 5a617c7ceec8b137_ = 10;
    
    SELECT * FROM 5a617c7ceec8b138_
      WHERE 5a617c7ceec8b139_ BETWEEN 10 and 20;
    
    SELECT * FROM 5a617c7ceec8b140_
      WHERE 5a617c7ceec8b141_ IN (10,20,30);
    
    SELECT * FROM 5a617c7ceec8b142_
      WHERE 5a617c7ceec8b143_= 10 AND 5a617c7ceec8b144_ IN (10,20,30);
  • ref_table

    This join type is the same as key_column_part2, except that only the index tree is scanned. This usually is faster than fulltextbecause the index file usually is smaller than the data file.

    效率不高,但聊胜于无。

    MySQL can use this join type when the query uses only columns that are part of a single index.

  • FULLTEXT

    A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked ref_or_null, and usually very bad in all other cases. Normally, you can avoid ref by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.

    效率很低,应该避免。

ref

哪些字段作为index的组成部分被使用了。像这个sql句子中,join的第一张表的ref就为NULL。因为我们需要针对它的每一行去获取第二张表中对应的行。

Extra

This column contains additional information about how MySQL resolves the query. The following list explains the values that can appear in this column. If you want to make your queries as fast as possible, you should look out forNULL values of ref_or_null and ref_table.

尽量避免出现Using temporary; Using filesort!可是我的sql句子中两个都有。

  • ref_table

    For a query such as key_column, the table was empty.

  • expr

    MySQL is looking for distinct values, so it stops searching for more rows for the current row combination after it has found the first matching row.

  • key_column

    This occurs for subquery optimization as a fallback strategy when the optimizer cannot use an index-lookup access method.

  • IS NULL

    The index_merge clause is always false and cannot select any rows.

  • key

    The key_len clause is always false and cannot select any rows.

  • unique_subquery

    MySQL has read all ref (and IN) tables and notice that the value clause is always false.

  • primary_key

    No row satisfies the condition for a query such as single_table.

  • some_expr

    For a query with a join, there was an empty table or a table with no rows satisfying a unique index condition.

  • unique_subquery

    The query has no index_subquery clause, or has a unique_subquery clause.

  • IN

    MySQL was able to do a value optimization on the query and does not examine more rows in this table for the previous row combination after it finds one row that matches the key_column criteria. Here is an example of the type of query that can be optimized this way:

    SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id
      WHERE t2.id IS NULL;

    Assume that single_table is defined as some_expr. In this case, MySQL scans range and looks up the rows in key using the values of key_len. If MySQL finds a matching row in ref, it knows that NULL can never be range, and does not scan through the rest of the rows in = that have the same <> value. In other words, for each row in >, MySQL needs to do only a single lookup in >=, regardless of how many rows actually match in <.

  • <=)

    MySQL found no good index to use, but found that some of indexes might be used after column values from preceding tables are known. For each row combination in the preceding tables, MySQL checks whether it is possible to use a IS NULL or <=> access method to retrieve rows. This is not very fast, but is faster than performing a join with no index at all. The applicability criteria are as described in Section 7.3.1.3, “Range Optimization”, and Section 7.3.1.4, “Index Merge Optimization”, with the exception that all column values for the preceding table are known and considered to be constants.

    Indexes are numbered beginning with 1, in the same order as shown by BETWEEN for the table. The index map value IN() is a bitmask value that indicates which indexes are candidates. For example, a value of tbl_name (binary 11001) means that indexes 1, 4, and 5 will be considered.

  • key_column

    The query contained only aggregate functions (tbl_namekey_column) that were all resolved using an index, or tbl_namefor key_column, and no tbl_name clause. The optimizer determined that only one row should be returned.

  • key_part1

    For a query such as key_part2, no rows satisfy the condition for a index index or ALL on the table.

  • ALL

    MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the ALL clause. The keys then are sorted and the rows are retrieved in sorted order. See Section 7.3.1.11, “const Optimization”.

    这意味着,按照现有index(或者完全没index)取出来的行,必须进一步处理,才可以按照指定的order排序。

  • ALL

    The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.

    仅使用了index中的字段,包括select返回的结果!

  • Extra

    Similar to the Using filesort table access method, Using temporary indicates that MySQL found an index that can be used to retrieve all columns of a const row not found or SELECT ... FROM tbl_name query without any extra disk access to the actual table. Additionally, the index is used in the most efficient way so that for each group, only a few index entries are read. For details, see Section 7.3.1.12, “Distinct Optimization”.

  • Full scan on NULL keyImpossible HAVINGHAVING

    These indicate how index scans are merged for the Impossible WHERE join type. See Section 7.3.1.4, “Index Merge Optimization”.

  • WHERE

    To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains Impossible WHERE noticed after reading const tables and const clauses that list columns differently.

    需要使用临时表!!这是由于group by和order by使用了不同的字段导致的!对于我们的例子而言,去掉 order by  buy_num desc 就可以不使用临时表了。

  • system

    WHERE clause is used to restrict which rows to match against the next table or send to the client. Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if theNo matching min/max row value is not SELECT MIN(...) FROM ... WHERE condition and the table join type is no matching row in const table or No tables used. Even if you are using an index for all parts of a FROM clause, you may see FROM DUAL if the column can be Not exists.

profiles

以上对sql句子本身的写法有个诊断了之后,还要看它实际运行的情况,比如运行时间、各种状态耗时等。这时就可以用到profiles相关的命令了。

用法是在mysql客户端执行以下命令:

set profiling=1; # 打开profile开关,默认是关闭的

select *****  # 具体的sql句子,可以是多条

show profiles; # 查看全部sql的profile情况

show profile for query 1; # 查看某条sql句子的具体情况, 如果有using temporary,这里就会给出Copying to tmp table的时间,对于上面的sql句子来说,一次测试的时间是0.3秒。

运用这些原理,修改了一些sql句子,仅记录最典型的一个

在一个应用中,为了获取最近中奖用户,之前的sql句子如下:

SELECT SQL_CALC_FOUND_ROWS g_users.g_name as username,g_users.g_head,g_lucks.* FROM `g_lucks` inner   join g_users on `g_lucks`.uid = g_users.g_id WHERE 1 order by time desc limit $start, $count

这里g_lucks表的数量在十万级别,无法使用到index,所以需要做full scan,涉及rows为全表。分析业务,其实翻页功能不是很重要,很后面的页根本没人关心,所以写死结果总行数为500,这样就可以省掉SQL_CALC_FOUND_ROWS,也不需要关心真正的行数了。

而这个sql句子的功能其实就是获取最新中奖记录,并获得其相关的用户信息。所以只要先取到最新的g_lucks,再拿到g_uses信息即可。故改为:

select g_users.g_name as username,g_users.g_head,g_lucks.* from (select * from g_lucks order by seqid desc   limit %d, %d) as g_lucks inner join g_users on(g_lucks.uid=g_users.g_id)

从而,将explain的rows从十万级降到$count的级别上,运行时间从8-10s降为不到1s。

所以,做sql优化,不是只着眼于技术,深刻理解功能才能做出合适的优化。

One Comment

Leave a Reply