×

Loading...
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务

第一点严重反对。我正好在微调下面一个query。900000+条数据

query 1:
SELECT BSKey,
InKey,
MAX(Weight) AS Weight,
100.*MAX(Weight)/(SELECT score FROM metaScore WHERE field='_FullScore')
FROM EntityMatch
GROUP BY BSKey, InKey

query 2:
SELECT BSKey,
InKey,
MAX(Weight) AS Weight,
100.*MAX(Weight)/score
FROM EntityMatch, metaScore
WHERE field='_FullScore'
GROUP BY BSKey, InKey,score

猜猜结果如何?
query 1:
SQL Server Execution Times:
CPU time = 4578 ms, elapsed time = 16600 ms.

query 2:
SQL Server Execution Times:
CPU time = 2501 ms, elapsed time = 14950 ms.

CPU时间居然要快一倍!
Report

Replies, comments and Discussions:

  • 工作学习 / 专业技术讨论 / Interesting MS SQL server performance results
    本文发表在 rolia.net 枫下论坛TABLE_ONE:
    ItemID, other fields (8 million+ records)

    TABLE_TWO
    ItemID, other fields (150000+ records)

    Query A:

    SELECT ItemID FROM TABLE_TWO WHERE ItemID NOT IN (SELECT ItemID FROM TABLE_ONE)

    Query B (Same result as Query A):

    SELECT T2.ItemID FROM TABLE_TWO T2 LEFT JOIN TABLE_ONE T1 ON T1. ItemID = T2. ItemID WHERE T1.ItemID IS NULL

    Query C:

    SELECT ItemID FROM TABLE_ONE WHERE ItemID NOT IN (SELECT ItemID FROM TABLE_TWO)

    Query D (Same result as Query C):

    SELECT T1.ItemID FROM TABLE_ONE T1 LEFT JOIN TABLE_TWO T2 ON T1. ItemID = T2. ItemID WHERE T2.ItemID IS NULL

    Results:

    No any index in both tables:
    A: return about 120000 records and spends 36 seconds
    B: return about 120000 records and spends 35 seconds
    C: return about 8200000 records and spends 1:23 seconds
    D: return about 8200000 records and spends 1:25 seconds

    Built clustered index on ItemID in both tables
    A: return about 120000 records and spends 57 seconds
    B: return about 120000 records and spends 57 seconds
    C: return about 8200000 records and spends 1:40 seconds
    D: return about 8200000 records and spends 1:43 seconds更多精彩文章及讨论,请光临枫下论坛 rolia.net
    • must be very interesting.... however it is friday afternoon now.....
    • 试过 NOT EXISTS 了吗?
    • how did you get the time? 如果你用SET STATISTICS TIME ON会比较准确些。很多情况Server本身没太大工作量,时间主要花在数据返回的过程中(包括网络传输,用户端显示),如果看总时间的话,不太能区分query的好坏。
      • Good point. How to show cup time in SELECT query?
        • 如果你不是真要那些数据,SELECT COUNT(*)就行了
          • Unfortunately, it doesn’t work.
        • Under SQL Query Analyzer (Or Management Studio), before SELECT query, turn on the STATISTICS TIME. You will see the detail CPU time in Messages tab.
          SET STATISTICS TIME ON

          SELECT ItemID FROM TABLE_TWO WHERE ItemID NOT IN (SELECT ItemID FROM TABLE_ONE)
          • Thanks,
    • 很有趣的测试,说明两点: 1 sql server会优化sql,结果基本一样,所以sql语句不用考虑优化。 2 index对于not in 没有用,server始终是用表扫描。
      • 第一点严重反对。我正好在微调下面一个query。900000+条数据
        query 1:
        SELECT BSKey,
        InKey,
        MAX(Weight) AS Weight,
        100.*MAX(Weight)/(SELECT score FROM metaScore WHERE field='_FullScore')
        FROM EntityMatch
        GROUP BY BSKey, InKey

        query 2:
        SELECT BSKey,
        InKey,
        MAX(Weight) AS Weight,
        100.*MAX(Weight)/score
        FROM EntityMatch, metaScore
        WHERE field='_FullScore'
        GROUP BY BSKey, InKey,score

        猜猜结果如何?
        query 1:
        SQL Server Execution Times:
        CPU time = 4578 ms, elapsed time = 16600 ms.

        query 2:
        SQL Server Execution Times:
        CPU time = 2501 ms, elapsed time = 14950 ms.

        CPU时间居然要快一倍!
      • 你的query1写的也太神奇,server估计看不懂。
        • 简单的一个通过subquery从metadata中拿一个常数作运算,很神奇么?不管如何,说明依靠Server作优化基本行不通,说“sql语句不用考虑优化”基本是误导。:)
        • 你不用抬杠,server又不是万能,什么语句都认识, 我们优化的原则是尽量避免表扫描。
          • 你的认识还停留在"FULL TABLE SCAN IS EVIL"的误区。用索引有时候读盘次数更多。楼主碰到的这种有趣的现象在ORACLE中是看不到的。如果用索引不合算,优化器将会忽略索引,并不会强行使用而降低PERFORMANCE
      • 我会这样写
        Declare @score int
        select @score = score FROM metaScore WHERE field='_FullScore'

        SELECT BSKey,InKey, MAX(Weight) AS Weight,
        100.*MAX(Weight)/@score
        FROM EntityMatch
        GROUP BY BSKey, InKey,score

        这样更简单明了
        • 这是用在view里面的,所以~~~
          • 所以你该用存储过程:) Q2的写法是较好,Q1用于结果集不大的情况。
            • 呵呵,还在推销你的SP。我这个例子可不是用在web application的。只能是table or view。
              • 要不要试一下这个Q3?
                Q3:
                SELECT vw_match.*
                ,100*vw_match.Weight/metaScore.score
                FROM (SELECT BSKey,
                InKey,
                MAX(Weight) AS Weight
                FROM EntityMatch
                GROUP BY BSKey, InKey
                ) vw_match
                ,metaScore
                WHERE metaScore.field='_FullScore'
                • 不错,又快了一些。本来很简单的一个query,没想得那么复杂。现在看来,即使一个简单的query,还是有很多油水可以榨的。
    • 2 index对于not in 没有用,server始终是用表扫描。
    • 如果ItemID是整数,我建议作另外一个实验,改成varchar类型, 在测试一下结果。
      • ItemID is varchar type.
    • by sql1,2 in test result set 1 and set 2, I am guessing that sequencial scan is faster than using index in your invironment.
    • More interesting thing
      More interesting thing is that when I apply Display Estimated Execution Plan to Q1 and Q2. It gives exactly same plan:
      Starting from scanning both tables, then Hash Match/Left Anti Semi Join, finally SELECT operation.

      It clearly shows that database server does have ability to optimize query, at least to those not too complicated query.

      Of course, the server is not superpower. When in some complicated tasks, experienced developers are more intelligent then the machine.
      • 你的QUERY A 和 B 用的时间基本一样,可以猜想计划也一样,有什么INTERESTING的?难道你是说加了索引后,执行计划也一样,但是时间变长了?
        • I'm still testing. Results may be diffrent.
        • CPU spending for indexed and non indexed queries
          Per BDBS’s suggestion, I put ‘SET STATISTICS ON’.

          Because I noticed that in my previous results, in indexed query it returned sorted result (although without ORDER BY clause), but non-indexed query returned unsorted rows. Therefore, ORDER BY clause is added in query.

          After redoing the Q1, results show indexed query spends less cpu time, (about 56.2%) but more elapsed time (about 136%)
          • elapsed time要选多次运行的平均值才作数,因为Server和Network等在不同时间会有不同的忙的状态。如果大多数情况Q1的total elapsed time都比Q2长,这个问题就会比较有意思了,得好好查查了。
          • 有没有磁盘IO的统计数字?我猜用了索引后IO更多。尽量在只有一个用户的环境下做实验;在运行不同的查询之前把CACHE清理干净,这样才比较客观。
            • SET STATISTICS IO ON 可以看这个。
              • IO Data
                In order to exclude some issues, the queries were run in local machine rather than in server. And each time execute DBCC DROPCLEANBUFFERS to clean cache first.

                IO Data:

                Non-indexed
                Scan count 1, logical reads 475, physical reads 0, read-ahead reads 478

                Indexed:
                Scan count 1, logical reads 477, physical reads 166, read-ahead reads 476.
                • 看来我没猜措,physical reads增加了那么多。不明白MS SQL的优化器为什么还是选用了索引。
                  • 没看懂。没有索引,查询要scan table,却没有读硬盘?另外选用索引也许是因为磁盘IO性能可以提高,例如硬盘缓存、内存缓存甚至RAID等。
                    • 虽然我不用SQLSERVER,但GOOGLE可知Read-Ahead Reads也是读盘操作。不管你磁盘如何优化,开销就是开销,避免读盘就可以减少开销。
                      有趣的是,某文章说在优化查询时应该忽略physical read,因为那是DBA的事,logical reads才重要,难怪MS的优化器会选索引呢,哈哈。
          • Index Tuning Wizard
            After research, I realize that there is a tool, Index Tuning Wizard (ITW), in SQL Server 2000 can be used to improve query performance.

            Apply ITW to my Q2, it recommends only set non-clustered index to ItemID in TABLE_ONE.

            After set the index and run Q2, there is following result:

            With the index CPU time improved 11.2% (from 8478 ms to 7528 ms) and elapsed time improved 57% (from 33578 ms to 14488 ms).

            BTW, although my testing query is very simple, ITW can used to pretty complicated query.

            Finally thanks to every one in discussion.
            • TABLE TWO上的索引如果不去掉,会影响速度吗?如果你有不同的报表,需要运行不同的查询,ITW给出的建议有矛盾怎么办?如果建立一个新索引可能会降低其他查询的效率,这MS的优化器也太那个了。
    • 1. if itemID is vchar, do NOT use cluster index .2. return too many rows - try to avoid this in real solution design,
      if return 120k of 150k clustered index will not be used, non-clustered index maybe used because of small row size (only itemid is returned, like covering index)