×

Loading...
Ad by
  • 最优利率和cashback可以申请特批,好信用好收入offer更好。请点链接扫码加微信咨询,Scotiabank -- Nick Zhang 6478812600。
Ad by
  • 最优利率和cashback可以申请特批,好信用好收入offer更好。请点链接扫码加微信咨询,Scotiabank -- Nick Zhang 6478812600。

探讨一下提高数据搜索性能的方法

I have a big fat table having 10 million rows. This table has three major columns as CurrencyCode, DealType and ID. There are 9 distinct CurrencyCode, 5 distinct DealType and 1 million distinct Ids.I created a clustered index on CurrencyCode,DealType, ID.

I have a SQL query as "select xxx,xxx,xxx... from this_table where CurrencyCode='USD' and DealType='PE' and ID in (1,2,3,xxx...)". Usually, number of IDs will go up to 10000 as a cap.With 10000 IDs in the query, i start to get data back in 6 seconds and get all data back in 28 seconds.In the execution plan, it shows 98% time is spent on table scanning.

Is it possible to make the query run faster? like getting data from 2nd second? Is the order of index right.Is there other way to do the select? When i load data in this table, it is done by currency in 9 patches, but i don't really know this will help or not.

thank you for your inputs.
Report

Replies, comments and Discussions:

  • 工作学习 / 学科技术讨论 / 探讨一下提高数据搜索性能的方法
    I have a big fat table having 10 million rows. This table has three major columns as CurrencyCode, DealType and ID. There are 9 distinct CurrencyCode, 5 distinct DealType and 1 million distinct Ids.I created a clustered index on CurrencyCode,DealType, ID.

    I have a SQL query as "select xxx,xxx,xxx... from this_table where CurrencyCode='USD' and DealType='PE' and ID in (1,2,3,xxx...)". Usually, number of IDs will go up to 10000 as a cap.With 10000 IDs in the query, i start to get data back in 6 seconds and get all data back in 28 seconds.In the execution plan, it shows 98% time is spent on table scanning.

    Is it possible to make the query run faster? like getting data from 2nd second? Is the order of index right.Is there other way to do the select? When i load data in this table, it is done by currency in 9 patches, but i don't really know this will help or not.

    thank you for your inputs.
    • Check www.infobright.org.
      我在这个公司上班,性能应该满足你的要求。
      几点注意:
      - currencycode, dealtype defined as comment 'lookup'
      - replaceIN (... 10K of them) with subquery
    • Is this Oracle? Do you have PK or UK?
      • Oracle没有clustered index这种东西
        如果是ORACLE, 这个表应该按前两列的组合做LIST分区,在ID建本地索引,而1万个随机ID做查询条件我觉得不太可能,应该想办法转成RANGE SCAN.
        • Based on my reading. It seems to clustered index the column ID would be more effecient in this case.
          Because the clustered index works will on the columns that contain a large number of distinct values, and if columns that are accessed sequentially, it would be better.
          • 按你这样每个ID读出来还要再读表数据再筛选其他两个条件,效率更低。
    • try three separate indexes instead of one...
      • 觉的做两个index 好点,前两个条件一个,然后id单独一个index.
        • 好在哪?前两个所有组合才45种,能起什么作用?楼主的索引顺序(针对这个查询而言)是正确的。我猜测他用SQL SERVER不知道能否像ORACLE那样分区,也就是切成几大块,前两列就没必要包含在索引里面。
          • 是MS SQL。曾经想过把它分成9个小的table或view。但有同样类似的另一table,最后要18个,这也太难搞了。table中文怎样翻译?
            • TABLE就是“表”。你在这学的IT?
              • 不是的,只是忘记了。现在连写中文都有问题。
    • 你这up to 10000的1,2,3,xxx...是从哪里获得的?建议把这个弄成subquery跟主table join。
      • 不成。这些ID是随机的,不连续。不知道把它们排列一下再搜索会不会提高速度。主要是不清楚MS SQL 是怎样用这些ID作搜索的。
        • 再随机也得有个来头吧。不然你这个query怎么来呢?
          • 这些ID是OLAP drill-through来的。
            • 插入到临时表再JOIN. 要是ORACLE只允许最多1000个在IN里面。
      • 谢谢你,table join 确实是比 in 快。想偷懒还是不行的。
    • 哈哈,让我想到一个方法,现在是1秒钟就开始有数据返回。优化一下再公开。
      • 如果你是要追求快速响应而不是所有数据都返回,缩小列表分成多批就行了。
      • 今天work from home,中午上网打了一小时COD,死了无数遍。接着又上网买了200股suncor,将rolia刷了几遍,忽然让我想起很多年前有同样类似的问题 - 还是table join table 比 in 来的快。
        本文发表在 rolia.net 枫下论坛不要以为我懒和工作轻松,昨晚可是工作到凌晨才睡觉的。

        i have to create a inline table with help of a home brew function

        select a.*
        from fat_table a, (select * from dbo.[ConvertCommaSeparatedStringToIntTable]('399451, 403177, 386076, xxx,xxxx,xxx')) b
        where a.currencycode='USD'
        and a.dealtypecode='PE'
        and a.id = b.item
        order by a.companyName asc, a.InvestmentDate desc

        /* function */
        CREATE FUNCTION [dbo].[ConvertCommaSeparatedStringToIntTable]
        (
        @commaSeparatedString varchar(max)
        )
        RETURNS @TempList TABLE
        (
        item int
        )
        AS
        BEGIN
        DECLARE @item varchar(50), @Pos int
        SET @commaSeparatedString = LTRIM(RTRIM(@commaSeparatedString))+ ','
        SET @Pos = CHARINDEX(',', @commaSeparatedString, 1)
        IF REPLACE(@commaSeparatedString, ',', '') <> ''
        BEGIN
        WHILE @Pos > 0
        BEGIN
        SET @item = LTRIM(RTRIM(LEFT(@commaSeparatedString, @Pos - 1)))
        IF isnumeric(@item)=1
        BEGIN
        INSERT INTO @TempList (item) VALUES (cast(@item as int))
        END
        SET @commaSeparatedString = RIGHT(@commaSeparatedString, LEN(@commaSeparatedString) - @Pos)
        SET @Pos = CHARINDEX(',', @commaSeparatedString, 1)
        END
        END
        RETURN
        END更多精彩文章及讨论,请光临枫下论坛 rolia.net
        • 类似ORACLE里面的TABLE()函数。但ORACLE的优化器会假设返回的集合为某个尺寸(一般8K行)有时候会产生不好的计划。
    • Go cloud. Map reduce.