×

Loading...
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务

first thought

I forgot to mention this table has a primary key which is not null for sure. I don't want to do something like column1 is not null and column2 is not null .... columnN is not null. I think i can do this: do an unpivot by the primary key to create a virtual table having two columns (id, value), create a temp table. create table #temp (id) select into #temp(id) select id from unpivot_table group by column_id having sum(isnull(column_value,0))=0 if this is OK, i will delete any rows having the id in #temp. I am not familiar with unpivot so I am going over it. Will update this post soon.
Report

Replies, comments and Discussions:

  • 工作学习 / 专业技术讨论 / 请教一个SQL QUERY问题
    A table in SQL database has hundreds of columns. It is also a big fact table having millions of rows. If all columns have null value, I need to delete this row from this table. Is there any way to do that? It can be a SQL statement or SSIS task but not a external program.

    Thanks.
    • make table query, add condition to column: <> ""
      • Make table query only in access tho. from what i understand is that he is complaining the table is too big and he is not going to type the column name and concatenate them together as the condition.
        there is a way to do it. by auto list all the column name and contruct a dynamic T-SQL. and do what you need to do.
    • 可以动态生成query。
      DECLARE @sql VARCHAR(5000)

      SET @sql = 'DELETE tableName WHERE 1 = 1'

      SELECT @sql = @sql + ' AND ' + name + ' IS NULL' from syscolumns where id = OBJECT_ID('tableName')

      EXEC (@sql)
      • 高手.
    • first thought
      I forgot to mention this table has a primary key which is not null for sure. I don't want to do something like column1 is not null and column2 is not null .... columnN is not null. I think i can do this: do an unpivot by the primary key to create a virtual table having two columns (id, value), create a temp table. create table #temp (id) select into #temp(id) select id from unpivot_table group by column_id having sum(isnull(column_value,0))=0 if this is OK, i will delete any rows having the id in #temp. I am not familiar with unpivot so I am going over it. Will update this post soon.
      • lost the format
        I forgot to mention this table has a primary key which is not null for sure. I don't want to do something like column1 is not null and column2 is not null .... columnN is not null.

        I think i can do this:

        do an unpivot by the primary key to create a virtual table having two columns (id, value), create a temp table.

        create table #temp (id)
        select into #temp(id)
        select id
        from unpivot_table
        group by column_id having sum(isnull(column_value,0))=0

        if this is OK, i will delete any rows having the id in #temp.

        I am not familiar with unpivot so I am going over it. Will update this post soon.
        • 如果仅仅是为了有non-nullable的field,稍微修正一下就可以了。
          DECLARE @sql VARCHAR(5000)

          SET @sql = 'DELETE tableName WHERE 1 = 1'

          SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('tableName') AND IsNullable = 1

          EXEC (@sql)
          • I mean this big table has some rows having all columns except its key column. I need to delete these rows.
            • I mean this big table has some rows having all columns value NULL except its key column. I need to delete these rows.
              • #3848320 难道不能解决你的 这个 问题么?
                • 或者?
                  DECLARE @sql VARCHAR(5000)

                  SET @sql = 'DELETE tableName WHERE 1 = 1'

                  SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('tableName') AND name <> key_column_name

                  EXEC (@sql)
                  • 那样的话,不如这样
                    DECLARE @sql VARCHAR(5000)

                    SET @sql = 'DELETE tableName WHERE 1 = 1'

                    SELECT @sql = @sql + ' AND ' + name + ' IS NULL'
                    FROM syscolumns
                    WHERE id = OBJECT_ID('tableName')
                    AND name NOT IN (SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE TABLE_NAME = 'tableName')

                    EXEC (@sql)
                    • 呵呵,你确定他所谓的key column是你认为的那种key column?
                      • 不确定了。:(
                    • You copy my idea too. Just kidding.
            • "this big table has some rows having all columns except its key column" - 这怎么可能?key column怎么可能是nullable的?你用中文表述吧。
    • 2nd thought, becasue I found...
      本文发表在 rolia.net 枫下论坛In my distinct select, record 6 doesn't show. Hmmmmmmmmm.... That means anybody doesn't show up in the party will be...

      CREATE TABLE #pvt (VendorID int, Emp1 int, Emp2 int,
      Emp3 int, Emp4 int, Emp5 int)
      GO
      INSERT INTO #pvt VALUES (1,4,3,5,4,4)
      INSERT INTO #pvt VALUES (2,4,1,5,5,5)
      INSERT INTO #pvt VALUES (3,null,null,null,null,4)
      INSERT INTO #pvt VALUES (4,4,2,null,null,4)
      INSERT INTO #pvt VALUES (5,5,1,5,5,null)
      INSERT INTO #pvt VALUES (6,null,null,null,null,null)
      INSERT INTO #pvt VALUES (7,null,0,1,2,4)
      GO

      select * from #pvt

      --Unpivot the table.
      SELECT VendorID, Employee, Orders
      FROM
      (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
      FROM #pvt) p
      UNPIVOT
      (Orders FOR Employee IN
      (Emp1, Emp2, Emp3, Emp4, Emp5)
      )AS unpvt
      GO

      select distinct VendorID
      from
      (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
      FROM #pvt) p
      UNPIVOT
      (Orders FOR Employee IN
      (Emp1, Emp2, Emp3, Emp4, Emp5)
      )AS unpvt


      select vendorid
      from #pvt
      group by vendorid
      having sum(Emp1)=8

      drop table #pvt更多精彩文章及讨论,请光临枫下论坛 rolia.net
      • #3848320 不行么? 想那么复杂干吗?脑筋急转弯?
    • You can also use powerful INFORMATION SCHEMA view.
      DECLARE @SQL VARCHAR(5000)

      SET @SQL = 'DELETE FROM YOUR_TABLE_NAME WHERE 1=1 '

      SELECT @SQL = @SQL + ' AND ' + COLUMN_NAME + ' IS NULL ' from INFORMATION_SCHEMA.COLUMNS Where IS_NULLABLE = 'YES' and TABLE_NAME = YOUR_TABLE_NAME

      EXEC (@SQL)
      • 你这样抄袭我的CODE
        会让人以为你是俺的马甲呢。呵呵,开玩笑。
    • Here is the sample
      Following is the sample table. In this table, VendorId is primary key and all other columns are nullable. So, if I insert rows with following statements, I want to delete row 6 becasue it has all columns having null value. This sample table has 6 colums. The one I have does have about 150 columns.

      CREATE TABLE #pvt (VendorID int, Emp1 int, Emp2 int,
      Emp3 int, Emp4 int, Emp5 int)
      GO
      INSERT INTO #pvt VALUES (1,4,3,5,4,4)
      INSERT INTO #pvt VALUES (2,4,1,5,5,5)
      INSERT INTO #pvt VALUES (3,null,null,null,null,4)
      INSERT INTO #pvt VALUES (4,4,2,null,null,4)
      INSERT INTO #pvt VALUES (5,5,1,5,5,null)
      INSERT INTO #pvt VALUES (6,null,null,null,null,null)
      INSERT INTO #pvt VALUES (7,null,0,1,2,4)
      GO
      • In my understanding, when you create table, you don’t actual neither set VendorID as PK or specify it as non-nullable, you should explicitly exclude it in query:
        SELECT @SQL = @SQL + ' AND ' + COLUMN_NAME + ' IS NULL ' from INFORMATION_SCHEMA.COLUMNS Where IS_NULLABLE = 'YES' and TABLE_NAME = YOUR_TABLE_NAME AND COLUMN_NAME <>’VendorID’
    • bdbs(不多不少), i tried your query
      it generated a statement like:

      delete from table_name
      where column1 is null and column2 is null and ...and...

      it worked, for sure. But the performance..., I need to do a test.

      Thanks.
    • 数据加载中
      This table is a financing statement table storing all data collected from the whole world. It has 147 columns. One is primary key column, others are float type, nullable. I am loading all data in through several SSIS packages. It will take about 3 hours (at least) to update the whole database.

      I will post the query performance result tomorrow. The query is created by bdbs(不多不少).

      Thanks, everyone, especially to bdbs and have a good weekend.
      • And the performance is ...?
    • Finally, I got the data loaded in database
      On a desktop with dual-core 3.4G, 4G memory, Windows XP2, SQL 2005 with SP2. A table having 144 columns, 596000 rows of records, running this query

      DECLARE @sql VARCHAR(max)
      SET @sql = 'select count(*) from cf WHERE 1 = 1'
      SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('cf') AND IsNullable = 1
      EXEC (@sql)

      It took 15 seconds to finish. Row count is 0.
      • ROW count = 0? Does it mean you deleted nothing?
        • LZ 改用select count(*) 了,说明未发现脏数据。
          • It is a very trick question to prove the correctness of a query.
            If you run something like:

            DELETE FROM Your_Table WHERE 1=2
            SELECT COUNT(*) FROM Your_Table WHERE 1=2

            What do you get?
            • 0。0。不懂你问这么简单的问题干什么。也不明白什么地方trick了。
              • The point is you cannot use deleting condition to prove you deleted correct rows.
                • 恕我愚钝,还是不懂。
                  • No matter what your deleting condition is, your SELECT COUNT(*) FROM TABLE_NAME deleting condition Always return 0. How can you depended it find out your deleting condition is right!?
                    • 来来回回看了好几遍,我想你是没看明白前面的贴子吧。不然就是我仍旧#3859253
      • you did not take your primary key out from the quary.
        The following one will be better and even including empty strings:

        declare @sql varchar(max)
        set @sql='select * from cf where 1=1 '

        select @sql=@sql + ' and (' + NAME + ' IS NULL OR '+ NAME +'='''')'
        FROM SYSCOLUMNS
        WHERE ID=OBJECT_ID('cf') AND
        NAME != 'ID' ---Identity Columns


        the above quary will selsect the empty rows.
        • IsNullable = 1 is enough to eliminate the primary key
          • you are right.
      • That's why you got 0 rows.
    • 这种脏数据应该在加载时就过滤掉。
      • 我想LZ就是在做这项工作,在加载数据前对数据进行预处理。
        • It is also a big fact table having millions of rows...这不像是staging数据呀
      • The reason why we do this
        In order to improve performance, we denormalize lots of related tables into a single fact table by pivoting. It is very hard to tell the combination will have a row with all nulls in the middle of the process. So, we need to do the data cleaning at the end of the ETL.
        • When you insert an empty row, is it still in the middle of the process? If yes, then this row should be inserted to the staging table and deleted when ETL is done; if no, then you have a choice not to insert this row
          • Yes. You will have one row from here, one row from there. When you merge these rows into single row, you don't know this row will have all null values beforehand.
            • Then use a staging table. Before you insert into the target table, run your dynamic SQL to delete this row from the staging table. Since intermediate data set is usually not big, performance is not a concern.