This topic has been archived. It cannot be replied.
-
工作学习 / 专业技术讨论 / 请教一个SQL QUERY问题A table in SQL database has hundreds of columns. It is also a big fact table having millions of rows. If all columns have null value, I need to delete this row from this table. Is there any way to do that? It can be a SQL statement or SSIS task but not a external program.
Thanks.
-madeinchina(强力万能胶);
2007-8-3
{283}
(#3848058@0)
-
make table query, add condition to column: <> ""
-7y7(醉里吴音);
2007-8-3
(#3848147@0)
-
Make table query only in access tho. from what i understand is that he is complaining the table is too big and he is not going to type the column name and concatenate them together as the condition.there is a way to do it. by auto list all the column name and contruct a dynamic T-SQL. and do what you need to do.
-sunday8(sunday8);
2007-8-3
{115}
(#3848183@0)
-
可以动态生成query。DECLARE @sql VARCHAR(5000)
SET @sql = 'DELETE tableName WHERE 1 = 1'
SELECT @sql = @sql + ' AND ' + name + ' IS NULL' from syscolumns where id = OBJECT_ID('tableName')
EXEC (@sql)
-bdbs(不多不少);
2007-8-3
{196}
(#3848202@0)
-
高手.
-findinghouse(不写错别字。8);
2007-8-3
(#3848249@0)
-
first thought
-madeinchina(强力万能胶);
2007-8-3
{610}
(#3848285@0)
-
lost the format
-madeinchina(强力万能胶);
2007-8-3
{615}
(#3848296@0)
-
如果仅仅是为了有non-nullable的field,稍微修正一下就可以了。DECLARE @sql VARCHAR(5000)
SET @sql = 'DELETE tableName WHERE 1 = 1'
SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('tableName') AND IsNullable = 1
EXEC (@sql)
-bdbs(不多不少);
2007-8-3
{216}
(#3848320@0)
-
I mean this big table has some rows having all columns except its key column. I need to delete these rows.
-madeinchina(强力万能胶);
2007-8-3
(#3848345@0)
-
I mean this big table has some rows having all columns value NULL except its key column. I need to delete these rows.
-madeinchina(强力万能胶);
2007-8-3
(#3848359@0)
-
#3848320 难道不能解决你的 这个 问题么?
-bdbs(不多不少);
2007-8-3
(#3848386@0)
-
或者?DECLARE @sql VARCHAR(5000)
SET @sql = 'DELETE tableName WHERE 1 = 1'
SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('tableName') AND name <> key_column_name
EXEC (@sql)
-guestagain(guest again);
2007-8-3
{225}
(#3848531@0)
-
那样的话,不如这样DECLARE @sql VARCHAR(5000)
SET @sql = 'DELETE tableName WHERE 1 = 1'
SELECT @sql = @sql + ' AND ' + name + ' IS NULL'
FROM syscolumns
WHERE id = OBJECT_ID('tableName')
AND name NOT IN (SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE TABLE_NAME = 'tableName')
EXEC (@sql)
-bdbs(不多不少);
2007-8-3
{312}
(#3848556@0)
-
呵呵,你确定他所谓的key column是你认为的那种key column?
-guestagain(guest again);
2007-8-3
(#3848596@0)
-
不确定了。:(
-bdbs(不多不少);
2007-8-3
(#3848772@0)
-
You copy my idea too. Just kidding.
-deep_blue(BLUE);
2007-8-4
(#3849991@0)
-
"this big table has some rows having all columns except its key column" - 这怎么可能?key column怎么可能是nullable的?你用中文表述吧。
-bdbs(不多不少);
2007-8-3
(#3848382@0)
-
2nd thought, becasue I found...
-madeinchina(强力万能胶);
2007-8-3
{1051}
(#3848351@0)
-
#3848320 不行么? 想那么复杂干吗?脑筋急转弯?
-guestagain(guest again);
2007-8-3
(#3848515@0)
-
You can also use powerful INFORMATION SCHEMA view.DECLARE @SQL VARCHAR(5000)
SET @SQL = 'DELETE FROM YOUR_TABLE_NAME WHERE 1=1 '
SELECT @SQL = @SQL + ' AND ' + COLUMN_NAME + ' IS NULL ' from INFORMATION_SCHEMA.COLUMNS Where IS_NULLABLE = 'YES' and TABLE_NAME = YOUR_TABLE_NAME
EXEC (@SQL)
-deep_blue(BLUE);
2007-8-3
{258}
(#3848436@0)
-
你这样抄袭我的CODE会让人以为你是俺的马甲呢。呵呵,开玩笑。
-bdbs(不多不少);
2007-8-3
{40}
(#3848510@0)
-
Here is the sample
-madeinchina(强力万能胶);
2007-8-3
{706}
(#3848695@0)
-
In my understanding, when you create table, you don’t actual neither set VendorID as PK or specify it as non-nullable, you should explicitly exclude it in query:SELECT @SQL = @SQL + ' AND ' + COLUMN_NAME + ' IS NULL ' from INFORMATION_SCHEMA.COLUMNS Where IS_NULLABLE = 'YES' and TABLE_NAME = YOUR_TABLE_NAME AND COLUMN_NAME <>’VendorID’
-deep_blue(BLUE);
2007-8-4
{185}
(#3849997@0)
-
bdbs(不多不少), i tried your queryit generated a statement like:
delete from table_name
where column1 is null and column2 is null and ...and...
it worked, for sure. But the performance..., I need to do a test.
Thanks.
-madeinchina(强力万能胶);
2007-8-3
{194}
(#3848729@0)
-
数据加载中This table is a financing statement table storing all data collected from the whole world. It has 147 columns. One is primary key column, others are float type, nullable. I am loading all data in through several SSIS packages. It will take about 3 hours (at least) to update the whole database.
I will post the query performance result tomorrow. The query is created by bdbs(不多不少).
Thanks, everyone, especially to bdbs and have a good weekend.
-madeinchina(强力万能胶);
2007-8-3
{454}
(#3848842@0)
-
And the performance is ...?
-bdbs(不多不少);
2007-8-8
(#3856425@0)
-
Finally, I got the data loaded in databaseOn a desktop with dual-core 3.4G, 4G memory, Windows XP2, SQL 2005 with SP2. A table having 144 columns, 596000 rows of records, running this query
DECLARE @sql VARCHAR(max)
SET @sql = 'select count(*) from cf WHERE 1 = 1'
SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('cf') AND IsNullable = 1
EXEC (@sql)
It took 15 seconds to finish. Row count is 0.
-madeinchina(强力万能胶);
2007-8-8
{408}
(#3857787@0)
-
ROW count = 0? Does it mean you deleted nothing?
-deep_blue(BLUE);
2007-8-9
(#3858376@0)
-
LZ 改用select count(*) 了,说明未发现脏数据。
-bdbs(不多不少);
2007-8-9
(#3858741@0)
-
It is a very trick question to prove the correctness of a query.If you run something like:
DELETE FROM Your_Table WHERE 1=2
SELECT COUNT(*) FROM Your_Table WHERE 1=2
What do you get?
-deep_blue(BLUE);
2007-8-9
{125}
(#3859127@0)
-
0。0。不懂你问这么简单的问题干什么。也不明白什么地方trick了。
-bdbs(不多不少);
2007-8-9
(#3859160@0)
-
The point is you cannot use deleting condition to prove you deleted correct rows.
-deep_blue(BLUE);
2007-8-9
(#3859178@0)
-
恕我愚钝,还是不懂。
-bdbs(不多不少);
2007-8-9
(#3859253@0)
-
No matter what your deleting condition is, your
SELECT COUNT(*) FROM TABLE_NAME deleting condition
Always return 0.
How can you depended it find out your deleting condition is right!?
-deep_blue(BLUE);
2007-8-9
(#3859590@0)
-
来来回回看了好几遍,我想你是没看明白前面的贴子吧。不然就是我仍旧#3859253
-bdbs(不多不少);
2007-8-9
(#3859663@0)
-
you did not take your primary key out from the quary.The following one will be better and even including empty strings:
declare @sql varchar(max)
set @sql='select * from cf where 1=1 '
select @sql=@sql + ' and (' + NAME + ' IS NULL OR '+ NAME +'='''')'
FROM SYSCOLUMNS
WHERE ID=OBJECT_ID('cf') AND
NAME != 'ID' ---Identity Columns
the above quary will selsect the empty rows.
-winstonwx(绿芒果);
2007-8-9
{363}
(#3858490@0)
-
IsNullable = 1 is enough to eliminate the primary key
-bdbs(不多不少);
2007-8-9
(#3858739@0)
-
you are right.
-winstonwx(绿芒果);
2007-8-9
(#3859418@0)
-
That's why you got 0 rows.
-winstonwx(绿芒果);
2007-8-9
(#3858494@0)
-
这种脏数据应该在加载时就过滤掉。
-newkid(newkid);
2007-8-9
(#3858569@0)
-
我想LZ就是在做这项工作,在加载数据前对数据进行预处理。
-bdbs(不多不少);
2007-8-9
(#3858744@0)
-
It is also a big fact table having millions of rows...这不像是staging数据呀
-newkid(newkid);
2007-8-9
(#3858761@0)
-
The reason why we do thisIn order to improve performance, we denormalize lots of related tables into a single fact table by pivoting. It is very hard to tell the combination will have a row with all nulls in the middle of the process. So, we need to do the data cleaning at the end of the ETL.
-madeinchina(强力万能胶);
2007-8-9
{268}
(#3858845@0)
-
When you insert an empty row, is it still in the middle of the process? If yes, then this row should be inserted to the staging table and deleted when ETL is done; if no, then you have a choice not to insert this row
-newkid(newkid);
2007-8-9
(#3858865@0)
-
Yes. You will have one row from here, one row from there. When you merge these rows into single row, you don't know this row will have all null values beforehand.
-madeinchina(强力万能胶);
2007-8-9
(#3858901@0)
-
Then use a staging table. Before you insert into the target table, run your dynamic SQL to delete this row from the staging table. Since intermediate data set is usually not big, performance is not a concern.
-newkid(newkid);
2007-8-9
(#3858960@0)