WOW, IT IS MUCH MORE THAN I EXPECTED! -contact(Contact) @佛州华人论坛:佛州枫下论坛 The Rolia Forum of Florida

工作学习 / IT技术讨论 / 闲来无事, 出道ORACLE SQL的题考考大家. 这是在下几个月前工作中遇到的问题, 看似简单, 其实不然. -contact(Contact); 2001-11-13 {429} (#260222@0)
Table T has a few columns. Column A in T is not unique. Write a SQL statement returns all the rows in T where column A value is non-unique, and group by A. Please note, you can write a complex SQL statement, but no functions, procedures, and triggers are allowed.

For instance, I have table T:

A B

1 B1
2 B2
3 B3
2 B4
3 B5
3 B6
4 B7

after running the SQL, the result is

A B

2 B2
2 B4
3 B3
3 B5
3 B6

Correction -contact(Contact); 2001-11-13 {200} (#260228@0)
"no functions, procedures, and triggers are allowed" should be "no user defined functions, procedures, and triggers are allowed". Built in functions like AVG, MAX, DECODE, COUNT etc. are allowed.
为什么只能用一句STATEMENT?用PROCEDURE是很好做的 -bigrice(大米); 2001-11-13 (#260233@0)

That is the difficult part. Because the stupid third party software only takes statement. -contact(Contact); 2001-11-13 (#260241@0)

answer! -cloud2001(坐看云起); 2001-11-13 {78} (#260246@0)
SELECT * FROM T
WHERE A IN (
SELECT A FROM T
GROUP BY A HAVING COUNT(A)>1)

哈哈, 和我的一样. 其实也不太难. -contact(Contact); 2001-11-13 (#260249@0)

a piece of cake! -cloud2001(坐看云起); 2001-11-13 (#260252@0)

But since you used sub-query, you lose the benefits of index. -contact(Contact); 2001-11-13 (#260258@0)

DX, is that possible to get the data set without sub-query? That is beyond my ablity, please tell me. Thanks. -cloud2001(坐看云起); 2001-11-13 (#260268@0)

Sorry about late. I was answering a phone call. -contact(Contact); 2001-11-13 {121} (#260300@0)
This one seems working:

select T2.A, T1.B from T T1, T T2
where T1.A = T2.A group by T2.A, T1.B having count(T2.A)>1

Graceful! -abba(Spitfire); 2001-11-14 {397} (#260691@0)
I tried this way this evening. But after several minutes I gave up. I find it difficult for me to get focused at home. Yeah, the performance could be bad when using sub-query, especially when the data volume is large.

By the way, I like to write join conditions in the JOIN clause, instead of WHERE. I feel that things are clearer in this way.

Thanks! I've certainly learnt something today.

Thank you! We all learned something! I am really glad to see that. -contact(Contact); 2001-11-15 (#262832@0)

So do I. : ) -abba(Spitfire); 2001-11-15 (#262892@0)

How come you are named as "walk to the edge of fortune" -- "XING DAO SHUI JIN SHU"? passing away from a luck of falling into love? -lottery(©_©); 2001-11-14 (#260738@0)

it is amazing -zuokanyunqi(坐看云起); 2001-11-15 {134} (#263049@0)
hi,nice to meet you,we have the same name,can you tell me why you want to use this name,you can send email to zhenghao@gosympatico.ca.

不过是一般的SQL的问题罢了。 -checkme(checkme); 2001-11-13 (#260254@0)

对, 但在当时下一开始让我们的DBA写, 居然写不出. 所以在下只能自己写. 在此出题纯属好玩. -contact(Contact); 2001-11-13 (#260262@0)

让DBA写SQL有点象让网管写程序，有点不相干。 -checkme(checkme); 2001-11-13 (#260273@0)

在北美做DBA这么简单的东东都不知道哪行!. 不过你也能看出去年硅谷雇人多么机不择食, 什么烂人都能当然DBA. -contact(Contact); 2001-11-13 (#260305@0)
不敢苟同。如果他写不出这个SQL STATEMENT，我会怀疑他对数据库的理解，进而影响他对数据库结构的设计。 -abba(Spitfire); 2001-11-13 (#260308@0)

我的见解 -sameway(sameway); 2001-11-13 {128} (#260617@0)
一看就是没有开发经验，这是最最基本的东西，数据转换的时候经常用。加拿大这地总养些动嘴皮子的人，不过这也难怪，前几年IT工作太好找了

同意 -benbendan(笨笨蛋); 2001-11-18 (#266311@0)

Contact, Can you help me explain the following question? Thank you. -hongri(hongri); 2001-11-14 {322} (#260982@0)
select T2.A, T1.B from T T1, T T2
where T1.B = T2.B group by T2.A, T1.B having count(T2.A)>1

I changed 'where' , before the 'having' the result is same as the old one, but after 'having', the result is totally different. So what's the function of 'HAVING'? I think it just process the data before it, right?

我觉的可以用一个减操作，但好久未用oracle,忘了名字。取出全部记录，然后减A为单一值的记录，行不？ -rabbitbug(兔八哥); 2001-11-14 {54} (#260724@0)
云MM的SQL嵌套速度有很大影响，如果记录数一多，虽然简洁.
Will this work? select A, B from (select A, B, lag(A, 1) over (order by A) as A1, lead(A, 1) over (order by A) as A2 from T) where A = A1 or A = A2 -fromztoa(fromZtoA); 2001-11-14 (#260764@0)
Did you test the statement for so many rows?something wrong? -cloud2001(坐看云起之山尽水穷); 2001-11-14 {356} (#261113@0)
I tried a table with 33 rows, almost no difference(20 msec--subquery, 40 msec--your statement).
I tried a table with 480184 rows using the sub-query one, it took 35 sec to get back data without index, 23 sec to get back with index, but I tried the one you provided, I waited for at least half an hour, still no data come back, I have to kill the process.

Can you answer my question? -hongri(hongri); 2001-11-14 (#261165@0)

My answer are select * from T T1 where exists(select A from T where A=T1.A group by A having count(1)>1); or select * from T T1 where exists(select A from T group by A having count(1)>1 and A=T1.A); -cyt(会飞的鹿); 2001-11-14 {27} (#261114@0)
But what's the difference?

If you change into and B=a.B, you can find different ,but I don't know why -hongri(hongri); 2001-11-14 (#261180@0)

select T.A, T.B from T join (select A,count(B) CNT from T group by A Having count(B) > 1 ) Z on T.A = Z.A -tianqi21(); 2001-11-14 (#261202@0)
Why you guys make it so complicated? select * from t group by a having count(*) >1 order by a -aabbcc(abc); 2001-11-14 (#261370@0)

哥们，你试过吗？其实cyt的答案，我最喜欢，性能最好，两者并无不同。还可以：select * from T t1 where exists(select 1 from t t2 where t1.a=t2.a having count(0)>1)； -onewayticket(onewayticket); 2001-11-14 (#261589@0)

aabbcc hasn't got the point yet. But, -abba(Spitfire); 2001-11-14 {184} (#261707@0)
do you really think cyt's solution is better than that of #260300, in terms of performance? To my knowledge, join is much better than sub-query, especially when there is usable index.

I am pretty sure. 请看分析。 -onewayticket(onewayticket); 2001-11-14 {3083} (#261831@0)
本文发表在 rolia.net 枫下论坛我在table t的a列上加了index t_i, 这样能看出#260246,#260300,#260764并没有用到index (note: 对小型table来说，性能并不一定好）

cloud2001（坐看云起)的答案: (#260246):

SQL> SELECT * FROM T
2 WHERE A IN (
3 SELECT A FROM T
4 GROUP BY A HAVING COUNT(A)>1) ;

A B
---------- ----------
2 B2
3 B3
2 B4
3 B5
3 B6

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=67 Bytes=2211)
1 0 HASH JOIN (Cost=6 Card=67 Bytes=2211)
2 1 VIEW OF 'VW_NSO_1' (Cost=4 Card=82 Bytes=1066)
3 2 FILTER
4 3 SORT (GROUP BY) (Cost=4 Card=82 Bytes=1066)
5 4 TABLE ACCESS (FULL) OF 'T' (Cost=1 Card=82 Bytes=1066)
6 1 TABLE ACCESS (FULL) OF 'T' (Cost=1 Card=82 Bytes=1640)

contact(Contact)的答案 (#260300):

SQL> select T2.A, T1.B from T T1, T T2
2 where T1.a = T2.a group by T2.A, T1.B having count(T2.A)>1;

A B
---------- ----------
2 B2
2 B4
3 B3
3 B5
3 B6

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=67 Bytes=2211)
1 0 FILTER
2 1 SORT (GROUP BY) (Cost=5 Card=67 Bytes=2211)
3 2 HASH JOIN (Cost=3 Card=67 Bytes=2211)
4 3 TABLE ACCESS (FULL) OF 'T' (Cost=1 Card=82 Bytes=1640)
5 3 TABLE ACCESS (FULL) OF 'T' (Cost=1 Card=82 Bytes=1066)

cyt（会飞的鹿）的答案 (#261114):
(note: 应加上group by, 这样比较起来才公平，加group by后性能要差一些）

SQL> select * from T T1
2 where exists
3 (select A from T where A=T1.A group by A having count(1)>1)
4 group by a,b;

A B
---------- ----------
2 B2
2 B4
3 B3
3 B5
3 B6

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=5 Bytes=100)
1 0 SORT (GROUP BY) (Cost=3 Card=5 Bytes=100)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T' (Cost=1 Card=5 Bytes=100)
4 2 FILTER
5 4 SORT (GROUP BY NOSORT) (Cost=1 Card=1 Bytes=13)
6 5 INDEX (RANGE SCAN) OF 'T_I' (NON-UNIQUE) (Cost=1 Card=1 Bytes=13)

fromztoa(fromZtoA)的答案 (#260764):

SQL> select A, B from
2 (select A, B, lag(A, 1) over (order by A) as A1,
3 lead(A, 1) over (order by A) as A2
4 from T)
5 where A = A1 or A = A2;

A B
---------- ----------
2 B2
2 B4
3 B3
3 B5
3 B6

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=82 Bytes=3772)
1 0 VIEW (Cost=3 Card=82 Bytes=3772)
2 1 WINDOW (SORT)
3 2 TABLE ACCESS (FULL) OF 'T' (Cost=1 Card=82 Bytes=1640)更多精彩文章及讨论，请光临枫下论坛 rolia.net

高人！ -abba(Spitfire); 2001-11-14 {329} (#261909@0)
老兄是真正的DBA吧。我是做开发的，用了4，5年SQL，也搞过performance tuning，但没有系统研究过理论。今天借这个机会请教一下：怎样分析execution plan？有没有好的网站介绍一下？先行谢过！

不过，我的经验仍然顽固地告诉我：#260300会用到‘T_I'，并且join好于sub_query。我现在没条件，你能不能弄几万条记录试一下？（我的经验以前都挺管用啊！）

再看看这个 -onewayticket(onewayticket); 2001-11-15 {2441} (#262617@0)
本文发表在 rolia.net 枫下论坛你可以用下面的sql生成几万条记录:
CREATE SEQUENCE sq1
INCREMENT BY 1
START WITH 1
MAXVALUE 1000
CYCLE
NOCACHE
ORDER;

CREATE SEQUENCE sq2
INCREMENT BY 1
START WITH 1
MAXVALUE 500
CYCLE
NOCACHE
ORDER;

declare
i INTEGER := NULL;
c varchar2(10);
begin
for i in 1..50000 loop
select 'B'||sq2.nextval into c from dual;
insert into t values(sq1.nextval,c);
end loop;
commit;
exception
when others then null;
end;

然后再分析:

我是用的cost based optimizer.
alter session set optimizer_mode=all_rows;
而且对table t 做了分析:
analyze table t estimate statistics;

结果仍是:

SQL> select t2.a, t1.b from t t1, t t2
2 where t1.a=t2.a group by t2.a, t1.b having count(t2.a)>1

997 B497
998 B498

A B
---------- ----------
999 B499
1000 B500

1003 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6271 Card=17678 Bytes=176780)
1 0 FILTER
2 1 SORT (GROUP BY) (Cost=6271 Card=17678 Bytes=176780)
3 2 HASH JOIN (Cost=67 Card=2500600 Bytes=25006000)
4 3 TABLE ACCESS (FULL) OF 'T' (Cost=8 Card=50006 Bytes=150018)
5 3 TABLE ACCESS (FULL) OF 'T' (Cost=8 Card=50006 Bytes=350042)

SQL> select * from t t1
2 where exists
3 (select 1 from t t2 where t2.a=t1.a having count(1)>1)
4 group by a, b;

A B
---------- ----------
997 B497
998 B498
999 B499
1000 B500

1003 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=12 Card=2500 Bytes=17500)
1 0 SORT (GROUP BY) (Cost=12 Card=2500 Bytes=17500)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T' (Cost=8 Card=2500 Bytes=17500)
4 2 FILTER
5 4 SORT (AGGREGATE)
6 5 INDEX (RANGE SCAN) OF 'T_I' (NON-UNIQUE) (Cost=1 Card=50 Bytes=150)

1. 用explain plan 之前,必须生成plan_table, 用utlxplan.sql 生成. 在$ORACLE_HOME/rdbms/admin下.
2. 选择cost-based or rule based (Oracle 推荐cost based). 可以在parameter file 或session 或instance级设: 如:
session 级:
alter session set optimizer_mode=all_rows;
3.可以从table plan_table看结果或用: set autotrace on explain 设(我用的这种方式)更多精彩文章及讨论，请光临枫下论坛 rolia.net

谢指教！ -abba(Spitfire); 2001-11-15 {119} (#262662@0)
你的解释真详细，多谢！可惜我手头没有可用的系统，没法试。我一直用DB2，不知道有没有类似的分析工具，你是否碰巧也知道？：）

db2exfmt, db2expln, visual explain, db2vexp, etc -onewayticket(onewayticket); 2001-11-15 (#262981@0)

Thanks! -abba(Spitfire); 2001-11-15 (#263022@0)

谢谢你的论据，我也只是凭经验。很希望和大家多交流，特别是几种不同database 的比较。我在用Oracle 和SQL Server。希望了解一些DB2和Sybase. Please send me mail, I'd like to share you my phone number. -cyt(会飞的鹿); 2001-11-14 (#261955@0)
YES. Is there any problem with that statement? -aabbcc(abc); 2001-11-15 (#262875@0)

why no one answer my question. Thank you DXS -hongri(hongri); 2001-11-15 {322} (#262415@0)
select T2.A, T1.B from T T1, T T2
where T1.B = T2.B group by T2.A, T1.B having count(T2.A)>1

I changed 'where' , before the 'having' the result is same as the old one, but after 'having', the result is totally different. So what's the function of 'HAVING'? I think it just process the data before it, right?

能列出我你的sql语句吗? -onewayticket(onewayticket); 2001-11-15 (#262622@0)
I answered part of your question in #262621. As far as HAVING, please get any SQL book and look into the chapter of GROUP BY for greater details. -contact(Contact); 2001-11-15 (#262626@0)

WOW, IT IS MUCH MORE THAN I EXPECTED! -contact(Contact); 2001-11-15 {1519} (#262621@0)
本文发表在 rolia.net 枫下论坛My very first solution was same as cloud's. Because logically it is the clearest way to get the work done. My self join answer, in fact, was just a variation of cloud's answer by removing the sub-query. Among the answers with sub-query, apparently cyt's is the best.

I share with abba's view completely. The question I posted here was way much simplified from the real life problem I faced. In real life, I needed to join it with something like other 6 or 7 tables using, for instance, the value of B. Some of those tables are huge, and with different index etc. Using sub-query as part of the complex query risks the chance to be benefit from the index in tables, and may result horrible performance. The original SQL for the original problem was one page long. Therefor in the answer I posted here, the highest priority in my mind was removing any sub-queries, rather than the fastest performance of querying this single table. In this sense, abba is absolutely correct.

Thank you for your excellent execution plan, onewayticket! As you can tell I am not a dba, but I got your idea. I definitely learned something.

To answer hongri's question, you cannot change T1.a=T2.a to T1.b=T2.b. Your query looked like working but indeed it didn't because the original table happened to have the values in B column unique. Change your table by making B column to 'B1' for all the rows, your will find your query doesn't work.

It has been fun to share ideas with you folks. Let's keep going!更多精彩文章及讨论，请光临枫下论坛 rolia.net

Hi! Like to disscuss any questions with you. -cloud2001(坐看云起之山尽水穷); 2001-11-15 {404} (#262630@0)
When I saw your question, I just want to answer you as soon as possible, yes, I admit at that time I did not even think about performance, so I am not a dba either. :) Although it seems a simple question, as you and abba said, it is more complicated in real life, but I really learn much from it. I really appreciate it. You make rolia IT more like a technical forum, which is the place I am looking for.

Absolutely -contact(Contact); 2001-11-15 (#262673@0)

Do you know any bbs or community which discuss application server or java middleware solution, I can not find a good one! Thanks. -cloud2001(坐看云起之山尽水穷); 2001-11-15 (#262702@0)

I don't think any is really good. Try google to search. We wrote our own framework at our workplace instead of using an application server. For J2EE, you may take a look Orion server. -contact(Contact); 2001-11-15 (#262833@0)

要是T－SQL就太容易了： "select T1.A,T1.B from T T1 join T T2 ON T1.A=T2.A anf T1.B <> T2.B Group By T1.A,T1.B" -benbendan(笨笨蛋); 2001-11-18 (#266310@0)

更正一下：select T1.A,T1.B from T T1 join T T2 on T1.A=T2.A Group By T1.A , T1.B Having count(T1.A)>1 -benbendan(笨笨蛋); 2001-11-18 (#266339@0)

@Florida

WOW, IT IS MUCH MORE THAN I EXPECTED!

Replies, comments and Discussions:

More Topics