Why is this query really slow with 70k+ rows? - mysql

First of all, this is my table structure:
CREATE TABLE IF NOT EXISTS `site_forum_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`forum_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`data` int(11) NOT NULL,
`comment` longtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
Before importing my backup, it had like 10-15 rows and I made a ranking system based on number of comments and this query was working flawlessly:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users AS u
LEFT JOIN site_forum_comments AS f ON (f.user_id = u.id)
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
But now, with more than 70k rows inserted, the script won't even load and just crashes the server.
What have I possibly done wrong? Is this problem about the query specifically or is it the table structure?
Thanks in advance, cheers!

This is your query:
SELECT u.id, u.username, COUNT(f.id) AS rank
FROM site_users u LEFT JOIN
site_forum_comments f
ON f.user_id = u.id
GROUP BY u.id
ORDER BY rank DESC
LIMIT :l
Because you are choosing the highest ranked user, you can probably use an inner join rather than an outer join. In any case, this version doesn't have a great many optimization opportunities. But, you need an index on site_forum_comments(user_id, id).
You might get better performance with the same index and a correlated subquery:
SELECT u.id, u.username,
(SELECT COUNT(*)
FROM site_forum_comments f
WHERE f.user_id = u.id
) as rank
FROM site_users u
ORDER BY rank DESC
LIMIT :l;

You are currently joining all users to their comments without an index on the user_id column thats slow.
The following query will select the highest user first and only join that one user with the highest rank with the site_users table (using the index over site_users.id). So it should be faster.
SELECT site_users.id, site_users.username, a.rank
FROM (
SELECT user_id, COUNT(*) as rank
FROM site_forum_comments
GROUP BY user_id
ORDER BY rank DESC
LIMIT 1
) AS a
LEFT JOIN site_users ON a.user_id = site_users.id
note that with this query you won't get a result if the rank is 0

Related

What Am I Missing? MySQL Left Join Most Newest Entry From 2nd Table

I need a fresh pair of eyes on this. I have two tables, one of which has users and the second which contains login records, multiple records for each user. What I'm trying to do is select all entries from the first table, and the most recent record from the second table, e.g., a list of all users but only show the most recent activity. Both tables have auto increment in the ID column.
My code currently is thus:
SELECT u.user_id, u.name, u.email, r.rid, r.user_id
FROM users AS u
LEFT JOIN login_records AS r ON r.user_id = u.user_id
WHERE
r.rid = (
SELECT MAX( rid )
FROM login_records
WHERE user_id = u.user_id
)
I've scoured answers to similar questions on SO and tried all of them, but results have been either returning nothing or only getting odd results (not necessarily the newest one). ID in both tables is auto-increment, so I thought it should be a relatively simple matter to get the only or highest ID for a particular user, but it either returns nothing or a completely different selection each time.
It's my first time using JOIN - do I have the wrong JOIN? Do I need to ORDER or GROUP things differently?
Thanks for your help. It's got to be something simple, since Danny Coulombe's answer appearing here seems to work for other users.
You will need a subquery I believe:
https://www.db-fiddle.com/f/2wudMDVxReYJz4FEyG19Va/0
CREATE TABLE users (
user_id INT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY
);
CREATE TABLE users_logins (
user_login_id INT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED NOT NULL
);
INSERT INTO users SELECT 1;
INSERT INTO users SELECT 2;
INSERT INTO users_logins SELECT 1,1;
INSERT INTO users_logins SELECT 2,1;
INSERT INTO users_logins SELECT 3,1;
INSERT INTO users_logins SELECT 4,1;
INSERT INTO users_logins SELECT 5,2;
INSERT INTO users_logins SELECT 6,2;
And the query:
SELECT
u.user_id, ul.latest_login_id
FROM users u
LEFT JOIN
(
SELECT user_id, MAX(user_login_id) latest_login_id
FROM users_logins
GROUP BY user_id
) ul ON u.user_id = ul.user_id
You have to ORDER BY with what column you want to display by desc, for example ORDER BY last_login DESC.
Change the last_login column with the column you want to order, but you must first declare the last_login column after SELECT.
How about replacing all rid in where clause and corrolated subquery by record_id?
SELECT u.user_id, u.name, u.email, r.rid, r.record_id, r.user_id
FROM test_users AS u
LEFT JOIN test_login_records AS r ON r.user_id = u.user_id
WHERE
(r.record_id = (
SELECT MAX(record_id)
FROM test_login_records
WHERE user_id = u.user_id
) OR r.record_id is null);
Test here

mysql explain slow where on left joined table

Playing with a mysql and thinking how to solve one thing in the future. I want to retrieve statuses which are posted by my friends (specific user ids) or are posted inside of the group I follow.
CREATE TABLE `status` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`status` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_F23501207E3C61F9` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1567559 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `group_status` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`group_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_F23501207E3C61F9` (`group_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I fed both tables with 1M rows.
The query I am running:
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
OR gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
ORDER BY s.id DESC
LIMIT 15
The result:
Question one:
Shouldn't be the extra role like: "using index" instead of "where" ?
Question two:
Why is the response time so slow? 2,3s
Edit after Tim's answer:
The filesort behaviour I guess is normal when using union no?
Why there is 'using where' in the second row of explain? If in the third is 'using where, using index' ?
In case of how many returned rows from selects you think this would get slow?
The union select seems to be super fast but there is only few rows returning each select currently. I will try to select more rows in each select.
Where you have an "OR" on different columns, mysql may use none of your indexes.
Usually we can solve the problem using "UNION" two separate queries with each matching one of the criteria.
SELECT id, status, group_id FROM
(
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
UNION
SELECT s.id, s.status, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
) t
ORDER BY id DESC
LIMIT 15
However, in your case, this may NOT help if either query returns large number of records.
Your status column is defined as text, which may cause the file sort. You can check it to a long varchar to see if the filesort goes way. Or try this to avoid worse case scenario:
SELECT ss.id, group_id, ss.status
FROM (
SELECT id, group_id FROM
(
SELECT s.id, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
s.user_id IN (55883,122024,442468,846269,903941,980896,192660,20608,525056,563457)
UNION
SELECT s.id, gs.group_id
FROM status s
LEFT JOIN group_status gs
ON s.id = gs.id
WHERE
gs.group_id IN (78,79,79,80,80,83,84,85,86,87,88,89,89,91,92,92,94,98)
) t
ORDER BY id DESC
LIMIT 15
) f
JOIN status ss
ON f.id =ss.id
ORDER BY ss.id

Summary Join query taking too long

I am trying to get the max record of each 'telephone_number' where process_status='0' and for that I am using the below query.
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (SELECT MAX( id ) AS maxid FROM SPRINTABLE_DATA GROUP BY telephone_number)dt
ON t.id = dt.maxid WHERE process_status = '0'
AND RESET_FLAG = '0'
ORDER BY id DESC limit 0,700
The above query is giving me the desired result but the problem is this is too slow..
My table has about 20 million rows and this query is taking about 15-20 mins at times.
What can be done to improve this?
This is the structure:.
CREATE TABLE `SPRINTABLE_DATA` (
`ID` bigint(11) NOT NULL AUTO_INCREMENT,
`CUSTID` int(11) DEFAULT NULL,
`telephone_number` varchar(20) DEFAULT NULL,
`TOTAL_USAGE` int(11) DEFAULT NULL,
`PROCESS_STATUS` tinyint(4) DEFAULT '0',
`RESET_FLAG` tinyint(4) DEFAULT '0',
`RESET_REASON` varchar(10) DEFAULT NULL,
`PLAN_ID` varchar(20) DEFAULT NULL,
`ACCOUNT_STATUS` varchar(30) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `telephone_number` (`telephone_number`),
KEY `CALL_CUST` (`CALL_START_TIME`,`CUSTID`),
KEY `telephone_number1` (`telephone_number `,`PROCESS_STATUS`,`SOC_ADDED`),
KEY `CURRENT_USAGE` (`CURRENT_USAGE`),
KEY `TOTAL_USAGE` (`TOTAL_USAGE`)
) ENGINE=InnoDB AUTO_INCREMENT=36392272 DEFAULT CHARSET=latin1
It seems that you are looking for the 700 most recently called numbers. (If that is not correct, please edit your question.
Your query follows a good practice for retrieving the latest log row for each item (telephone number in your case), as follows, in your subquery.
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
To optimize the performance of this subquery, you need a compound index on two fields: (telephone_number, id), in that order. If you don't have that index, add it in. This is to allow a so-called loose index scan, an extraordinarily efficient way of satisfying a query.
Secondly, you're looking for (I presume) a small subset of your data. Presumably you have plenty more than 700 distinct telephone_number values. This means you're sorting a lot of data with ORDER BY only to discard it with LIMIT. So, let's do a deferred join, sorting a minimal number of columns, and then retrieving all the information you need.
Here's how to get the ID values of the 700 rows you need
SELECT q.ID /* get our 700 records */
FROM SPRINTABLE_DATA q
JOIN (
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
) r ON q.id = r.id
WHERE q.process_status = '0'
AND q.RESET_FLAG = '0'
ORDER BY q.ID DESC
LIMIT 0,700
This pulls out 700 id numbers. You need to do some experimenting with indexes to find out what helps the most to optimize this. It's possible that an index on
process_status, RESET_FLAG, id
will help. It's also possible that changing the order of columns in the index will help, like this:
id, process_status, RESET_FLAG
Try them both.
Finally, we'll use this as a subquery to carry out the join (the so-called deferred join) to fetch the actual detail records. This technique gets rid of the need for sorting all that data.
SELECT t.ID, t.CUSTID, t.telephone_number, t.TOTAL_USAGE, t.ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (
SELECT q.ID /* get our 700 records */
FROM SPRINTABLE_DATA q
JOIN (
SELECT MAX( id ) AS id
FROM SPRINTABLE_DATA
GROUP BY telephone_number
) r ON q.id = r.id
WHERE q.process_status = '0'
AND q.RESET_FLAG = '0'
ORDER BY q.ID DESC
LIMIT 0,700
) s ON t.ID = s.ID
ORDER BY t.ID DESC
This will yield the same results, but will be faster.
Now, finally, if it's possible to select the latest calls from the 700 numbers that meet your criteria, you can simplify this query a lot. This will change your result set in a subtle way, though. In that case your call-selection subquery will look like this:
SELECT MAX( id ) AS id /* 700 matching numbers */
FROM SPRINTABLE_DATA
WHERE process_status = '0'
AND reset_flag = '0'
GROUP BY telephone_number
ORDER BY ID desc
LIMIT 0,700
With a compound covering index on
reset_flag, process_status, telephone_number, ID
this query will be quite fast. Your final query in this case would be
SELECT t.ID, t.CUSTID, t.telephone_number, t.TOTAL_USAGE, t.ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (
SELECT MAX( id ) AS id /* 700 matching numbers */
FROM SPRINTABLE_DATA
WHERE process_status = '0'
AND reset_flag = '0'
GROUP BY telephone_number
ORDER BY ID desc
LIMIT 0,700
) s ON t.ID = s.ID
ORDER BY t.ID DESC
Made a slight modification to your query
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t
JOIN (SELECT telephone_number,MAX( id ) AS maxid FROM SPRINTABLE_DATA GROUP BY telephone_number)dt
ON t.id = dt.maxid WHERE process_status = '0'
AND RESET_FLAG = '0'
ORDER BY id DESC limit 0,700
Add these indexes if are not there already
ALTER TABLE SPRINTABLE_DATA ADD KEY (telephone_number,id)
ALTER TABLE SPRINTABLE_DATA ADD KEY (process_status,reset_flag,id)
Another option which is probably the fastest is to use a correlated subquery
SELECT ID, CUSTID, telephone_number, TOTAL_USAGE, ACCOUNT_STATUS
FROM SPRINTABLE_DATA t WHERE EXISTS
(SELECT MAX( id ) FROM SPRINTABLE_DATA tt WHERE t.id=tt.id AND tt.process_status = '0'
AND tt.RESET_FLAG = '0' )
ORDER BY id DESC limit 0,700
For this you need
ALTER TABLE SPRINTABLE_DATA ADD KEY (id,process_status,reset_flag)

Optimising a working MYSQL statement

Background
I have a table of "users", a table of "content", and a table of "content_likes". When a user "likes" an item of content, a relation is added to "content_likes". Simple.
Now what I am trying to do is order content based on the number of likes it has received. This is relatively easy, however, I only want to retrieve 10 items at a time and then with a lazy load I am retrieving the next 10 items and so forth. If the select was ordered by time it would be easy to do the offset in the select statement, however, due to the ordering by number of "likes" I need another column I can offset by. So I've added a "rank" column to the result set, then on the next call of 10 items I can offset by this.
This query WORKS and does what I need to do. However, I am concerned about performance. Could anyone advise on optimising this query. Or even possibly a better way of doing it.
DB SCHEMA
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `content` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`owner_id` int(11) NOT NULL,
`added` int(11) NOT NULL,
`deleted` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `content_likes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`added` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
*columns omitted for simplicity
Breakdown of query
group content_id in content_likes relations table, and order by likes desc
add a column "rank" (or row number) to result set and order by this
join "content" table so that any content with a deleted flag can be ommited
only return results where "rank" (or row number) is greater than variable
limit result set to 10
THE MYSQL
SELECT
results.content_id, results.likes, results.rank
FROM
(
SELECT
t1.content_id, t1.likes, #rn:=#rn+1 AS rank
FROM
(
SELECT
cl.content_id,
COUNT(cl.content_id) AS likes
FROM
content_likes cl
GROUP BY
cl.content_id
ORDER BY
likes DESC,
added DESC
) t1, (SELECT #rn:=0) t2
ORDER BY
rank ASC
) results
LEFT JOIN
content c
ON
(c.id = results.content_id)
WHERE
c.deleted <> 1
AND
results.rank > :lastRank
LIMIT
10
MYSQL ALTERNATIVE
SELECT
*
FROM
(
SELECT
results.*, #rn:=#rn+1 AS rank
FROM
(
SELECT
c.id, cl.likes
FROM
content c
INNER JOIN
(SELECT content_id, COUNT(content_id) AS likes FROM content_likes GROUP BY content_id ORDER BY likes DESC, added DESC) cl
ON
c.id = cl.content_id
WHERE
c.deleted <> 1
AND
c.added > :timeago
LIMIT
100
) results, (SELECT #rn:=0) t2
) final
WHERE
final.rank > :lastRank
LIMIT
5
The "Alternative" mysql query works as I would like it too also. Content is ordered by number of likes by users and I can offset by inserting the last row number. What I have attempted to do here is limit the result sets so if and when the tables get large performance isn't hindered too badly. In this example only content from within a timespan, and limit to 100 will be returned. Then I can offset by the row number (lazy load/pagination)
Any help or advice always appreciated. I am relatively a newbie to mysql so be kind :)
You can eliminate the subquery:
SELECT results.content_id, results.likes, results.rank
FROM (SELECT cl.content_id, COUNT(cl.content_id) AS likes, #rn:=#rn+1 AS rank
FROMc content_likes cl cross join
(SELECT #rn:=0) t2
GROUP BY cl.content_id
ORDER BY likes DESC, added DESC
) results LEFT JOIN
content c
ON c.id = results.content_id
WHERE c.deleted <> 1 AND
results.rank > :lastRank
LIMIT 10;
However, I don't think that will have an appreciable affect on performance. What you should probably do is store the last number number of likes and "added" value and use these to filter the data. The query needs to be a little fixed up, because added is not unambiguously defined in the order by clause:
SELECT results.content_id, results.likes, results.rank, results.added
FROM (SELECT cl.content_id, COUNT(cl.content_id) AS likes, MAX(added) as added, #rn:=#rn+1 AS rank
FROMc content_likes cl cross join
(SELECT #rn := :lastRank) t2
WHERE likes < :likes or
likes = :likes and added < :added
GROUP BY cl.content_id
ORDER BY likes DESC, added DESC
) results LEFT JOIN
content c
ON c.id = results.content_id
WHERE c.deleted <> 1 AND
results.rank > :lastRank
LIMIT 10;
This will at least reduce the number of rows that need to be sorted.

Is there a more efficent way to write this query?

Ok imagine the following DB structure
USERS:
id | name | company_id
1 John 1
2 Jane 1
3 Jack 2
4 Jill 3
COMPANIES:
id | name
1 CompanyA
2 CompanyB
3 CompanyC
4 CompanyD
First I want to SELECT all the companies that have more than one user
SELECT
`c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
Easy enough. Now I want to SELECT all the users that belong to a company that has more than one user. I have this combined query but I think this is not efficent
SELECT * FROM `users` WHERE `company_id` = (
SELECT
`c`.`id`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
)
Basically I take the id returned from the first query (companies that have more than 1 user) and then query the users table to find all users with that company.
Why not
SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1
You don't really need any information from the companies table according to the data you say needs returning. "Now I want to SELECT all the users that belong to a company that has more than one user."
try this:
SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1
Simplest way to get the users only is probably to keep the subquery but eliminate the join; since it's not a correlated subquery, it should be fairly efficient (obviously an index on company_id helps here);
SELECT u.* FROM USERS u WHERE company_id IN (
SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);
You could for example rewrite it as a LEFT JOIN, but I suspect it will actually be less efficient since you'd most likely need to use a DISTINCT when using a JOIN;
SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;
An SQLfiddle to test both.
Try also a semi-join query:
SELECT *
FROM users u
WHERE EXISTS (
SELECT null FROM users u1
WHERE u.company_id=u1.company_id
AND u.id <> u1.id
)
demo --> http://www.sqlfiddle.com/#!2/12dc34/2
Assumming that id is a primary key column, creating an index on company_id column gives better performance.
If you are really obsessed with the performance of this query, create a composite index on columns company_id + id:
CREATE INDEX very_fast ON users( company_id, id );
Could you try this?
SELECT users.*
FROM users INNER JOIN
(
SELECT company_id
FROM users
GROUP BY company_id
HAVING COUNT(*) > 1
) x USING(company_id);
You should have an index INDEX(company_id)
Peformance Test
I have tested 3 queries in answers.
Q1 = sub-query (with GROUP BY) and INNER JOIN
Q2 = LEFT JOIN and IS NOT NULL
Q3 = EXISTS
All queries return same result. Test was done with TPC-H lineitem table. And The problem is "find lineitem have more than 1 item"
Test Results
It depends on what you want is retrieving FIRST N row or entire rows.
Q1 (get FIRST 10K rows) : 2.85 sec
Q2 (get FIRST 10K rows) : 0.03 sec
Q3 (get FIRST 10K rows) : 0.03 sec
Q1 (get all rows) : 8.19 sec
Q2 (get all rows) : 34.12 sec
Q3 (get all rows) : 29.54 sec
Schema and DATA
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
*************************** 1. row ***************************
COUNT(*): 11997996
1 row in set (1.68 sec)
mysql> SHOW CREATE TABLE lineitem\G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`l_orderkey` int(11) NOT NULL,
`l_partkey` int(11) NOT NULL,
`l_suppkey` int(11) NOT NULL,
`l_linenumber` int(11) NOT NULL,
`l_quantity` decimal(15,2) NOT NULL,
`l_extendedprice` decimal(15,2) NOT NULL,
`l_discount` decimal(15,2) NOT NULL,
`l_tax` decimal(15,2) NOT NULL,
`l_returnflag` char(1) NOT NULL,
`l_linestatus` char(1) NOT NULL,
`l_shipDATE` date NOT NULL,
`l_commitDATE` date NOT NULL,
`l_receiptDATE` date NOT NULL,
`l_shipinstruct` char(25) NOT NULL,
`l_shipmode` char(10) NOT NULL,
`l_comment` varchar(44) NOT NULL,
PRIMARY KEY (`l_orderkey`,`l_linenumber`),
KEY `l_orderkey` (`l_orderkey`),
KEY `l_partkey` (`l_partkey`,`l_suppkey`),
CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Queries
Q1 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey)
LIMIT 10000;
Q2 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL
LIMIT 10000;
Q3 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
)
LIMIT 10000;
retrieve entire rows
Q1 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey);
Q2 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL;
Q3 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
);