Interesting mysql 5.6 behavior - mysql

I have this short snippet of code
SELECT candidate.ID
FROM users u
JOIN users candidate ON candidate.a = u.a AND candidate.b < 1
JOIN user_meta meta ON candidate.id = meta.user_id
WHERE u.id = 1
AND candidate.count > 0
ORDER BY meta.updated_at DESC
LIMIT 100
And it finishes in around 8s which I think is far to slow so I started to investigate a bit. I tried experiment with the join conditions
SELECT candidate.ID
FROM users u
JOIN users candidate ON candidate.a = u.a AND candidate.b < 2
JOIN user_meta meta ON candidate.id = meta.user_id
WHERE u.id = 1
AND candidate.count > 0
ORDER BY meta.updated_at DESC
LIMIT 100
and interesting enough this finishes in ~80ms. The only thing changed is the less than 1 to a less than 2.
Running EXPLAIN on the query yields the following for both queries
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u const PRIMARY,index_a PRIMARY 4 const 1 NULL
1 SIMPLE meta index PRIMARY index_meta_on_updated_at 5 NULL 100 Using index
1 SIMPLE candidate eq_ref PRIMARY,index_a PRIMARY 4 db.meta.user_id 1 Using where
Probably something I have missed but what can cause this behavior?

Could you provide more information about:
- The table ( you can use describe )
- How many records has every table
- Does the tables has index ?
For troubleshooting you can use in MySQL the explain extended like Mjh told you. Provide us the explain for every query this will help us give you a better advice or even help you to make your query better.

Related

Alternative of NOT IN On MySQL

I have a query
SELECT DISTINCT phoneNum
FROM `Transaction_Register`
WHERE phoneNum NOT IN (SELECT phoneNum FROM `Subscription`)
LIMIT 0 , 1000000
It takes too much time to execute b/c Transaction_Register table has millions of records
is there any alternative of above query I will be grateful to you guys if there is any.
An alternative would be to use a LEFT JOIN:
select distinct t.phoneNum
from Transaction_Register t
left join Subscription s
on t.phoneNum = s.phoneNum
where s.phoneNum is null
LIMIT 0 , 1000000;
See SQL Fiddle with Demo
I doubt whether LEFT JOIN truly perform better than NOT IN. I just perform a few tests with the following table structure (if I am wrong please correct me):
account (id, ....) [42,884 rows, index by id]
play (account_id, playdate, ...) [61,737 rows, index by account_id]
(1) Query with LEFT JOIN
SELECT * FROM
account LEFT JOIN play ON account.id = play.account_id
WHERE play.account_id IS NULL
(2) Query with NOT IN
SELECT * FROM
account WHERE
account.id NOT IN (SELECT play.account_id FROM play)
Speed test with LIMIT 0,...
LIMIT 0,-> 100 150 200 250
-------------------------------------------------------------------------
LEFT 3.213s 4.477s 5.881s 7.472s
NOT EXIST 2.200s 3.261s 4.320s 5.647s
--------------------------------------------------------------------------
Difference 1.013s 1.216s 1.560s 1.825s
As I increase the the limit, the difference is getting larger and larger
With EXPLAIN
(1) Query with LEFT JOIN
SELECT_TYPE TABLE TYPE ROWS EXTRA
-------------------------------------------------
SIMPLE account ALL 42,884
SIMPLE play ALL 61,737 Using where; not exists
(2) Query with NOT IN
SELECT_TYPE TABLE TYPE ROWS EXTRA
-------------------------------------------------
SIMPLE account ALL 42,884 Using where
DEPENDENT SUBQUERY play INDEX 61,737 Using where; Using index
It seem like the LEFT JOIN does not make use of index
LOGIC
(1) Query with LEFT JOIN
After LEFT JOIN between account and play will produce 42,884 * 61,737
= 2,647,529,508 rows. Then check if play.account_id is NULL on those rows.
(2) Query with NOT IN
Binary search takes log2(N) for item existence. That's mean 42,884 * log2(61,737) = 686,144 steps

mysql join optimization for big query

i have big query like this and i can't totally rebuild application because of customer:
SELECT count(AdvertLog.id) as count,AdvertLog.advert,AdvertLog.ut_fut_tstamp_dmy as day,
AdvertLog.operation,
Advert.allow_clicks,
Advert.slogan as name,
AdvertLog.log,
(User.tx_reality_credit
+-20
-(SELECT COUNT(advert_log.id) FROM advert_log WHERE ut_fut_tstamp_dmy <= day AND operation = 0 AND advert IN (168))
+(SELECT IF(ISNULL(SUM(log)),0,SUM(log)) FROM advert_log WHERE ut_fut_tstamp_dmy <= day AND operation IN (1, 2) AND advert = 40341 )) AS points
FROM `advert_log` AS AdvertLog
LEFT JOIN `tx_reality_advert` Advert ON Advert.uid = AdvertLog.advert
LEFT JOIN `fe_users` AS User ON (User.uid = Advert.user or User.uid = AdvertLog.advert)
WHERE User.uid = 40341 and AdvertLog.id>0
GROUP BY AdvertLog.ut_fut_tstamp_dmy, AdvertLog.advert
ORDER BY AdvertLog.ut_fut_tstamp_dmy_12 DESC,AdvertLog.operation,count DESC,name
LIMIT 0, 15
It takes 1.5s approximately which is too long.
Indexes:
User.uid
AdvertLog.advert
AdvertLog.operation
AdvertLog.advert
AdvertLog.ut_fut_tstamp_dmy
AdvertLog.id
Advert.user
AdvertLog.log
Output of Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY User const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
1 PRIMARY AdvertLog range PRIMARY,advert PRIMARY 4 NULL 21427 Using where
1 PRIMARY Advert eq_ref PRIMARY PRIMARY 4 etrend.AdvertLog.advert 1 Using where
3 DEPENDENT SUBQUERY advert_log ref ut_fut_tstamp_dmy,operation,advert advert 5 const 1 Using where
2 DEPENDENT SUBQUERY advert_log index_merge ut_fut_tstamp_dmy,operation,advert advert,operation 5,2 NULL 222 Using intersect(advert,operation); Using where
Can anyone help me, because i tried different things but no improvements
The query is pretty large, and I'd expect this to take a fair bit of time, but you could try adding an index on Advert.uid, if it's not present. Other than that, someone with much better SQL-foo than I will have to answer this.
First, your WHERE clause is based on a specific "User.ID", yet there is an index on the Advert_Log by the Advert (user ID). So, first, change the WHERE clause to reflect this...
Where
AdverLog.Advert = 40341
Then, remove the "LEFT JOIN" to just a "JOIN" to the user table.
Finally (without a full rewrite of the query), I would tack on the "STRAIGHT_JOIN" keyword...
select STRAIGHT_JOIN
... rest of query ...
Which tells the optimizer to perform the query in the order / relations explicitly stated.
Another area to optimize would be to pre-query the "points" (counts and logs based on advert and operation) once and pull the answer from that (as a subquery) instead of it running through two queries)... but I'd be interested to know impact of above WHERE, JOIN and STRAIGHT_JOIN helps.
Additionally, looking at the join to the user table based on EITHER The Advert_Log.Advert (userID), or the TX_Reality_Credit.User (another user ID which does not appear to be the same since the join between Advert_Log and TX_Reality_Credit (TRC) is based on the TRC.UID) unless that is an incorrect assumption. This could possibly give erroneous results as you are testing for MULTIPLE User IDs... the advert user, and whoever else is the "user" from the "TRC" table... which would result in which user's credit is being applied to the "points" calculation.
To better understand the relationship and context, can you give some more clarification of what is IN these tables from the Advert_Log to TX_Reality_Credit perspective, and the Advert vs UID vs User...

MySQL: Ordering By Using Two Counts From Another Table?

I have one sql table that looks like this called "posts":
id | user
--------------------------------
0 | tim
1 | tim
2 | bob
And another called "votes" that stores either upvotes or downvotes on the posts in the "posts" table:
id | postID | type
--------------------------------
0 | 0 | 0
1 | 2 | 1
2 | 0 | 1
3 | 0 | 1
4 | 3 | 0
In this table, the 'type' is either a 0 for downvote or 1 for upvote.
How would I go about ordering posts by "tim" by the number of (upvotes - downvotes) the post has?
SELECT
p.id,
p.user,
SUM(v.type * 2 - 1) AS votecount
FROM posts p
LEFT JOIN votes v ON p.id = v.postID
WHERE p.user = 'tim'
GROUP BY p.id, p.user
ORDER BY votes DESC
UPDATE – p and v explained.
In this query, p and v are aliases of, respectively, posts and votes. An alias is essentially an alternative name and it is defined only within the scope of the statement that declares it (in this case, the SELECT statement). Not only a table can have an alias, but a column too. In this query, votecount is an alias of the column represented by the SUM(v.type * 2 - 1) expression. But presently we are talking only about tables.
Before I go on with explanation about table aliases, I'll briefly explain why you may need to prefix column names with table names, like posts.id as opposed to just id. Basically, when a query references more than one table, like in this case, you may find it quite useful always to prefix column names with the respective table names. That way, when you are revisiting an old script, you can always tell which column belongs to which table without having to look up the structures of the tables referenced. Also it is mandatory to include the table reference when omitting it creates ambiguity as to which table the column belongs to. (In this case, referencing the id column without referencing the posts table does create ambiguous situation, because each table has got their own id.)
Now, a large and complex query may be difficult to read when you write out complete table names before column names. This is where (short) aliases come in handy: they make a query easier to read and understand, although I've already learnt that not all people share that opinion, and so you should judge for yourself: this question contains two versions of the same query, one with long-named table references and the other with short-aliased ones, as well as an opinion (in a comment to one of the answers) why aliases are not suitable.
Anyway, using short table aliases in this particular query may not be as beneficial as in some more complex statements. It's just that I'm used to aliasing tables whenever the query references more than one.
This MySQL documentation article contains the official syntax for aliasing tables in MySQL (which is actually the same as in standard SQL).
Not tested, but should work:
select post.id, sum(if(type = 0, -1, 1)) as score
from posts join votes on post.id = votes.postID
where user = 'tim'
group by post.id
order by score
Do you plan to concur SO? ;-)
Edit: I cut out the subquery since in mysql its unnecessary. The original query was portable, but unnecessary for mysql.
select
p.id, SUM(case
when v.type = 0 then -1
when v.type = 1 then 1
else 0 end) as VoteCount
from
posts p
left join votes v
on p.id = v.postid
where
p.[user] = 'tim'
group by
p.id
order by
VoteCount desc

From what MySQL index would this query benefit?

UPDATE: Added the query that runs second most:
(maybe needed when taking an index in consideration???)
SELECT m.time, m.message, m.receiver_uid AS receiver, m.sender_uid AS sender
FROM messages AS m, users AS u
WHERE u.uid = '$coID'
AND ( (m.receiver_uid = '$meID' AND m.sender_uid = '$coID') OR
(m.receiver_uid = '$coID' AND m.sender_uid = '$meID') )
ORDER BY m.time DESC
$meID is the iD of the user who runs the wuery,
$coID is the ID of the contact.
I've got a somewhat big query and it runs everytime an user visits my page.
SELECT m2.message, m2.time, m2.sender_uid AS sender, m2.receiver_uid AS receiver,
m.contact, u.ufirstname
FROM ( SELECT CASE
WHEN sender_uid = '$me' THEN receiver_uid
ELSE sender_uid
END AS contact,
MAX(time) AS maxtime
FROM messages
WHERE sender_uid = '$me' OR receiver_uid = '$me'
GROUP BY CASE
WHEN sender_uid = '$me' THEN receiver_uid
ELSE sender_uid
END ) AS m
INNER JOIN messages m2 ON m.maxtime = m2.time
AND ((m2.sender_uid = '$me' AND m2.receiver_uid = m.Contact)
OR (m2.receiver_uid = '$me' AND m2.sender_uid = m.Contact))
INNER JOIN users AS u ON m.contact = u.uid
ORDER BY time DESC
$me is the ID of the user who runs the query
This query will (successfully) retrieve:
LAST MESSAGE from EVERY 'CONVERSATION' ordered by TIME.
So it will get the last message (whether the message is send or received) in every PM session
And than sort those by time, and retrieves the contacts information.
Please tell me if I didn't explain it correctly.
My MySQL table looks like this:
receiver_id | sender_id | message | time
From what index(es) would this query benefit?
(The user table already has an primary key on the ID so the part where the join retrieves the contacts name should be efficient)
EXPLAIN OUTPUTs:
The BIG query:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
1 PRIMARY m2 ALL NULL NULL NULL NULL 42 Using where
1 PRIMARY u eq_ref PRIMARY PRIMARY 4 m.contact 1 Using where
2 DERIVED messages ALL NULL NULL NULL NULL 42 Using where; Using temporary; Using filesort
The query in the update part:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u const PRIMARY PRIMARY 4 const 1 Using index; Using filesort
1 SIMPLE m ALL NULL NULL NULL NULL 42 Using where
As your messages table grows, this query will start becoming slower and slower. Depending on the number of conversations that the user is a part of, you will start seeing exponentially decaying performance. While an individual index on messages.time, messages.sender_uid and messages.receiver_uid will help for now, no index will help you in your long run, unless you trim your messages table. Especially when you have more than a few hundred thousand messages.
I would suggest maintaining an association type type that links a user to a conversation and their last message id. Something looking like:
user_id | conversation_id | message_id
You then look up this table, instead of performing a complicated and expensive query. This greatly reduces the number of scans that you need to do on your messages table. While, it does slightly increase the complexity, the performance will not degrade as much as your query above.
I found by trail and error that indexing the time decreases the load time.
So that will probably be the answer to my question.

Slow query takes .0007s? Why is this in my slowlog?

SELECT vt.vtid, vt.tag, vt.typeid, vt.id, vt.count, tt.type, u.username, vt.date_added, tc.context, tc.contextid
FROM ( vt, tt, u )
LEFT JOIN tc ON ( vt.vtid = tc.vtid AND tc.userid = vt.userid )
WHERE vt.typeid = tt.typeid
AND vt.verified =0
AND vt.userid = u.userid
ORDER BY vt.date_added DESC
LIMIT 1
takes .0007s to complete
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE vt ref typeid,userid,verified verified 1 const 9 Using where; Using filesort
1 SIMPLE tt eq_ref PRIMARY PRIMARY 4 vt.typeid 1
1 SIMPLE tc ref vtid vtid 4 vt.vtid 3
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 vt.userid 1 Using where
How can I change this to not show up in the slow query log?
Just a guess. It's possible that you set log-queries-not-using-indexes flag. According to documentation, it may cause queries to be logged in slow log even if indexes are used.
I'm pretty sure that a1ex07 is correct.
However if you want to speed this query up slightly you can change your index on tc from being an index on vtid to being an index on (vtid, userid). Compound keys like that are much faster if you're joining on both keys, and are almost exactly as fast if you're just joining on the first field.