Complex query optimization improve speed - mysql

I have the following query that i would like to optimize:
SELECT
*, #rownum := #rownum + 1 AS rank
FROM (
SELECT
SUM(a.id = 1) as KILLS,
SUM(a.id = 2) as DEATHS,
SUM(a.id = 3) as WINS,
tb1.totalPlaytime,
p.playerName
FROM
(
SELECT
player_id,
SUM(pg.timeEnded - pg.timeStarted) as totalPlaytime
FROM playergame pg
INNER JOIN player p
ON pg.player_id = p.id
WHERE pg.game_id IN(1, 2, 3)
GROUP BY
p.id
ORDER BY
p.playerName ASC
) tb1
INNER JOIN playeraction pa
ON pa.player_id = tb1.player_id
INNER JOIN action a
ON pa.action_id = a.id
INNER JOIN player p
ON pa.player_id = p.id
GROUP BY
p.id
ORDER BY
KILLS DESC) tb2
WHERE tb2.playerName LIKE "%"
Somehow i am having the feeling that this is not suited for mysql. I keep a lot of actions in different tables for a good statistical approach but this slows down everything. (perhaps big data?)
This is my model
Now i tried doing the following:
Combining joins in a view
I Combined the many JOINS into a VIEW. This gave me no improvements.
Index the tables
I indexed the frequently used keys, this did speed up but i can't manage to get the entire resultset below 0.613s.
Start from the action table and use left joins
This gave me a somewhat different approach but yet the joins keep being slow (the first example is still the fastest)
indexes:
Any hints, tips, additions, improvements are welcome

I removed my previous answer as it was wrong and did not help, and here I am just summarizing our conversation in the comments with additional comments from myself
There are several ways to speed up the query.
Make sure you are not making any redundant queries.
Do as few joins as possible.
Make indexes on multiple columns if possible.
Make indexes clustered if needed/possible http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
Regarding the query you wrote in the question:
Remove ORDER BY in the inner query
Remove INNER JOIN in the inner query and replace GROUP BY p.id by GROUP BY player_id
Few words on where indexes make sense and where not.
In your case it would not make sense to have index on gameid on table playergame because that probably would return loads of rows. So that is all what you can do about the most inner query.
The joins can also be a bit optimized if you know what you expect from the tables, i.e., the amount of data they may face. you may think of it as a question are you building database behind a MMO game of FPS. MMO will have millions of users per game, FPS will have only a few. Also different types of games may have different actions. That would imply that you may try to optimize the query by making the index more precise. If you are able to define in the inner join of action that gameid IN (...) then creating an index on tuple (gameid, id) might help.
Wildcart in WHERE clause. You may try to create an index on playername but it will only work if you look with a wildcard at the end of your search string, for one in the beginning you would need a separate index, and hope that query optimizer will be smart enough to switch between them each time you make a query.
Keep in mind that more indexes imply slowed insert and delete, so keep only as few as possible.
Another thing would be redesigning the structure a bit. You may still keep the database normalized, but maybe it would be usefull to have a table with summary of some games. You may have a table with summary of games that happened before yesterday, and your query would only summarize the data for today, and then join both tables if needed. Then you could optimize it by either creating and index on timestamp or partitioning table by day. Everything depends on the load you expect.
The topic is rather deep, so everything depends on what is the story behind the data.

Related

MySQL performance - cross join vs left join

I am wondering how MySQL (or its underlying engine) processes the queries.
There are two set queries below (one uses left join and the other one uses cross join), which eventually will give the same result.
My question is, how come the processing time of the two sets of queries are similar?
What I expected is that the first set query will run quicker because the computer is dealing with left join so the size of the "table" won't be expanding, while the second set of queries makes the size of the "table" (what I assume is that the computer needs to get the result of the cross-join from multiple tables before it can go ahead and do the where clause) relatively larger.
select s.*, a.score as score_01, b.score as score_02
from student s
left join (select \* from sc where cid = '01') a using (sid)
left join (select \* from sc where cid = '02') b using (sid)
where a.score > b.score;
select s.*, a.score as score_01, b.score as score_02
from student s
,(select * from sc where cid = '01') a
,(select * from sc where cid = '02') b
where a.score > b.score and a.sid = b.sid and s.sid = a.sid;
I tried both sets of queries and expected the processing time for the first set query will be shorter, but it is not the case.
Add this to sc:
INDEX(sid, cid, score)
Better yet, if you have a useless id on side replace it with
PRIMARY KEY(sid, cid)`
(Assuming that pair is Unique.)
With either of those fixes, I expect both of your queries run at similar speed, and faster than currently.
For further discussion, please provide SHOW CREATE TABLE.
Addressing some of the Comments
MySQL ignores the keywords INNER, OUTER, and CROSS. So, it up to the WHERE to figure whether it is "inner" or "outer".
MySQL throws the ON and WHERE conditions together (except when it matters for LEFT), then decides what is used for filtering (WHERE) so it may be able to do that first. Then other conditions (which belonged in ON) help it get to the 'next' table.
So... Please use ON to say how the tables are related; use WHERE for filtering. (And don't use the old comma-join.)
That is, MySQL will [usually] look at one table at a time, doing a "Nested Loop Join" (NLJ) to get to the next.
There are many possible ways to evaluate a JOIN; MySQL ponders which one might be best, then uses that.
The order of non-LEFT JOINs does not matter, nor does the order of expressions AND'd together in WHERE.
In some situations, a HAVING expression can (and is) moved to the WHERE clause.
Although FROM comes before WHERE, the two get somewhat tangled up together. But, in general, the clauses are required to be in a certain order, and that order is logically the order that things have to happen in.
It is up to the Optimizer to combine steps. For example
WHERE a = 1
ORDER BY b
and the table has INDEX(a,b) -- The index will be used to do both, essentially at the same time. Ditto for
SELECT a, MAX(b)
...
GROUP BY a
ORDER BY a
can hop through the BTree index on (a,b) and deliver the results without an extra sort pass for either the GROUP BY or ORDER BY.
SELECT x is executed after WHERE y = 'abc' -- Well, in some sense it is. But if you have INDEX(y,x), the Optimizer is smart enough to grab the x values while it is performing the WHERE.
When a WHERE references more than one table of a JOIN, the Optimizer has a quandary. Which table should it start its NLJ with? It has some statistics to help make the decision, but it does not always get it right. It will usually
filter on one of the tables
NLJ to get to the next table, meanwhile throwing in any WHERE clauses for that table in with the ON clause.
Repeat for other tables.
When there is both a WHERE and an ORDER BY, the Optimizer will usually filter filter, then sort. But sometimes (not always correctly) it will decide to use an index for the ORDER BY (thereby eliminating the sort) and filter as it reads the table. LIMIT, which is logically done last further muddies the decision.
MySQL does not have FULL OUTER JOIN. It can be simulated with two JOIN and a UNION. (It is only very rarely needed.)

MySQL - Large database, improve search time

I have two rather big tables (threads and posts) that include a ton of forum posts. I really have to improve my search time. Even doing a normal search where COLUMN = VALUE will take 15 seconds. Doing a LIKE often crash the entire website (timeout).
Here's a picture of my site and two tables:
The threads table contains about 430,000 rows.
The posts table contains about 2,700,000 rows.
And I need to combine these in a query to get the results I want.
Don't bother about the search boxes on the website for now. Let's just start off with this query right here and start improving this one first.
SELECT p.id, t.id, t.title, t.threadstarter, t.replies, t.views, t.board, p.dateposted FROM threads t
JOIN posts p
ON t.id = p.threadid
WHERE t.title = 'sell'
GROUP BY t.id
This query will take about 15 seconds to get all threads and posts where the thread title is "sell". How would I improve this, making it just a second or two? Is this even possible with MySQL in two tables with these sizes?
And from there on, I would have to make a LIKE (unless there is another method). Because the users on the website will most likely not search for an exact match. And I'd want to include any title that includes the world "sell". So that would be like this:
SELECT p.id, t.id, t.title, t.threadstarter, t.replies, t.views, t.board, p.dateposted FROM threads t
JOIN posts p
ON t.id = p.threadid
WHERE t.title LIKE '%sell%'
GROUP BY t.id
Which I am not even going to bother measuring. It's crashing the website (too long time to execute). So this one really(!) needs improvement.
How should I even approach this? Should I even use MySQL? What options do I have? I do not want a user to sit and wait 30-300 seconds for a query to finish. At most 5 seconds.
Is this possible, with such large tables?
I've heard using "MATCH" and "AGAINST" could be better than a "COLUMN" LIKE "VALUE". But then I need to make all the columns freetext. Any downsides of doing that?
If there's anyone out there that's worked with a ~3 million row MySQL database, then please let me know how you handled it (if you did).
Make use of INDEX. Just try to create an index on one of the table which has more records or the master though its an inner join still itll make it easier to inner join the two.
Plus, I simply din understand usage of group by without any aggregation as its select *.. in the query.
CREATE INDEX Index_NAME ON
threads(title);
The correct way to express your first query is:
SELECT p.id, t.id, t.title, t.threadstarter, t.replies, t.views, t.board, p.dateposted
FROM threads t JOIN
posts p
ON t.id = p.threadid
WHERE t.title = 'sell' AND
p.dateposted = (SELECT MIN(p2.dateposted) FROM posts p2 WHERE p2.threadid = p.threadid);
This gets rid of the GROUP BY, so it might improve performance. In particular, you want indexes on:
threads(title, id)
posts(threadid, dateposted)
give these two articles a read.
how to optimize mysql queries for speed and performance
MySQL Optimization
LIKE with a leading wild card must scan all 430,000 rows:
WHERE t.title LIKE '%sell%'
Change to this:
WHERE MATCH(t.title) AGAINST('+sell' IN BOOLEAN MODE)
and have
FULLTEXT(title)
With that setup, the query can go directly to the few rows that have the 'word' sell in it.
Caveat: There are restrictions on what FULLTEXT can search for -- only "words', not "stop words", only words of a certain minimum length, etc.

What is the difference between subquery and a joined subquery?

What is the difference between these two mysql queries
select t.id,
(select count(c.id) from comment c where c.topic_id = t.id) as comments_count
from topic;
AND
select t.id,comments.count from topic
left join
(
select count(c.id) count,c.topic_id from comment c group by topic_id
) as comments on t.id = comments.topic_id
I know theres not much information. Just wanted to know when to use a subquery and joined subquery and whats the difference between them.
Thanks
This is a good question, but I would also add a third option (the more standard way of doing this):
select t.id, count(c.topic_id) as count
from topic left join
comment c
on t.id = c.topic_id
group by t.id;
The first way is often the most efficient in MySQL. MySQL can take advantage of an index on comment(topic_id) to generate the count. This may be true in other databases as well, but it is particularly noticeable in MySQL which does not use indexes for group by in practice.
The second query does the aggregation and then a join. The subquery is materialized, adding additional overhead, and then the join cannot use an index on comment. It could possibly use an index on topic, but the left join may make that option less likely. (You would need to check the execution plan in your environment.)
The third option would be equivalent to the first in many databases, but not in MySQL. It does the join to comment (taking advantage of an index on comment(topic_id), if available). However, it then incurs the overhead of a file sort for the final aggregation.
Reluctantly, I must admit that the first choice is often the best in terms of performance in MySQL, particularly if the right indexes are available. Without indexes, any of the three might be the best choice. For instance, without indexes, the second is the best if comments is empty or has very few topics.

MySQL Selecting things where a condition on a row is met 2 or more times, but showing the two or more results

If I use GROUP BY then I will get just 1 row per group. For example
Sessions table: SessionId (other things)
Actions table: ActionId, SessionId, (other things)
With:
SELECT S.*, A.* FROM ActionList A JOIN SessionList S ON A.SessionId
=S.SessionId
WHERE 1 /*various criteria to filter*/
ORDER BY S.SessionId DESC, ActionId DESC;
Thus showing me the most recent session at the top. Now I want to look at only sessions with 2 or more actions.
If I use GROUP BY A.SessionId then I can get COUNT(ActionId) and use HAVING to look at rows only with the required count, but I wont get both rows (or more) rows, just the one.
I suspect I can do this by JOINing a table with SessionIds and the count of action IDs but I'm fairly new to joins (I could do this via a subquery any ANY).
If a view would help, I would create a view of the form:
SELECT SessionId, COUNT(*) FROM Actions GROUP BY SessionId;
Or put this in brackets and JOIN on it (but I confess I'd have to loop up 3 table joins)
What is the neatest way to do this?
Also is this where "Foreign keys" come into play? That'd probably stop the "ambiguity errors" I get if I don't qualify SessionId. I've avoided them for fear of TRIGGERs, I also didn't know about JOINs and just used subqueries until recently. I've realised it is stupid to avoid things that were added to help.
Additionally I'm quite timid with joins because I know what it does, well worst case. If I JOIN on a table with m rows, and another with n I end up with m*n rows. That could be VERY large! I'm dealing with large tables (as in: schema wont fit in RAM large) so that is quite scary. I do know MySQL optimises well (able to move stuff from HAVING to WHERE and so forth) but still!
If you want to look at sessions with two or more actions, then use a join:
select sl.*
from SessionList sl join
(select SessionId, count(*) as cnt
from Actions
group by SessionId
) a
on sl.SessionId = a.SessionId and cnt > 1;

Improve JOIN query speed

I have this simple join that works great but is HORRIBLY slow I think because the tech table is very large. There are many instances of uid as it tracks timestamp of the uid thus the distinct. What is the best way to speed this query up?
SELECT DISTINCT tech.uid,
listing.empno,
listing.firstname,
listing.lastname
FROM tech,
listing
WHERE tech.uid = listing.empno
ORDER BY listing.empno ASC
First add an Index to tech.UID and listing.EmpNo on their respective tables.
After you are sure there are indexes you can try to re-write your query like this:
SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
If it's still not fast enough, put the word EXPLAIN before the query to get some hints about the execution plan of the query.
EXPLAIN SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
Posts the Explain results so we can get better insight.
Hope it helps,
This is very simple query. Only thing you can do in SQL - you may add indexes on fields used in JOIN/WHERE and ORDER BY clauses (tech.uid, listing.empno), if there are no indexes.
If there are JOIN fields with NULL values - they may ruin your performance. You should filter them in WHERE clause (WHERE tech.uid is not null and listing.empno not null). If there are many rows with JOIN on NULL field - that data may produce cartesian result (not sure how is this called in english) with may contain enormous count of rows.
You may change MySQL configuration. There are many options useful for performance tuning, like key_buffer_size, sort_buffer_size, tmp_table_size, max_heap_table_size, read_buffer_size etc.