Creating Temporary Column Order By Slow - mysql

So basically what I'm trying to do is get gained experience, ordering it, then only displaying top 5 or 50. Now note that I'm not SQL expert but I have knowledge of indexes as well as file sorting. The query that I have is filesorting most likely due to "gained_xp" not being a index-- let alone even a column as it's only temporary. There's no clear explanation how to fix this as I'm trying to contain it all in one query. I'm trying to sort nearly 13k rows with that number only expanding. I'd also need the number of rows to be dynamic as well as the time since. Any help would be appreciated. Thank you
Explain Output: Using where; Using temporary; Using filesort
Indexes include: time userid override overallXp overallLevel overallRank
The closest I've gotten to order all rows (which never ends up completing and ends in a mysql reboot) are:
SELECT FROM_UNIXTIME(time, '%Y-%m-%d'), t_u.userid as uid, MAX(t_u.OverallXP)-(SELECT overallXP FROM track_updates WHERE `userid` = t_u.userid AND `time`>'1394323200' ORDER BY id ASC LIMIT 1) as gained_xp
FROM track_updates t_u
WHERE t_u.time>'1394323200'
GROUP BY t_u.userid
If I'd run a query that selects only one user and works correctly is:
SELECT FROM_UNIXTIME(time, '%Y-%m-%d'), (t_u.overallXP)-(SELECT overallXP FROM track_updates WHERE `userid`='1' ORDER BY `id` ASC LIMIT 1) as gained_xp, t_u.userid
FROM track_updates t_u
WHERE t_u.userid='1' AND t_u.time>'1393632000'
ORDER BY t_u.time DESC
LIMIT 1
Sample Data per request:
____________________________________________________________________
| id | userid | time | overallLevel | overallXP | overallRank |
| 1 | 1 | 1394388114 | 1 | 1 | 1 |
| 2 | 1 | 1394389114 | 2 | 10 | 1 |
| 3 | 2 | 1394388114 | 1 | 1 | 2 |
| 4 | 2 | 1394389114 | 1 | 5 | 2 |
| 5 | 2 | 1394390114 | 2 | 7 | 2 |
Output (most recent time; gained xp current-initial; ordered by gain_xp):
____________________________________________
| id | time | userid | gained_xp |
| 1 | March 9th 2014 | 1 | 9 |
| 2 | March 9th 2014 | 2 | 6 |

Related

Why should I write the rest of columns into GROUP BY when there is an aggregate function?

I have this table structure:
// mytable
+----+------+-------+-------------+
| id | type | score | unix_time |
+----+------+-------+-------------+
| 1 | 1 | 5 | 1463508841 |
| 2 | 1 | 10 | 1463508842 |
| 3 | 2 | 5 | 1463508843 |
| 4 | 1 | 5 | 1463508844 |
| 5 | 2 | 15 | 1463508845 |
| 6 | 1 | 10 | 1463508846 |
+----+------+-------+-------------+
And here is my query:
SELECT SUM(score), unix_time
FROM mytable
WHERE 1
GROUP BY type
And here is the output:
+-------+-------------+
| score | unix_time |
+-------+-------------+
| 30 | 1463508841 |
| 20 | 1463508843 |
+-------+-------------+
Ok, all fine .. Just there is a thing: Professional people suggest me to write unix_time into GROUP BY. They believe doing that is the base of grouping and aggregate function.
Well why really should I write a (almost) unique column into GROUP BY? If I do that then each row will be a separated group and there will be a lot of extra rows which are useless:
+-------+-------------+
| score | unix_time |
+-------+-------------+
| 30 | 1463508841 |
| 30 | 1463508842 |
| 20 | 1463508843 |
| 30 | 1463508844 |
| 20 | 1463508845 |
| 30 | 1463508846 |
+-------+-------------+
See? There is a lot of extra rows. So why doing that is an standard thing? Why everybody tell me MySQL does work without doing that but no database else doesn't .. Well I really don't understand why should I do that ..!
May please someone make it clear for me and explain me how GROUP BY works exactly? Is that different than my understanding?
Not having unix_time in the GROUP BY clause is a non-standard MySQL hack that I would totally stay away from. The values for unix_type across all the rows with the same type are completely different. How do you know which unix_time should appear?
In your example, you seem perfectly content to use a completely arbitrary value of unix_time per group.
However this is a recipe for disaster. What does it even mean to pick some totally arbitrary value from a group? What if the unix_times were spread out by days or weeks or even years? Which one would you take then?
The reason the pros are telling you to put it in the group by clause is so that the result makes sense! Another approach is to leave unix_time out of the select completely, as the result you are getting shouldn't be relied upon.
Maybe you need something like this:
SELECT type,
SUM(score) as sum_of_score,
MIN(unix_time) as start_unix_time,
MAX(unix_time) as end_unix_time
FROM mytable
WHERE 1
GROUP BY type

Calculating row indices with subquery having joins, results in A*B examined rows

This question is derived from a one I started previously: Incorrect row index when grouping
Due to different natures, I'm asking here and will provide the answer back there once I have resolved this issue.
I thought about subqueries, and came up with this:
SELECT
mq.*,
#indexer := #indexer + 1 AS indexer
FROM
(
SELECT
p.id,
p.tag_id,
p.title,
p.created_at
FROM
`posts` AS p
LEFT JOIN
`votes` AS v
ON p.id = v.votable_id
AND v.votable_type = "Post"
AND v.deleted_at IS NULL
WHERE
p.deleted_at IS NULL
GROUP BY
p.id
) AS mq
JOIN
(SELECT #indexer := 0) AS i
Which actually works, I get the desired result:
+----+--------+------------------------------------+---------------------+---------+
| id | tag_id | title | created_at | indexer |
+----+--------+------------------------------------+---------------------+---------+
| 2 | 2 | PostPostPost | 2014-10-23 23:53:15 | 1 |
| 3 | 3 | Title | 2014-10-23 23:56:13 | 2 |
| 4 | 2 | GIFGIFIGIIF | 2014-10-23 23:59:03 | 3 |
| 5 | 2 | GIFGIFIGIIF | 2014-10-23 23:59:03 | 4 |
| 6 | 4 | My new avatar | 2014-10-26 22:22:30 | 5 |
| 7 | 5 | Hi, haiii, oh Hey ! | 2014-10-26 22:38:10 | 6 |
| 8 | 6 | Mclaren testing stealth technology | 2014-10-26 22:44:15 | 7 |
| 9 | 7 | Just random thoughts while pooping | 2014-10-26 22:50:03 | 8 |
+----+--------+------------------------------------+---------------------+---------+
The problem now is... I ran a EXPLAIN query, to see how fast it works. And, I have a number there that is really bugging me:
Well, the number is obvious: 252 * 1663 = 419076.
This worries me, though - is the row count normal there, or I have to optimize the query? And if so, then how do I optimize this one?
As of MySQL version 5.7 all joins are treated as nested loop joins.
MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on.
So to answer your question... no, you won't be able to get that row count down. However, by adding indexes to your join columns you may be able to achieve faster results but your row count will be the same.

MySQL: optimize query for scoring calculation

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.

Efficient MySQL Table Setup and Queries

Suppose I have the following database setup (a simplified version from what I actually have):
Table: news_posting (500,000+ entries)
| --------------------------------------------------------------|
| posting_id | name | is_active | released_date | token |
| 1 | posting_1 | 1 | 2013-01-10 | 123 |
| 2 | posting_2 | 1 | 2013-01-11 | 124 |
| 3 | posting_3 | 0 | 2013-01-12 | 125 |
| --------------------------------------------------------------|
PRIMARY posting_id
INDEX sorting ON (is_active, released_date, token)
Table: news_category (500 entries)
| ------------------------------|
| category_id | name |
| 1 | category_1 |
| 2 | category_2 |
| 3 | category_3 |
| ------------------------------|
PRIMARY category_id
Table: news_cat_match (1,000,000+ entries)
| ------------------------------|
| category_id | posting_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 2 | 2 |
| 3 | 2 |
| 1 | 3 |
| 2 | 3 |
| ------------------------------|
UNIQUE idx (category_id, posting_id)
My task is as follows. I must get a list of 50 latest news postings (at some offset) that are active, that are before today's date, and that are in one of the 20 or so categories that are specified in the request. Before I choose the 50 news postings to return, I must sort the appropriate news postings by token in descending order. My query is currently similar to the following:
SELECT DISTINCT posting_id
FROM news_posting np
INNER JOIN news_cat_match ncm ON (ncm.posting_id = np.posting_id AND ncm.category_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))
WHERE np.is_active = 1
AND np.released_date < '2013-01-28'
ORDER BY np.token DESC LIMIT 50
With just one specified category_id the query does not involve a filesort and is reasonably fast because it does not have to process removal of duplicate results. However, calling EXPLAIN on the above query that has multiple category_id's returns a table that says that there is filesort to be done. And, the query is extremely slow on my data set.
Is there any way to optimize the table setup and/or the query?
I was able to get the above query to run even faster than with a single-value category list version by rewriting it as follows:
SELECT posting_id
FROM news_posting np
WHERE np.is_active = 1
AND np.released_date < '2013-01-28'
AND EXISTS (
SELECT ncm.posting_id
FROM news_cat_match ncm
WHERE ncm.posting_id = np.posting_id
AND ncm.category_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
LIMIT 1
)
ORDER BY np.token DESC LIMIT 50
This now takes under a second on my data set.
The sad part is that this is even faster than if there is just one category_id specified. That's because the subset of news items is bigger than with just one category_id, so it finds the results more quickly.
Now my next question is whether this can be optimized for cases when a category has only few news that are spread in time?
The following is still pretty slow on my development machine. Although it's fast enough on the production server, I would like to optimize this if possible.
SELECT DISTINCT posting_id
FROM news_posting np
INNER JOIN news_cat_match ncm ON (ncm.posting_id = np.posting_id AND ncm.category_id = 1)
WHERE np.is_active = 1
AND np.released_date < '2013-01-28'
ORDER BY np.token DESC LIMIT 50
Does anyone have any further suggestions?

SQL 'COUNT' not returning what I expect, and somehow limiting results to one row

Some background: an 'image' is part of one 'photoshoot', and may be a part of zero or many 'galleries'. My tables:
'shoots' table:
+----+--------------+
| id | name |
+----+--------------+
| 1 | Test shoot |
| 2 | Another test |
| 3 | Final test |
+----+--------------+
'images' table:
+----+-------------------+------------------+
| id | original_filename | storage_location |
+----+-------------------+------------------+
| 1 | test.jpg | store/test.jpg |
| 2 | test.jpg | store/test.jpg |
| 3 | test.jpg | store/test.jpg |
+----+-------------------+------------------+
'shoot_images' table:
+----------+----------+
| shoot_id | image_id |
+----------+----------+
| 1 | 1 |
| 1 | 2 |
| 3 | 3 |
+----------+----------+
'gallery_images' table:
+------------+----------+
| gallery_id | image_id |
+------------+----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+------------+----------+
What I'd like to get back, so I can say 'For this photoshoot, there are X images in total, and these images are featured in Y galleries:
+----+--------------+-------------+---------------+
| id | name | image_count | gallery_count |
+----+--------------+-------------+---------------+
| 3 | Final test | 1 | 1 |
| 2 | Another test | 0 | 0 |
| 1 | Test shoot | 2 | 4 |
+----+--------------+-------------+---------------+
I'm currently trying the SQL below, which appears to work correctly but only ever returns one row. I can't work out why this is happening. Curiously, the below also returns a row even when 'shoots' is empty.
SELECT shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
ORDER BY shoots.id DESC
Thanks for taking the time to look at this :)
You are missing the GROUP BY clause:
SELECT
shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
GROUP BY 1, 2 -- Added this line
ORDER BY shoots.id DESC
Note: The SQL standard allows GROUP BY to be given either column names or column numbers, so GROUP BY 1, 2 is equivalent to GROUP BY shoots.id, shoots.name in this case. There are many who consider this "bad coding practice" and advocate always using the column names, but I find it makes the code a lot more readable and maintainable and I've been writing SQL since before many users on this site were born, and it's never cause me a problem using this syntax.
FYI, the reason you were getting one row before, and not getting and error, is that in mysql, unlike any other database I know, you are allowed to omit the group by clause when using aggregating functions. In such cases, instead of throwing a syntax exception, mysql returns the first row for each unique combination of non-aggregate columns.
Although at first this may seem abhorrent to SQL purists, it can be incredibly handy!
You should look into the MySQL function group by.