MySQL: optimize query for scoring calculation - mysql

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks

Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.

Related

SQL: How to get last date in groupby for getting unique records?

I am working on a data where I have to use multiple joins and figures out that one of the table is producing duplicates as I applied Group by on dates as well and b/c of different dates my query takes in duplicate values.
I wrote following query
SELECT
ll.ID,
ll.EST_DT
gg.col1 ,
ll.EST_CLAIM_DT,
gg.col2
FROM table gg
inner join
(select substr(ID,1,instr(ID,'-',7)-1) EST_ID,
max(est_dt) as EST_DT,
max(EST_CLAIM_DT) as EST_CLAIM_DT
from table group by substr(gg.ID,1,instr(ID,'-',7)-1)) ll
on substr(ID,1,instr(gg.ID,'-',7)-1)=substr(ll.ID,1,instr(ll.ID,'-',7)-1)
GROUP BY
ll.ID,
ll.EST_DT
gg.col1 ,
ll.EST_CLAIM_DT,
gg.col2
Table looks like this:
+-----------------+------------+----------------+------+------+
| ID | est_date | est_claimed_dt | col1 | col2 |
+-----------------+------------+----------------+------+------+
| EST-U-1040452-1 | 28/02/2019 | 28/02/2019 | 50 | 50 |
| EST-U-1040452-2 | 5/10/2020 | 5/10/2020 | 50 | 50 |
+-----------------+------------+----------------+------+------+
Desired output
+---------+-----------+----------------+------+------+
| ID | est_date | est_claimed_dt | col1 | col2 |
+---------+-----------+----------------+------+------+
| 1040452 | 5/10/2020 | 5/10/2020 | 50 | 50 |
+---------+-----------+----------------+------+------+
I get this error as well
Negative sub string length not allowed
P.S. I have search SO for this issue and it helped but couldn't get it to work.

Only return an ordered subset of the rows from a joined table

Given a structure like this in a MySQL database
#data_table
(id) | user_id | time | (...)
#relations_table
(id) | user_id | user_coach_id | (...)
we can select all data_table rows belonging to a certain user_coach_id (let's say 1) with
SELECT rel.`user_coach_id`, dat.*
FROM `relations_table` rel
LEFT JOIN `data_table` dat ON rel.`uid` = dat.`uid`
WHERE rel.`user_coach_id` = 1
ORDER BY val.`time` DESC
returning something like
| user_coach_id | id | user_id | time | data1 | data2 | ...
| 1 | 9 | 4 | 15 | foo | bar | ...
| 1 | 7 | 3 | 12 | oof | rab | ...
| 1 | 6 | 4 | 11 | ofo | abr | ...
| 1 | 4 | 4 | 5 | foo | bra | ...
(And so on. Of course time are not integers in reality but to keep it simple.)
But now I would like to query (ideally) only up to an arbitrary number of rows from data_table per distinct user_id but still have those ordered (i.e. newest first). Is that even possible?
I know I can use GROUP BY user_id to only return 1 row per user, but then the ordering doesn't work and it seems kind of unpredictable which row will be in the result. I guess it's doable with a subquery, but I haven't figured it out yet.
To limit the number of rows in each GROUP is complicated. It is probably best done with an #variable to count, plus an outer query to throw out the rows beyond the limit.
My blog on Groupwise Max gives some hints of how to do such.

SQL 'COUNT' not returning what I expect, and somehow limiting results to one row

Some background: an 'image' is part of one 'photoshoot', and may be a part of zero or many 'galleries'. My tables:
'shoots' table:
+----+--------------+
| id | name |
+----+--------------+
| 1 | Test shoot |
| 2 | Another test |
| 3 | Final test |
+----+--------------+
'images' table:
+----+-------------------+------------------+
| id | original_filename | storage_location |
+----+-------------------+------------------+
| 1 | test.jpg | store/test.jpg |
| 2 | test.jpg | store/test.jpg |
| 3 | test.jpg | store/test.jpg |
+----+-------------------+------------------+
'shoot_images' table:
+----------+----------+
| shoot_id | image_id |
+----------+----------+
| 1 | 1 |
| 1 | 2 |
| 3 | 3 |
+----------+----------+
'gallery_images' table:
+------------+----------+
| gallery_id | image_id |
+------------+----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+------------+----------+
What I'd like to get back, so I can say 'For this photoshoot, there are X images in total, and these images are featured in Y galleries:
+----+--------------+-------------+---------------+
| id | name | image_count | gallery_count |
+----+--------------+-------------+---------------+
| 3 | Final test | 1 | 1 |
| 2 | Another test | 0 | 0 |
| 1 | Test shoot | 2 | 4 |
+----+--------------+-------------+---------------+
I'm currently trying the SQL below, which appears to work correctly but only ever returns one row. I can't work out why this is happening. Curiously, the below also returns a row even when 'shoots' is empty.
SELECT shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
ORDER BY shoots.id DESC
Thanks for taking the time to look at this :)
You are missing the GROUP BY clause:
SELECT
shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
GROUP BY 1, 2 -- Added this line
ORDER BY shoots.id DESC
Note: The SQL standard allows GROUP BY to be given either column names or column numbers, so GROUP BY 1, 2 is equivalent to GROUP BY shoots.id, shoots.name in this case. There are many who consider this "bad coding practice" and advocate always using the column names, but I find it makes the code a lot more readable and maintainable and I've been writing SQL since before many users on this site were born, and it's never cause me a problem using this syntax.
FYI, the reason you were getting one row before, and not getting and error, is that in mysql, unlike any other database I know, you are allowed to omit the group by clause when using aggregating functions. In such cases, instead of throwing a syntax exception, mysql returns the first row for each unique combination of non-aggregate columns.
Although at first this may seem abhorrent to SQL purists, it can be incredibly handy!
You should look into the MySQL function group by.

MySQL: Sort by group and field

I have a table with the following (simplified) structure:
INT id,
INT type,
INT sort
What I need is a SELECT that sorts my data in a way, so that:
all rows of the same type are in sequency, sorted ascendingly by sort internally, and
all "blocks" of one type are sorted by their minimum sort.
Example:
If the table looks like this:
| id | type | sort |
| 1 | 1 | 3 |
| 2 | 3 | 5 |
| 3 | 3 | 1 |
| 4 | 2 | 4 |
| 5 | 1 | 2 |
| 6 | 2 | 6 |
The query should sort the result like this:
| id | type | sort |
| 3 | 3 | 1 |
| 2 | 3 | 5 |
| 5 | 1 | 2 |
| 1 | 1 | 3 |
| 4 | 2 | 4 |
| 6 | 2 | 6 |
I hope this makes it clear enough.
Looks to me, as this should be a very common requirement, but I didn't find any examples close enough to be able to transfer it to my use case on my own. I suppose I can't avoid at least one subquery, but I didn't figure it out on my own.
Any help is appreciated, thanks in advance.
By the way: I'm going to use this query with CakePHP 2.1, so if you know of a comfortable way to do it with Cake, please let me know.
This is simpler than it initially sounds. I believe the following should do the trick:
SELECT a.id, a.type, a.sort
FROM Some_Table as a
JOIN (SELECT type, MIN(sort) as min
FROM Some_Table
GROUP BY type) as b
ON b.type = a.type
ORDER BY b.min, a.type, a.sort
For best (fastest) results, you're probably going to want an index on (type, sort).
You want an additional sort by a.type (instead of (b.min, a.sort)), in case there are two groups with the same sort value (would result in mixed rows). If there are no duplicate values, you can remove it.
sort and type are reserved words on some databases and can cause you problems.
Have you tried?
ORDER BY TYPE DESC, SORT ASC

MySQL using GROUP BY to group by multiple columns

I'd like to use GROUP BY multiple columns, I think it's best to start with an example:
SELECT
eventsviews.eventId,
showsActive.showId,
showsActive.venueId,
COUNT(*) AS count
FROM eventsviews
INNER JOIN events ON events.eventId = eventsviews.eventId
INNER JOIN showsActive ON showsActive.eventId = eventsviews.eventId
WHERE events.status = 1
GROUP BY showsActive.venueId, showsActive.showId, showsActive.eventId
ORDER BY count DESC
LIMIT 100;
Output:
| *eventId* | *showId* | *venueId* | *count* |
+-----------+----------+-----------+---------+
[...snip...]
| 95 | 92099 | 9770 | 32 |
| 95 | 105472 | 10702 | 32 |
| 3804 | 41225 | 8165 | 17 |
| 3804 | 41226 | 8165 | 17 |
| 923 | 2866 | 5451 | 14 |
| 923 | 20184 | 5930 | 14 |
[...snip...]
What I would like instead:
| *eventId* | *showId* | *venueId* | *count* |
+-----------+----------+-----------+---------+
| 95 | 92099 | 9770 | 32 |
| 3804 | 41226 | 8165 | 17 |
| 923 | 20184 | 5930 | 14 |
So, I want my data grouped by eventId, but only once for each showId and venueId ...
I actually have a SQL query that does that, but it has 8 subqueries and is as slow as a T-Ford ... And since this is executed on every page load, speeding things up looks like a good idea!
There are a few questions like this, and I've tried many different things, but I've been at this query for an hour and I can't seem to get it to work as I want :-(
Thanks!
You probably want either a min or a max on showid, and then not include it in the group by, I can't tell which because looking at your "prefered" resultset, you have both.
If you want your data grouped by eventId, group just by eventId and you'll get exactly the result you're looking for.
This is a MySQL feature (?) that it allows you to select non-aggregate columns, in which case it will return the first row available. In other DBMS it's achieved by DISTINCT ON, which is not available in MySQL.