MySQL: Sort by group and field - mysql

I have a table with the following (simplified) structure:
INT id,
INT type,
INT sort
What I need is a SELECT that sorts my data in a way, so that:
all rows of the same type are in sequency, sorted ascendingly by sort internally, and
all "blocks" of one type are sorted by their minimum sort.
Example:
If the table looks like this:
| id | type | sort |
| 1 | 1 | 3 |
| 2 | 3 | 5 |
| 3 | 3 | 1 |
| 4 | 2 | 4 |
| 5 | 1 | 2 |
| 6 | 2 | 6 |
The query should sort the result like this:
| id | type | sort |
| 3 | 3 | 1 |
| 2 | 3 | 5 |
| 5 | 1 | 2 |
| 1 | 1 | 3 |
| 4 | 2 | 4 |
| 6 | 2 | 6 |
I hope this makes it clear enough.
Looks to me, as this should be a very common requirement, but I didn't find any examples close enough to be able to transfer it to my use case on my own. I suppose I can't avoid at least one subquery, but I didn't figure it out on my own.
Any help is appreciated, thanks in advance.
By the way: I'm going to use this query with CakePHP 2.1, so if you know of a comfortable way to do it with Cake, please let me know.

This is simpler than it initially sounds. I believe the following should do the trick:
SELECT a.id, a.type, a.sort
FROM Some_Table as a
JOIN (SELECT type, MIN(sort) as min
FROM Some_Table
GROUP BY type) as b
ON b.type = a.type
ORDER BY b.min, a.type, a.sort
For best (fastest) results, you're probably going to want an index on (type, sort).
You want an additional sort by a.type (instead of (b.min, a.sort)), in case there are two groups with the same sort value (would result in mixed rows). If there are no duplicate values, you can remove it.

sort and type are reserved words on some databases and can cause you problems.
Have you tried?
ORDER BY TYPE DESC, SORT ASC

Related

How can I merge two strings of comma-separated numbers in MySQL?

For example, there are three rooms.
1|gold_room|1,2,3
2|silver_room|1,2,3
3|brown_room|2,4,6
4|brown_room|3
5|gold_room|4,5,6
Then, I'd like to get
gold_room|1,2,3,4,5,6
brown_room|2,3,4,6
silver_room|1,2,3
How can I achieve this?
I've tried: select * from room group by name; And it only prints the first row. And I know CONCAT() can combine two string values.
Please use below query,
select col2, GROUP_CONCAT(col3) from data group by col2;
Below is the Test case,
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ab35e8d66ffe3ac6436c17faf97ee9af
I'm not making an assumption that the lists don't have elements in common on separate rows.
First create a table of integers.
mysql> create table n (n int primary key);
mysql> insert into n values (1),(2),(3),(4),(5),(6);
You can join this to your rooms table using the FIND_IN_SET() function. Note that this cannot be optimized. It will execute N full table scans. But it does create an interim set of rows.
mysql> select * from n inner join rooms on find_in_set(n.n, rooms.csv) order by rooms.room, n.n;
+---+----+-------------+-------+
| n | id | room | csv |
+---+----+-------------+-------+
| 2 | 3 | brown_room | 2,4,6 |
| 3 | 4 | brown_room | 3 |
| 4 | 3 | brown_room | 2,4,6 |
| 6 | 3 | brown_room | 2,4,6 |
| 1 | 1 | gold_room | 1,2,3 |
| 2 | 1 | gold_room | 1,2,3 |
| 3 | 1 | gold_room | 1,2,3 |
| 4 | 5 | gold_room | 4,5,6 |
| 5 | 5 | gold_room | 4,5,6 |
| 6 | 5 | gold_room | 4,5,6 |
| 1 | 2 | silver_room | 1,2,3 |
| 2 | 2 | silver_room | 1,2,3 |
| 3 | 2 | silver_room | 1,2,3 |
+---+----+-------------+-------+
Use GROUP BY to reduce these rows to one row per room. Use GROUP_CONCAT() to put the integers together into a comma-separated list.
mysql> select room, group_concat(distinct n.n order by n.n) as csv
from n inner join rooms on find_in_set(n.n, rooms.csv) group by rooms.room
+-------------+-------------+
| room | csv |
+-------------+-------------+
| brown_room | 2,3,4,6 |
| gold_room | 1,2,3,4,5,6 |
| silver_room | 1,2,3 |
+-------------+-------------+
I think this is a lot of work, and impossible to optimize. I don't recommend it.
The problem is that you are storing comma-separated lists of numbers, and then you want to query it as if the elements in the list are discrete values. This is a problem for SQL.
It would be much better if you did not store your numbers in a comma-separated list. Store multiple rows per room, with one number per row. You can run a wider variety of queries if you do this, and it will be more flexible.
For example, the query you asked about, to produce a result with numbers in a comma-separated list is more simple, and you don't need the extra n table:
select room, group_concat(n order by n) as csv from rooms group by room
See also my answer to Is storing a delimited list in a database column really that bad?

Why should I write the rest of columns into GROUP BY when there is an aggregate function?

I have this table structure:
// mytable
+----+------+-------+-------------+
| id | type | score | unix_time |
+----+------+-------+-------------+
| 1 | 1 | 5 | 1463508841 |
| 2 | 1 | 10 | 1463508842 |
| 3 | 2 | 5 | 1463508843 |
| 4 | 1 | 5 | 1463508844 |
| 5 | 2 | 15 | 1463508845 |
| 6 | 1 | 10 | 1463508846 |
+----+------+-------+-------------+
And here is my query:
SELECT SUM(score), unix_time
FROM mytable
WHERE 1
GROUP BY type
And here is the output:
+-------+-------------+
| score | unix_time |
+-------+-------------+
| 30 | 1463508841 |
| 20 | 1463508843 |
+-------+-------------+
Ok, all fine .. Just there is a thing: Professional people suggest me to write unix_time into GROUP BY. They believe doing that is the base of grouping and aggregate function.
Well why really should I write a (almost) unique column into GROUP BY? If I do that then each row will be a separated group and there will be a lot of extra rows which are useless:
+-------+-------------+
| score | unix_time |
+-------+-------------+
| 30 | 1463508841 |
| 30 | 1463508842 |
| 20 | 1463508843 |
| 30 | 1463508844 |
| 20 | 1463508845 |
| 30 | 1463508846 |
+-------+-------------+
See? There is a lot of extra rows. So why doing that is an standard thing? Why everybody tell me MySQL does work without doing that but no database else doesn't .. Well I really don't understand why should I do that ..!
May please someone make it clear for me and explain me how GROUP BY works exactly? Is that different than my understanding?
Not having unix_time in the GROUP BY clause is a non-standard MySQL hack that I would totally stay away from. The values for unix_type across all the rows with the same type are completely different. How do you know which unix_time should appear?
In your example, you seem perfectly content to use a completely arbitrary value of unix_time per group.
However this is a recipe for disaster. What does it even mean to pick some totally arbitrary value from a group? What if the unix_times were spread out by days or weeks or even years? Which one would you take then?
The reason the pros are telling you to put it in the group by clause is so that the result makes sense! Another approach is to leave unix_time out of the select completely, as the result you are getting shouldn't be relied upon.
Maybe you need something like this:
SELECT type,
SUM(score) as sum_of_score,
MIN(unix_time) as start_unix_time,
MAX(unix_time) as end_unix_time
FROM mytable
WHERE 1
GROUP BY type

Optimize SQL-Query that is using REGEXP in a JOIN

I have the following situation:
Table Words:
| ID | WORD |
|----|--------|
| 1 | us |
| 2 | to |
| 3 | belong |
| 4 | are |
| 5 | base |
| 6 | your |
| 7 | all |
| 8 | is |
| 9 | yours |
Table Sentence:
| ID | SENTENCE |
|----|-------------------------------------------|
| 1 | <<7>> <<6>> <<5>> <<4>> <<3>> <<2>> <<1>> |
| 2 | <<7>> <<8>> <<9>> |
And i want to replace the <<(\d)>> with the equivalent word from the Word-Table.
So the result should be
| ID | SENTENCE |
|----|--------------------------------|
| 1 | all your base are belong to us |
| 2 | all is yours |
What i came up with is the following SQL-Code:
SELECT id, GROUP_CONCAT(word ORDER BY pos SEPARATOR ' ') AS sentence FROM (
SELECT sentence.id, words.word, LOCATE(words.id, sentence.sentence) AS pos
FROM sentence
LEFT JOIN words
ON (sentence.sentence REGEXP CONCAT('<<',words.id,'>>'))
) AS TEMP
GROUP BY id
I made a sqlfiddle for this:
http://sqlfiddle.com/#!2/634b8/4
The code basically is working, but i'd like to ask you pros if there is a way without a derived table or without filesort in the execution plan.
You should make a table with one entry per word, so your sentense (sic) can be made by joining on that table. It would look something like this
SentenceId, wordId, location
2, 7, 1
2, 8, 2
2, 9, 3
They way you have it set up, you are not taking advantage of your database, basically putting several points of data in 1 table-field.
The location field (it is tempting to call it "order", but as this is an SQL keyword, don't do it, you'll hate yourself) can be used to 'sort' the sentence.
(and you might want to rename sentense to sentence?)

MySQL: optimize query for scoring calculation

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.

SQL 'COUNT' not returning what I expect, and somehow limiting results to one row

Some background: an 'image' is part of one 'photoshoot', and may be a part of zero or many 'galleries'. My tables:
'shoots' table:
+----+--------------+
| id | name |
+----+--------------+
| 1 | Test shoot |
| 2 | Another test |
| 3 | Final test |
+----+--------------+
'images' table:
+----+-------------------+------------------+
| id | original_filename | storage_location |
+----+-------------------+------------------+
| 1 | test.jpg | store/test.jpg |
| 2 | test.jpg | store/test.jpg |
| 3 | test.jpg | store/test.jpg |
+----+-------------------+------------------+
'shoot_images' table:
+----------+----------+
| shoot_id | image_id |
+----------+----------+
| 1 | 1 |
| 1 | 2 |
| 3 | 3 |
+----------+----------+
'gallery_images' table:
+------------+----------+
| gallery_id | image_id |
+------------+----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+------------+----------+
What I'd like to get back, so I can say 'For this photoshoot, there are X images in total, and these images are featured in Y galleries:
+----+--------------+-------------+---------------+
| id | name | image_count | gallery_count |
+----+--------------+-------------+---------------+
| 3 | Final test | 1 | 1 |
| 2 | Another test | 0 | 0 |
| 1 | Test shoot | 2 | 4 |
+----+--------------+-------------+---------------+
I'm currently trying the SQL below, which appears to work correctly but only ever returns one row. I can't work out why this is happening. Curiously, the below also returns a row even when 'shoots' is empty.
SELECT shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
ORDER BY shoots.id DESC
Thanks for taking the time to look at this :)
You are missing the GROUP BY clause:
SELECT
shoots.id,
shoots.name,
COUNT(DISTINCT shoot_images.image_id) AS image_count,
COUNT(DISTINCT gallery_images.gallery_id) AS gallery_count
FROM shoots
LEFT JOIN shoot_images ON shoots.id=shoot_images.shoot_id
LEFT JOIN gallery_images ON shoot_images.image_id=gallery_images.image_id
GROUP BY 1, 2 -- Added this line
ORDER BY shoots.id DESC
Note: The SQL standard allows GROUP BY to be given either column names or column numbers, so GROUP BY 1, 2 is equivalent to GROUP BY shoots.id, shoots.name in this case. There are many who consider this "bad coding practice" and advocate always using the column names, but I find it makes the code a lot more readable and maintainable and I've been writing SQL since before many users on this site were born, and it's never cause me a problem using this syntax.
FYI, the reason you were getting one row before, and not getting and error, is that in mysql, unlike any other database I know, you are allowed to omit the group by clause when using aggregating functions. In such cases, instead of throwing a syntax exception, mysql returns the first row for each unique combination of non-aggregate columns.
Although at first this may seem abhorrent to SQL purists, it can be incredibly handy!
You should look into the MySQL function group by.