Order By order, parent_id and group? - mysql

I'm really not sure how to exactly phrase what I'm looking to do therefore I'm having trouble searching. I have a table of pages that each have an id, title, order and parent_id if the parent_id is NULL that's considered a top level page. I'm able to almost sort this correctly with ORDER BY by parent_id and order with the following query:
select `id`, `title`, `order`, `parent_id`
from pages
order by `order` AND COALESCE(`parent_id`, `order`), `parent_id` is not null, `order`
The query spits out the following:
id
title
order
parent_id
107fa138
video
0
NULL
8eeda86c
mn
2
NULL
cac640ad
xxe title
3
NULL
1ce4d070
sdfsdfsdf
4
NULL
b45dc24d
another
1
8eeda86c
d3490141
hello
9
8eeda86c
This is almost what I want. Ideally, I'd have the rows with parent_ids directly under the row with that id so ideally the sort order would look like this:
id
title
order
parent_id
107fa138
video
0
NULL
8eeda86c
mn
2
NULL
b45dc24d
another
1
8eeda86c
d3490141
hello
9
8eeda86c
cac640ad
xxe title
3
NULL
1ce4d070
sdfsdfsdf
4
NULL
I don't even know how I would go about this. If someone can point me in the right direction that would be very awesome.

I think your problem is here:
order by `order` AND COALESCE(`parent_id`, `order`), ...
This probably isn't doing what you think it's doing. The AND is a boolean operator, so it will be as if you had written this expression:
order by (`order` > 0) AND (COALESCE(`parent_id`, `order`) > 0), ...
That is, if both order and the other term are nonzero, then 1, else 0.
I think the following would get closer to what you describe:
order by COALESCE(`parent_id`, `id`),
`parent_id` is not null,
`order`
Demo:
create table pages ( id varchar(10) primary key, title text, `order` int, parent_id varchar(10) );
insert into pages (id, title, `order`, parent_id) values
('107fa138', 'video', 0, NULL),
('8eeda86c', 'mn', 2, NULL),
('cac640ad', 'xxe title', 3, NULL),
('1ce4d070', 'sdfsdfsdf', 4, NULL),
('b45dc24d', 'another', 1, '8eeda86c'),
('d3490141', 'hello', 9, '8eeda86c');
select `id`, `title`,
COALESCE(parent_id, id) as cpi,
parent_id is not null as pinn,
`order`,
`parent_id`
from pages
order by COALESCE(`parent_id`, `id`), `parent_id` is not null, `order`
+----------+-----------+----------+------+-------+-----------+
| id | title | cpi | pinn | order | parent_id |
+----------+-----------+----------+------+-------+-----------+
| 107fa138 | video | 107fa138 | 0 | 0 | NULL |
| 1ce4d070 | sdfsdfsdf | 1ce4d070 | 0 | 4 | NULL |
| 8eeda86c | mn | 8eeda86c | 0 | 2 | NULL |
| b45dc24d | another | 8eeda86c | 1 | 1 | 8eeda86c |
| d3490141 | hello | 8eeda86c | 1 | 9 | 8eeda86c |
| cac640ad | xxe title | cac640ad | 0 | 3 | NULL |
+----------+-----------+----------+------+-------+-----------+
By adding in the columns that show the sorting expressions we can see how the sort occurs.
First it sorts by cpi alphabetically. This prefers the parent_id if it is set, but defaults to id.
For ties of cpi, then it sorts by pinn. So 0 comes before 1.
For ties of pinn (i.e. when multiple rows have a value 1), then it sorts by order.
Is this not what you wanted?

Related

GROUP BY + HAVING ignore row

Basically what I wanted is that I can select all the race records with record holder and best time. I looked up about similar queries and managed to find 3 queries that were faster than the rest.
The problem is it completely ignores the race the userid 2 owns the record of.
These are my tables, indexes, and some sample data:
CREATE TABLE `races` (
`raceid` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
PRIMARY KEY (`raceid`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `users` (
`userid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
PRIMARY KEY (`userid`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `race_times` (
`raceid` smallint(5) unsigned NOT NULL,
`userid` mediumint(8) unsigned NOT NULL,
`time` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`raceid`,`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `races` (`raceid`, `name`) VALUES
(1, 'Doherty'),
(3, 'Easter Basin Naval S'),
(5, 'Flint County'),
(6, 'Fort Carson'),
(4, 'Glen Park'),
(2, 'Palomino Creek'),
(7, 'Tierra Robada');
INSERT INTO `users` (`userid`, `name`) VALUES
(1, 'Player 1'),
(2, 'Player 2');
INSERT INTO `race_times` (`raceid`, `userid`, `time`) VALUES
(1, 1, 51637),
(1, 2, 50000),
(2, 1, 148039),
(3, 1, 120516),
(3, 2, 124773),
(4, 1, 101109),
(6, 1, 89092),
(6, 2, 89557),
(7, 1, 77933),
(7, 2, 78038);
So if I run these 2 queries:
SELECT rt1.raceid, r.name, rt1.userid, p.name, rt1.time
FROM race_times rt1
LEFT JOIN users p ON (rt1.userid = p.userid)
JOIN races r ON (r.raceid = rt1.raceid)
WHERE rt1.time = (SELECT MIN(rt2.time) FROM race_times rt2 WHERE rt1.raceid = rt2.raceid)
GROUP BY r.name;
or..
SELECT rt1.*, r.name, p.name
FROM race_times rt1
LEFT JOIN users p ON p.userid = rt1.userid
JOIN races r ON r.raceid = rt1.raceid
WHERE EXISTS (SELECT NULL FROM race_times rt2 WHERE rt2.raceid = rt1.raceid
GROUP BY rt2.raceid HAVING MIN(rt2.time) >= rt1.time);
I receive correct results as shown below:
raceid | name | userid | name | time |
-------+----------------------+--------+----------+--------|
1 | Doherty | 2 | Player 2 | 50000 |
3 | Easter Basin Naval S | 1 | Player 1 | 120516 |
6 | Fort Carson | 1 | Player 1 | 89092 |
4 | Glen Park | 1 | Player 1 | 101109 |
2 | Palomino Creek | 1 | Player 1 | 148039 |
7 | Tierra Robada | 1 | Player 1 | 77933 |
and here is the faulty query:
SELECT rt.raceid, r.name, rt.userid, p.name, rt.time
FROM race_times rt
LEFT JOIN users p ON p.userid = rt.userid
JOIN races r ON r.raceid = rt.raceid
GROUP BY r.name
HAVING rt.time = MIN(rt.time);
and the result is this:
raceid | name | userid | name | time |
-------+----------------------+--------+----------+--------|
3 | Easter Basin Naval S | 1 | Player 1 | 120516 |
6 | Fort Carson | 1 | Player 1 | 89092 |
4 | Glen Park | 1 | Player 1 | 101109 |
2 | Palomino Creek | 1 | Player 1 | 148039 |
7 | Tierra Robada | 1 | Player 1 | 77933 |
As you can see, race "Doherty" (raceid: 1) is owned by "Player 2" (userid: 2) and it is not shown along with the rest of race records (which are all owned by userid 1). What is the problem?
Regards,
Having is a post filter. The query gets all the results, and then further filters them based on having. The GROUP BY compacting the rows based on the group, which gives you the first entry in each set. Since player 1 is the first entry for race 1, that's the result that is being processed by the HAVING. It is then filtered out because its time does not equal the MIN(time) for the group result.
This is why the other ones you posted are using a sub-query. My personal preference is for the first example, as to me it's slightly easier to read. Performance wise they should be the same.
While it's not a bad idea to try and avoid sub queries in the where clause, this is mostly valid when you can accomplish the same result with a JOIN. Other times it's not possible to get the result with a JOIN and a sub query is required.

SQL select with conditions to find other record with better value

I have the following table with some data:
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
CREATE TABLE `activities` (
`id` int(10) UNSIGNED NOT NULL,
`project_id` int(10) UNSIGNED NOT NULL,
`user_id` int(10) UNSIGNED NOT NULL,
`task_hour` double(8,2) NOT NULL,
`validated` tinyint(1) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `activities` (`id`, `project_id`, `user_id`, `task_hour`, `validated`) VALUES
(1, 1, 1, 10.00, 1),
(2, 1, 1, 20.00, 0),
(3, 2, 1, 5.00, 1),
(4, 3, 1, 30.00, 0);
When I do a SELECT user_id,project_id,task_hour,validated FROM activities, here is what I get:
| user_id | project_id | task_hour | validated |
|---------|------------|-----------|-----------|
| 1 | 1 | 10 | true |
| 1 | 1 | 20 | false |
| 1 | 2 | 5 | true |
| 1 | 3 | 30 | false |
I would like to get the following result from a select:
| user_id | task_hour_total |
|---------|-----------------|
| 1 | 45 |
This result comes from the sum of task_hour for user 1 with the condition that the task_hour can be added only if validated is true OR in case validated is false, that there is not a record in the table for the same user_id and project_id with validated is true.
So the reasoning for each line would be:
| user_id | project_id | task_hour | validated |
|---------|------------|-----------|-----------|
| 1 | 1 | 10 | true | -> include in the sum because validated is true
| 1 | 1 | 20 | false | -> do not include in the sum because validated is false and there is the first record which has same user_id, same project_id and validated is true
| 1 | 2 | 5 | true | -> include in the sum because validated is true
| 1 | 3 | 30 | false | -> include in the sum because validated is false and there is no record in this table for user_id 1 and project_id 3 where validated is true
I have tried the following but it tells me that this is not the right structure in mysql. This is a first test to get a column to say if it found another record in the db with validated = true for same user_id and project_id:
select #u = user_id, #p = project_id,task_hour,validated
case when (select count(*) from activities where user_id = #u and project_id = #p and validated = true) > 1 then 'validated found' end as found
from activities
Thank you if you can help me on this one...
This would be very easy in standard SQL where you'd rank the records with ROW_NUMBER, but MySQL doesn't support this standard function. The ranking is simple: per user_id and project_id you want the better record. Better means validated true is preferred to false.
In MySQL true is 1 and false is 0. So you want the maximum validated per user_id and project_id. You can use an IN clause for this.
select user_id, sum(task_hour) as task_hour_total
from activities
where (user_id, project_id, validated) in
(
select user_id, project_id, max(validated)
from activities
group by user_id, project_id
)
group by user_id;
Still a simple query. The difference to the ROW_NUMBER method is that records must be read twice.
Ok, I found a way to do it. It is not very elegant but it works:
SELECT user_id,sum(task_hour)
FROM
(SELECT * FROM activities a1 WHERE a1.project_id NOT IN (SELECT project_id FROM activities as a2 WHERE validated = 1)
UNION SELECT * FROM activities WHERE validated = 1)
AS temp_table
GROUP BY user_id
If anyone knows a better solution than this, don't hesitate otherwise,I will stay with this long and complex select.
I found it simple to do writing the next query. I hope it help you.
SELECT user_id,
SUM(task_hour)
FROM activities
WHERE validated = 1
OR project_id NOT IN (SELECT project_id
FROM activities
WHERE validated = 1)
GROUP BY user_id;

How to filter on master table in a left join query

I have 2 tables POST and COMMENT,
each post has ID, TITLE, CATEGORY_ID AND USER_ID
each comment has ID, COMMENT, POST_ID AND USER_ID
I want to list all posts having category_id=2 and comment.user_id=1
CREATE TABLE post
(`ID` int, `TITLE` varchar(9), `CATEGORY_ID` int, `USER_ID` int)
;
INSERT INTO post
(`ID`, `TITLE`, `CATEGORY_ID`, `USER_ID`)
VALUES
(1, 'My post A', 1, 1),
(2, 'My post B', 2, 1),
(3, 'My post C', 2, 2)
;
CREATE TABLE comment
(`ID` int, `COMMENT` varchar(12), `POST_ID` int, `USER_ID` int)
;
INSERT INTO comment
(`ID`, `COMMENT`, `POST_ID`, `USER_ID`)
VALUES
(1, 'My comment X', 1, 1),
(2, 'My comment Y', 2, 1),
(3, 'My comment Z', 1, 2)
;
This command fetch all posts including the one with category_id = 1
SELECT post.*, comment.comment, comment.post_id, comment.user_id c_user_id
FROM post
LEFT JOIN COMMENT
ON POST.id = COMMENT.post_id
AND COMMENT.user_id=1
AND POST.category_id =2
I get this
+---------+-----------+-------------+---------+--------------+---------+-----------+
| USER_ID | TITLE | CATEGORY_ID | USER_ID | COMMENT | POST_ID | C_USER_ID |
+---------+-----------+-------------+---------+--------------+---------+-----------+
| 1 | MY POST A | 1 | 1 | null | null | null |
| 2 | MY POST B | 2 | 1 | My Comment Y | 2 | 1 |
| 1 | MY POST C | 2 | 2 | null | null | null |
+---------+-----------+-------------+---------+--------------+---------+-----------+
and I'd like to get this (all posts with category_id = 2 so 2 records, where the comment for user_id appears and the other comment to null as below:
+---------+-----------+-------------+---------+--------------+---------+-----------+
| USER_ID | TITLE | CATEGORY_ID | USER_ID | COMMENT | POST_ID | C_USER_ID |
+---------+-----------+-------------+---------+--------------+---------+-----------+
| 2 | MY POST B | 2 | 1 | My Comment Y | 2 | 1 |
| 1 | MY POST C | 2 | 2 | null | null | null |
+---------+-----------+-------------+---------+--------------+---------+-----------+
Thank you in advance for your help
That rule about putting conditions in the on clause for left outer joins . . . well, it applies to conditions on the second table, not the first. So, put that condition in a where clause:
SELECT post.*, comment.comment, comment.post_id, comment.user_id c_user_id
FROM post LEFT JOIN
COMMENT
ON POST.id = COMMENT.post_id AND COMMENT.user_id=1
WHERE POST.category_id = 2;
The way the left outer join works, conceptually is to say: Take a row in the first table. Then find all matching rows in the second table subject to the on condition. Keep all the matches. If there are no matches, keep the row in the first table.
Guess what. This is still the logic, even when the condition filters rows on the first table. So, filtering a left outer join on conditions in the first table has no effect.
All of this similarly applies to right outer joins with all the table references reversed.

how to group by with a sql subqueries

I can't think clearly at the moment, I want to return counts by station_id, an example of output would be:
station 1 has 3 fb post, 6 linkedin posts, 5 email posts
station 2 has 3 fb post, 6 linkedin posts, 5 email posts
So I need to group by the station id, my table structure is
CREATE TABLE IF NOT EXISTS `posts` (
`post_id` bigint(11) NOT NULL auto_increment,
`station_id` varchar(25) NOT NULL,
`user_id` varchar(25) NOT NULL,
`dated` datetime NOT NULL,
`type` enum('fb','linkedin','email') NOT NULL,
PRIMARY KEY (`post_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=x ;
The query I have so far is returning station 0 as having 2 linkedin posts when it has one (2 in the db tho)
SELECT Station_id, (select count(*) FROM posts WHERE type = 'linkedin') AS linkedin_count, (select count(*) FROM posts WHERE type = 'fb') AS fb_count, (select count(*) FROM posts WHERE type = 'email') AS email_count FROM `posts` GROUP BY station_id;
Or, the fastest way, avoiding joins and subselects to get it in the exact format you want:
SELECT
station_id,
SUM(CASE WHEN type = 'linkedin' THEN 1 ELSE 0 END) AS 'linkedin',
SUM(CASE WHEN type = 'fb' THEN 1 ELSE 0 END) AS 'fb',
SUM(CASE WHEN type = 'email' THEN 1 ELSE 0 END) AS 'email'
FROM posts
GROUP BY station_id;
Outputs:
+------------+----------+------+-------+
| station_id | linkedin | fb | email |
+------------+----------+------+-------+
| 1 | 3 | 2 | 5 |
| 2 | 2 | 0 | 1 |
+------------+----------+------+-------+
You may also want to put an index on there to speed it up
ALTER TABLE posts ADD INDEX (station_id, type);
Explain output:
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | posts | index | NULL | station_id | 28 | NULL | 13 | Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
As implied by gnif's answer, having three correlated sub_queries has a performance over-head. Depending on the DBMS you're using, it could perform similarly to having a self join three times.
gnif's methodology ensures that the table is only parsed once, without the need for joins, correlated sub_queries, etc.
The immediately obvious down-side of gnif's answer is that you don't ever get records for 0's. If there are no fb types, you just don't get a record. If that is not an issue, I'd go with his answer. If it is an issue, however, here is a version with similar methodology to gnif, but matching your output format...
SELECT
station_id,
SUM(CASE WHEN type = 'linkedin' THEN 1 ELSE 0 END) AS linkedin_count,
SUM(CASE WHEN type = 'fb' THEN 1 ELSE 0 END) AS fb_count,
SUM(CASE WHEN type = 'email' THEN 1 ELSE 0 END) AS email_count
FROM
posts
GROUP BY
station_id
Give this a go:
SELECT station_id, type, count(*) FROM posts GROUP BY station_id, type
The output format will be a little different to what your attempting to get, but it should provide the statistics your trying to retrieve. Also since its a single query it is much faster.
-- Edit, added example result set
+------------+----------+----------+
| station_id | type | count(*) |
+------------+----------+----------+
| 1 | fb | 2 |
| 1 | linkedin | 3 |
| 1 | email | 5 |
| 2 | linkedin | 2 |
| 2 | email | 1 |
+------------+----------+----------+
try this:
SELECT p.Station_id,
(select count(*) FROM posts WHERE type = 'linkedin' and station_id=p.station_id) AS linkedin_count,
(select count(*) FROM posts WHERE type = 'fb' and station_id=p.station_id) AS fb_count,
(select count(*) FROM posts WHERE type = 'email' and station_id=p.station_id) AS email_count
FROM `posts` p GROUP BY station_id

Mysql, complex ORDER BY

Two columns town and priority.
I need to sort table, so that towns with priority=1 would be first and not sorted by name ASC, while the rest gets sorted by name ASC.
How would i do that?
Thanks ;)
Update
SELECT *
FROM map_towns
ORDER BY priority DESC, town
Like this, but so that priority were from 1 to 12+ instead of 12 to 1.
Like that:
town priority
b_town1 1
a_town2 2
d_town3 3
c_town4 4
a_town5 NULL
b_town6 NULL
c_town7 NULL
d_town8 NULL
etc...
By default, MySQL sorts nulls first
I created a small test case (rows inserted non-sorted on purpose).
create table map_towns(
town varchar(30) not null
,priority int null
);
insert into map_towns(town, priority) values('d_town3', 3);
insert into map_towns(town, priority) values('a_town2', 2);
insert into map_towns(town, priority) values('c_town4', 4);
insert into map_towns(town, priority) values('b_town1', 1);
insert into map_towns(town, priority) values('b_town6', NULL);
insert into map_towns(town, priority) values('d_town8', NULL);
insert into map_towns(town, priority) values('a_town5', NULL);
insert into map_towns(town, priority) values('c_town7', NULL);
The following query should do what you ask for.
select town
,priority
,isnull(priority)
from map_towns
order by isnull(priority), priority, town;
+---------+----------+------------------+
| town | priority | isnull(priority) |
+---------+----------+------------------+
| b_town1 | 1 | 0 |
| a_town2 | 2 | 0 |
| d_town3 | 3 | 0 |
| c_town4 | 4 | 0 |
| a_town5 | NULL | 1 |
| b_town6 | NULL | 1 |
| c_town7 | NULL | 1 |
| d_town8 | NULL | 1 |
+---------+----------+------------------+
Here is a link on ISNULL documentation
My idea:
SELECT * FROM Towns
ORDER BY IF(priority = 1, 0, 1) ASC,
town ASC;
Well just simply make it so that the Priority by default is 0, and then each Town you have you can sort them based on a number. I would normally do something like DisplayOrder which in terms could be your Priority.
something like this.
SELECT * FROM Towns
ORDER BY priority ASC,
name ASC;
So if you have something like
id, name, priority
-----------------------
1, Smithtown, 0
2, Rocktown, 2
3, Georgetown, 1
4, Rockton, 2
The ordering then would be
1, Smithtown, 0
3, Georgetown, 1
4, Rockton, 2
2, Rocktown, 2
SELECT *
FROM map_towns
ORDER BY
priority IS NULL, priority, town