sql query execution order question for group by and select - mysql

the table structure:
create table t_hr_ship (
shipment_id int,
shipper_id int,
date_time date,
pickup_state varchar(20),
dropoff_state varchar(20));
Here are some data in this table:
insert into t_hr_ship values
(1, 1, "2018-01-01", "WA", "OR"),
(2, 1, "2018-01-02", "WA", "OR"),
(3, 1, "2018-01-03", "WA", "OR"),
(4, 1, "2018-01-04", "WA", "OR"),
(5, 2, "2018-01-05", "WA", "OR"),
(6, 3, "2018-01-06", "WA", "OR"),
(7, 2, "2018-02-01", "OR", "WA"),
(8, 4, "2018-02-02", "OR", "WA"),
(9, 3, "2018-02-03", "WA", "CA"),
(10, 5, "2018-02-04", "CA", "OR"),
(11, 2, "2018-03-05", "WA", "TX"),
(12, 3, "2018-01-06", "OR", "CA");
the question is to get top 3 busiest routes in Jan and Feb. note that the route is same for "WA" to "OR" and "OR" to "WA" (the order of two end points don't matter as long as they are the same two end ports).
the solution is as below:
select case when s.pickup_state < s.dropoff_state then s.pickup_state else s.dropoff_state end as pickup,
case when s.pickup_state > s.dropoff_state then s.pickup_state else s.dropoff_state end as dropoff,
count(s.shipment_id) as no_of_shipment
from t_hr_ship s
where month(s.date_time) in ("01","02")
group by pickup, dropoff
order by no_of_shipment desc
limit 3;
this does get what I expect. my question is: I read from online resource that the sql query execution order is from -> where -> group -> having -> select -> order by -> limit", if this is true than this solution should not work because thepickupanddropoffdefined inselectcan't be used ingroup by`. am I missing anything?

You can use least() and greatest() to group by consistently:
select
least(pickup_state, dropoff_state) pickup,
greatest(pickup_state, dropoff_state) dropoff,
count(*) as no_of_shipment
from t_hr_ship s
where month(date_time) in (1, 2)
group by pickup, dropoff
order by no_of_shipment desc
limit 3;
Note that, unlike other RDBMS, MySQL allows the use of column aliases in the GROUP BY clause (and as well in the ORDER BY clause, but this is common in most RDMS).
Demo on DB Fiddle:
pickup | dropoff | no_of_shipment
:----- | :------ | -------------:
OR | WA | 8
CA | OR | 2
CA | WA | 1

Execution order is NOT determined by the query. SQL is not a procedural language, it is a declarative language.
The SELECT statement is describing the result set. In fact, the ultimate execution path may have little resemblance to the actual query -- although MySQL is not as sophisticated as other databases.
What is specified is the order for interpreting the meaning of column aliases in the query. That is what you are referring to.
Some databases, such as MySQL relax the standard and allow column aliases in the GROUP BY. It is as simple as that.

It works because you are using alias name. Yes to use the select columns in group by you have 3 options
case when s.pickup_state < s.dropoff_state then s.pickup_state else s.dropoff_state end
pickup
1

Related

How to select a mysql column values which contains Y first and N second

I want query in mysql to select a column values which contains Y and N.
Below is my table
If I use this query
"SELECT * from hotel where standard='Y' OR standard='N' group by hotel_code";
This query is working based on insert id but my requirement is not like that, first it should select 'Y' first then only 'N' should come.
[![enter image description here][3]][3]
I want select particular these column values
2 ---- 123 ------Y
4 -----324 ------Y
6 -----456 ------N or 5 ------456 -- N any row from when N appear
7 -----987 ------Y
Thanks in advance!!!
My previous answer has a problem which indicated error related to only_full_group_by when executing a query in MySql. However, I have created a local database myself and then came up with the correct sql that you need. Here it is.
SELECT min(origin), hotel_code, max(standard) as std from hotel
where standard='Y' OR standard='N'
group by hotel_code
order by std desc;
And after executing the sql, here's the result that I have got.
1 123 Y
3 324 Y
7 987 Y
5 456 N
I am sharing the create table and insert statements so that anyone can check by themselves if the query is okay.
create table hotel (
origin integer auto_increment primary key,
hotel_code integer not null,
standard varchar(1) not null
);
INSERT INTO `hotel` (`origin`, `hotel_code`, `standard`)
VALUES
(1, 123, 'Y'),
(2, 123, 'N'),
(3, 324, 'N'),
(4, 324, 'Y'),
(5, 456, 'N'),
(6, 456, 'N'),
(7, 987, 'N'),
(8, 987, 'Y');
Hope that helps!
Here what you need:
SELECT origin, hotel_code, CASE COUNT(DISTINCT standart)
WHEN 1 AND standart = "N" THEN "N"
WHEN 1 AND standart = "Y" THEN "Y"
WHEN 2 THEN "Y"
END as standart
FROM hotel
GROUP BY hotel_code ORDER BY standart DESC
Results:
origin hotel_code standart
1 123 Y
3 324 Y
9 888 Y
7 987 Y
6 456 N
SQLFiddle: http://sqlfiddle.com/#!9/17bd53/3/0
There are many approaches to solve your algorithm, however, due to simplicity I pick this one:
SELECT h2.origin, h2.hotel_code, h2.standard
FROM (SELECT * FROM hotel WHERE standard = 'Y') h1
JOIN hotel h2 on h1.hotel_code = h2.hotel_code
ORDER BY h2.hotel_code, h2.standard;
Click here to view it working
Enjoy it!

Nested Set Query to retrieve all ancestors of each node

I have a MySQL query that I thought was working fine to retrieve all the ancestors of each node, starting from the top node, down to its immediate node. However when I added a 5th level to the nested set, it broke.
Below are example tables, queries and SQL Fiddles:
Four Level Nested Set:
CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
`tree` int,
`left` int,
`right` int);
INSERT Tree
VALUES
("Food", 1, 1, 18),
('Fruit', 1, 2, 11),
('Red', 1, 3, 6),
('Cherry', 1, 4, 5),
('Yellow', 1, 7, 10),
('Banana', 1, 8, 9),
('Meat', 1, 12, 17),
('Beef', 1, 13, 14),
('Pork', 1, 15, 16);
The Query:
SELECT t0.title node
,(SELECT GROUP_CONCAT(t2.title)
FROM Tree t2
WHERE t2.left<t0.left AND t2.right>t0.right
ORDER BY t2.left) ancestors
FROM Tree t0
GROUP BY t0.title;
The returned result for node Banana is Food,Fruit,Yellow - Perfect. You can see this here SQL Fiddle - 4 Levels
When I run the same query on the 5 level table below, the 5th level nodes come back in the wrong order:
CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
`tree` int,
`left` int,
`right` int);
INSERT Tree
VALUES
("Food", 1, 1, 24),
('Fruit', 1, 2, 13),
('Red', 1, 3, 8),
('Cherry', 1, 4, 7),
('Cherry_pie', 1, 5, 6),
('Yellow', 1, 9, 12),
('Banana', 1, 10, 11),
('Meat', 1, 14, 23),
('Beef', 1, 15, 16),
('Pork', 1, 17, 22),
('Bacon', 1, 18, 21),
('Bacon_Sandwich', 1, 19, 20);
The returned result for Bacon_Sandwich is Bacon,Food,Meat,Pork which is not the right order, it should be Food,Meat,Pork,Bacon - You can see this here SQL Fiddle - 5 Levels
I am not sure what is happening because I don't really understand subqueries well enough. Can anyone shed any light on this?
EDIT AFTER INVESTIGATION:
Woah!! Looks like writing all this out and reading up about ordering with GROUP_CONCAT gave me some inspiration.
Adding ORDER BY to the actual GROUP_CONCAT function and removing from the end of the subquery solved the issue. I now receive Food,Meat,Pork,Bacon for the node Bacon_Sandwich
SELECT t0.title node
,(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
FROM Tree t2
WHERE t2.left<t0.left AND t2.right>t0.right
) ancestors
FROM Tree t0
GROUP BY t0.title;
I still have no idea why though. Having ORDER BY at the end of the subquery works for 4 levels but not for 5?!?!
If someone could explain what the issue is and why moving the ORDER BY fixes it, I'd be most grateful.
First it's important to understand that you have an implicit GROUP BY
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
To make the point more understandable I'll leave out subqueries and reduce the problem to the banana. Banana is the set [10, 11]. The correct sorted ancestors are those:
SELECT "banana" as node, GROUP_CONCAT(title ORDER by `left`)
FROM Tree WHERE `left` < 10 AND `right` > 11
GROUP BY node;
The ORDER BY must be in GROUP_CONCAT() as you want the aggregation function to sort. ORDER BY outside sorts by the aggregated results (i.e. the result of GROUP_CONCAT()). The fact that it worked until level 4 is just luck. ORDER BY has no effect on an aggregate function. You would get the same results with or without the ORDER BY:
SELECT GROUP_CONCAT(title)
FROM Tree WHERE `left` < 10 AND `right` > 11
/* ORDER BY `left` */
It might help to understand what
SELECT GROUP_CONCAT(title ORDER BY left) FROM Tree WHERE … ORDER BY left does:
Get a selection (WHERE) which results in three rows in an undefined order:
("Food")
("Yellow")
("Fruit")
Aggregate the result into one row (implicit GROUP BY) in order to be able to use an aggregate function:
(("Food","Yellow", "Fruit"))
Fire the aggregate function (GROUP_CONCAT(title, ORDER BY link)) on it. I.e. order by link and then concatenate:
("Food,Fruit,Yellow")
And now finally it sorts that result (ORDER BY). As it's only one row, sorting changes nothing.
("Food,Fruit,Yellow")
You can get the result using JOIN or SUB-QUERY.
Using JOIN:
SELECT t0.title node, GROUP_CONCAT(t2.title ORDER BY t2.left) ancestors
FROM Tree t0
LEFT JOIN Tree t2 ON t2.left < t0.left AND t2.right > t0.right
GROUP BY t0.title;
Check this SQL FIDDLE DEMO
Using SUB-QUERY:
SELECT t0.title node,
(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
FROM Tree t2 WHERE t2.left<t0.left AND t2.right>t0.right) ancestors
FROM Tree t0
GROUP BY t0.title;
Check this SQL FIDDLE DEMO
OUTPUT
| NODE | ANCESTORS |
|----------------|-----------------------|
| Bacon | Food,Meat,Pork |
| Bacon_Sandwich | Food,Meat,Pork,Bacon |
| Banana | Food,Fruit,Yellow |
| Beef | Food,Meat |
| Cherry | Food,Fruit,Red |
| Cherry_pie | Food,Fruit,Red,Cherry |
| Food | (null) |
| Fruit | Food |
| Meat | Food |
| Pork | Food,Meat |
| Red | Food,Fruit |
| Yellow | Food,Fruit |
In your sub query you had used ORDER BY after WHERE clause which won't affect the output. By default GROUP_CONCAT() function will orders the output string in ascending order of column value. It won't consider you explicit ORDER BY clause.
If you check your output of first query which returns the data in ascending order of title column. So the returned result for node Banana is Food,Fruit,Yellow.
But in your second result for Bacon_Sandwich is Bacon,Food,Meat,Pork because in ascending order Bacon comes first than Food will come.
If you want to order the result based on left column than you have to specify ORDER BY inside the GROUP_CONCAT() function as above. Check my both queries.
I prefer that you use JOIN instead of SUB-QUERY for improving performance.

SQL: Query formatting

I have a SQL query with three columns. The first is Year (categorical), the second is Site (categorical) and the last is temperature (float). The rows as unique combinations of Year X Site. For example:
Current query result
Year, Site, Temp
1, 1, x11
1, 2, x12
1, 3, x13
2, 1, x21
2, 2, x22
2, 3, x23
3, 1, x31
3, 2, x32
3, 3, x33
I would like to have each site as a different column, while keeping years as rows. For example:
Desired query result
Year, TSite1, TSite2, TSite3
1, x11, x12, x13
2, x21, x22, x23
3, x31, x23, x33
Any ideas on how to do a query that results in this format? I would not mind using a temporary table or a view to store the information.
Thanks in advance.
SELECT Year,MIN(CASE WHEN Site=1 THEN Temp ELSE 0 END) as Tsite1,
MIN(CASE WHEN Site=2 THEN Temp ELSE 0 END) as Tsite2,
MIN(CASE WHEN Site=3 THEN Temp ELSE 0 END) as Tsite3 FROM table GROUP BY Year
A pivot query is one approach (as mentioned in the comments) If you just want a comma-delimited list of sites, then you can do that with group_concat().
select year, group_concat(temp separator ', ' order by site) as temps
from t
group by year;
I realize this may not be exactly what you want -- you lose the type information for temp by converting it to a string for example. But then again, it may be what you need if you just want to see the temps or export them to another tool.

SQL Query for exact match in many to many relation

I have the following tables(only listing the required attributes)
medicine (id, name),
generic (id, name),
med_gen (med_id references medicine(id),gen_id references generic(id), potency)
Sample Data
medicine
(1, 'Crocin')
(2, 'Stamlo')
(3, 'NT Kuf')
generic
(1, 'Hexachlorodine')
(2, 'Methyl Benzoate')
med_gen
(1, 1, '100mg')
(1, 2, '50ml')
(2, 1, '100mg')
(2, 2, '60ml')
(3, 1, '100mg')
(3, 2, '50ml')
I want all the medicines which are equivalent to a given medicine. Those medicines are equivalent to each other that have same generic as well as same potency. In the above sample data, all the three have same generics, but only 1 and three also have same potency for the corresponding generics. So 1 and 3 are equivalent medicines.
I want to find out equivalent medicines given a medicine id.
NOTE : One medicine may have any number of generics. Medicine table has around 102000 records, generic table around 2200 and potency table around 200000 records. So performance is a key point.
NOTE 2 : The database used in MySQL.
One way to do it in MySQL is to leverage GROUP_CONCAT() function
SELECT g.med_id
FROM
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id = 1 -- here 1 is med_id for which you're trying to find analogs
) o JOIN
(
SELECT med_id, GROUP_CONCAT(gen_id ORDER BY gen_id) gen_id, GROUP_CONCAT(potency ORDER BY potency) potency
FROM med_gen
WHERE med_id <> 1 -- here 1 is med_id for which you're trying to find analogs
GROUP BY med_id
) g
ON o.gen_id = g.gen_id
AND o.potency = g.potency
Output:
| MED_ID |
|--------|
| 3 |
Here is SQLFiddle demo

MySQL: Joins vs. Bitwise operator, and performance thereof

There are a number of questions about this subject, but mine is more specific to performance concerns.
With regards to an object, I want to track a multitude of 'attributes', each with a multitude of discrete 'values' (each attribute have between 3 and 16 valid 'values'.) For instance, consider tracking military personnel. The attributes/values might be (not real, I totally made these up):
attribute: {values}
languages_spoken: {english, spanish, russian, chinese, …. }
certificates: {infantry, airborne, pilot, tank_driver…..}
approved_equipment: {m4, rocket_launcher, shovel, super_secret_radio_thingy….}
approved_operations: {reconnaissance, logistics, invasion, cooking, ….}
awards_won: {medal_honor, purple_heart, ….}
… and so on.
One one to do this - the way I want to do this - is to have a personnel table and an attributes table:
personnel table => [id, name, rank, address…..]
personnel_attributes table => [personnel_id, attribute_id, value_id]
along with the associated attributes and values tables.
So if pesonnel_id=31415 is approved for logistics, there would be the following entry in the personnel_attributes table:
personnel_id | attribute_id | value_id
31415 | 3 | 2
where 3 = attribute_id for "approved_operations" and 2 = value_id for "logistics" (sorry formatting spaces didn't line up.)
Then a search to find all personnel who speak english OR spanish, AND who is infantry OR airborne, AND can operate a shovel OR super_secret_radio_thingy would be something like:
SELECT t1.personnel_id
FROM personnel_attributes t1, personnel_attributes t2, personnel_attributes t3
WHERE ((t1.attribute_id = 1 and t1.value_id = 1) OR (t1.attribute_id = 1 and t1.value_id = 2))
AND ((t2.attribute_id = 2 and t1.value_id = 1) OR (t2.attribute_id = 2 and t1.value_id = 2))
AND ((t3.attribute_id = 3 and t1.value_id = 3) OR (t3.attribute_id = 3 and t1.value_id = 4))
AND t2.personnel_id = t1.personnel_id
AND t3.personnel_id = t1.personnel_id;
Assuming this isn't a totally stupid way to write the SQL query, the problem is that its very slow (even with seemingly relevant indexes.)
So I'm am toying with using bitwise operators instead, where each attribute is a column in a table and each value is a bit. The same search would be:
SELECT personnel_id FROM personnel_attributes
WHERE language & b'00000011'
AND certificates & b'00000011'
AND approved_operations & b'00001100';
I know this does a full table scan, but in my experiments with 350,000 sample personnel, and 16 attributes each, the first method took 20 seconds whereas the bitwise method took 38 milliseconds!
Am I doing something wrong here? Are these the performance results I should expect?
Thanks!
Using the bitwise operation will require evaluating all of the rows. I believe your problem can be solved with a change to your original SELECT statement and how you're joing your tables:
To make it a little easier to read, I've changed attribute values to words instead of integers so it's less confusing while reading through my example, but obviously you can leave them as integers and it concept would still work:
CREATE TABLE PERSONNEL (
ID INT,
NAME VARCHAR(20)
)
CREATE TABLE PERSONNEL_ATTRIBUTES (
PERSONNEL_ID INT,
ATTRIB_ID INT,
ATTRIB_VALUE VARCHAR(20)
)
INSERT INTO PERSONNEL VALUES (1, 'JIM SMITH')
INSERT INTO PERSONNEL VALUES (2, 'JANE DOE')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Spanish')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Russian')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Logistics')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Infantry')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 3, 'Infantry')
SELECT P.ID, P.NAME, PA1.ATTRIB_VALUE AS DESIRED_LANGUAGE, PA2.ATTRIB_VALUE AS APPROVED_OPERATION
FROM PERSONNEL P
JOIN PERSONNEL_ATTRIBUTES PA1 ON P.ID = PA1.PERSONNEL_ID AND PA1.ATTRIB_ID = 1
JOIN PERSONNEL_ATTRIBUTES PA2 ON P.ID = PA2.PERSONNEL_ID AND PA2.ATTRIB_ID = 3
WHERE PA1.ATTRIB_VALUE = 'Spanish' AND (PA2.ATTRIB_VALUE = 'Infantry' OR PA2.ATTRIB_VALUE = 'Airborne')
Have the same issue of using django-bitfield or a separate table for flags.
Inspired by your experiment, I used a 3.5m record table (innodb) and made count() and retrieve queries for both variants. the result was astonishing: approx 5sec vs. 40sec bitfield wins.