DISTINCT on one value from a group selects - mysql

I have following sql query
select devices_device.id , devices_device.code, sss.id as "site_id", sss.name as "site_name"
from devices_device
inner join st_site_site sss on devices_device.site_id = sss.id
where devices_device.deleted = false
order by devices_device.id, devices_device.start_date
I now get a list of device id's. Some of them are the same. I want to do a distinct so I only keep the first record for every device (and due to order by on start_date that would be the most recent device record for that device)
How do I do this? If I do
select distinct devices_device.id , devices_device.code, sss.id as "site_id", sss.name as "site_name"
from devices_device
inner join st_site_site sss on devices_device.site_id = sss.id
where devices_device.deleted = false
order by devices_device.id, devices_device.start_date
nothing happens

You can use the ROW_NUMBER() window function to identify the row you want. Then filtering out the other ones is easy.
For example:
select *
from (
select
d.id, d.start_date, d.code,
s.id as "site_id", s.name as "site_name",
row_number() over(partition by d.id order by start_date desc) as rn
from devices_device d
inner join st_site_site s on d.site_id = s.id
where d.deleted = false
) x
where rn = 1
order by id, start_date
In this query the ROW_NUMBER() value will be 1 for the latest row in each device group. That's how the filtering at the end removes all other rows greater than 1.
NOTE: In case there are collisions (two rows with the same recent start_date) this query will always return a single [though random] row between them.

You should probably use a GROUP BY. Something like:
select distinct devices_device.id , devices_device.code, sss.id as "site_id",
sss.name as "site_name"
from devices_device
inner join st_site_site sss on devices_device.site_id = sss.id
where devices_device.deleted = false
group by devices_device.id
order by devices_device.start_date

You could test for the min start date
drop table if exists devices_device,st_site_site;
create table devices_device(id int,code int,site_id int,start_date date,deleted int);
create table st_site_site(id int,name varchar(10));
insert into devices_device values(1,10,1,'2020-10-01',0),(1,20,1,'2020-09-01',0);
insert into st_site_site values(1,'aaa');
select devices_device.id , devices_device.code, sss.id as "site_id", sss.name as "site_name"
from devices_device
inner join st_site_site sss on devices_device.site_id = sss.id
where devices_device.deleted = false and
devices_device.start_date = (select min(d1.start_date) from devices_device d1 where d1.id = devices_device.id)
order by devices_device.id;
+------+------+---------+-----------+
| id | code | site_id | site_name |
+------+------+---------+-----------+
| 1 | 20 | 1 | aaa |
+------+------+---------+-----------+
1 row in set (0.001 sec)

Related

Get only 1 result per group of ID

I have a list of records (domestic_helper_idcards) and I want to return only one card per staff (domestic_helper_id) that is not deleted (is_deleted = 0), and that has the card_expiration_date furthest in the future (latest expiry date).
Have tried grouping and so on, but cant get it to work. Code below:
SELECT * FROM domestic_helper_idcard
where
is_deleted = 0
order by card_expiration_date desc
This returns the following (image):
I want only records with ID 4 and 5 to be returned. Anyone?
You could use a join with the subquery grouped by domestic_helper_id with an aggregated function eg: max()
SELECT d.*
FROM domestic_helper_idcard d
inner join (
select domestic_helper_id, max(id) max_id
from domestic_helper_idcard
where is_deleted = 0
group by domestic_helper_id
) t on t.domestic_helper_id = d.domestic_helper_id and t.max_id = d.id
order by d.card_expiration_date desc
and as suggested by Jens after clarification using max card_expiration_date
SELECT d.*
FROM domestic_helper_idcard d
inner join (
select domestic_helper_id, max(card_expiration_date) max_date
from domestic_helper_idcard
where is_deleted = 0
group by domestic_helper_id
) t on t.domestic_helper_id = d.domestic_helper_id and t.max_date = d.max_date
order by d.card_expiration_date desc

Why I can not join this query with Max date?

I have an issue with the following mySQL query where it fails when Max date is introduced as shown below.
I get the following error
Error Code: 1054. Unknown column 'order_items.ORDER_ITEM_ID' in 'where
clause'
SET #UserID = 160;
SET #OrderDateTime = '2018-11-13 09:23:45';
SELECT
order_items.ORDER_ID,
listing_region.LIST_REGION_REGION_ID,
listings.LISTING_ID,
order_items.ORDER_REQUIRED_DATE_TIME,
listings.LISTING_NICK_NAME,
order_items.ORDER_QUANTITY,
order_price.ORDER_PRICE_ID,
order_items.ORDER_PORTION_SIZE,
t.LATEST_DATE,
t.ORDER_STATUS
FROM order_status_change, order_items
INNER JOIN listings ON listings.LISTING_ID = order_items.ORDER_LISTING_ID
INNER JOIN listing_region ON listing_region.LIST_REGION_LISTING_ID = listings.LISTING_ID
INNER JOIN order_price ON order_price.ORDERP_ITEM_ID = order_items.ORDER_ITEM_ID
INNER JOIN
(
SELECT MAX(order_status_change.ORDER_STATUS_CHANGE_DATETIME) AS LATEST_DATE, order_status_change.ORDER_ITEM_ID, order_status_change.ORDER_STATUS
FROM order_status_change
WHERE order_status_change.ORDER_ITEM_ID = order_items.ORDER_ITEM_ID
) AS t ON order_status_change.ORDER_ITEM_ID = t.ORDER_ITEM_ID AND order_status_change.ORDER_STATUS_CHANGE_DATETIME = t.LATEST_DATE
WHERE ((order_items.ORDER_USER_ID = #UserID) AND DATE(order_items.ORDER_REQUIRED_DATE_TIME) = DATE(#OrderDateTime))
Any help ?
I have assumed you can join order_status_change on order_items.ID = order_status_change.ORDER_ITEM_ID
If that is valid then I think this will achieve what you are after:
SET #UserID = 160;
SET #OrderDateTime = '2018-11-13 09:23:45';
SELECT
order_items.ORDER_ID
, listing_region.LIST_REGION_REGION_ID
, listings.LISTING_ID
, order_items.ORDER_REQUIRED_DATE_TIME
, listings.LISTING_NICK_NAME
, order_items.ORDER_QUANTITY
, order_price.ORDER_PRICE_ID
, order_items.ORDER_PORTION_SIZE
, t.LATEST_DATE
, order_status_change.ORDER_STATUS
FROM order_items
INNER JOIN listings ON listings.LISTING_ID = order_items.ORDER_LISTING_ID
INNER JOIN listing_region ON listing_region.LIST_REGION_LISTING_ID = listings.LISTING_ID
INNER JOIN order_price ON order_price.ORDERP_ITEM_ID = order_items.ORDER_ITEM_ID
INNER JOIN order_status_change ON order_items.ID = order_status_change.ORDER_ITEM_ID
INNER JOIN (
SELECT
MAX( mc.ORDER_STATUS_CHANGE_DATETIME ) AS LATEST_DATE
, mc.ORDER_ITEM_ID
FROM order_status_change AS mc
GROUP BY
mc.ORDER_ITEM_ID
) AS t
ON order_status_change.ORDER_ITEM_ID = t.ORDER_ITEM_ID
AND order_status_change.ORDER_STATUS_CHANGE_DATETIME = t.LATEST_DATE
WHERE order_items.ORDER_USER_ID = #UserID
AND DATE( order_items.ORDER_REQUIRED_DATE_TIME ) = DATE( #OrderDateTime )
You need to avoid this in future:
FROM order_status_change , order_items
That comma between the 2 table names IS a join, but it is from an older syntax and it is LOWER in precedence than the other joins of your query. Also, by default this comma based join acts as an equivalent to a cross join which MULTIPLIES the number of rows. In brief, please do NOT USE commas between table names.
The other issue is that you were missing a group by clause and I believe you just want to get the "latest" date from this aggregation, once that is determined link back to that table to get the status relevant to that date. (i.e. you can't group by status in the subquery, otherwise you get the latest dateS (one for each status).
Here's a simplified version to illustrate the problem.
DROP TABLE IF exists t,t1;
create table t (id int);
create table t1(id int,dt date);
insert into t values (1),(2);
insert into t1 values (1,'2018-01-01'),(1,'2018-02-01'),(2,'2018-01-01');
select t.*,t2.maxdt
from t
join (select max(dt) maxdt,t1.id from t1 where t1.id = t.id) t2
on t2.id = t.id;
ERROR 1054 (42S22): Unknown column 't.id' in 'where clause'
You could group by in the sub query and then the on clause will come into play
select t.*,t2.maxdt
from t
join (select max(dt) maxdt,t1.id from t1 group by t1.id) t2
on t2.id = t.id;
+------+------------+
| id | maxdt |
+------+------------+
| 1 | 2018-02-01 |
| 2 | 2018-01-01 |
+------+------------+
2 rows in set (0.00 sec)
If you want an answer closer to your problem please add sample data and expected output to the question as text of to sqlfiddle.

MySQL select with group and one to many relations condition

For example have such structure:
CREATE TABLE clicks
(`date` varchar(50), `sum` int, `id` int)
;
CREATE TABLE marks
(`click_id` int, `name` varchar(50), `value` varchar(50))
;
where click can have many marks
So example data:
INSERT INTO clicks
(`sum`, `id`, `date`)
VALUES
(100, 1, '2017-01-01'),
(200, 2, '2017-01-01')
;
INSERT INTO marks
(`click_id`, `name`, `value`)
VALUES
(1, 'utm_source', 'test_source1'),
(1, 'utm_medium', 'test_medium1'),
(1, 'utm_term', 'test_term1'),
(2, 'utm_source', 'test_source1'),
(2, 'utm_medium', 'test_medium1')
;
I need to get agregated values of click grouped by date which contains all of selected values.
I make request:
select
c.date,
sum(c.sum)
from clicks as c
left join marks as m ON m.click_id = c.id
where
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1') OR
(m.name = 'utm_term' AND m.value='test_term1')
group by date
and get 2017-01-01 = 700, but I want to get 100 which means that only click 1 has all of marks.
Or if condition will be
(m.name = 'utm_source' AND m.value='test_source1') OR
(m.name = 'utm_medium' AND m.value='test_medium1')
I need to get 300 instead of 600
I found answer in getting distinct click_id by first query and then sum and group by date with condition whereIn, but on real database which is very large and has id as uuid this request executes extrimely slow. Any advices how to get it work propely?
You can achieve it using below queries:
When there are the three conditions then you have to pass the HAVING count(*) >= 3
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
OR (
m.NAME = 'utm_term'
AND m.value = 'test_term1'
)
GROUP BY id
HAVING count(*) >= 3
) AS t ON cc.id = t.id
GROUP BY cc.DATE
When there are the three conditions then you have to pass the HAVING count(*) >= 2
SELECT cc.DATE
,sum(cc.sum)
FROM clicks AS cc
INNER JOIN (
SELECT id
FROM clicks AS c
LEFT JOIN marks AS m ON m.click_id = c.id
WHERE (
m.NAME = 'utm_source'
AND m.value = 'test_source1'
)
OR (
m.NAME = 'utm_medium'
AND m.value = 'test_medium1'
)
GROUP BY id
HAVING count(*) >= 2
) AS t ON cc.id = t.id
GROUP BY cc.DATE
Demo: http://sqlfiddle.com/#!9/fe571a/35
Hope this works for you...
You're getting 700 because the join generates multiple rows for the different IDs. There are 3 rows in the mark table with ID=1 and sum=100 and there are two rows with ID=2 and sum=200. On doing the join where shall have 3 rows with sum=100 and 2 rows with sum=200, so adding these sum gives 700. To fix this you have to aggregate on the click_id too as illustrated below:
select
c.date,
sum(c.sum)
from clicks as c
inner join (select * from marks where (name = 'utm_source' AND
value='test_source1') OR (name = 'utm_medium' AND value='test_medium1')
OR (name = 'utm_term' AND value='test_term1')
group by click_id) as m
ON m.click_id = c.id
group by c.date;
DEMO SQL FIDDLE
I found the right way myself, which works on large amounts of data
The main goal is to make request generate one table with subqueries(conditions) which do not depend on amount of data in results, so the best way is:
select
c.date,
sum(c.sum)
from clicks as c
join marks as m1 ON m1.click_id = c.id
join marks as m2 ON m2.click_id = c.id
join marks as m3 ON m3.click_id = c.id
where
(m1.name = 'utm_source' AND m1.value='test_source1') AND
(m2.name = 'utm_medium' AND m2.value='test_medium1') AND
(m3.name = 'utm_term' AND m3.value='test_term1')
group by date
So we need to make as many joins as many conditions we have

How can those two SQL statements be combined into one?

I wrote and would like to combine these 2 sql, one is based on results of another. I checked this post, but looks like its not results based. How could I achieve it ?
First sql:
SELECT
`potential`.*,
`customer`.`ID` as 'FID_customer'
FROM
`os_potential` as `potential`,
`os_customer` as `customer`
WHERE `potential`.`FID_author` = :randomID
AND `potential`.`converted` = 1
AND `potential`.`street` = `customer`.`street`
AND `potential`.`zip` = `customer`.`zip`
AND `potential`.`city` = `customer`.`city`;
Second sql:
SELECT
sum(`order`.`price_customer`) as 'Summe'
FROM
`os_order` as `order`,
`RESUTS_FROM_PREVIOUS_SQL_STATEMENT` as `results`
WHERE `order`.`FID_status` = 10
AND `results`.`FID_customer` = `order`.`FID_customer`;
I would like to get everything from first sql + the 'Summe' from second sql.
TABLES
1.Potentials:
+----+------------+-----------+--------+-----+------+
| ID | FID_author | converted | street | zip | city |
+----+------------+-----------+--------+-----+------+
2.Customers:
+----+--------+-----+------+
| ID | street | zip | city |
+----+--------+-----+------+
3.Orders:
+----+--------------+----------------+
| ID | FID_customer | price_customer |
+----+--------------+----------------+
SELECT p.*
, c.ID FID_customer
, o.summe
FROM os_potential p
JOIN os_customer c
ON c.street = p.street
AND c.zip = p.zip
AND c.city = p.city
JOIN
( SELECT FID_customer
, SUM(price_customer) Summe
FROM os_order
WHERE FID_status = 10
GROUP
BY FID_customer
) o
ON o.FID_customer = c.ID
WHERE p.FID_author = :randomID
AND p.converted = 1
;
You would just write a single query like this:
SELECT sum(o.price_customer) as Summe
FROM os_order o JOIN
os_potential p JOIN
os_customer c
ON p.street = c.street AND p.zip = c.zip AND p.city = c.city JOIN
os_order o2
ON o2.FID_customer = c.FID_customer
WHERE p.FID_author = :randomID AND p.converted = 1 AND
o2.FID_status = 10 ;
Notes:
Never use commas in the FROM clause. Always use explicit JOIN syntax with conditions in an ON clause.
Table aliases are easier to follow when they are short. Abbreviations for the table names is commonly used.
Backticks are only necessary when the table/column name needs to be escaped. Yours don't need to be escaped.
If the 1st query return 1 record per customer, then just simply join the 3 tables, keep the sum and use the group by clause:
SELECT
`potential`.*,
`customer`.`ID` as 'FID_customer',
sum(`order`.`price_customer`) as Summe
FROM
`os_potential` as `potential`
INNER JOIN
`os_customer` as `customer`
ON `potential`.`street` = `customer`.`street`
AND `potential`.`zip` = `customer`.`zip`
AND `potential`.`city` = `customer`.`city`
LEFT JOIN
`os_order` as `order`
ON `results`.`FID_customer` = `order`.`FID_customer`
AND `order`.`FID_status` = 10
WHERE `potential`.`FID_author` = :randomID
AND `potential`.`converted` = 1
GROUP BY `customer`.`ID`, <list all fields from potential table>
If the 1st query may return multiple records per customer, then you need to do the summing in a subquery:
SELECT
`potential`.*,
`customer`.`ID` as 'FID_customer',
`order`.Summe
FROM
`os_potential` as `potential`
INNER JOIN
`os_customer` as `customer`
ON `potential`.`street` = `customer`.`street`
AND `potential`.`zip` = `customer`.`zip`
AND `potential`.`city` = `customer`.`city`
LEFT JOIN
(SELECT FID_customer, sum(price_customer) as Summe
FROM `os_order`
WHERE FID_status=10
GROUP BY FID_customer
) as `order`
ON `results`.`FID_customer` = `order`.`FID_customer`
WHERE `potential`.`FID_author` = :randomID
AND `potential`.`converted` = 1
I think you should use a subselect, but be careful with the number of results, it's not the best for performance.
You can do something like this:
SELECT n1, n2, (select count(1) from whatever_table) as n3, n4 from whatever_table
note that the subselect must return just 1 result, in other case you'll have an error

How to one column full result and max of that column value?

SELECT Max(c.vendor_id),c.vendors_id FROM (SELECT distinct a.vendor_id FROM service_master a,products b,vendors v,`vendor_addresses` ad WHERE a.cat_id= 242 AND a.service_id = b.s_sid AND a.is_active =1 AND b.isproductactive = 1 AND v.vendorid = a.vendor_id AND ad.vendorchild_id = a.vendor_id AND v.isvendoractive = 1 LIMIT 10) c ORDER BY c.vendor_id
Questions:
1)I want full result in vendor_id column
2)Max(vendor_id)result
How to get result in single query?
Not tested. But please try this.
select t1.id,t2.id
from detail t1
left join(
select max(id) as id
from detail
) t2 on 1 = 1
select id, ( select max(id) from detail internal_detail
where detail.id = internal_detail.id ) as max from detail