Get largest values from multiple columns from latest timestamps in MySql - mysql

I'm trying to get a list of the*usedpc values across multiple similar columns, and order desc to get worst offenders. Also, I need to only select the values from the most recent timestamp for each sys_id.
Example data:
Sys_id | timestamp | disk0_usedpc | disk1_usedpc | disk2_usedpc
---
1 | 2016-05-06 15:24:10 | 75 | 45 | 35
1 | 2016-04-06 15:24:10 | 70 | 40 | 30
2 | 2016-05-06 15:24:10 | 23 | 28 | 32
3 | 2016-05-06 15:24:10 | 50 | 51 | 55
Desired result (assuming limit 2 for example):
1 | 2016-05-06 15:24:10 | disk0_usedpc | 75
3 | 2016-05-06 15:24:10 | disk2_usedpc | 55
I know I can get the max from each column using greatest, max and group timestamp to get only the latest values, but I can't figure out how to get the whole ordered list (not just max/greatest from each column, but the "5 highest values across all 3 disk columns").
EDIT: I set up a SQLFiddle page:
http://sqlfiddle.com/#!9/82202/1/0
EDIT2: I'm very sorry about the delay. I was able to get all three solutions to work, thank you. If #PetSerAl can put his solution in an answer, I'll mark it as accepted, as this solution allowed me to very smoothly customise further.

You can join vm_disk table with three row table to create separate row for each of yours disks. Then, as you have row per disk now, you can easily filter or sort them.
select
`sys_id`,
`timestamp`,
concat('disk', `disk`, '_usedpc') as `name`,
case `disk`
when 0 then `disk0_usedpc`
when 1 then `disk1_usedpc`
when 2 then `disk2_usedpc`
end as `usedpc`
from
`vm_disk` join
(
select 0 as `disk`
union all
select 1
union all
select 2
) as `t`
where
(`sys_id`, `timestamp`) in (
select
`sys_id`,
max(`timestamp`)
from `vm_disk`
group by `sys_id`
)
order by `usedpc` desc
limit 5

Maybe something like this would work... I know it may look pretty redundant but it could save overhead caused by doing multiple joins to the same table:
SELECT md.Sys_id,
md.timestamp,
CASE
WHEN
md.disk0_usedpc > md.disk1_usedpc
AND
md.disk0_usedpc > md.disk2_usedpc
THEN 'disk0_usedpc'
WHEN
md.disk1_usedpc > md.disk0_usedpc
AND
md.disk1_usedpc > md.disk2_usedpc
THEN 'disk1_usedpc'
ELSE 'disk2_usedpc'
END AS pcname,
CASE
WHEN
md.disk0_usedpc > md.disk1_usedpc
AND
md.disk0_usedpc > md.disk2_usedpc
THEN md.disk0_usedpc
WHEN
md.disk1_usedpc > md.disk0_usedpc
AND
md.disk1_usedpc > md.disk2_usedpc
THEN md.disk1_usedpc
ELSE md.disk2_usedpc
END AS pcusage
FROM mydatabase md
GROUP BY md.Sys_id HAVING MAX(md.timestamp)
ORDER BY pcusage DESC

Try this:
select
t1.sys_id, t1.`timestamp`,
case locate(greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc), concat_ws(',' ,disk0_usedpc ,disk1_usedpc ,disk2_usedpc))
when 1 then 'disk0_usedpc'
when 1 + length(concat(disk0_usedpc, ',')) then 'disk1_usedpc'
when 1 + length(concat(disk0_usedpc, ',', disk1_usedpc, ',')) then 'disk2_usedpc'
end as usedpc,
greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc) as amount
from yourtable t1
join (
select max(`timestamp`) as `timestamp`, sys_id
from yourtable
group by sys_id
) t2 on t1.sys_id = t2.sys_id and t1.`timestamp` = t2.`timestamp`
order by t1.`timestamp` desc
-- limit 2
SQLFiddle Demo
How it works, the sub query here is try to get the latest row for each group sys_id, as one kind of way in many solutions. Then you should get the greatest column in disk0_usedpc ,disk1_usedpc ,disk2_usedpc, as you wrote in your question, the function greatest is the plan. So greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc) as amount can help you get the amount.
But also you want that column's name, here I used locate and concat, concat_ws(which avoids writing so many separators, here is comma ,).
Let's take row 1 | 2016-05-06 15:24:10 | 75 | 45 | 35 as an example:
concat_ws(',' ,disk0_usedpc ,disk1_usedpc ,disk2_usedpc) will give us "75,45,35", here 75's index in this string is 1, 45 is 4, 35 is 7.
As you see, locate(greatest(disk0_usedpc ,disk1_usedpc ,disk2_usedpc), concat_ws(',' ,disk0_usedpc ,disk1_usedpc ,disk2_usedpc)) will return 1, so the greatest row is disk0_usedpc, here it makes.

Related

How to assign to each row a number of times a value appears in the whole table?

I'm trying to run an SQL query on Vertica but I can't find a way to get the results I need.
Let's say I have a table showing:
productID
campaignID (ID of the sales campaign)
calendarYearWeek (calendar week when the campaign was active [usually they're active for 5 days)
countryOrigin (in which country was the product sold, as it's international sales)
valueLocal (price in local currency)
What I need to do is to find products sold in different countries and compare their prices between markets.
Sometimes the campaigns are available only in one country, sometimes in more, so to avoid having hundreds of thousands of unnecessary rows that I can't compare to others, I want to distill only those products that were available in more than 1 countryOrigin.
What's important - a product can be available in different campaigns with a different price.
That's why in my SELECT statement I added a new column:
calendarYearWeek||productID||campaignID AS uniqueItem - that way I know that I'm checking the price only for a specific product in a specific campaign during a specific week of year.
The table is also joined with another table to get exchange rates etc., so it's also GROUPed BY, so in each row I have a price and average exchange rate for a given uniqueItem in a specific country.
If I run this query, it works but even just for this year it gives me several million results, most of which I don't need because these are products sold only in one country and I need to compare prices across different markets.
So what I thought I need is to assign to each row a number of times a uniqueItem value appears in the whole table. If it's 1 - then the product is sold only in one country and I don't have to care about it. If it's 2 or 3 - this is what I need. Then I can filter out the unnecessary results in the WHERE clause ( > 1) and I can work on a smaller, better data set.
I tried different combinations of COUNT, I tried row_number + OVER(PARTITION BY) (works only partially, as when a product is available in 2 or more countries it counts the rows, but still I cannot filter out "1" because then I'll lose the "first" country on the list). I thought about MATCH_RECOGNIZED, but I've never used it before and I think it's not available in Vertica.
Sorry if it's messy, but I'm not really advanced in SQL and English is not my native language.
Do you have any ideas how to get only the data I need?
What I have now is:
SELECT
a.originCountry,
a.calendarYearWeek,
a.productID,
a.campaignId,
a.valueLocal,
ROUND(AVG(b.exchange_rate),4),
a.calendarYearWeek||a.productID||a.campaignID AS uniqueItem
FROM table1 a
LEFT JOIN table2 b
ON a.reportDate = b.reportDate
AND a.originCountry = b.originCountry
WHERE a.originCountry IN ('ES', 'DE', 'FR')
GROUP BY 3, 4, 7, 1, 5, 2
ORDER BY 3, 4, 1
----------
I need some sample data - so I make up a few rows.
You need to find the identifying grouping columns of those combinations that occur more than once in a sub select or a common table expression, to join with table1.
You need to formulate the average as an OLAP function if you want the country back in the report.
WITH
-- input, don't use in final query ..
table1(originCountry,calendarYearWeek,productID,campaignId,valuelocal,reportDate) AS (
SELECT 'ES',202203,43,142,100.50, DATE '2022-01-19'
UNION ALL SELECT 'DE',202203,43,142,135.00, DATE '2022-01-19'
UNION ALL SELECT 'FR',202203,43,142, 98.75, DATE '2022-01-19'
UNION ALL SELECT 'ES',202203,44,147,198.75, DATE '2022-01-19'
UNION ALL SELECT 'DE',202203,44,147,205.00, DATE '2022-01-19'
UNION ALL SELECT 'FR',202203,44,147,198.75, DATE '2022-01-19'
UNION ALL SELECT 'es',202203,49,150, 1.25, DATE '2022-01-19'
)
,
table2(originCountry,reportDate,exchange_rate) AS (
SELECT 'ES',DATE '2022-01-19', 1
UNION ALL SELECT 'DE',DATE '2022-01-19', 1
UNION ALL SELECT 'FR',DATE '2022-01-19', 1
)
-- end of input; real query starts here, replace following comma with "WITH" ..
,
-- you need the unique ident grouping values to join with ..
selgrp AS (
SELECT
a.calendarYearWeek
, a.productID
, a.campaignId
FROM table1 a
GROUP BY
a.calendarYearWeek
, a.productID
, a.campaignId
HAVING COUNT(*) > 1
-- chk calendarYearWeek | productID | campaignId
-- chk ------------------+--------+--------
-- chk 202203 | 43 | 142
-- chk 202203 | 44 | 147
)
SELECT
a.originCountry
, a.calendarYearWeek
, a.productID
, a.campaignId
, a.valueLocal
, AVG(b.exchange_rate) OVER w::NUMERIC(9,4) AS avg_exch_rate
-- a.calendarYearWeek||a.productID||a.campaignID AS uniqueItem
FROM table1 a
JOIN selgrp USING(calendarYearWeek,productID,campaignId)
LEFT JOIN table2 b
ON a.reportDate = b.reportDate
AND a.originCountry = b.originCountry
WHERE UPPER(a.originCountry) IN ('ES', 'DE', 'FR')
WINDOW w AS (PARTITION BY a.calendarYearWeek,a.productID,a.campaignID)
ORDER BY 3, 4, 1
-- out originCountry | calendarYearWeek | productID | campaignId | valueLocal | avg_exch_rate
-- out ---------------+------------------+-----------+------------+------------+---------------
-- out DE | 202203 | 43 | 142 | 135.00 | 1.0000
-- out ES | 202203 | 43 | 142 | 100.50 | 1.0000
-- out FR | 202203 | 43 | 142 | 98.75 | 1.0000
-- out DE | 202203 | 44 | 147 | 205.00 | 1.0000
-- out ES | 202203 | 44 | 147 | 198.75 | 1.0000
-- out FR | 202203 | 44 | 147 | 198.75 | 1.0000

Mysql: Subtraction between rows and sum with other table

I have two tables, both with a Time column as timestamp type which is filled by default when the row is created: Table1 is updated approximately every 10 seconds:
Time | Val_1a | Val_2a | Val_3a
2021-11-06 13:59:53 | 15 | 10 | 35
2021-11-06 14:00:02 | 12 | 15 | 34
.................
2021-11-06 14:05:25 | 11 | 13 | 35
2021-11-06 14:05:35 | 11 | 17 | 36
Table2 is updated every hour after mathematical operations on table1:
Time | Var_1b | Var_2b | Var_3b
2021-11-06 11:00:00 | 2 | 15 | 30
2021-11-06 12:00:00 | 8 | 12 | 32
2021-11-06 13:00:00 | 12 | 11 | 35
What I would like to get but I'm not able to do in any way, is:
Check that the last table1.Val_2a value is greater than the first table1.Val_2a value written at the beginning of the current hour (with the tables above, check if 17 > 15). If this condition is not met, the entire query must return 0 otherwise:
2a) If the last row in table2 refers to the previous day, then the query result is simply the difference of the two table1.Val_2a values (17 - 15 = 2)
2b) Otherwise their difference is calculated as at point 2a (17-15 = 2) and it is added to the table2.Var_1b value (2 + 12 = 14)
I hope I was able to explain it in a clearly way, and that it all is possible with a single query. Thanks everyone for the support
Sorry, if I add an Answer but I couldn't add the image into the comment.
This is the qwery I used to test the CASE clause
SELECT t1.dtm, t1.Val_2a2, t1.Val_2a1,
CASE WHEN Val_2a2 > Val_2a1
THEN Val_2a2-Val_2a1 ELSE 0 END AS ValF FROM (SELECT DATE_FORMAT(time, '%Y-%m-%d %H:00:00') dtm,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time),',',1) Val_2a1,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time DESC),',',1) Val_2a2 FROM table1 GROUP BY dtm) t1
and this is the unexpected result
Qwery result
It is possible in a single query but different people will have different method of doing it. Whatever the method is, I personally think that the most important part is to keep the logic intact. The details you've provided in your question got me assuming that this might be a kind of query you're looking for:
SELECT t1.dtm, t1.Val_2a2, t1.Val_2a1, t2.Val_1b2,
CASE WHEN Val_2a2 > Val_2a1
THEN Val_2a2-Val_2a1+Val_1b2 ELSE 0 END AS ValF
FROM
(SELECT DATE_FORMAT(time, '%Y-%m-%d %H:00:00') dtm,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time),',',1) Val_2a1 ,
SUBSTRING_INDEX(GROUP_CONCAT(Val_2a ORDER BY time DESC),',',1) Val_2a2
FROM table1
GROUP BY dtm) t1
LEFT JOIN
(SELECT DATE(time) dtm,
SUBSTRING_INDEX(GROUP_CONCAT(Val_1b ORDER BY time DESC),',',1) Val_1b2
FROM table2
GROUP BY dtm) t2
ON DATE(t1.dtm)=t2.dtm;
Demo fiddle
hoping it can help someone else, after some more test this is the final qwery I got, considering I just need a value on the fly without needing of storing it.
Of course every consideration by the experts is more than appreciate.
Thanks to all
SELECT
CASE WHEN
(ABS(t1.Val_2a2) - ABS(t1.Val_2a1)) BETWEEN 0 AND 30
THEN t1.Val_2a2-t1.Val_2a1+t2.Val_1b2
ELSE t2.Val_1b2
END AS My_result
FROM
(SELECT DATE_FORMAT(Time, '%Y-%m-%d %H:00:00') dtm,
(SELECT Val_2a FROM table1 WHERE Time >= DATE_FORMAT(NOW(),"%Y-%m-%d %H:00:00") ORDER BY Time LIMIT 1) Val_2a1,
(SELECT Val_2a FROM table1 WHERE Time >= DATE_FORMAT(NOW(),"%Y-%m-%d %H:00:00") ORDER BY Time DESC LIMIT 1) Val_2a2
FROM table1
GROUP BY dtm
ORDER BY Time DESC LIMIT 1) t1
LEFT JOIN
(SELECT (Time) dtm,
(Val_1b) Val_1b2
FROM table2
GROUP BY dtm ORDER BY dtm DESC LIMIT 1) t2
ON DATE(t1.dtm)= DATE(t2.dtm)

SQL Count events with duration per hour

I have data of an event with duration (say, eating a meal at a restaurant) and I want to know for any given hour how many events were taking place. The data looks like this:
Event | Start Time | End Time
-----------------------------------------
1 | 12:03 | 14:20
2 | 12:30 | 12:50
3 | 13:05 | 14:45
4 | 14:01 | 14:49
I also have "Duration" available as an alternative to "End Time". The result I'm looking for would be like this:
Hour | Count
-----------------------
12 | 2
13 | 2
14 | 3
During hour 12, there were two events happening (1 & 2), hour 13 also had two events (1 & 3) and hour 14 had three events (1, 3, & 4).
I can do this programmatically with a loop. I can count when the events start (or end) in SQL. But I'd really like to bridge the gap and do this in SQL, but I can't think of a way.
One possible solution (works with MySQL v5.6+ and SQLite3):
create table hours(Hour int);
insert into hours values
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),
(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
create table log(Event int,StartTime varchar(5),EndTime varchar(5));
insert into log values
(1,'12:03','14:20'),
(2,'12:30','12:50'),
(3,'13:05','14:45'),
(4,'14:01','14:49');
-- ------------------------------------------------------------------------------
select Hour,count(Event) Count
from log join hours
on Hour between substr(StartTime,1,2) and substr(EndTime,1,2)
group by Hour;
If you are running MySQL 8.0, you could use UNION ALL, window functions and aggregation, like so:
select hr, sum(sum(cnt)) over(order by hr) cnt
from (
select hour(start_time) hr, 1 cnt from mytable
union all select hour(end_time) + 1, -1 from mytable
) t
group by hr
Demo on DB Fiddle:
hr | cnt
-: | --:
12 | 2
13 | 2
14 | 3
15 | 0
If you do not have MySql 8, then create a table hour:
CREATE TABLE hour (
hr INT PRIMARY KEY
);
INSERT INTO hour(hr) VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
And then:
select h.hr, count(*) as cnt from hour h
join mytable m on h.hr between hour(m.Start_Time) and hour(m.End_Time)
group by hr
order by hr
;
See Db-Fiddle

configure query to bring rows which have more than 1 entries

How to get those entries which have more than 1 records?
If it doesn't make sense... let me explain:
From the below table I want to access the sum of the commission of all rows where type is joining and "they have more than 1 entry with same downmem_id".
I have this query but it doesn't consider more entries scenario...
$search = "SELECT sum(commission) as income FROM `$database`.`$memcom` where type='joining'";
Here's the table:
id mem_id commission downmem_id type time
2 1 3250 2 joining 2019-09-22 13:24:40
3 45 500 2 egbvegr new time
4 32 20 2 vnsjkdv other time
5 23 2222 2 vfdvfvf some other time
6 43 42 3 joining time
7 32 353 5 joining time
8 54 35 5 vsdvsdd time
Here's the expected result: it should be the sum of the id no 2, 7 only
ie. 3250+353=whatever.
It shouldn't include id no 6 because it has only 1 row with the same downmem_id.
Please help me to make this query.
Another approach is two levels of aggregation:
select sum(t.commission) income
from (select sum(case when type = 'joining' then commission end) as commission
from t
group by downmem_id
having count(*) > 1
) t;
The main advantage to this approach is that this more readily supports more complex conditions on the other members of each group -- such as at most one "joining" record or both "joining" records and no more than two "vnsjkdv" records.
Use EXISTS:
select sum(t.commission) income
from tablename t
where t.type = 'joining'
and exists (
select 1 from tablename
where id <> t.id and downmem_id = t.downmem_id
)
See the demo.
Results:
| income |
| ----- |
| 3603 |
You can use subquery that will find all downmem_id having more than one occurrence in the table.
SELECT Sum(commission) AS income
FROM tablename
WHERE type = 'joining'
AND downmem_id IN (SELECT downmem_id
FROM tablename t
GROUP BY downmem_id
HAVING Count(id) > 1);
DEMO

SQL SUM + Distinct

I want to know the request with which I displayed the sum of the amounts of the various clients that do not repeat with the SUM function and DISTINCT.
I used :
SELECT DISTINCT id_721z, SUM(montant) AS somme_montant
FROM `roue_ventes_cb`
WHERE `date_transaction` between '2015/01/01' and '2015/01/21';
But the result is not displayed correctly. I have this data:
id_721z | montant
1 | 15
1 | 15
2 | 22
2 | 22
2 | 22
I would like to show total_montant = 37 but not
id_721z | montant
1 | 30
2 | 66
SELECT SUM(montant) AS somme_montant
FROM (
SELECT DISTINCT id_721z, montant
FROM `roue_ventes_cb`
WHERE `date_transaction` between '2015/01/01' and '2015/01/21'
) AS t
This will sum all different montants. But if two ids have the same montant it will only count it once.
SELECT id_721z, SUM(DISTINCT montant) AS somme_montant
FROM `roue_ventes_cb`
WHERE `date_transaction` between '2015/01/01' and '2015/01/21';
So I will prefer emiros answer in any case. It is safer and distint will have a performance penalty anyway.