MySQL Select delete any element except for first one per day - mysql

my problem ist the following one:
I have a database which receives reports from a server and saves the data into the report table:
enter image description here
and I want to delete and select every report which are made on the same day, except for the first one.
I've already tried to select the reports, which are on the same day:
WITH res as (
select
cis_anlagen.name as plant,
ReportTimestamp,
LAG(ReportTimestamp, 1) OVER (
partition by cis_anlagen.name
ORDER BY ReportTimestamp
) prevTime
from reports
inner join hosts_to_apps using (HostToAppId)
join hosts using(HostId)
Left join cis_anlagen on hosts.anId = cis_anlagen.anId )
select
plant,
ReportTimestamp,
prevTime
from res
where DATEDIFF(ReportTimestamp, prevTime) = 0;
this gives me any report made on the same they, but I still need to exclude the first one.

I want to delete ... every report which are made on the same day, except for the first one
DELETE t1
FROM reports t1
JOIN reports t2 ON t1.HostToAppId = t2.HostToAppId
AND DATE(t1.ReportTimestamp) = DATE(t2.ReportTimestamp)
WHERE t1.ReportTimestamp > t2.ReportTimestamp
I.e. we delete the row when the row with the same HostToAppId and DATE but greater ReportTimestamp exitst.
If there exists 2 or more rows for the same HostToAppId with absolutely the same (and minimal within this day) ReportTimestamp then all of them will be stored.
I want to ... select every report which are made on the same day, except for the first one.
SELECT t1.*
FROM reports t1
JOIN reports t2 ON t1.HostToAppId = t2.HostToAppId
AND DATE(t1.ReportTimestamp) = DATE(t2.ReportTimestamp)
WHERE t1.ReportTimestamp > t2.ReportTimestamp

Instead of using LAG, you could use row_number.
If you use row number partitioned by date (not the whole timestamp, but just the day) and the HostToAppID (Which I am assuming that is an unique identifier of some kind) you would have the the reports numbered for HostToAppID and for day, which would allow you to exclude anything where the row number is not 1.
I can't test it right now but I would go with something like this:
WITH res as (
select
cis_anlagen.name as plant,
ReportTimestamp,
row_number(ReportTimestamp) OVER (
partition by cis_anlagen.name, date(ReportTimestamp)
ORDER BY ReportTimestamp
) rn
from reports
inner join hosts_to_apps using (HostToAppId)
join hosts using(HostId)
Left join cis_anlagen on hosts.anId = cis_anlagen.anId )
select
plant,
ReportTimestamp,
prevTime from res where rn <> 1;

Why all the joins? You want to delete everything except the oldest row per day, so delete all rows where exists a row for the same day, but earlier time:
delete from reports
where exists
(
select null
from (select * from reports) older
where date(older.reporttimestamp) = date(reports.reporttimestamp)
and older.reporttimestamp < reports.reporttimestamp
);
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c8a941f3813ae60e14e32dadef46b361
You may wonder about (select * from reports) older. This is because of a MySQL peculiarity that forbids to select from the same table directly that you are deleting from. In other DBMS that would simply be from reports older.

Related

How to aggregate SQL query and display result in a different column?

I've a MY-SQL query which is pulling a set of records from database. I want to aggregate slightly different way to use in my application. When duplicate rows present in record set with same ticker value query will sum up est_units and est_trans_value and display in new columns as total_est_units and total_est_trans_value. If there is no duplicate with same ticker value it should display total_est_units as est_units and total_est_trans_value as est_trans_value. How can I do this -- Can you please help to modify this query?
SQL:
SELECT
oc.*
FROM
order_confirm_daily oc
INNER JOIN
(SELECT
id, ticker, MAX(est_order_time) AS mts
FROM
order_confirm_daily
WHERE DATE(est_order_time) LIKE '2021-04-26%'
GROUP BY ticker) ds ON ds.ticker = oc.ticker
AND oc.est_order_time = ds.mts;
Sample Data:
desired results: Added two new derived column "total_est_units" and "Total_est_trans_value" which will display Sum of est_units and est_trans_value respectively only when multiple rows present with same ticker -- here it is "TNA" highlighted in screen shot.
I see. You just want window functions:
select oc.*,
sum(est_units) over (partition by ticker) as total_est_units,
sum(est_trans_value) over (partition by ticker) as total_est_trans_value
from order_confirm_daily oc;
EDIT:
In older versions of MySQL, you would use JOIN and GROUP BY:
select *
from order_confirm_daily oc join
(select ticker, sum(est_units) as total_est_units,
sum(est_trans_value) as total_est_trans_value
from order_confirm_daily oc
group by ticker
) oct
using (ticker);

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

I have below query in mysql where I want to check if branch id and year of finance type from branch_master are equal with branch id and year of manager then update status in manager table against branch id in manager
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
(
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)
)
)
but getting error
Table 'm1' is specified twice, both as a target for 'UPDATE' and as a
separate source for data
This is a typical MySQL thing and can usually be circumvented by selecting from the table derived, i.e. instead of
FROM manager AS m2
use
FROM (select * from manager) AS m2
The complete statement:
UPDATE manager
SET status = 'Y'
WHERE branch_id IN
(
select branch_id
FROM (select * from manager) AS m2
WHERE (branch_id, year) IN
(
SELECT branch_id, year
FROM branch_master
WHERE type = 'finance'
)
);
The correct answer is in this SO post.
The problem with here accepted answer is - as was already mentioned multiple times - creating a full copy of the whole table. This is way far from optimal and the most space complex one. The idea is to materialize the subset of data used for update only, so in your case it would be like this:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT * FROM(
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance')
) t
)
Basically you just encapsulate your previous source for data query inside of
SELECT * FROM (...) t
Try to use the EXISTS operator:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE EXISTS (SELECT 1
FROM (SELECT m2.branch_id
FROM branch_master AS bm
JOIN manager AS m2
WHERE bm.type = 'finance' AND
bm.branch_id = m2.branch_id AND
bm.year = m2.year) AS t
WHERE t.branch_id = m1.branch_id);
Note: The query uses an additional nesting level, as proposed by #Thorsten, as a means to circumvent the Table is specified twice error.
Demo here
Try :::
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
(SELECT DISTINCT branch_id
FROM branch_master
WHERE type = 'finance'))
AND m1.year IN ((SELECT DISTINCT year
FROM branch_master
WHERE type = 'finance'))
The problem I had with the accepted answer is that create a copy of the whole table, and for me wasn't an option, I tried to execute it but after several hours I had to cancel it.
A very fast way if you have a huge amount of data is create a temporary table:
Create TMP table
CREATE TEMPORARY TABLE tmp_manager
(branch_id bigint auto_increment primary key,
year datetime null);
Populate TMP table
insert into tmp_manager (branch_id, year)
select branch_id, year
from manager;
Update with join
UPDATE manager as m, tmp_manager as tmp_m
inner JOIN manager as man on tmp_m.branch_id = man.branch_id
SET status = 'Y'
WHERE m.branch_id = tmp_m.branch_id and m.year = tmp_m.year and m.type = 'finance';
This is by far the fastest way:
UPDATE manager m
INNER JOIN branch_master b on m.branch_id=b.branch_id AND m.year=b.year
SET m.status='Y'
WHERE b.type='finance'
Note that if it is a 1:n relationship the SET command will be run more than once. In this case that is no problem. But if you have something like "SET price=price+5" you cannot use this construction.
Maybe not a solution, but some thoughts about why it doesn't work in the first place:
Reading data from a table and also writing data into that same table is somewhat an ill-defined task. In what order should the data be read and written? Should newly written data be considered when reading it back from the same table? MySQL refusing to execute this isn't just because of a limitation, it's because it's not a well-defined task.
The solutions involving SELECT ... FROM (SELECT * FROM table) AS tmp just dump the entire content of a table into a temporary table, which can then be used in any further outer queries, like for example an update query. This forces the order of operations to be: Select everything first into a temporary table and then use that data (instead of the data from the original table) to do the updates.
However if the table involved is large, then this temporary copying is going to be incredibly slow. No indexes will ever speed up SELECT * FROM table.
I might have a slow day today... but isn't the original query identical to this one, which souldn't have any problems?
UPDATE manager as m1
SET m1.status = 'Y'
WHERE (m1.branch_id, m1.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)

Getting previous row in MySQL

I'm stucked in a MySQL problem that I was not able to find a solution yet. I have the following query that brings to me the month-year and the number new users of each period in my platform:
select
u.period ,
u.count_new as new_users
from
(select DATE_FORMAT(u.registration_date,'%Y-%m') as period, count(distinct u.id) as count_new from users u group by DATE_FORMAT(u.registration_date,'%Y-%m')) u
order by period desc;
The result is the table:
period,new_users
2016-10,103699
2016-09,149001
2016-08,169841
2016-07,150672
2016-06,148920
2016-05,160206
2016-04,147715
2016-03,173394
2016-02,157743
2016-01,173013
So, I need to calculate for each month-year the difference between the period and the last month-year. I need a result table like this:
period,new_users
2016-10,calculate(103699 - 149001)
2016-09,calculate(149001- 169841)
2016-08,calculate(169841- 150672)
2016-07,So on...
2016-06,...
2016-05,...
2016-04,...
2016-03,...
2016-02,...
2016-01,...
Any ideas: =/
Thankss
You should be able to use a similar approach as I posted in another S/O question. You are on a good track to start. You have your inner query get the counts and have it ordered in the final direction you need. By using inline mysql variables, you can have a holding column of the previous record's value, then use that as computation base for the next result, then set the variable to the new balance to be used for each subsequent cycle.
The JOIN to the SqlVars alias does not have any "ON" condition as the SqlVars would only return a single row anyhow and would not result in any Cartesian product.
select
u.period,
if( #prevCount = -1, 0, u.count_new - #prevCount ) as new_users,
#prevCount := new_users as HoldColumnForNextCycle
from
( select
DATE_FORMAT(u.registration_date,'%Y-%m') as period,
count(distinct u.id) as count_new
from
users u
group by
DATE_FORMAT(u.registration_date,'%Y-%m') ) u
JOIN ( select #prevCount := -1 ) as SqlVars
order by
u.period desc;
You may have to play with it a little as there is no "starting" point in counts, so the first entry in either sorted direction may look strange. I am starting the "#prevCount" variable as -1. So the first record processed gets a new user count of 0 into the "new_users" column. THEN, whatever was the distinct new user count was for the record, I then assign back to the #prevCount as the basis for all subsequent records being processed. yes, it is an extra column in the result set that can be ignored, but is needed. Again, it is just a per-line place-holder and you can see in the result query how it gets its value as each line progresses...
I would create a temp table with two columns and then fill it using a cursor that
does something like this (don't remember the exact syntax - so this is just a pseudo-code):
#val = CURSOR.col2 - (select col2 from OriginalTable t2 where (t2.Period = (CURSOR.Period-1) )))
INSERT tmpTable (Period, NewUsers) Values ( CURSOR.Period, #val)

MySQL How can I select the time difference between two log entries over a set?

I have a large set of log entries that I need to do an analysis on. What I want to do is select the time difference between the start and the complete entry, for each set of logs. How would I go about doing this?
Sample Log Set:
ExecuteTime, Entry, TrackKey
7.2408730984, Start update, 487996b0608b0006bfbe5501217c76bde879b728
9.9038851261, Complete update, 487996b0608b0006bfbe5501217c76bde879b728
I assume start time is minimum and end time is maximum. If it isnt like that, this query will wrong
select TrackKey, (MAX(ExecuteTime) - MIN(ExecuteTime)) as time_difference
FROM table
WHERE Entry IN('Start update', 'Complete update')
GROUP BY TrackKey
Untested :
SELECT * FROM Table AS t1 LEFT JOIN (SELECT MIN(Sample) AS Min, MAX(Sample) AS Max FROM Table) AS t2 ON t1.TrackKey = t2.TrackKey
Updated :
SELECT
(t3.End - t2.Start) AS Total
FROM Table AS t1
LEFT JOIN
(
SELECT
Sample AS Start
FROM
Table
WHERE
Entry = "Start update"
) AS t2 ON t1.TrackKey = t2.TrackKey
LEFT JOIN
(
SELECT
Sample AS End
FROM
Table
WHERE
Entry = "Complete update"
) AS t3 ON t1.TrackKey = t3.TrackKey

Mysql, how to change the following query to fech each row of a table?

I have an event occurring once a day. I have 2 tables:
application
rating
Basically, each application has an avg_score that is given by the average of all the feedbacks given by users that are stored in the table rating in the field score. I wrote an event that once a day refresh this value:
CREATE EVENT MY_DAILY_UPDATE
ON SCHEDULE EVERY 1 DAY STARTS '2011-07-23 23:30:00'
DO
UPDATE application
SET `avg_score`= (SELECT AVG(`score`) as new_score
FROM `rating`
WHERE `ID_APPLICATION` = 1)
WHERE `APPLICATION_ID` = 1
It works, but only for the application with ID = 1, cause i wrote it by myself.
Instead i need my query to update the field avg_score for each application in the table application.
So i think i need to change the value 1 with a variable ID (ex WHERE APPLICATION_ID = ID_VARIABLE).......and this variable should take the id value of each app in the application table (1,2,3.....4 etc).......but i have no idea about how to change my query.....
Change your sub-query to referrence the values in the outer query. (This makes it a correlated sub-query.)
UPDATE application
SET avg_score = (
SELECT AVG(score)
FROM rating
WHERE ID_APPLICATION = application.APPLICATION_ID
)
Alternatively, as you're doing this for "all values", just join on the sub-query...
UPDATE
application
INNER JOIN
(
SELECT ID_APPLICATION, AVG(score) AS score FROM rating GROUP BY ID_APPLICATION
)
AS averages
ON averages.ID_APPLICAITON = application.APPLICATION_ID
SET
application.avg_score = averages.score
UPDATE application
SET `avg_score`=
(SELECT AVG(`score`) as new_score
FROM `rating`
WHERE `ID_APPLICATION` = `application.APPLICATION_ID`)