MySql: Select value of current and next row - mysql

I am trying to learn of a better way in achieving the desired result of a select query - details below, thank you in advance.
MySQL version: 5.7
Table:
id int(11)
product_number int(8)
service_group int (4)
datetime datetime
value int (6)
Indexes on all but value column.
MySql table has the following data:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,0
2,1234,1,2022-02-10 00:01:30,25
3,1234,1,2022-02-10 00:02:30,11
4,1234,2,2022-02-10 01:00:30,0
5,1234,2,2022-02-10 01:01:30,65
6,1234,2,2022-02-10 01:02:30,55
In essence, the value for each product within the service group is wrongly recorded, and the correct value for the "current" row is actually recorded against the next row for the product within the same service group - correct output should look like this:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,25
2,1234,1,2022-02-10 00:01:30,11
3,1234,1,2022-02-10 00:02:30,0
4,1234,2,2022-02-10 01:00:30,65
5,1234,2,2022-02-10 01:01:30,55
6,1234,2,2022-02-10 01:02:30,0
The below query is what seems to be hugely inefficient way of returning the correct results - what would be a better way to go about this in MySql? Thank you.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(
Select b.value FROM products b
Where b.product_number=a.product_number AND b.service_group=a.service_group
AND b.datetime>a.datetime
Order by b.datetime ASC
Limit 1
)
FROM products a```

If there's no skipped id (the number is in sequence) then you could probably use simple select like below
1.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id = a.id+1)
FROM products a
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
b.value
FROM products a
INNER JOIN products b ON b.id = a.id+1
Note that both SQL 1 and 2 is assuming your ID is primary key as I see that's an incrementing value
Either way you need to run an explain query so you could analyze which one is the most efficient one
And more importantly I suggest to update it if it's "wrongly recorded", you should put the your service on maintenance mode and do update+fix on the data using query
Edit:
based on your comment "Hi, Gunawan. Thank you for your suggestion. Unfortunately IDs will not be in sequences to support the proposed approach."
You could alter the subquery on (1) a bit to
Select b.value
FROM products b
Where b.id > a.id order by id asc limit 1
so it became
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id > a.id order by b.id asc limit 1)
FROM products a

I think what you need in THIS case is the Windows LEAD() function and can be found Here for example and clarification
In summary, LEAD() looks at the NEXT possible record for the given column in question, LAG() looks at the prior.
So in this example, I am asking for the LEAD() of the record (the next one in line) and getting the VALUE column of that record. The 1 represents how many records to skip ahead, in this case 1. The last parameter 0 is what should be returned if no such record exists.
The second half of the clause with the ORDER BY clause identifies how you want the records returned, in this case the datetime sequence.
I included both the value and the NEXTVALUE so you could see the results, but can remove the extra column once you see and understand how it works.
But since you have each service group to its own, and dont want to carry-over the value from another, you need the PARTITION clause as well. So I added that as an additional column... both so you can see how the results work with OR without it and the impact of the query you need.
select
t.id,
t.product_number,
t.service_group,
t.datetime,
t.value,
-- lead (t.value, 1, 0)
-- over (order by t.datetime) as NextValue,
-- without the ,1, 0 in the lead call as sample above
-- you can see the NULLs are not getting the value
-- from the next row when the service group changes.
lead (t.value)
over ( PARTITION BY
t.service_group
order by
t.datetime) as NextValuePerServiceGroup
from
Tmp1 t
order by
t.service_group,
t.datetime

Related

Find duplicate value in the table via sql

I have a table called stocktransfer
Rows truncated to keep it simple here.
as per the image the problem with the record is that there is duplicated transaction number across two different invoice number which is incorrect to context of how business logic is.
So duplicate transaction is expected as long as it is under the same invoice number.
I wrote this query but it does not help since the duplication is expected.
Select strefno,sttran,STDATE,count(sttran)
From postfrh
Group By sttran,strefno,STDATE
Having Count(sttran) >1
Order By sttran
Can anyone please help with how to write a logic to find duplicated transaction where invoice numbers are different two.
strefno > TransctionNumber
sttran > InvoiceNumber
STDATE > date
SELECT strefno,
sttran,
STDATE,
row_number ( )
OVER ( PARTITION BY strefno
ORDER BY STDATE ) AS `rowNumber`
FROM postfrh
WHERE strefno IN
(SELECT strefno
FROM postfrh
GROUP BY strefno
HAVING count( sttran ) > 1 )
ORDER BY strefno;
You are probably looking for something like this. I don't have the exact table so I cannot be sure.
select a.tnum
from postfrh as a
, postfrh as b
where a.tnum = b.tnum
and b.inum != a.inum
(tnum = transaction number, inum = invoice number)
There are several ways to approach the problem but the above query works by joining two instances of the table, the first condition in the where clause means that there will only be items with the same transaction number, the second statement filters out transactions that have the same transaction number and invoice number.

SQL: FOR Loop in SELECT query

Is there a way to go through a FOR LOOP in a SELECT-query? (1)
I am asking because I do not know how to commit in a single
SELECT-query collection of some data from table t_2 for each row of
table t_1 (please, see UPDATE for an example). Yes, it's true that we can GROUP BY a UNIQUE INDEX but
what if it's not present? Or how to request all rows from t_1, each concatenated with a specific related row from t_2. So, it seems like in a Perfect World we would have to be able to loop through a table by a proper SQL-command (R). Maybe, ANY(...) will help?
Here I've tried to find maximal count of repetitions in column prop among all values of the column in table t.
I.e. I've tried to carry out something alike Pandas'
t.groupby(prop).max() in an SQL query (Q1):
SELECT Max(C) FROM (SELECT Count(t_1.prop) AS C
FROM t AS t_1
WHERE t_1.prop = ANY (SELECT prop
FROM t AS t_2));
But it only throws the error:
Every derived table must have its own alias.
I don't understand this error. Why does it happen? (2)
Yes, we can implement Pandas' value_counts(...) way easier
by using SELECT prop, COUNT() GROUP BY prop. But I wanted to do it in a "looping" way staying in a "single non-grouping SELECT-query mode" for reason (R).
This sub-query, which attempts to imitate Pandas' t.value_counts(...)) (Q2):
SELECT Count(t_1.prop) AS C FROM t AS t_1 WHERE t_1.prop = ANY(SELECT prop FROM t AS t_2)
results in 6, which is simply a number of rows in t. The result is logical. The ANY-clause simply returned TRUE for every row and once all rows had been gathered COUNT(...) returned simply the number of the gathered (i.e. all) rows.
By the way, it seems to me that in the "full" previous SELECT-query (Q1) should return that very 6.
So, the main question is how to loop in such a query? Is there such
an opportunity?
UPDATE
The answer to the question (2) is found here, thanks to
Luuk. I just assigned an alias to the (...) subquery in SELECT Max(C) FROM (...) AS sq and it worked out. And of course, I got 6. So, the question (1) is still unclear.
I've also tried to do an iteration this way (Q3):
SELECT (SELECT prop_2 FROM t_2 WHERE t_2.prop_1 = t_1.prop) AS isq FROM t_1;
Here in t_2 prop_2 is connected to prop_1 (a.k.a. prop in t_1) as many to one. So, along the course, our isq (inner select query) returns several (rows of) prop_2 values per each prop value in t_1.
And that is why (Q3) throws the error:
Subquery returns more than 1 row.
Again, logical. So, I couldn't create a loop in a single non-grouping SELECT-query.
This query will return the value for b with the highest count:
SELECT b, count(*)
FROM table1
GROUP BY b
ORDER BY count(*) DESC
LIMIT 1;
see: DBFIDDLE
EDIT: Without GROUP BY
SELECT b,C1
FROM (
SELECT
b,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A) C1,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A DESC) C2
FROM table1
) x
WHERE x.C2=1
see: DBFIDDLE

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

MySQL - DATEDIFF LOOKUP previous row (same ID)

I'm trying to create an extra column on my SQL where I could identify if the user_id generated a subsequent (or even third) row.
And if so, calculate the datediff between the connection time and previous row disconnection time.
If no duplicate user_id, the response should be NULL.
Here's a screenshot of my data and notes:
I tried the DATEDIFF formula but no success.
Could someone help me on this? I really would appreciate any input.
SELECT id,
user_id,
connected_at,
disconnected_at,
IFNULL(DateDiff('second', Lookup(disconnected_at), -1), connected_at)
FROM data
ORDER BY id, user_id, connected_at
When comparing values in different rows, you need to join the table to itself to get the second row. Try this:
SELECT
a.id,
a.user_id,
a.connected_at,
a.disconnected_at,
DateDiff('second',b.`connected_at`,a.`disconnected_at`) as `time_diff`
FROM `data` a
JOIN `data` b
ON a.`user_id` = b.`user_id` AND b.`connected_at` > a.`disconnected_at`

Selecting the second column in a complex mysql query

I'm trying to create a query for reporting purposes and I'm failing when i try to select the date (second column) from multiple joins (selects).
The query is something like this:
SELECT a.Name, COALESCE(a.CustomerFaxes,0) AS 'CustomerFaxes', COALESCE(b.PoliceFaxes,0) AS 'PoliceFaxes', COALESCE(c.CustomerWebposts,0)AS 'CustomerWebposts', COALESCE(d.PoliceWebposts,0) AS 'PoliceWebposts', COALESCE(e.Letters,0) AS 'Letters'
FROM jobs j
LEFT JOIN (SELECT ...
GROUP BY 1,2) a ON ...
LEFT JOIN (SELECT ...
GROUP BY 1,2) b ON ...
LEFT JOIN (SELECT ...
GROUP BY 1,2) c ON ...
LEFT JOIN (SELECT ...
GROUP BY 1,2) d ON ...
LEFT JOIN (SELECT ...
GROUP BY 1,2) e ON ..
JOIN table cu ON ...
JOIN table2 a ON ...
WHERE j.UserID IN (1,2,3,4,5,6,7,8,9)
AND j.receivedontime BETWEEN UNIX_TIMESTAMP (20150302) AND UNIX_TIMESTAMP (20150310)
GROUP BY 1;
This results in something like
User CustomerFaxes PoliceFaxes CustomerWebposts PoliceWebposts Letters
There are a lot of conditions in between. All sub queries have the second value as a date, i want to select it and group by it.
Any of you know a way?
A few things
First, this is going to cause you a headache
COALESCE(a.value2,0) AS 'value2', COALESCE(b.value2,0) AS 'value2'
You are trying to give two different columns in the same result set the same name.
Based on your comments, it sounds like what you really want is to get all date values in one column, using the date field from a first, then b, then c. You can do this by plugging them all into a single coalesce statement.
COALESCE(a.date,b.date,c.date) as Date
Second, you should probably know that column order is not particularly significant in SQL. You never select columns by column number. Naming your columns with an idex is confusing (what if someone reorders your query?) and doesn't help other users figure out what your query is trying to do. In the long run, readability matters more than anything else. You don't want to come back 6 months later and wonder what the heck your query was supposed to do.
Third, don't use date as a column name. Date is a resevered keyword in most implementations of SQL. Use a more informative name like 'DateRecieved' or 'DateOpened' or 'DateOfBirth' or 'TheDateTheEarthStoodStill'.
Just use column names rather than numbers, and then select it from the relevant subquery - it's as simple as that.
For example -
SELECT c.Date AS Date
or
SELECT d.Date AS Date
whichever subquery you want the Date from, just use that.