MySQL - Get previous row with a same identifier - mysql

I need help in constructing an MySQL Statement where I need to find previous rows in the same table.
My data looks like this:
history_id (auto increment), object_id (exists multiple times), timestamp, ...
example:
1, 2593, 2018-08-07 09:37:21
2, 2593, 2018-08-07 09:52:54
3, 15, 2018-08-07 10:41:15
4, 2593, 2018-08-07 09:57:36
Some properties of this data:
the higher the auto increment gets the later the timestamp is for the same object id
it is possible that there is only one row for one object_id at all
the combination of object_id and timestamp is always unique, no duplicates are possible
For every row I need to find the most previous row with the same object_id.
I found this post: https://dba.stackexchange.com/questions/24014/how-do-i-get-the-current-and-next-greater-value-in-one-select and worked through the examples but I was not able to solve my problem.
I just tested around a bit and got to this point:
SELECT
i1.history_id,
i1.object_id,
i1.timestamp AS state_time,
i2.timestamp AS previous_time
FROM
history AS i1
LEFT JOIN (
select timestamp as timestamp,history_id as history_id,object_id as object_id
from history
group by object_id
) AS i2 on i2.object_id = i1.object_id and i2.history_id < i1.history_id
Now I only need to cut of the subquery that I only get the highest value of history_id for each row but its not working when I use limit 1, because then I will get only one value at all.
Do you have any Idea on how to solve this problem? Or you may have better and more efficient techniques?
Performance is a point here because I have 3.1 million rows growing higher..
Thank you!

The best direction is to use window function. Simple lag(timestamp) would do the job with proper order by clause. See here: https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag
But if all You need is
to cut of the subquery that I only get the highest value of history_id for each row but its not working when I use limit 1
Then change subquery from
select timestamp as timestamp,history_id as history_id,object_id as object_id
from history
group by object_id
to
select object_id as object_id, MAX(history_id) as history_id, MAX(timestamp) as timestamp
from history
group by object_id
In general You should not SELECT more columns, than You have in GROUP BY clause, unless they are enclosed with aggregate function.

Related

SQL: FOR Loop in SELECT query

Is there a way to go through a FOR LOOP in a SELECT-query? (1)
I am asking because I do not know how to commit in a single
SELECT-query collection of some data from table t_2 for each row of
table t_1 (please, see UPDATE for an example). Yes, it's true that we can GROUP BY a UNIQUE INDEX but
what if it's not present? Or how to request all rows from t_1, each concatenated with a specific related row from t_2. So, it seems like in a Perfect World we would have to be able to loop through a table by a proper SQL-command (R). Maybe, ANY(...) will help?
Here I've tried to find maximal count of repetitions in column prop among all values of the column in table t.
I.e. I've tried to carry out something alike Pandas'
t.groupby(prop).max() in an SQL query (Q1):
SELECT Max(C) FROM (SELECT Count(t_1.prop) AS C
FROM t AS t_1
WHERE t_1.prop = ANY (SELECT prop
FROM t AS t_2));
But it only throws the error:
Every derived table must have its own alias.
I don't understand this error. Why does it happen? (2)
Yes, we can implement Pandas' value_counts(...) way easier
by using SELECT prop, COUNT() GROUP BY prop. But I wanted to do it in a "looping" way staying in a "single non-grouping SELECT-query mode" for reason (R).
This sub-query, which attempts to imitate Pandas' t.value_counts(...)) (Q2):
SELECT Count(t_1.prop) AS C FROM t AS t_1 WHERE t_1.prop = ANY(SELECT prop FROM t AS t_2)
results in 6, which is simply a number of rows in t. The result is logical. The ANY-clause simply returned TRUE for every row and once all rows had been gathered COUNT(...) returned simply the number of the gathered (i.e. all) rows.
By the way, it seems to me that in the "full" previous SELECT-query (Q1) should return that very 6.
So, the main question is how to loop in such a query? Is there such
an opportunity?
UPDATE
The answer to the question (2) is found here, thanks to
Luuk. I just assigned an alias to the (...) subquery in SELECT Max(C) FROM (...) AS sq and it worked out. And of course, I got 6. So, the question (1) is still unclear.
I've also tried to do an iteration this way (Q3):
SELECT (SELECT prop_2 FROM t_2 WHERE t_2.prop_1 = t_1.prop) AS isq FROM t_1;
Here in t_2 prop_2 is connected to prop_1 (a.k.a. prop in t_1) as many to one. So, along the course, our isq (inner select query) returns several (rows of) prop_2 values per each prop value in t_1.
And that is why (Q3) throws the error:
Subquery returns more than 1 row.
Again, logical. So, I couldn't create a loop in a single non-grouping SELECT-query.
This query will return the value for b with the highest count:
SELECT b, count(*)
FROM table1
GROUP BY b
ORDER BY count(*) DESC
LIMIT 1;
see: DBFIDDLE
EDIT: Without GROUP BY
SELECT b,C1
FROM (
SELECT
b,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A) C1,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A DESC) C2
FROM table1
) x
WHERE x.C2=1
see: DBFIDDLE

MySql: Select value of current and next row

I am trying to learn of a better way in achieving the desired result of a select query - details below, thank you in advance.
MySQL version: 5.7
Table:
id int(11)
product_number int(8)
service_group int (4)
datetime datetime
value int (6)
Indexes on all but value column.
MySql table has the following data:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,0
2,1234,1,2022-02-10 00:01:30,25
3,1234,1,2022-02-10 00:02:30,11
4,1234,2,2022-02-10 01:00:30,0
5,1234,2,2022-02-10 01:01:30,65
6,1234,2,2022-02-10 01:02:30,55
In essence, the value for each product within the service group is wrongly recorded, and the correct value for the "current" row is actually recorded against the next row for the product within the same service group - correct output should look like this:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,25
2,1234,1,2022-02-10 00:01:30,11
3,1234,1,2022-02-10 00:02:30,0
4,1234,2,2022-02-10 01:00:30,65
5,1234,2,2022-02-10 01:01:30,55
6,1234,2,2022-02-10 01:02:30,0
The below query is what seems to be hugely inefficient way of returning the correct results - what would be a better way to go about this in MySql? Thank you.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(
Select b.value FROM products b
Where b.product_number=a.product_number AND b.service_group=a.service_group
AND b.datetime>a.datetime
Order by b.datetime ASC
Limit 1
)
FROM products a```
If there's no skipped id (the number is in sequence) then you could probably use simple select like below
1.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id = a.id+1)
FROM products a
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
b.value
FROM products a
INNER JOIN products b ON b.id = a.id+1
Note that both SQL 1 and 2 is assuming your ID is primary key as I see that's an incrementing value
Either way you need to run an explain query so you could analyze which one is the most efficient one
And more importantly I suggest to update it if it's "wrongly recorded", you should put the your service on maintenance mode and do update+fix on the data using query
Edit:
based on your comment "Hi, Gunawan. Thank you for your suggestion. Unfortunately IDs will not be in sequences to support the proposed approach."
You could alter the subquery on (1) a bit to
Select b.value
FROM products b
Where b.id > a.id order by id asc limit 1
so it became
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id > a.id order by b.id asc limit 1)
FROM products a
I think what you need in THIS case is the Windows LEAD() function and can be found Here for example and clarification
In summary, LEAD() looks at the NEXT possible record for the given column in question, LAG() looks at the prior.
So in this example, I am asking for the LEAD() of the record (the next one in line) and getting the VALUE column of that record. The 1 represents how many records to skip ahead, in this case 1. The last parameter 0 is what should be returned if no such record exists.
The second half of the clause with the ORDER BY clause identifies how you want the records returned, in this case the datetime sequence.
I included both the value and the NEXTVALUE so you could see the results, but can remove the extra column once you see and understand how it works.
But since you have each service group to its own, and dont want to carry-over the value from another, you need the PARTITION clause as well. So I added that as an additional column... both so you can see how the results work with OR without it and the impact of the query you need.
select
t.id,
t.product_number,
t.service_group,
t.datetime,
t.value,
-- lead (t.value, 1, 0)
-- over (order by t.datetime) as NextValue,
-- without the ,1, 0 in the lead call as sample above
-- you can see the NULLs are not getting the value
-- from the next row when the service group changes.
lead (t.value)
over ( PARTITION BY
t.service_group
order by
t.datetime) as NextValuePerServiceGroup
from
Tmp1 t
order by
t.service_group,
t.datetime

MySQL: Ensure consecutiveness of numbers in a column

I need some help for a MySQL query.
Let's assume we have a table with the columns "id", "unique_id" and "consecutive_id". The numbers in column "id" are NOT always consecutive, while we have to keep consecutiveness in column "consecutive_id". Basically every row should get its own consecutive number, but sometimes there may occur rows that should share the same consecutive number. Such rows have the same value in column "unique_id". I need a query to find the first ID of a row that has more than one row with the same consecutive ID and a unique ID which is not part of another row.
In created a little fiddle at https://www.db-fiddle.com/f/hy8SACLyM2D65H2ZY31c2f/0 to demonstrate my issue. As you can see, IDs 3 and 5 have the same consecutive number (2). That's okay as they share the same unique ID. IDs 9, 10, 12 and 14 also have the same consecutive number (4), but only IDs 9 and 10 have an identical unique ID. Therefore in this case the query should find ID 12.
Can you please help me with developing a solution for this?
Thank you so very much for your help in advance.
All the best,
Marianne
You can use COUNT(DISTINCT unique_id) to find the values of consecutive_id that have different unique_id.
SELECT consecutive_id
FROM test
GROUP BY consecutive_id
HAVING COUNT(DISTINCT unique_id) > 1
You can then join this with your original table, group by both unique_id and consecutive_id, and get the rows that just have a count of 1, which means they're not equal to the rest of the group.
Since there can be multiple outliers, you need another level of subquery that just gets the minimum outlier for each consecutive ID.
SELECT consecutive_id, MIN(id) as id
FROM (
SELECT a.consecutive_id, MIN(id) AS id
FROM test AS a
JOIN (
SELECT consecutive_id
FROM test
GROUP BY consecutive_id
HAVING COUNT(DISTINCT unique_id) > 1) AS b
ON a.consecutive_id = b.consecutive_id
GROUP BY a.consecutive_id, a.unique_id
HAVING COUNT(*) = 1) AS x
GROUP BY consecutive_id
DEMO

Distinct Count Index - MYSQL

I have simple table (MYSQL - MyISAM):
EDIT: This is a 50M record table, adding a new index isn't really something we can do.
actions (PRIMARY action_id, user_id, item_id, created_at)
Indicies:
action_id (action_id, PRIMARY)
user (user_id)
item_user (user_id, item)
created_user (user_id, created_at)
And the query:
SELECT count(distinct item_id) as c_c from actions where user_id = 1
The explain:
1 SIMPLE action ref user_id,created_user user_id 4 const 1415
This query takes around 7 seconds to run for users with over 1k entries. Any way to improve this?
I've tried the following and they are all worse:
SELECT count(*) from actions where user_id =1 group by item_id
SELECT count(item_id) from actions USE INDEX (item_user) where user_id = 1 group by item_Id
Can you test the following:
SELECT count(*) as c_c
from (
SELECT distinct item_id
from actions where user_id = 1
) as T1
In case you're using PHP, can you simplify your query to the following:
SELECT distinct item_id
FROM actions
WHERE user_id = 1
and then use mysql_num_rows to get the number of rows in your result?
Another option you could try, although it requires more work, is to:
1- create another table that will hold the total number of rows found for each user_id. meaning you'll have to create a table with two columns, one is the user_id and the 2nd is the total of items found in your previous table.
2- schedule a job to run ,every 1 hour for instance, and update the table with the total returned from the 'actions` table. At this point you can just query your newly created table like this:
SELECT total
FROM actions_total
WHERE user_id = 1
This will be much faster when you need your final result because you're dealing with a single row instead of thousands. The drawback here is that you may not get an accurate result depending on how ofter you need to run your job.
3- In case you decide not to use a job. You can actually still use the newly created table but you will need to update (increment/decrement) its total each time you insert/delete into the `actions' table.
N.B: Just trying to help

MySQL: Grabbing the latest ID from duplicate records within a table

I'm trying to grab the latest ID from a duplicate record within my table, without using a timestamp to check.
SELECT *
FROM `table`
WHERE `title` = "bananas"
-
table
id title
-- -----
1 bananas
2 apples
3 bananas
Ideally, I want to grab the ID 3
I'm slightly confused by the SELECT in your example, but hopefully you will be able to piece this out from my example.
If you want to return the latest row, you can simply use a MAX() function
SELECT MAX(id) FROM TABLE
Though I definitely recommend trying to determine what makes that row the "latest". If its just because it has the highest column [id], you may want to consider what happens down the road. What if you want to combine two databases that use the same data? Going off the [id] column might not be the best decision. If you can, I suggest an [LastUpdated] or [Added] datestamp column to your design.
im assuming the id's are autoincremented,
you can count how many rows you have, store that in a variable and then set the WHERE= clause to check for said variable that stores how many rows you have.
BUT this is a hack solution because if you delete a row and the ID is not decremented you can end up skipping an id.
select max(a.id) from mydb.myTable a join mydb.myTable b on a.id <> b.id and a.title=b.title;