Retrieving last row inserted in table for each "parameter" - mysql

I have a table, currently about 1.3M rows which stores measured data points for a couple of different parameters. It is a bout 30 parameters.
Table:
* id
* station_id (int)
* comp_id (int)
* unit_id (int)
* p_id (int)
* timestamp
* value
I have a UNIQUE index on: (station_id, comp_id, unit_id, p_id, timestamp)
Due to timestamp differ for every parameter i have difficulties sorting by the timestamp (I have to use a group by).
So today I select the last value for each parameter by this query:
select p_id, timestamp, value
from (select p_id, timestamp, value
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) table_x
group by p_id;
This query takes about 3 seconds to execute.
Even though i have index as mentioned before the optimizer uses filesort to find the values.
Querying for only 1 specific parameter:
select p_id, timestamp, value from table where station_id = 3 and comp_id = 9112 and unit_id = 1 and p_id =1 order by timestamp desc limit 1;
Takes no time (0.00).
I've also tried joining the parameter-ids to a table which I store the parameter ID's in without luck.
So, is there a simple ( & fast) way to ask for the latest values for a couple of rows with different parameters?
Doing a procedure running a loop asking for each parameter individually seems much faster than asking all for once which I think not is the way to use a database.

Your query is incorrect. You are aggregating by p_id, but including other columns. These come from indeterminate rows, and the documentation is quite clear:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause.
The following should work:
select p_id, timestamp, value
from table t join
(select p_id, max(timestamp) as maxts
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) tt
on tt.pid = t.pid and tt.timestamp = t.maxts;
The best index for this query is a composite index on table(station_id, comp_id, unit_id, p_id, timestamp).

Related

Selecting Data from Normalized Tables

I'm stuck on trying to write this query, I think my brain is just a little fried tonight. I have this table that stores whenever a person executes an action (Clocking In, Clocking Out, Going on Lunch, Returning from Lunch) and I need to return a list of all the primary ID's for the people who's last action is not clock_out - but the problem is it needs to be a somewhat fast query.
Table Structure:
ID | person_id | status | datetime | shift_type
ID = Primary Key for this table
person_id = The ID I want to return if their status does not equal clock_out
status = clock_in, lunch_start, lunch_end, break_start, break_end, clock_out
datetime = The time the record was added
shift_type = Not Important
The way I was executing this query before was finding people who are still clocked in during a specific time period, however I need this query to locate at any point. The queries I am trying are taking the thousands and thousands of records and making it way too slow.
I need to return a list of all the primary ID's for the people whose last action is not clock_out.
One option uses window functions, available in MySQL 8.0:
select id
from (
select t.*, row_number() over(partition by person_id order by datetime desc) rn
from mytable t
) t
where rn = 1 and status <> 'clock_out'
In earlier versions, one option uses a correlated subquery:
select id
from mytable
where
datetime = (select max(t1.datetime) from mytable t1 where t1.personid = t.person_id)
and status <> 'clock_out'
After looking through it further, this was my solution -
SELECT * FROM (
SELECT `status`,`person_id` FROM `timeclock` ORDER BY `datetime` DESC
) AS tmp_table GROUP BY `person_id`
This works because it is grouping all of the same person ID's together, and then ordering them by the datetime and selecting the most recent.

Filter duplicate records and count of occurance corrosponding to a filter condition in SQL

There is a table and group the records using key: stu_class|stu_birth|stu_major. If there are duplicate records, the record with the smallest stu_id is selected. So, I need to count the total number of records with satisfied this condition.
Example:
Here, stu_id (100,101) are duplicate records based on the key. But I want to select only the smallest stu_id record. It is stu_id , 100. Simillary, stu_id (102,104) are duplicate records. but need to select stu_id 102.
Then selected record count should be 2. How can I get this count using SQL?. I mean how I can get calculated total number of records as 2.
One method uses window functions:
select t.*
from (select t.*,
row_number() over (partition by stu_class, stu_birth, stu_major order by stu_id) as seqnum
from t
) t
where seqnum = 1;
This is available in MySQL starting with version 8.
An alternative uses a correlated subquery and might be faster, even in version 8:
select t.*
from t
where t.stu_id = (select min(t2.stu_id)
from t t2
where t2.stu_class = t.stu_class and t2.stu_birth = t.stu_birth and t2.stu_major = t.stu_major
);
This can take advantage of an index on (stu_class, stu_birth, stu_major, stu_id).
EDIT
If you just want the total records, then use aggregation:
select stu_class, stu_birth, stu_major, min(stu_id), count(*) as cnt
from t
group by stu_class, stu_birth, stu_major;

MySQL wrong results with GROUP BY and ORDER BY

I have a table user_comission_configuration_history and I need to select the last Comissions configuration from a user_id.
Tuples:
I'm trying with many queries, but, the results are wrong. My last SQL:
SELECT *
FROM(
SELECT * FROM user_comission_configuration_history
ORDER BY on_date DESC
) AS ordered_history
WHERE user_id = 408002
GROUP BY comission_id
The result of above query is:
But, the correct result is:
id user_id comission_id value type on_date
24 408002 12 0,01 PERCENTUAL 2014-07-23 10:45:42
23 408002 4 0,03 CURRENCY 2014-07-23 10:45:41
21 408002 6 0,015 PERCENTUAL 2014-07-23 10:45:18
What is wrong in my SQL?
This is your query:
SELECT *
FROM (SELECT *
FROM user_comission_configuration_history
ORDER BY on_date DESC
) AS ordered_history
WHERE user_id = 408002
GROUP BY comission_id;
One major problem with your query is that it uses a MySQL extension to group by that MySQL explicitly warns against. The extension is the use of other columns in the in theselect that are not in the group by or in aggregation functions. The warning (here) is:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
So, the values returned in the columns are indeterminate.
Here is a pretty efficient way to get what you want (with "comission" spelled correctly in English):
SELECT *
FROM user_commission_configuration_history cch
WHERE NOT EXISTS (select 1
from user_commission_configuration_history cch2
where cch2.user_id = cch.user_id and
cch2.commission_id = cch.commission_id and
cch2.on_date > cch.on_date
) AND
cch.user_id = 408002;
Here's one way to do what your trying. It gets the max date for each user_ID and commissionID and then joins this back to the base table to limit the results to just the max date for each commissionID.
SELECT *
FROM user_comission_configuration_history A
INNER JOIN (
SELECT User_ID, Comission_Id, max(on_Date) mOn_Date
FROM user_comission_configuration_history
Group by User-Id, Comission_Id
) B
on B.User_ID = A.User_Id
and B.Comission_Id = A.Comission_ID
and B.mOnDate=A.on_date
WHERE user_id = 408002
ORDER BY on_Date desc;

Why is 'ORDER BY' needed to get correct result from MySQL join?

I have the following query:
SELECT t.ID, t.caseID, time
FROM tbl_test t
INNER JOIN (
SELECT ID, MAX( TIME )
FROM tbl_test
WHERE TIME <=1353143351
GROUP BY caseID
ORDER BY caseID DESC -- ERROR HERE!
) s
USING (ID)
It seems that I only get the correct result if I use the ORDER BY in the inner join. Why is that? I am using the ID for the join, so the order should take no effekt.
If I remove the order by, I get too old entries from the database.
ID is the primary key, the caseID is a kind of object with multiple entries with different timestamps.
This query is ambiguous:
SELECT ID, MAX( TIME )
FROM tbl_test
WHERE TIME <=1353143351
GROUP BY caseID
It's ambiguous because it does not guarantee that it returns the ID of the row where the MAX(TIME) occurs. It returns the MAX(TIME) for each distinct value of caseID, but the value of other columns (like ID) is chosen arbitrarily from members of the group.
In practice, MySQL chooses the row that it finds first in the group as it scans rows in storage order.
Example:
caseID ID time
1 10 15:00
1 12 18:00
1 14 13:00
The max time is 18:00, which is the row with ID 12. But the query will return ID 10, simply because it's the first one in the group. If you were to reverse the order with ORDER BY, it would return ID 14. Still not the row where the max time is found, but it's from the other end of the group of rows.
Your query works with ORDER BY caseID DESC because, by coincidence, your Time values increase with the increasing ID.
This sort of query is actually an error in standard SQL and most other brands of SQL database. MySQL permits it, trusting that you know how to form an unambiguous query.
The fix is to use columns in the select-list only if they are unambiguous, that is, if they are in the GROUP BY clause, then each group is guaranteed to have only one distinct value:
SELECT caseID, MAX( TIME )
FROM tbl_test
WHERE TIME <=1353143351
GROUP BY caseID
SELECT t.ID, t.caseID, time
FROM tbl_test t
INNER JOIN (
SELECT caseID, MAX( TIME ) maxtime
FROM tbl_test
WHERE TIME <=1353143351
GROUP BY caseID
) s
ON t.caseID = s.caseID and t.time = s.maxtime
You are seeing that issue because you are getting the MAX(TIME) per caseID, but since you are grouping by caseID and NOT ID, you are getting an arbitrary ID. That happens because when you use an aggregate function, like MAX, you must, for every non-grouped field in the select specify how you want to aggregate it. That means, if it's in the SELECT and NOT in the GROUP BY, you have to tell MySQL how to aggregate. If you don't then you get a RANDOM row (well, not random per se, but it's not going to be in an order that you necessarily expect).
The reason ORDER BY is working for you, is that it kind of tricks the query optimizer into sorting the results before grouping, which just so happens to produce the result you want, but be warned, that will not always be the case.
What you want is the ID that has the MAX(TIME) given a caseID. Which means your INNER join needs to connect by caseID (not ID) and time (which will give you 1 row per each 1 row in the outer table).
Barmar beat me to the actual query, but that's the way you want to go.

Determine total amount of top result returned

I would like to determine two things from a single query:
Most prevalent column in a table
The amount of times such column was located upon querying the table
Example Table:
user_id some_field
1 data
2 data
1 data
The above would return user_id # 1 as being the most prevalent in the table, and it would return (2) for the total amount of times that it was located in the table.
I have done my research and I came across two types of queries.
GROUP BY user_id ORDER BY COUNT(*) DESC
SUM
The problem is that I can't figure out how to use these two queries in conjunction with one another. For example, consider the following query which successfully returns the most prevalent column.
$top_user = "SELECT user_id FROM table_name GROUP BY user_id ORDER BY COUNT(*) DESC";
The above query returns "1" based on the example table shown above. Now, I would like to be able to return "2" for the total amount of times the user_id (1) was found in the table.
Is this by any chance possible?
Thanks,
Evan
You can include count(*) in the SELECT list:
SELECT user_id, count(*) as totaltimes from table_name
GROUP BY user_id ORDER BY count(*) DESC;
If you want only the first one:
SELECT user_id, count(*) as totaltimes from table_name
GROUP BY user_id ORDER BY count(*) DESC LIMIT 1;