Selecting Data from Normalized Tables - mysql

I'm stuck on trying to write this query, I think my brain is just a little fried tonight. I have this table that stores whenever a person executes an action (Clocking In, Clocking Out, Going on Lunch, Returning from Lunch) and I need to return a list of all the primary ID's for the people who's last action is not clock_out - but the problem is it needs to be a somewhat fast query.
Table Structure:
ID | person_id | status | datetime | shift_type
ID = Primary Key for this table
person_id = The ID I want to return if their status does not equal clock_out
status = clock_in, lunch_start, lunch_end, break_start, break_end, clock_out
datetime = The time the record was added
shift_type = Not Important
The way I was executing this query before was finding people who are still clocked in during a specific time period, however I need this query to locate at any point. The queries I am trying are taking the thousands and thousands of records and making it way too slow.

I need to return a list of all the primary ID's for the people whose last action is not clock_out.
One option uses window functions, available in MySQL 8.0:
select id
from (
select t.*, row_number() over(partition by person_id order by datetime desc) rn
from mytable t
) t
where rn = 1 and status <> 'clock_out'
In earlier versions, one option uses a correlated subquery:
select id
from mytable
where
datetime = (select max(t1.datetime) from mytable t1 where t1.personid = t.person_id)
and status <> 'clock_out'

After looking through it further, this was my solution -
SELECT * FROM (
SELECT `status`,`person_id` FROM `timeclock` ORDER BY `datetime` DESC
) AS tmp_table GROUP BY `person_id`
This works because it is grouping all of the same person ID's together, and then ordering them by the datetime and selecting the most recent.

Related

Writing SQL with timestamps

The data
CREATE TABLE IF NOT EXISTS `transactions` (
`transactions_ts` timestamp ,
`user_id` int(6) unsigned NOT NULL,
`transaction_id` bigint,
`item` varchar(200), PRIMARY KEY(`transaction_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `transactions` (`transactions_ts`, `user_id`, `transaction_id`,`item` ) VALUES
('2016-06-18 13:46:51.0', 13811335,1322361417, 'glove'),
('2016-06-18 17:29:25.0', 13811335,3729362318, 'hat'),
('2016-06-18 23::07:12.0', 13811335,1322363995,'vase' ),
('2016-06-19 07:14:56.0',13811335,7482365143, 'cup'),
('2016-06-19 21:59:40.0',13811335,1322369619,'mirror' ),
('2016-06-17 12:39:46.0',3378024101,9322351612, 'dress'),
('2016-06-17 20:22:17.0',3378024101,9322353031,'vase' ),
('2016-06-20 11:29:02.0',3378024101,6928364072,'tie'),
('2016-06-20 18:59:48.0',13811335,1322375547, 'mirror');
The question: for each user, show the first item that they ordered (first by time). I assume time as a whole timestamp (not time and date separately).
My attempt
select
min(transactions_ts) as first_trans,
user_id, item
from transactions
group by user_id
order by first_trans;
I am sorry that may be it is a simple question, but one person tells me that my query is entirely wrong. And I have got no other means to test this claim of his
demo fiddle
This is a little bit more complicated than you thought.
To start with: "for each user" would translate to GROUP BY user_id, not to GROUP BY user_id, item.
But with GROUP BY user_id, you'd need an aggregation function saying "the item for the minimum transactions_ts". MySQL doesn't feature such an aggregation function.
The obvious solution is to make this two steps:
Find the first transaction per user
Show the items for these transactions
The query:
select *
from transactions
where (user_id, transactions_ts) in
(
select user_id, min(transactions_ts)
from transactions
group by user_id
);
Another way to word the task is: "Give me the transactions for which no older transaction for the same user exists".
The query:
select *
from transactions t
where not exists
(
select *
from transactions t2
where t2.user_id = t.user_id
and t2.transactions_ts < t.transactions_ts
);
If you are using MySQL 8.0, window function ROW_NUMBER() can be used to adress your use case, as follows:
SELECT transactions_ts, user_id, item
FROM (
SELECT
transactions_ts,
user_id,
item,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY transactions_ts) rn
FROM transactions
) x WHERE rn = 1
The inner query ranks each record by ascending timestamp, within groups of records having the same user_id. The outer query filters in the first transaction of each customer.
Demo on DB Fiddle:
transactions_ts | user_id | item
:------------------ | ---------: | :----
2016-06-18 13:46:51 | 13811335 | glove
2016-06-17 12:39:46 | 3378024101 | dress
You can do it using a subquery to get the first transaction_ts for each user:
select user_id, item, transactions_ts
from transactions a
where transactions_ts=(select min(transactions_ts)
from transactions b
where b.user_id=a.user_id)
So your get:
In the inner query get the first transaction time for each user
In the outer query you get the row that has the time you got at point 1

Retrieving last row inserted in table for each "parameter"

I have a table, currently about 1.3M rows which stores measured data points for a couple of different parameters. It is a bout 30 parameters.
Table:
* id
* station_id (int)
* comp_id (int)
* unit_id (int)
* p_id (int)
* timestamp
* value
I have a UNIQUE index on: (station_id, comp_id, unit_id, p_id, timestamp)
Due to timestamp differ for every parameter i have difficulties sorting by the timestamp (I have to use a group by).
So today I select the last value for each parameter by this query:
select p_id, timestamp, value
from (select p_id, timestamp, value
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) table_x
group by p_id;
This query takes about 3 seconds to execute.
Even though i have index as mentioned before the optimizer uses filesort to find the values.
Querying for only 1 specific parameter:
select p_id, timestamp, value from table where station_id = 3 and comp_id = 9112 and unit_id = 1 and p_id =1 order by timestamp desc limit 1;
Takes no time (0.00).
I've also tried joining the parameter-ids to a table which I store the parameter ID's in without luck.
So, is there a simple ( & fast) way to ask for the latest values for a couple of rows with different parameters?
Doing a procedure running a loop asking for each parameter individually seems much faster than asking all for once which I think not is the way to use a database.
Your query is incorrect. You are aggregating by p_id, but including other columns. These come from indeterminate rows, and the documentation is quite clear:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause.
The following should work:
select p_id, timestamp, value
from table t join
(select p_id, max(timestamp) as maxts
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) tt
on tt.pid = t.pid and tt.timestamp = t.maxts;
The best index for this query is a composite index on table(station_id, comp_id, unit_id, p_id, timestamp).

This slow MySQL Query needs improvement

This query works and provides me with the information I need, but it is very slow: it takes 18 seconds to agregate a database of only 4,000 records.
I'm bringing it here to see if anyone has any advice on how to improve it.
SELECT COUNT( status ) AS quantity, status
FROM log_table
WHERE time_stamp
IN (SELECT MAX( time_stamp ) FROM log_table GROUP BY userid )
GROUP BY status
Here's what it does/what it needs to do in plain text:
I have a table full of logs, each log contains a "userid", "status" (integer between 1-12) and "time_stamp" (a time stamp of when the log was created). There may be many entries for a particular userid, but with a different time stamp and status. I'm trying to get the most recent status (based on time_stamp) for each userid, then count the occurrences of each most-recent status among all the users.
My initial idea was to use a sub query with GROUP BY userid, that worked fast - but that always returned the first entry for each userid, not the most recent. If I could do GROUP BY userid using time_stamp DESC to Identify which row should be the representative for the group, that would be great. But of course ORDER BY inside of group does not work.
Any suggestions?
The first thing to try is to make this an explicit join:
SELECT COUNT(status) AS quantity, status
FROM log_table join
(select lg.userid, MAX( time_stamp ) as maxts
from log_table lg
GROUP BY userid
) lgu
on lgu.userid = lg.userid and lgu.maxts = lg.time_stamp
GROUP BY status;
Another approach is to use a different where clause. This will work best if you have an index on log_table(userid, time_stamp). This approach is doing the filtering by saying "there is no timestamp bigger than this one for a given user":
SELECT COUNT(status) AS quantity, status
FROM log_table
WHERE not exists (select 1
from log_table lg2
where lgu.userid = lg.userid and lg2.time_stamp > lg.time_stamp
)
GROUP BY status;

SQL SELECT last entry without limiting

I have a table with 3 fields:
id
note
created_at
Is there a way in the SQL language especially Postgres that I can select the value of the last note without having to LIMIT 1?
Normal query:
select note from table order by created_at desc limit 1
I'm interested in something avoiding the limit since I'll need it as a subquery.
Simple version with EXISTS semi-join:
SELECT note FROM tbl t
WHERE NOT EXISTS
(SELECT 1 FROM tbl t1 WHERE t1.created_at > t.created_at);
"Find a note where no other note was created later."
This shares the weakness of #Hogan's version that it can return multiple rows if created_at is not UNIQUE - like #Ollie already pointed out. Unlike #Hogan's query (max() is only defined for simple types) this one can be improved easily:
Compare row types
SELECT note FROM tbl t
WHERE NOT EXISTS
(SELECT 1 FROM tbl t1
WHERE (t1.created_at, t1.id) > (t.created_at, t.id));
Assuming you want the greatest id in case of a tie with created_at, and id is the primary key, therefore unique. This works in PostgreSQL and MySQL.
SQL Fiddle.
Window function
The same can be achieved with a window function in PostgreSQL:
SELECT note
FROM (
SELECT note, row_number() OVER (ORDER BY created_at DESC, id DESC) AS rn
FROM tbl t
) x
WHERE rn = 1;
MySQL lacks support for window functions. You can substitute with a variable like this:
SELECT note
FROM (
SELECT note, #rownum := #rownum + 1 AS rn
FROM tbl t
,(SELECT #rownum := 0) r
ORDER BY created_at DESC, id DESC
) x
WHERE rn = 1;
(SELECT #rownum := 0) r initializes the variable with 0 without an explicit SET command.
SQL Fiddle.
If your id column is an autoincrementing primary key field, it's pretty easy. This assumes the latest note has the highest id. (That might not be true; only you know that!)
select *
from note
where id = (select max(id) from note)
It's here: http://sqlfiddle.com/#!2/7478a/1/0 for MySQL and here http://sqlfiddle.com/#!1/6597d/1/0 for postgreSQL. Same SQL.
If your id column isn't set up so the latest note has the highest id, but still is a primary key (that is, still has unique values in each row), it's a little harder. We have to disambiguate identical dates; we'll do this by choosing, arbitrarily, the highest id.
select *
from note
where id = (
select max(id)
from note where created_at =
(select max(created_at)
from note
)
)
Here's an example: http://sqlfiddle.com/#!2/1f802/4/0 for MySQL.
Here it is for postgreSQL (the SQL is the same, yay!) http://sqlfiddle.com/#!1/bca8c/1/0
Another possibility: maybe you want both notes shown together in one row if they were both created at the same exact time. Again, only you know that.
select group_concat(note separator '; ')
from note
where created_at = (select max(created_at) from note)
In postgreSQL 9+, it's
select string_agg(note, '; ')
from note
where created_at = (select max(created_at) from note)
If you do have the possibility for duplicate created_at times and duplicate id values, and you don't want the group_concat effect, you are unfortunately stuck with LIMIT.
I'm not 100% on Postgres (actually never used it) but you can get the same effect with something like this - if the created_at is unique ... (or with any column which is unique):
SELECT note FROM table WHERE created_at = (
SELECT MAX(created_at) FROM table
)
I may not know how to answer on this platform but what I have suggested is working
SELECT * FROM table GROUP BY field ORDER BY max(field) DESC;
You can get the last value of the field without limiting, usually in JOINED query we get the last update time with no limiting of output like this way, such as last message time without limiting it.

MySQL Subquery with main query data variable

Ok, need a MySQL guru here. I am trying to write a query that will serve as a notification system for when someone leaves a comment on an item that you have previously commented on. The 'drinkComment' table is very simple:
commentID, userID, drinkID, datetime, comment
I've written a query that will get all of the comments on drinks that I have previously commented on (that are not mine), but it will still show comments that occurred BEFORE my comment. This is as close to what I would think would work, but it does not. Please help!
select #drinkID:=drinkComments.drinkID, commentID, drinkID, userID, comment, datetime
FROM drinkComments
WHERE `drinkID` IN
( select distinct drinkID from drinkComments where drinkComments.userID = 1)
AND drinkComments.dateTime > (
/*This gets the last date user commented on the main query's drinkID*/
select datetime FROM drinkComments WHERE drinkComments.userID = 1 AND drinkComments.drinkID = #drinkID ORDER BY datetime DESC LIMIT 1
)
ORDER BY datetime DESC
Why not start with a prequery of the user and all the drinks they've offered comments and as of what time (don't know if you have multiple comments per person for any given drink or not). Then, find comments from all others AFTER such of your date/time comment...
This query should actually be faster as it is STARTING with only ONE USER's drink comments as a basis, THEN goes back to the comments table for those matching the drink ID and cutoff time.
SELECT STRAIGHT_JOIN
dc.*
from
( select
drinkID,
max( datetime ) UserID_DrinkCommentTime
FROM
drinkComments
WHERE
userID = 1
group by
drinkID ) PreQuery
join DrinkComments dc
on PreQuery.DrinkID = dc.DrinkID
and dc.datetime > PreQuery.UserID_DrinkCommentTime
order by
dc.DateTime desc
I think you need to relate your innermost query to the middle query by drinkID.
select #drinkID:=drinkComments.drinkID, commentID, drinkID, userID, comment, datetime
FROM drinkComments
WHERE `drinkID` IN
( select distinct drinkID from drinkComments AS a where drinkComments.userID = 1)
AND drinkComments.dateTime > (
/*This gets the last date user commented on the main query's drinkID*/
select datetime FROM drinkComments WHERE drinkComments.userID = 1 AND drinkComments.drinkID = a.drinkID ORDER BY datetime DESC LIMIT 1
)
ORDER BY datetime DESC