Can this MySQL query be more efficient? - mysql

UPDATE Here's a sqlfiddle http://sqlfiddle.com/#!2/e0822/1/0
I have a MySQL database of apps (itunes_id), each app id has a comments field. To preserve a history, every time a comment is changed, a new row of data is added. In the query below, I just want a list of the latest entry (highest id) of every app (itunes_id).
Here are the headers of my db:
id (key and auto increment)
itunes_id
comments
date
This query is getting the latest entry for a given itunes_id. How can I make this query more efficient?
SELECT * FROM (
SELECT * FROM (
SELECT * FROM Apps
ORDER BY id DESC
) AS apps1
GROUP BY itunes_id
) AS apps2
LIMIT 0 , 25

This query uses a subquery which separately gets the maximum ID for every itunes_ID. The result of the subquery is then join back on the original table provided that it matches on two columns: itunes_ID and ID.
SELECT a.*
FROM Apps a
INNER JOIN
(
SELECT itunes_id, MAX(ID) max_id
FROM Apps
GROUP BY itunes_id
) b ON a.itunes_id = b.itunes_id AND
a.ID = b.max_ID
LIMIT 0, 25
For faster performance, create a compound column INDEX on columns itunes_ID and ID. EG,
ALTER TABLE Apps ADD INDEX (itunes_ID, ID)

For a similar approach, I use a "recent" boolean field to mark records containing the latest version. This requires an UPDATE query on every insert (deactivate the previous recent record), but allows for a quick select query. Alternatively, you could maintain two tables, one with the recent records, the other one with the history for each app.
EDIT: Maybe you can try a table similar to this:
id int not null auto_increment primary key
version int not null
main_id int null
recent boolean not null
app varchar(32) not null
comment varchar(200) null
You can use the column "main_id" to point to the record with version 1.

SELECT * FROM (
SELECT * FROM (
SELECT * FROM Apps
ORDER BY id DESC
) AS apps1
GROUP BY itunes_id
) AS apps2
LIMIT 0 , 25
will not select the oldest record (you cannot assume the generated key will always be the "oldest"). What you want is something like this:
SELECT * FROM (
SELECT * FROM (
SELECT * FROM Apps
where some_date = (select max(some_date) from Apps limit 1)
ORDER BY id DESC
) AS apps1
GROUP BY itunes_id
) AS apps2
LIMIT 0 , 25

I just want the latest entry (highest id) for a given app (itunes_id)
This will do it
SELECT MAX(id), comments FROM Apps WHERE itunes_id = "iid";
or
SELECT id, comments FROM Apps WHERE itunes_id = "iid" ORDER BY id DESC LIMIT 1;
Where iid is the itunes id for which you want the latest comment.
Make sure id and itunes_id are indexed in a composite index for maximum efficiency.

Related

Selecting Data from Normalized Tables

I'm stuck on trying to write this query, I think my brain is just a little fried tonight. I have this table that stores whenever a person executes an action (Clocking In, Clocking Out, Going on Lunch, Returning from Lunch) and I need to return a list of all the primary ID's for the people who's last action is not clock_out - but the problem is it needs to be a somewhat fast query.
Table Structure:
ID | person_id | status | datetime | shift_type
ID = Primary Key for this table
person_id = The ID I want to return if their status does not equal clock_out
status = clock_in, lunch_start, lunch_end, break_start, break_end, clock_out
datetime = The time the record was added
shift_type = Not Important
The way I was executing this query before was finding people who are still clocked in during a specific time period, however I need this query to locate at any point. The queries I am trying are taking the thousands and thousands of records and making it way too slow.
I need to return a list of all the primary ID's for the people whose last action is not clock_out.
One option uses window functions, available in MySQL 8.0:
select id
from (
select t.*, row_number() over(partition by person_id order by datetime desc) rn
from mytable t
) t
where rn = 1 and status <> 'clock_out'
In earlier versions, one option uses a correlated subquery:
select id
from mytable
where
datetime = (select max(t1.datetime) from mytable t1 where t1.personid = t.person_id)
and status <> 'clock_out'
After looking through it further, this was my solution -
SELECT * FROM (
SELECT `status`,`person_id` FROM `timeclock` ORDER BY `datetime` DESC
) AS tmp_table GROUP BY `person_id`
This works because it is grouping all of the same person ID's together, and then ordering them by the datetime and selecting the most recent.

Distinct on one column but retrieve all

I have got a table with the following syntax:
ID, VALUE, TIMESTAMP
(ID,TIMESTAMP) is primary key, so there might be more than one row for each ID. There are 5000 unique ids.
I want to retrieve the most recent entry for each ID.
My naive way to do was:
SELECT * FROM table ORDER BY TIMESTAMP DESC LIMIT 5000;
For most cases this will give the correct result, but it is not guaranteed.
Because there might be 100k to 500k entries in the Table I would like to take performance into account.
What do you suggest?
TRY THIS
SELECT ID, VALUE, TIMESTAMP
FROM table
GROUP BY ID
HAVING MAX(TIMESTAMP)
ORDER BY TIMESTAMP DESC ;
OR
SELECT ID, VALUE, TIMESTAMP
FROM table
GROUP BY ID
ORDER BY TIMESTAMP DESC ;
Please try this :
SELECT * FROM
(SELECT ID,
VALUE,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TIMESTAMP DESC) AS TIMESTAMP
FROM
TABLE )
WHERE TIMESTAMP=1 ;
If your DB supports QUALIFY statement you can try below also :
SELECT * FROM
TABLE
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TIMESTAMP DESC)=1 ;`enter code here`
You would find the latest date for each ID entry, try this
SELECT *
FROM table AS a
WHERE TIMESTAMP = (
SELECT MAX(TIMESTAMP)
FROM table AS b
WHERE a.ID = b.ID
)

Retrieving last row inserted in table for each "parameter"

I have a table, currently about 1.3M rows which stores measured data points for a couple of different parameters. It is a bout 30 parameters.
Table:
* id
* station_id (int)
* comp_id (int)
* unit_id (int)
* p_id (int)
* timestamp
* value
I have a UNIQUE index on: (station_id, comp_id, unit_id, p_id, timestamp)
Due to timestamp differ for every parameter i have difficulties sorting by the timestamp (I have to use a group by).
So today I select the last value for each parameter by this query:
select p_id, timestamp, value
from (select p_id, timestamp, value
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) table_x
group by p_id;
This query takes about 3 seconds to execute.
Even though i have index as mentioned before the optimizer uses filesort to find the values.
Querying for only 1 specific parameter:
select p_id, timestamp, value from table where station_id = 3 and comp_id = 9112 and unit_id = 1 and p_id =1 order by timestamp desc limit 1;
Takes no time (0.00).
I've also tried joining the parameter-ids to a table which I store the parameter ID's in without luck.
So, is there a simple ( & fast) way to ask for the latest values for a couple of rows with different parameters?
Doing a procedure running a loop asking for each parameter individually seems much faster than asking all for once which I think not is the way to use a database.
Your query is incorrect. You are aggregating by p_id, but including other columns. These come from indeterminate rows, and the documentation is quite clear:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause.
The following should work:
select p_id, timestamp, value
from table t join
(select p_id, max(timestamp) as maxts
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) tt
on tt.pid = t.pid and tt.timestamp = t.maxts;
The best index for this query is a composite index on table(station_id, comp_id, unit_id, p_id, timestamp).

How do I do a dynamic UNION query in MySQL?

mytable has an auto-incrementing id column which is an integer, and for all intents and purposes in this case you can safely assume that the higher ID represents a more recent value. mytable also has an indexed column called group_id which is a foreign key to the groups table.
I want a quick and dirty query to select the 5 most recent rows for each group_id from mytable.
If there were only three groups, this would be easy, as I could do this:
SELECT * FROM `mytable` WHERE `group_id` = 1 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 2 ORDER BY `id` DESC LIMIT 5
UNION ALL
SELECT * FROM `mytable` WHERE `group_id` = 3 ORDER BY `id` DESC LIMIT 5
However, there is not a fixed number of groups. Groups are determined by the what's in the groups table, so there is an indeterminate number of them.
My thoughts so far:
I could grab a CURSOR on the groups table and build a new SQL query string, then EXECUTE it. However, that seems really messy and I'm hoping there's a better way of doing it.
I could grab a CURSOR on the groups table and insert things into a temporary table, then select from that. However, that also seems really messy.
I don't know if I could just grab a CURSOR and then start returning rows directly from there. Is there perhaps something similar to SQL Server's #table type variables?
What I'm hoping most of all is that I'm overthinking this and there is a way to do this in a SELECT statement.
To get n most recent rows per group can be best handled by window functions in other RDBMS (SQL Server,Postgre Sql,Oracle etc), But unfortunately MySql don't have any window functions so for alternative there is a solution to use user defined variables to assign a rank for rows that belong to same group in this case ORDER BY group_id,id desc is important to order the results properly per group
SELECT c.*
FROM (
SELECT *,
#r:= CASE WHEN #g = group_id THEN #r + 1 ELSE 1 END rownum,
#g:=group_id
FROM mytable
CROSS JOIN(SELECT #g:=NULL ,#r:=0) t
ORDER BY group_id,id desc
) c
WHERE c.rownum <=5
Above query will give you 5 recent rows for each group_id and if you want to get more than 5 rows just change where filter of outer query to your desired number WHERE c.rownum <= n

Get Last conversation row from MySQL database table

I have a database in MYSQL and it has chat table which looks like this.
I am using this query for fetching these records
SELECT * FROM (
SELECT * FROM `user_chats`
WHERE sender_id =2 OR receiver_id =2
ORDER BY id DESC
) AS tbl
GROUP BY sender_id, receiver_id
But my requirement is only 5,4 ID's records. basically my requirement id fetching last conversation in between 2 users. Here in between 2 & 3 user conversation has 2 records and we want only last one of them i.e. id = 5, here don't need id = 2.
So how we can write a query for that result?
SELECT
*
FROM
user_chats uc
WHERE
not exists (
SELECT
1
FROM
user_chats uc2
WHERE
uc2.created > uc.created AND
(
(uc.sender_id = uc2.sender_id AND uc.reciever_id = uc2.reciever_id) OR
(uc.sender_id = uc2.reciever_id AND uc.reciever_id = uc2.sender_id)
)
)
The following gets you latest record (assuming that the bigger id, the later it was created) meeting your criteria:
SELECT * FROM `user_chats`
WHERE (`sender_id` =2 AND `receiver_id` =3) OR (`sender_id` =3 AND `receiver_id` =2)
ORDER BY `id` DESC
LIMIT 1
which would be a good idea, if id is primary key and it rises along with rising value of created. Otherwise (if you are not sure that id rises when created rises) replace ORDER BY line with the following:
ORDER BY `created` DESC
Plus, in both cases, put proper indexes on: id (if it is your primary key, then there is no need to put additional index on it), sender_id and receiver_id (preferably composite index, meaning the single index for both columns), created (if you want to use ORDER BY created DESC instead of ORDER BY id DESC - otherwise there is no need for that).
try GROUP BY LEAST(sender_id, receiver_id), GREATEST(sender_id, receiver_id)