SQL queries optimization - mysql

I'm having trouble optimizing some sql queries that take in account datetime fields.
First of all, my table structure is the following:
CREATE TABLE info (
id int NOT NULL auto_increment,
name varchar(20),
infoId int,
shortInfoId int,
text varchar(255),
token varchar(60),
created_at DATETIME,
PRIMARY KEY(id)
KEY(created_at));
After using explain on some of the simple queries I added the created_at key, that improved most of my simple queries performance. I'm having now trouble with the following query:
SELECT min(created_at), max(created_at) from info order by id DESC limit 10000
With this query I want to get the timespan between tha last 10k results.
After using explain I get the following results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE info ALL NULL NULL NULL NULL 4 NULL
Any idea on how can I improve the performance of this query?

If you want to examine the first 10k rows ordered by id then you need to use a sub-query to achieve your goal:
SELECT MIN(created_at), MAX(created_at)
FROM (
SELECT created_at
FROM info
ORDER BY id DESC
LIMIT 10000
) tenK
The inner query gets the first 10k rows from the table, sorted by id (only the field created_at is needed). The outer table computes the minimum and maximum value of created_at from the results set generated by the inner query.
I didn't run an EXPLAIN on it but I think it says 'Using temporary' in the 'Extra' column (which is not good but you cannot do better for this request). However, 10,000 rows is not that much; it runs fast and the performance does not degrade as the table size increases.
Update:
Now I noticed this sentence in the question:
With this query I want to get the timespan between tha last 10k results.
If you want to get the value of created_at of the most recent row and the row that is 10k rows in the past then you can use two simple queries that use the index on created_at and run fast:
(
SELECT created_at
FROM info
ORDER BY id DESC
LIMIT 1
)
UNION ALL
(
SELECT created_at
FROM info
ORDER BY id DESC
LIMIT 9999,1
)
ORDER BY created_at
This query produces 2 rows, the first one is the value of created_at of the 10000th row in the past, the second one is the created_at of the most recent row (I assume created_at always grows).

SELECT min(created_at), max(created_at) from info order by id DESC limit 10000
The above query will give you one row containing the minimum and maximum created_at values from info table. Because it only returns 1 row, the order by and limit clauses don't come into play.
The 10000-th record from the end can be accessed with the order by & limit condition ORDER BY id DESC LIMIT 1 OFFSET 9999 (thanks #Mörre Noseshine for the correction)
So, we can write the intended query as follows:
SELECT
min_created_at.value,
max_created_at.value
FROM
(SELECT
created_at value
FROM info
ORDER BY id DESC
LIMIT 1 OFFSET 9999) min_created_at,
(SELECT
created_at value
FROM info
ORDER BY id DESC
LIMIT 1) max_created_at

Related

Selecting Data from Normalized Tables

I'm stuck on trying to write this query, I think my brain is just a little fried tonight. I have this table that stores whenever a person executes an action (Clocking In, Clocking Out, Going on Lunch, Returning from Lunch) and I need to return a list of all the primary ID's for the people who's last action is not clock_out - but the problem is it needs to be a somewhat fast query.
Table Structure:
ID | person_id | status | datetime | shift_type
ID = Primary Key for this table
person_id = The ID I want to return if their status does not equal clock_out
status = clock_in, lunch_start, lunch_end, break_start, break_end, clock_out
datetime = The time the record was added
shift_type = Not Important
The way I was executing this query before was finding people who are still clocked in during a specific time period, however I need this query to locate at any point. The queries I am trying are taking the thousands and thousands of records and making it way too slow.
I need to return a list of all the primary ID's for the people whose last action is not clock_out.
One option uses window functions, available in MySQL 8.0:
select id
from (
select t.*, row_number() over(partition by person_id order by datetime desc) rn
from mytable t
) t
where rn = 1 and status <> 'clock_out'
In earlier versions, one option uses a correlated subquery:
select id
from mytable
where
datetime = (select max(t1.datetime) from mytable t1 where t1.personid = t.person_id)
and status <> 'clock_out'
After looking through it further, this was my solution -
SELECT * FROM (
SELECT `status`,`person_id` FROM `timeclock` ORDER BY `datetime` DESC
) AS tmp_table GROUP BY `person_id`
This works because it is grouping all of the same person ID's together, and then ordering them by the datetime and selecting the most recent.

Optimize MIN & MAX query

My database table consists of more than 10 million records. I am writing a query containing MIN and MAX functions on the created_date column which I already indexed. But when I am running my select statement it takes too much time and some times execution time get over and do not receive any output.
Is there any way to optimize my query. The query I am trying is below.
SELECT MIN(created_date) AS Min, MAX(created_date) as Max FROM network ORDER
BY id DESC LIMIT 1000000
The above query will give you MIN AND MAX,created_date from the last latest 1 000 000 rows.
SELECT MIN(created_date) AS Min,
MAX(created_date) AS Max -- Get min and max from the 1M rows
FROM (
SELECT created_date
FROM network
ORDER BY created_date desc
LIMIT 1000000
) AS recent -- Collect the latest 1M rows
This index would help some:
INDEX(created_date)
Rereading question
The latest date is simply MAX(created_date). The millionth date is `( SELECT created_date FROM network ORDER BY created_date DESC LIMIT 1000000, 1 )
So, this is the first choice:
SELECT ( SELECT created_date FROM network
ORDER BY created_date DESC LIMIT 1000000, 1 ) AS Min,
MAX(created_date) AS Max
FROM network;
Summary Table
CREATE TABLE Dates (
create_date DATETIME NOT NULL,
ct INT UNSIGNED NOT NULL,
PRIMARY KEY(ct)
) ENGINE=InnoDB;
Then, every hour (or other unit of time), count the number of records and store it there.
To find MIN(created_date) is a bit messy; it means summing through that table to find when the count adds up to about 1M, and declaring the hour (or whatever) is when it happened.
Alternatively (and probably better) is to capture the exact DATETIME of each 1000th row. This means probing network frequently, and storing just the created_date (drop the ct column). Then this finds the approximate time of the 1M ago:
SELECT created_date
FROM Dates
ORDER BY created_date DESC
LIMIT 1000, 1
(And use that as the subquery for Min.)

Mysql: order by two column, use filesort

I have trouble ordering two columns.
EXPLAIN SELECT * FROM articles WHERE option <>0 AND deleted=0 ORDER BY
date_added DESC, category_id DESC LIMIT 25 OFFSET 500
id select_type table type possible_keys key key_len ref rows
Extra 1 SIMPLE articles ALL NULL NULL NULL NULL 437168 Using
where; Using filesort
I add single indexes for (option, deleted, date_added, category_id)
When i used:
EXPLAIN SELECT * FROM articles WHERE option <>0 AND deleted=0 ORDER BY
date_added DESC LIMIT 25 OFFSET 500
or
EXPLAIN SELECT * FROM articles WHERE option <>0 AND deleted=0 ORDER BY
category_id DESC LIMIT 25 OFFSET 500
Using only where
I tried add index to (option, deleted, date_added, category_id) but it works only when i try sort by one column.
It will be very hard to get MySQL to use an index for this query:
SELECT *
FROM articles
WHERE option <> 0 AND deleted = 0
ORDER BY date_added DESC
LIMIT 25 OFFSET 500
You can try a composite index: articles(deleted, date_added, option). By covering the WHERE and ORDER BY, MySQL might use it.
If you can add an optionflag column for equality testing (rather than <>), then write the query as:
SELECT *
FROM articles
WHERE optionflag = 1 AND deleted = 0
ORDER BY date_added DESC
LIMIT 25 OFFSET 500;
Then an index on articles(deleted, optionflag, date_added desc) would work well.
Otherwise a subquery might work for you:
SELECT a.*
FROM (SELECT *
FROM articles
WHERE deleted = 0
ORDER BY date_added DESC
) a
WHERE option <> 0
LIMIT 25 OFFSET 500;
This materializes the intermediate result, but it is doing an order by anyway. And, the final ordering is not guaranteed to be surfaced in the outer query, but it does work in practice (and is close to guaranteed because of the materialization).

Retrieving last row inserted in table for each "parameter"

I have a table, currently about 1.3M rows which stores measured data points for a couple of different parameters. It is a bout 30 parameters.
Table:
* id
* station_id (int)
* comp_id (int)
* unit_id (int)
* p_id (int)
* timestamp
* value
I have a UNIQUE index on: (station_id, comp_id, unit_id, p_id, timestamp)
Due to timestamp differ for every parameter i have difficulties sorting by the timestamp (I have to use a group by).
So today I select the last value for each parameter by this query:
select p_id, timestamp, value
from (select p_id, timestamp, value
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) table_x
group by p_id;
This query takes about 3 seconds to execute.
Even though i have index as mentioned before the optimizer uses filesort to find the values.
Querying for only 1 specific parameter:
select p_id, timestamp, value from table where station_id = 3 and comp_id = 9112 and unit_id = 1 and p_id =1 order by timestamp desc limit 1;
Takes no time (0.00).
I've also tried joining the parameter-ids to a table which I store the parameter ID's in without luck.
So, is there a simple ( & fast) way to ask for the latest values for a couple of rows with different parameters?
Doing a procedure running a loop asking for each parameter individually seems much faster than asking all for once which I think not is the way to use a database.
Your query is incorrect. You are aggregating by p_id, but including other columns. These come from indeterminate rows, and the documentation is quite clear:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause.
The following should work:
select p_id, timestamp, value
from table t join
(select p_id, max(timestamp) as maxts
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) tt
on tt.pid = t.pid and tt.timestamp = t.maxts;
The best index for this query is a composite index on table(station_id, comp_id, unit_id, p_id, timestamp).

Get Last conversation row from MySQL database table

I have a database in MYSQL and it has chat table which looks like this.
I am using this query for fetching these records
SELECT * FROM (
SELECT * FROM `user_chats`
WHERE sender_id =2 OR receiver_id =2
ORDER BY id DESC
) AS tbl
GROUP BY sender_id, receiver_id
But my requirement is only 5,4 ID's records. basically my requirement id fetching last conversation in between 2 users. Here in between 2 & 3 user conversation has 2 records and we want only last one of them i.e. id = 5, here don't need id = 2.
So how we can write a query for that result?
SELECT
*
FROM
user_chats uc
WHERE
not exists (
SELECT
1
FROM
user_chats uc2
WHERE
uc2.created > uc.created AND
(
(uc.sender_id = uc2.sender_id AND uc.reciever_id = uc2.reciever_id) OR
(uc.sender_id = uc2.reciever_id AND uc.reciever_id = uc2.sender_id)
)
)
The following gets you latest record (assuming that the bigger id, the later it was created) meeting your criteria:
SELECT * FROM `user_chats`
WHERE (`sender_id` =2 AND `receiver_id` =3) OR (`sender_id` =3 AND `receiver_id` =2)
ORDER BY `id` DESC
LIMIT 1
which would be a good idea, if id is primary key and it rises along with rising value of created. Otherwise (if you are not sure that id rises when created rises) replace ORDER BY line with the following:
ORDER BY `created` DESC
Plus, in both cases, put proper indexes on: id (if it is your primary key, then there is no need to put additional index on it), sender_id and receiver_id (preferably composite index, meaning the single index for both columns), created (if you want to use ORDER BY created DESC instead of ORDER BY id DESC - otherwise there is no need for that).
try GROUP BY LEAST(sender_id, receiver_id), GREATEST(sender_id, receiver_id)