How to speed up a very slow MySQL query? - mysql

I have a very slow MySQL syntax which is basically unusable since the table has grown to over 5000 entries. It takes more than 30 sec so the server sends error code and quits.
The syntax is:
SELECT
id,
user_id,
date
FROM
table
WHERE
id IN (
SELECT
MAX(id)
FROM
table
GROUP BY date
)
AND
company_id = '1'
AND
date > '1473700785'
AND
complete = '1'
AND
name = "random string"
ORDER BY id ASC
Structure:
id - int(11)
user_id - int(10)
company_id - int(11)
date - varchar(20)
complete - varchar(2)
name - varchar(75)
Do you have any idea what could be slowing it? It used to function as expected with a much smaller table size (under 1000 entries).

Apart from subquery (like below), the best method is indexing. Like what most people here suggested
SELECT id, user_id, date
FROM table min
--sub queries sometimes run faster than IN / NOT IN
JOIN (
SELECT SELECT MAX(id)
FROM table
GROUP BY date
)
max on max.id = min.id
WHERE min.company_id = '1'
AND min.date > '1473700785'
AND min.complete = '1'
AND min.name = "random string"
ORDER BY min.id ASC

At first you need index for date field.
And you need store date as integer, because you use this expression
date > '1473700785'

Indexing is good, but I don't see the need for a SUB-SELECT
SELECT
MAX(t.id) as id,
u.user_id,
t.date
FROM table t
JOIN table u ON u.id=MAX(t.id )
WHERE
t.company_id = '1'
AND
t.date > '1473700785'
AND
t. complete = '1'
AND
t.name = "random string"
GROUP BY t.date
ORDER BY t.id ASC

Related

Calculate the average date difference

This is the essential setup of the table (only the DDL for relevant columns is present). MySQL version 8.0.15
The intent is to show an average of date difference interval between orders.
CREATE TABLE final (
prim_id INT(11) NOT NULL AUTO_INCREMENT,
order_ID INT(11) NOT NULL,
cust_ID VARCHAR(45) NOT NULL,
created_at DATETIME NOT NULL,
item_name VARCHAR(255) NOT NULL,
cust_name VARCHAR(255) NOT NULL,
PRIMARY KEY (prim_id),
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=145699
Additional information:
cust ID -> cust_name (one-to-many)
cust_ID -> order_ID (one-to-many)
order ID -> item_name (one-to-many)
order ID -> created_at (one-to-one)
prim_id -> *everything* (one-to-many)
I've thought of using min(created_at) and max(created_at) but that will exclude all the orders between oldest and newest. I need a more refined solution.
The end result should be like this:
Information about average time intervals between all orders, (not just min and max because there are quite often times, more than two) measured in days, next to a column showing the client's name (cust_name).
If I get this right you might use a subquery getting the date of the previous order. Use datediff() to get the difference between the dates and avg() to get the average of that differences.
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM final f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM final f1
GROUP BY f1.cust_id;
Edit:
If there can be more rows for one order ID, as KIKO Software mentioned we need to do the SELECT from the distinct set of orders like:
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM (SELECT DISTINCT f3.cust_id,
f3.created_at,
f3.order_id
FROM final f3) f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM (SELECT DISTINCT f3.cust_id,
f3.created_at,
f3.order_id
FROM final f3) f1
GROUP BY f1.cust_id;
This may fail if there can be two rows for an order with different customer IDs or different creation time stamps. But in that case the data is just complete garbage and needs to be corrected before anything else.
2nd Edit:
Or alternatively getting the maximum creation timestamp per order if these can differ:
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM (SELECT max(f3.cust_id) cust_id,
max(f3.created_at) created_at,
f3.order_id
FROM final f3
GROUP BY f3.order_id) f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM (SELECT max(f3.cust_id) cust_id,
max(f3.created_at) created_at,
f3.order_id
FROM final f3
GROUP BY f3.order_id) f1
GROUP BY f1.cust_id;

MySql - Exclude default date from max inside case

I am working with a table of items with expiration dates,these items are assigned to users.
I want to get for each user,the highest expiration date.The issue here is that default items are initialized with a '3000/01/01' expiration date that should be ignored if another item exists for that user.
I've got a query doing that:
SELECT
user_id as UserId,
CASE WHEN (YEAR(MAX(date_expiration)) = 3000)
THEN (
SELECT MAX(temp.date_expiration)
FROM user_items temp
WHERE YEAR(temp.date_expiration) &lt&gt 3000 and temp.user_id = UserId
)
ELSE MAX(date_expiration)
END as date_expiration
FROM user_items GROUP BY user_id
This works, but the query inside THEN block is killing performance a bit and it is a huge table.
So,Is there a better way to ignore the default date from the MAX operation when entering the CASE condition?
SELECT user_id,
COALESCE(
MAX(CASE WHEN YEAR(date_expiration) = 3000 THEN NULL ELSE date_expiration END),
MAX(date_expiration)
)
FROM user_items
GROUP BY
user_id
If there are few users but lots of entries per user in your table, you can try improving your query yet a little more:
SELECT user_id,
COALESCE(
(
SELECT date_expiration
FROM user_items uii
WHERE uii.user_id = uid.user_id
AND date_expiration < '3000-01-01'
ORDER BY
user_id DESC, date_expiration DESC
LIMIT 1
),
(
SELECT date_expiration
FROM user_items uii
WHERE uii.user_id = uid.user_id
ORDER BY
user_id DESC, date_expiration DESC
LIMIT 1
)
)
FROM (
SELECT DISTINCT
user_id
FROM user_items
) uid
You need an index on (user_id, date_expiration) for this to work fast.

Pageviews to sessions without loop

I have a bit of a challenging SQL problem: Let's say you have a table of pageviews which looks like this:
CREATE TABLE pageviews (
id INT(11) NOT NULL AUTO_INCREMENT,
user_id INT(11) NOT NULL,
timestamp DATETIME NOT NULL,
PRIMARY KEY (id)
)
In this table, you have a very large number of records (>100 million). From this data, you want to generate another table which looks like this:
CREATE TABLE sessions (
id INT(11) NOT NULL AUTO_INCREMENT,
user_id INT(11) NOT NULL,
started_at DATETIME NOT NULL,
ended_at DATETIME NOT NULL,
PRIMARY KEY (id)
)
The rule is that a session is any sequence of an arbitrary number of pageviews which does not contain any gap larger than 30 minutes.
Now I have managed to generate this table using a stored procedure which uses a loop to get the sessions:
DELIMITER |
CREATE PROCEDURE generate_sessions()
BEGIN
TRUNCATE sessions;
INSERT INTO sessions
SELECT NULL, p.user_id, p.timestamp, p.timestamp FROM pageviews p
LEFT JOIN pageviews2 p2 ON p2.user_id = p.user_id AND p2.timestamp > p.timestamp AND p2.timestamp < DATE_ADD(p.timestamp, INTERVAL 30 MINUTE)
WHERE p2.id IS NULL;
REPEAT
UPDATE sessions s
LEFT JOIN pageviews p ON p.user_id = s.user_id AND p.timestamp < s.started_at AND p.timestamp > DATE_SUB(s.started_at, INTERVAL 30 MINUTE)
SET s.started_at = p.timestamp
WHERE p.id IS NOT NULL;
UNTIL ROW_COUNT() = 0 END REPEAT;
END |
Basically, what the procedure does is to first get the latest pageview of any session, insert it into the table, and then iteratively backtrack until all sessions are complete.
Needless to say, this is incredibly slow. Anybody have a better solution, preferably one that involves only one query?
This is a hard problem in MySQL. You really want window functions for this.
But, there is a way. First, you need to define each session. For this, find the gaps that are greater than half an hour between pageviews. The following query looks backwards, so this is called PrevSessionEnd.
Next, because time is increasing, select the maximum of this value for all page views for a user that occur on or before a given page view. The result should be that every page view gets a value that is constant over a session. The first will be NULL, the second will be the maximum time stamp of the first session, and so on.
Then, group by this amount.
select USER_ID, MIN(timestamp) as started_at, MAX(timestamp) as ended_at
from (select pv.*,
(select MAX(prevSessionEnd)
from (select pv.*,
(select timestamp
from pageviews pv2
where pv2.useid = pv.user_id and pv2.timestamp < pv.timestamp and
(pv.timestamp - pv2.timestamp) > 0.5/24
order by timestamp
limit 1
) as PrevSessionEnd
from pageviews pv
) pv2
where pv.user_id = pv2.user_id and pv2.timestamp <= pv.timestamp
) as SessionGrouper
from pageviews pv
) pv
group by user_id, SessionGrouper
This particular query has not been tested, so it might have syntax errors.
I'm leaving the final insert up to you.
This will, in turn, run faster if you have an index on pageviews(user_id, timestamp). The subqueries can be resolved only using this index.

MYSQL Query : How to get values per category?

I have huge table with millions of records that store stock values by timestamp. Structure is as below:
Stock, timestamp, value
goog,1112345,200.4
goog,112346,220.4
Apple,112343,505
Apple,112346,550
I would like to query this table by timestamp. If the timestamp matches,all corresponding stock records should be returned, if there is no record for a stock for that timestamp, the immediate previous one should be returned. In the above ex, if I query by timestamp=1112345 then the query should return 2 records:
goog,1112345,200.4
Apple,112343,505 (immediate previous record)
I have tried several different ways to write this query but no success & Im sure I'm missing something. Can someone help please.
SELECT `Stock`, `timestamp`, `value`
FROM `myTable`
WHERE `timestamp` = 1112345
UNION ALL
SELECT `Stock`, `timestamp`, `value`
FROM `myTable`
WHERE `timestamp` < 1112345
ORDER BY `timestamp` DESC
LIMIT 1
select Stock, timestamp, value from thisTbl where timestamp = ? and fill in timestamp to whatever it should be? Your demo query is available on this fiddle
I don't think there is an easy way to do this query. Here is one approach:
select tprev.*
from (select t.stock,
(select timestamp from t.stock = s.stock and timestamp <= <whatever> order by timestamp limit 1
) as prevtimestamp
from (select distinct stock
from t
) s
) s join
t tprev
on s.prevtimestamp = tprev.prevtimestamp and s.stock = t.stock
This is getting the previous or equal timestamp for the record and then joining it back in. If you have indexes on (stock, timestamp) then this may be rather fast.
Another phrasing of it uses group by:
select tprev.*
from (select t.stock,
max(timestamp) as prevtimestamp
from t
where timestamp <= YOURTIMESTAMP
group by t.stock
) s join
t tprev
on s.prevtimestamp = tprev.prevtimestamp and s.stock = t.stock

mysql Group by : Slow query

Here is my query
SELECT file_id, file_name, file_date, file_email
FROM (SELECT *
FROM `file`
ORDER BY file_date DESC
) AS t
WHERE file_domains = ''
GROUP BY file_name
ORDER BY file_date DESC
LIMIT 0 , 100
primary key is file_id and index is file_name. Records about 900k
It took about 2 seconds in my local computer.
Is there any optimize for this query?
thanks in advance.
Your query uses a non-standard "feature" (mistake: one non-standard and one semi-standard feature) of MySQL and there is no guarantee that it will not break in future versions of MySQL, when the optimizer will be clever enough to understand that the subquery is redundant.
Add an index on (file_domains, file_name, file_date) and try this version:
SELECT f.file_id, f.file_name, f.file_date, f.file_email
FROM
`file` AS f
JOIN
( SELECT file_name
, MAX(file_date) AS max_file_date
FROM `file`
WHERE file_domains = ''
GROUP BY file_name
ORDER BY max_file_date DESC
LIMIT 0 , 100
) AS fm
ON fm.file_name = f.file_name
AND fm.max_file_date = f.file_date
ORDER BY f.file_date DESC ;
This intermediate query:
SELECT *
FROM `file`
ORDER BY file_date DESC
Fetches 900k records and orders by date, that might be slow.