MySql Select Query Optimisation - mysql

I recently came across a query in one of our office discussions,
SELECT t1.id, t1.name, t1.date AS date_filter,
(SELECT t2.column_x
FROM table_2 t2
WHERE t2.date = date_filter LIMIT 1
) AS column_x
FROM table_1 t1
WHERE t1.category_id = 10
ORDER BY t1.date
LIMIT 10;
The sub-query returns a column value from a second table that matches the date from the first table.
This query is not running at an optimised speed, can you guys pass me what are the ways to improvise the performance ?
Cheers

It would help to have SHOW CREATE TABLE for both tables, plus EXPLAIN SELECT ...
Indexes needed:
t1: INDEX(category_id, date)
t2: INDEX(date)
The subquery does not make sense without an ORDER BY -- which "1" row do you want?

Related

Create a new variable in SQL by groupby

I have 2 sql table as follows:
First table t1:
Second table t2:
I need to calculate the count of "Number" column based on "Name" column from t1 and merge it with t2.
I wrote following code. But it seems not working
select *
from (
select Name, count(Number) as count
from t1
group by Name ) as a
join ( select *
from t2 ) as b
on a.Name = b.Name;
Can any one figure out what is wrong ? Thank you very much
I think you want to use SUM() instead of COUNT().
Because SUM() sums some integers, while COUNT() counts number of occurencies.
And as also stated in the comments, multiple columns with same names will create conflicts, so you have to select the wanted columns explicit (that is usually a good idea anyway).
You could obtain your wanted endgoal by this query:
select
SUM(Number),
t1.Name,
(select val1 FROM t2 WHERE t2.Name = t1.Name LIMIT 1) as val1
FROM t1
GROUP BY t1.Name
Example in sqlfiddle: http://sqlfiddle.com/#!9/04dddf/7

MySQL Get exactly one entry per id by multiple ids

my problem is that I want this:
SELECT * FROM table
WHERE userId = 7243
ORDER BY date desc LIMIT 1
But for multiple ids in one request.
I tried this:
SELECT * FROM table
WHERE userId IN (7243, 1)
GROUP BY userId
ORDER BY date desc
But the order by seems to be ignored. Do anyone has a solution for me? Thank you
If you want the max date record for each of the two IDs, then you may use a subquery:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT userId, MAX(date) AS max_date
FROM yourTable
WHERE userId IN (7243, 1)
GROUP BY userId
) t2
ON t1.userId = t2.userId AND t1.date = t2.max_date
WHERE
t1.userId IN (7243, 1);
This is the just greatest-value-per-group question with a slight twist, namely that you only want to see two of the possible groups in the output.
As #Raymond commented below, an index on (userId, date) should greatly speed up the t2 subquery. I am not sure if this index would help beyond that, but it should make a difference.

MySQL - Deleting rows based on column subset duplicates

I have a table containing roughly 5 million rows and 150 columns. However, there are several similar rows that I would like to consider duplicates if they share the same values for 3 columns : ID, Order and Name.
However, I don't just want to delete the duplicates at random, I want the row I consider a duplicate to be the one which has the smaller count value (Count being another column) or if they have the same count, then base it on which has the earliest date column (Date is another column).
I have tried with the code below:
DELETE t1 FROM uploaddata_copy t1
JOIN uploaddata_copy t2
ON t2.Name = t1.Name
AND t2.ID = t1.ID
AND t2.Order = t1.Order
AND t2.Count < t1.Count
AND t2.Date < t1.Date
However (and this is probably due to my computer) it seems to run indefinitely (~25mins) before timing out from the server so I'm left unsure if this is correct and I just need to run for even longer or if the code is inherently wrong and there is a quicker way of doing it.
A more accurate query would be:
DELETE t1
FROM uploaddata_copy t1 JOIN
uploaddata_copy t2
ON t2.Name = t1.Name AND
t2.ID = t1.ID AND
t2.Order = t1.Order AND
(t2.Count < t1.Count OR
t2.Count = t1.Count AND t2.Date < t1.Date
);
However, fixing the logic will not (in this case) improve performance. First, you want an index on uploaddata_copy(Name, Id, Order, Count, Date). This allows the "lookup" to be between the original data and only the index.
Second, start small. Add a LIMIT 1 or LIMIT 10 to see how long it takes to remove just a few rows. Deleting rows is a complicated process, because it affects the table, indexes, and the transaction log -- not to mention any triggers on the table.
If a lot of rows are being deleted, you might find it faster to re-create the table, but that depends heavily on the relative number of rows being removed.
Why the join? You want to delete rows when there exists a "better" record. So use an EXISTS clause:
delete from dup using uploaddata_copy as dup
where exists
(
select *
from uploaddata_copy better
where better.name = dup.name
and better.id = dup.id
and better.order = dup.order
and (better.count > dup.count or (better.count = dup.count and better.date > dup.date))
);
(Please check my comparisions. This is how I understand this: A better record for name + id + order has a greater count or the same count and a higher date. You consider the worse record an undesired duplicate you want to delete.)
You'd have an index on uploaddata_copy(id, name, order) at least or better even on uploaddata_copy(id, name, order, count, date) for this delete statement to perform well.
Please try with this:
DELETE t1 FROM uploaddata_copy t1
JOIN uploaddata_copy t2
ON t2.Name = t1.Name
AND t2.ID = t1.ID
AND t2.Order = t1.Order
AND t2.Count < t1.Count
AND t2.Date < t1.Date
AND t2.primary_key != t1.primary_key

How to find latest record from two different tables

There are 2 tables table1 and table 2
First column, foreign_id is the common column between both tables.
Data type of all the related columns are same.
Now, we need to find the latest record based on timestamp column, for each foreign_id taking from both the tables, for example as below, also an extra column from_table, which shows from which table this row is selected.
One method that I can think of is
Combine both the tables
then, find the latest for each foreign_id column
Any, better way to do this as there could be 5000+ rows in both the tables.
Try this:
SELECT
t1.foreign_id,
MAX(t1.timestamp) max_time_table1,
MAX(t2.timestamp) max_time_table2
FROM *table1* t1
LEFT JOIN *table2* t2 USING (foreign_id)
GROUP BY foreign_id;
Note: This can be a bit slow, if the number of records are quite large.
However you can also use this:
SELECT a.foreign_id,
IF(a.max_time_table1 > a.max_time_table2, a.max_time_table1, a.max_time_table2) latest_update
FROM(
SELECT
t1.foreign_id,
SUBSTRING_INDEX(GROUP_CONCAT(t1.timestamp ORDER BY t1.id DESC),',',1) max_time_table1,
SUBSTRING_INDEX(GROUP_CONCAT(t2.timestamp ORDER BY t2.id DESC),',',1) max_time_table2
FROM *table1* t1
LEFT JOIN *table2* t2 USING (foreign_id)
GROUP BY foreign_id) a;
Make sure the id columns in both tables are auto_increment.
From your explanation, this would do then:
SELECT
foreign_id,
CASE
WHEN max_time_table1 < max_time_table2 THEN max_time_table2
WHEN max_time_table2 < max_time_table1 THEN max_time_table1
END as timestamps
FROM(
SELECT
t1.foreign_id,
SUBSTRING_INDEX(GROUP_CONCAT(t1.timestamp ORDER BY t1.id DESC),',',1) max_time_table1,
SUBSTRING_INDEX(GROUP_CONCAT(t2.timestamp ORDER BY t2.id DESC),',',1) max_time_table2
FROM *table1* t1
LEFT JOIN *table2* t2 USING (foreign_id)
GROUP BY foreign_id) a;

How to optimize mysql on left join

I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last