I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)
SELECT property.paon, property.saon, property.street, property.postcode, property.lastSalePrice, property.lastTransferDate,
epc.ADDRESS1, epc.POSTCODE, epc.TOTAL_FLOOR_AREA,
(
3959 * acos (
cos (radians(54.6921))
* cos(radians(property.latitude))
* cos(radians(property.longitude) - radians(-1.2175))
+ sin(radians(54.6921))
* sin(radians(property.latitude))
)
) AS distance
FROM property
RIGHT JOIN epc ON property.postcode = epc.POSTCODE AND CONCAT(property.paon, ', ', property.street) = epc.ADDRESS1
WHERE property.paon IS NOT NULL AND epc.TOTAL_FLOOR_AREA > 0
GROUP BY CONCAT(property.paon, ', ', property.street)
HAVING distance < 1.4
ORDER BY property.lastTransferDate DESC
LIMIT 10
table property has 22 million rows, table epc has 14 million rows
Without the GROUP BY and ORDER BY, it runs fast.
Property table often has the same property multiple times, but I need to select the one with the most current lastTransferDate.
If there is a better approach I'm open to it
Here is the explain of query:
Query-Explain-Image
You can do a few things:
Create a new column so you don't need to use CONCAT CONCAT(property.paon, ', ', property.street) in the GROUP BY and the JOIN (this will speed it up a lot!)
As JackHacks says you need to create indexes at the right spot. (property postcode and the newly created column, and epc postcode and address)
Remove the HAVING with epc.TOTAL_FLOOR_AREA > 0 and add it to the WHERE
If you need more help, share en EXPLAIN of your query with us.
Do you control the database? If you do, you could try adding indexes on the address and postcode columns (anything in the join clause). That should probably speed up the query.
Edit: reread your question.
If the GROUP BY and ORDER BY clauses are slowing it down, I would try adding indexes on the columns referenced in those clauses.
This query takes more than 40 seconds to execute on a table that has 200k rows
SELECT
my_robots.*,
(
SELECT count(id)
FROM hpsi_trading
WHERE estado <= 1 and idRobot = my_robots.id
) as openorders,
apikeys.apikey,
apikeys.apisecret
FROM my_robots, apikeys
WHERE estado <= 1
and idRobot = '2'
and ready = '1'
and apikeys.id = my_robots.idApiKey
and (my_robots.id LIKE '%0'
OR my_robots.id LIKE '%1'
OR my_robots.id LIKE '%2')
I know it is because of the count inside the query, but how could i fix this efficiently.
Edit: Explain
Thanks.
Use GROUP BY instead
SELECT my_robots.*,
count(id) as openorders,
apikeys.apikey,
apikeys.apisecret
FROM my_robots
JOIN apikeys ON apikeys.id = my_robots.idApiKey
LEFT JOIN hpsi_trading ON hpsi_trading.idRobot = my_robots.id and estado <= 1
WHERE estado <= 1 and
idRobot = '2' and
ready = '1' and
(
my_robots.id LIKE '%0' OR
my_robots.id LIKE '%1' OR
my_robots.id LIKE '%2'
)
GROUP BY my_robots.id, apikeys.apikey, apikeys.apisecret
Use explicit JOIN syntax. Some indexes will be needed to run it fast, however, the database structure is not clear from your post (and from your query as well).
The explain plan shows that the largest pain is selecting the data from the table hpsi_trading.
The challenge from the database's point of view is that the query contains a correlated subquery in the SELECT clause, which needs to be executed once for each result of the outer query (after filtering).
Replacing this subquery with a JOIN + GROUP BY will require MySQL to join between all these records (inflate) and only then deflate the data using GROUP BY, which might take time.
Instead, I would extract the subquery to a temporary table, which is grouped during creation, index it and join to it. That way, the subquery will run once, using a quick covering index, it will already group the data and only then join it to the other table.
This far, it's all pros. But, the con here is that extracting a subquery to a temporary table might require more effort on the development side.
Please try this version and let me know if it helped (if not, please provide a fresh EXPLAIN plan screenshot):
Creating the temp table:
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS
SELECT idRobot, COUNT(id) as openorders
FROM hpsi_trading
WHERE estado <= 1
GROUP BY idRobot;
The modified query:
SELECT
my_robots.*,
temp1.openorders,
apikeys.apikey,
apikeys.apisecret
FROM
my_robots,
apikeys
LEFT JOIN temp1 on temp1.idRobot = my_robots.id
WHERE
estado <= 1 AND idRobot = '2'
AND ready = '1'
AND apikeys.id = my_robots.idApiKey
AND (my_robots.id LIKE '%0'
OR my_robots.id LIKE '%1'
OR my_robots.id LIKE '%2')
The indexes to add for this solution (I assumed from logic that estado, idRobot and ready are from the apikeys table. If that's not the case, let me know and I'll adjust the indexes):
ALTER TABLE `temp1` ADD INDEX `temp1_index_1` (idRobot);
ALTER TABLE `hpsi_trading` ADD INDEX `hpsi_trading_index_1` (idRobot, estado, id);
ALTER TABLE `apikeys` ADD INDEX `apikeys_index_1` (`idRobot`, `ready`, `id`, `estado`);
ALTER TABLE `my_robots` ADD INDEX `my_robots_index_1` (`idApiKey`);
I have a big query (MYSQL) to join several tables:
SELECT * FROM
`AuthLogTable`,
`AppTable`,
`Company`,
`LicenseUserTable`,
`LicenseTable`,
`LicenseUserPool`,
`PoolTable`
WHERE
`LicenseUserPool`.`UserID`=`LicenseUserTable`.`UserID` and
`LicenseUserTable`.`License`=`LicenseTable`.`License` and
LEFT(RIGHT(`AuthLogTable`.`User`, 17), 16)=`LicenseUserPool`.`UserID` and
`LicenseUserPool`.`PoolID`=`PoolTable`.`id` and
`Company`.`id`=`LicenseTable`.`CompanyID` and
`AuthLogTable`.`License` = `LicenseTable`.`License` and
`AppTable`.`AppID` = `AuthLogTable`.`AppID` AND
`PoolTable`.`id` IN (-1,1,2,4,15,16,17,5,18,19,43,20,3,6,8,10,29,30,7,11,12,24,25,26,27,28,21,23,22,31,32,33,34,35,36,37,38,39,40,41,42,-1)
ORDER BY
`AuthLogTable`.`AuthDate` DESC,
`AuthLogTable`.`AuthTime` DESC
LIMIT 0,20
I use explain and it gives the following:
How to make this faster? It takes several seconds in a big table.
"Showing rows 0 - 19 ( 20 total, Query took 3.5825 sec)"
as far as i know, the fields used in the query are indexed in each table.
Indices are set for AuthLogTable
You can try running this query without 'order by' clause on your data and see if it makes a difference (also run 'explain'). If it does, you can consider adding index/indices on the fields you sort by. Using temporary; using filesort; means that the temp table is created and then sorted, without index that takes time.
As far as I know, join style doesn't make any difference because query is parsed into another form anyway. But you still may want to use ANSI join syntax (see also this question ANSI joins versus "where clause" joins).
First of all consider modifying your query to use JOINS properly. Also, make sure that you have indexed the columns used in JOIN ON clause ,WHERE condition and ORDER BY clause.
select * from `AuthLogTable`
join `AppTable` on `AppTable`.`AppID` = `AuthLogTable`.`AppID`
join `LicenseTable` on `AuthLogTable`.`License` = `LicenseTable`.`License`
join `Company` on `Company`.`id`=`LicenseTable`.`CompanyID`
join `LicenseUserTable` on `LicenseUserTable`.`License`=`LicenseTable`.`License`
join `LicenseUserPool` on `LicenseUserPool`.`UserID`=`LicenseUserTable`.`UserID`
join `PoolTable` on `LicenseUserPool`.`PoolID`=`PoolTable`.`id`
where LEFT(RIGHT(`AuthLogTable`.`User`, 17), 16)=`LicenseUserPool`.`UserID`
and `PoolTable`.`id` IN (-1,1,2,4,15,16,17,5,18,19,43,20,3,6,8,10,29,30,7,11,12,24,25,26,27,28,21,23,22,31,32,33,34,35,36,37,38,39,40,41,42,-1)
order by `AuthLogTable`.`AuthDate` desc, `AuthLogTable`.`AuthTime` desc
limit 0,20;
Try the following query:
SELECT *
FROM `AuthLogTable`
JOIN `AppTable` ON (`AppTable`.`AppID` = `AuthLogTable`.`AppID`)
JOIN `LicenseUserPool` ON (LEFT(RIGHT(`AuthLogTable`.`User`, 17), 16)=`LicenseUserPool`.`UserID`)
JOIN `LicenseUserTable` ON (`LicenseUserPool`.`UserID`=`LicenseUserTable`.`UserID`)
JOIN `LicenseTable` ON (`AuthLogTable`.`License` = `LicenseTable`.`License`
AND `LicenseUserTable`.`License`=`LicenseTable`.`License`)
JOIN `Company` ON (`Company`.`id`=`LicenseTable`.`CompanyID`)
JOIN `PoolTable` ON (`LicenseUserPool`.`PoolID`=`PoolTable`.`id`)
WHERE `PoolTable`.`id` IN (-1,1,2,4,15,16,17,5,18,19,43,20,3,6,8,10,29,30,7,11,12,24,25,26,27,28,21,23,22,31,32,33,34,35,36,37,38,39,40,41,42,-1)
ORDER BY `AuthLogTable`.`AuthDate` DESC, `AuthLogTable`.`AuthTime` DESC LIMIT 0,20
I have a simple table of data, and I'd like to select the row that's at about the 40th percentile from the query.
I can do this right now by first querying to find the number of rows and then running another query that sorts and selects the nth row:
select count(*) as `total` from mydata;
which may return something like 93, 93*0.4 = 37
select * from mydata order by `field` asc limit 37,1;
Can I combine these two queries into a single query?
This will give you approximately the 40th percentile, it returns the row where 40% of rows are less than it. It sorts rows by how far they are from the 40th percentile, since no row may fall exactly on the 40th percentile.
SELECT m1.field, m1.otherfield, count(m2.field)
FROM mydata m1 INNER JOIN mydata m2 ON m2.field<m1.field
GROUP BY
m1.field,m1.otherfield
ORDER BY
ABS(0.4-(count(m2.field)/(select count(*) from mydata)))
LIMIT 1
As an exercise in futility (your current solition would probably be faster and prefered), if the table is MYISAM (or you can live with the approximation of InnoDB):
SET #row =0;
SELECT x.*
FROM information_schema.tables
JOIN (
SELECT #row := #row+1 as 'row',mydata.*
FROM mydata
ORDER BY field ASC
) x
ON x.row = round(information_schema.tables.table_rows * 0.4)
WHERE information_schema.tables.table_schema = database()
AND information_schema.tables.table_name = 'mydata';
There's also this solution, which uses a monster string made by GROUP_CONCAT. I had to up the max on the output like so to get it to work:
SET SESSION group_concat_max_len = 1000000;
MySql wizards out there: feel free to comment on the relative performance of the methods.