how do I make this query run faster? - mysql

I have this SQL query running on a PHP website. This is an old site and the query was build by previous developer few years ago. But now as the site data is increased to around 230mb, this query has become pretty slow to execute. Take around 15-20 seconds. Is there any way I can make this run faster?
SELECT DISTINCT
NULL AS bannerID,
C1.url AS url,
LOWER(C1.Organization) AS company_name,
CONCAT(
'https://mywebsite.co.uk/logos/',
C1.userlogo
) AS logo_url
FROM
Company AS C1
INNER JOIN Vacancy AS V1 ON LOWER(V1.company_name) = LOWER(C1.Organization)
WHERE
V1.LiveDate <= CURDATE()
AND url = ''
AND V1.ClosingDate >= CURDATE()
AND C1.flag_show_logo = 1

As commented, your query is suffering from being non-sargable due to the use of lower function.
Additionally I suspect you can remove the distinct by using exists instead of joining your tables
select null as bannerID,
C1.url as url,
Lower(C1.Organization) as company_name,
Concat('https://mywebsite.co.uk/logos/', C1.userlogo) as logo_url
from Company c
where c.flag_show_logo = 1
and c.url = ''
and exists (
select * from Vacancy v
where v.LiveDate <= CURDATE()
and v.ClosingDate >= CURDATE()
and v.company_name = c.Organization
)

Avoid the sargable problem by changing to
ON V1.company_name = C1.Organization
and declaring those two columns to be the same collation, namely a collation ending with "_ci".
And have these composite indexes:
C1: INDEX(flag_show_logo, url, Organization, userlogo)
V1: INDEX(company_name, LiveDate, ClosingDate)
(These indexes should help Stu's answer, too.)

Related

MySQL query optimisation, SQL Query is taking too much time

select *
from `attendance_marks`
where exists (select *
from `attendables`
where `attendance_marks`.`attendable_id` = `attendables`.`id`
and `attendable_type` = 'student'
and `attendable_id` = 258672
and `attendables`.`deleted_at` is null
)
and (`marked_at` between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
this query is taking too much time approx 7-10 seconds.
I am trying to optimize it but stuck at here.
Attendance_marks indexes
Attendables Indexes
Please help me optimize it a little bit.
For reference
number of rows in attendable = 80966
number of rows in attendance_marks = 1853696
Explain select
I think if we use JOINS instead of Sub-Query, then it will be more performant. Unfortunately, I don't have the exact data to be able to benchmark the performance.
select *
from attendance_marks
inner join attendables on attendables.id = attendance_marks.attendable_id
where attendable_type = 'student'
and attendable_id = 258672
and attendables.deleted_at is null
and (marked_at between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
I'm not sure if your business requirement allows changing the PK, and adding index. Incase it does then:
Add index to attendable_id.
I assume that attendables.id is PK. Incase not, add an index to it. Or preferably make it the PK.
In case attendable_type have a lot of different values. Then consider adding an index there too.
If possible don't have granularity till the seconds' field in marked_at, instead round to the nearest minute. In our case, we can round off 2022-09-30 23:59:59 to 2022-10-01 00:00:00.
select b.*
from `attendance_marks` AS am
JOIN `attendables` AS b ON am.`attendable_id` = b.`id`
WHERE b.`attendable_type` = 'student'
and b.`attendable_id` = 258672
and b.`deleted_at` is null
AND am.`marked_at` >= '2022-09-01'
AND am.`marked_at` < '2022-09-01 + INTERVAL 1 MONTH
and have these
am: INDEX(marked_at, attendable_id)
am: INDEX(attendable_id, marked_at)
b: INDEX(attendable_type, attendable_id, attendables)
Note that the datetime range works for any granularity.
(Be sure to check that I got the aliases for the correct tables.)
This formulation, with these indexes should allow the Optimizer to pick which table is more efficient to start with.

How to simplify queries

How can I simplify this query? Can this query be simplified? I tried some
joins but the results were not the same as this query below. Please give me
some insights.
SELECT trafficbywebsite.`adwordsCampaignID`,
trafficbywebsite.adwordsAdGroupID, trafficbywebsite.adPlacementDomain,
trafficbywebsite.counts traffic, convertedtrafficbywebsite.counts
convertedclicks
FROM
(
SELECT `adwordsAdGroupID`, `adPlacementDomain`, COUNT(*) counts
FROM
(
SELECT GA_entrances.*
FROM
GA_entrances,
GA_conversions
WHERE
GA_entrances.clientId=GA_conversions.clientId
AND (eventLabel='myurl' OR eventLabel='myotherurl')
AND YEAR(GA_entrances.timestamp)>=2016
AND MONTH(GA_entrances.timestamp)>=6
AND YEAR(GA_conversions.timestamp)>=2016
AND MONTH(GA_conversions.timestamp)>=6
GROUP BY GA_entrances.clientId
) clickers
GROUP BY `adwordsAdGroupID`, `adPlacementDomain`
) convertedtrafficbywebsite
,(
SELECT `adwordsCampaignID`, `adwordsAdGroupID`, adPlacementDomain,
COUNT(*) counts
FROM
GA_entrances
WHERE
YEAR(timestamp)>=2016
AND MONTH(timestamp)>=6
GROUP BY `adwordsAdGroupID`, `adPlacementDomain`
) trafficbywebsite
WHERE
convertedtrafficbywebsite.counts>=(trafficbywebsite.counts/10)
ORDER BY traffic DESC
Without sample data it is difficult to be certain but it appears unlikely you can remove one of the subqueries. What you can do however is improve the way you flter for the dates. The thing to avoid is using functions on data to suit your filter criteria. For example you want data from 2016-06-01 onward, that is a single date, yet you are amending every row of data to match to a year and a month.
AND YEAR(GA_entrances.timestamp) >= 2016
AND MONTH(GA_entrances.timestamp) >= 6
AND YEAR(GA_conversions.timestamp) >= 2016
AND MONTH(GA_conversions.timestamp) >= 6
;
There is no need for all those functions, just compare to a single date:
AND GA_entrances.timestamp) >= '2016-06-01'
AND GA_conversions.timestamp >= '2016-06-01'
;
The other thing to avoid is using commas as a way to join tables. ANSI standard syntax for this 25+ years old. This is the antique way of joining:
FROM GA_entrances, GA_conversions
WHERE GA_entrances.clientId = GA_conversions.clientId
This is considered best practice:
GA_entrances.*
FROM GA_entrances
INNER JOIN GA_conversions ON GA_entrances.clientId = GA_conversions.clientId

Using JOIN with DISTINCT and prioritize one table

I am trying to combine data from 2 tables.
Those 2 tables both contain data from the same sensor (lets say a sensor that measures CO2 with 1 entry per 10 minutes).
The first table contains validated data. Let's call it station1_validated. The 2nd table contains raw data. Let's call this one station1_nrt.
While the raw-data table contains live data, the validated table contains only data points that are at least 1 month old. (It needs some time to validate those data and to control it manually afterwards, this happens only once every month).
What I am trying to do now is to combine the data of those 2 tables to display live data on a website. However when validated data is available it should prioritize that data point over the raw data-point.
The relevant columns for this are:
timed [bigint(20)]: Contains the datetime as a unix timestamp in milliseconds from 1.1.1970
CO2 [double]: Contains the measured concentration of CO2 in ppm (parts per million)
I wrote this basic SQL:
SELECT
*
FROM
(SELECT
timed, CO2, '2' tab
FROM
station1_nrt
WHERE
TIMED >= 1386932400000
AND TIMED <= 1386939600000
AND TIMED NOT IN (SELECT
timed
FROM
station1_nrt
WHERE
CO2 IS NOT NULL
AND TIMED >= 1386932400000
AND TIMED <= 1386939600000) UNION SELECT
timed, CO2, '1' tab
FROM
station1_validated
WHERE
CO2 IS NOT NULL
AND TIMED >= 1386932400000
AND TIMED <= 1386939600000) a
ORDER BY timed
This does not work correctly as it selects only those data points where both tables have an entry.
However I want to do this with a JOIN now as it would be much faster. However I don't know how to a JOIN with a DISTINCT (or something similar) with prioritizing a table. Could someone help me out with this (or explain it?)
You haven't mentioned if there exist records in station1_validated which don't exist in station1_nrt so I use FULL JOIN. If all rows from station1_validated exist in station1_nrt then you can use LEFT JOIN instead.
Something like this
SELECT IFNULL(n.timed,v.timed) as timed,
CASE WHEN v.timed IS NOT NULL THEN v.CO2 ELSE n.CO2 END as CO2,
CASE WHEN v.timed IS NOT NULL THEN '1' ELSE '2' END as tab
FROM station1_nrt as n
FULL JOIN station1_validated as v ON n.timed=v.timed AND v.CO2 IS NOT NULL
WHERE
( n.TIMED between 1386932400000 AND 1386939600000
or
v.TIMED between 1386932400000 AND 1386939600000
)
AND
(n.CO2 IS NOT NULL OR v.CO2 IS NOT NULL)
MySQL has an IF that would probably work for you. You would have to select specific columns, though, but you could build the query programmatically.
SELECT
IF(DATE_SUB(NOW(), INTERVAL 1 MONTH) < FROM_UNIXTIME(nrt.TIMED),
val.value,
nrt.value
) AS value
-- Similar for other values
FROM
station1_nrt AS nrt
JOIN station1_validated AS val USING(id)
ORDER BY TIMED
Note that the USING(id) is a placeholder. Presumably there is some indexed column you can join the two tables on.
You can join and then use IFs in the fields to choose the validated values if they exist. Something like:
SELECT
IFNULL(s1val.timed,s1.timed) AS timed,
IFNULL(s1val.C02,s1.C02) AS C02,
2 AS 2,
IFNULL(s1val.tab,s1.tab) AS tab,
FROM
station1_nrt s1
LEFT JOIN station1_validated s1val ON (s1.TIMED = s1val.TIMED)
WHERE
-- Any necessary where clauses
#Jim, #valex, #ExplosionPills
I managed to write a SQL select that emulates a FULL OUTER JOIN (as there is no FULL JOIN in MySQL) and returns the value of the validated data if it exists. If no validated data is available it will return the raw value
So this is the SQL I am using now:
SET #StartTime = 1356998400000;
SET #EndTime = 1386546000000;
SELECT
timed,
IFNULL (mergedData.validatedValue, mergedData.rawValue) as value
FROM
((SELECT
from_unixtime(timed / 1000) as timed,
rawData.NOX as rawValue,
validatedData.NOX as validatedValue
FROM
nabelnrt_bas as rawData
LEFT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE
(rawData.timed > #StartTime
AND rawData.timed < #EndTime)
OR (validatedData.timed > #StartTime
AND validatedData.timed < #EndTime)
) UNION (
SELECT
from_unixtime(timed / 1000) as timed,
rawData.NOX as rawValue,
validatedData.NOX as validatedValue
FROM
nabelnrt_bas as rawData
RIGHT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE
(rawData.timed > #StartTime
AND rawData.timed < #EndTime)
OR (validatedData.timed > #StartTime
AND validatedData.timed < #EndTime)
)
ORDER BY timed DESC) as mergedData

How to get list of records from table where the time difference found using DATEDIFF function between 2 variables that are select queries themselves?

SET #startdate = (select LOG_TIME from log.time where sender='Japan' and receiver ='USA' and code=158);
SET #enddate = (select LOG_TIME from log.time where sender='Japan' and receiver ='USA' and code=189);
select * from log.time where DATEDIFF(minute, #startdate, #enddate) >= 10;
Here I want to use 2 variables (#startdate and #enddate) which are populated with multiple entries coming from the select queries used .
And for the last line , I want the select query to return a list of records where the DATEDIFF function is greater than or equal to 10 minutes by using these 2 variables with multiple values .
P.S I am using the Squirrel SQL Client 2.3 )
The issue is I have no idea if it is possible to use multiple values for variables.
Also please advise or provide any solution to the above issue such that the query works in the end.
You can't use variables this way.
Now it's hard to tell for sure not seeing your table schema and sample data but you should be able to do what you want using JOIN with a query like this
SELECT l1.*
FROM log.time l1 JOIN log.time l2
ON l1.sender = l2.sender
AND l1.receiver = l2.receiver
AND l1.code = 158
AND l2.code = 189
WHERE l1.sender = 'Japan'
AND l1.receiver = 'USA'
AND DATEDIFF(minute, l1.log_time, l2.log_time) >= 10
If you were to provide a table schema, sample data and desired output, then it'll be possible to test your query

MySQL aggregate by month with running total [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Calculate a running total in MySQL
I'm monitoring the number of users created since 2011 in an application by month for a chart using MySQL and PHP. As a part of the query I would also like to include a running total.
SELECT
DATE_FORMAT(created,'%Y%m%d') as 'Date',
COUNT(item_id) as 'NewUsers'
FROM AP_user
WHERE YEAR(created) > 2011
AND user_groups = '63655'
AND user_active = 1
AND userID NOT IN $excludedUsers
GROUP BY MONTH(created) ASC
I'm able to return the "users by month" but how do I include a running total as a part of this query?
Unfortunately, MySQL doesn't provide analytic functions, like Oracle and SQL Server do.
One way to get a "running total" is to make use of a user variable, something like this:
SELECT t.Date
, t.NewUsers
, #rt := #rt + t.NewUsers AS `Running Total`
FROM (SELECT #rt := 0) i
JOIN (
SELECT DATE_FORMAT(created,'%Y%m%d') AS `Date`
, COUNT(item_id) as `NewUsers`
FROM AP_user
WHERE YEAR(created) > 2011
AND user_groups = '63655'
AND user_active = 1
AND userID NOT IN $excludedUsers
GROUP BY DATE_FORMAT(created,'%Y-%m')
ORDER BY DATE_FORMAT(created,'%Y-%m') ASC
) t
NOTE: The behavior of memory variables like used above is not guaranteed in this context. But if we are careful with the query, we can get predictable, repeatable results in SELECT statements. The behavior of memory variables may change in a future release, rendering this approach unworkable.
NOTE: I basically wrapped your query in parentheses, and gave it an alias as an inline view (what MySQL calls a "derived table"). I made a few changes to your query, your GROUP BY has potential to group January 2012 together with January from 2013, I changed that. I also added an ORDER BY clause.