Cutting SELECT query time in MySQL - mysql

I'm using CodeIgniter 2 and in my database model, I have a query that joins two tables and filters row based upon distance from a given geolocation.
SELECT users.id,
(3959 * acos(cos(radians(42.327612)) *
cos(radians(last_seen.lat)) * cos(radians(last_seen.lon) -
radians(-77.661591)) + sin(radians(42.327612)) *
sin(radians(last_seen.lat)))) AS distance
FROM users
JOIN last_seen ON users.id = last_seen.seen_id
WHERE users.age >= 18 AND users.age <= 30
HAVING distance < 50
I'm not sure if it's the distance that is making this query take especially long. I do have over 300,000 rows in my users table. The same amount in my last_seen table. I'm sure that plays a role.
But, the age column in the users table is indexed along with the id column.
The lat and lon columns in the last_seen table are also indexed.
Does anyone have ideas as to why this query takes so long and how I can improve it?
UPDATE
It turns out that this query actually runs pretty quickly. When I execute this query in PHPMyAdmin, it takes 0.56 seconds. Not too bad. But, when I try to execute this query with a third party SQL client like SequelPro, it takes at least 20 seconds and all of the other apps on my mac slow down. When the query is executed by loading the script via jQuery's load() method, it takes around the same amount of time.
Upon viewing my network tab in Google Chrome's developer tools, it seems that the reason it's taking so long to load is because of what's called TTFB or Time To First Byte. It's taking forever.

To make this query faster you need to limit the count of rows using an index before actually calculating the distance on every and each of them. To do so you can limit the rows from last_seen based on their lat/lon and a rough formula for desired distance.
The idea is that the positions with the same latitude as the reference latitude would be in 50 miles distance if their longitude falls in a certain distance from the reference longitude and vice versa.
For 50 miles distance, RefLat+-1 and RefLon+-1 would be a good start to limit the rows before actually calculating the precise distance.
last_seen.lat BETWEEN 42.327612 - 1 AND 42.327612 + 1
AND last_seen.lon BETWEEN -77.661591 - 1 AND -77.661591 + 1

For this query:
SELECT users.id, (3959 * acos(cos(radians(42.327612)) * cos(radians(last_seen.lat)) * cos(radians(last_seen.lon) - radians(-77.661591)) + sin(radians(42.327612)) * sin(radians(last_seen.lat)))) AS distance
FROM users JOIN
last_seen
ON users.id = last_seen.seen_id
WHERE users.age >= 18 AND users.age <= 30
HAVING distance < 50;
The best index is users(age, id) and last_seen(seen_id). Unfortunately, the distance calculations are going to take a while, because they have to be calculated for every row. You might want to consider a GIS extension to MySQL to help with this type of query.

Related

MySQL - group by interval query optimisation

Some background first. We have a MySQL database with a "live currency" table. We use an API to pull the latest currency values for different currencies, every 5 seconds. The table currently has over 8 million rows.
Structure of the table is as follows:
id (INT 11 PK)
currency (VARCHAR 8)
value (DECIMAL
timestamp (TIMESTAMP)
Now we are trying to use this table to plot the data on a graph. We are going to have various different graphs, e.g: Live, Hourly, Daily, Weekly, Monthly.
I'm having a bit of trouble with the query. Using the Weekly graph as an example, I want to output data from the last 7 days, in 15 minute intervals. So here is how I have attempted it:
SELECT *
FROM currency_data
WHERE ((currency = 'GBP')) AND (timestamp > '2017-09-20 12:29:09')
GROUP BY UNIX_TIMESTAMP(timestamp) DIV (15 * 60)
ORDER BY id DESC
This outputs the data I want, but the query is extremely slow. I have a feeling the GROUP BY clause is the cause.
Also BTW I have switched off the sql mode 'ONLY_FULL_GROUP_BY' as it was forcing me to group by id as well, which was returning incorrect results.
Does anyone know of a better way of doing this query which will reduce the time taken to run the query?
You may want to create summary tables for each of the graphs you want to do.
If your data really is coming every 5 seconds, you can attempt something like:
SELECT *
FROM currency_data cd
WHERE currency = 'GBP' AND
timestamp > '2017-09-20 12:29:09' AND
UNIX_TIMESTAMP(timestamp) MOD (15 * 60) BETWEEN 0 AND 4
ORDER BY id DESC;
For both this query and your original query, you want an index on currency_data(currency, timestamp, id).

Get next and previous records in an ordered list

I'm currently using PHP to parse a sorted list of records and get the previous and next record of a given record. My DB has grown a lot and the query is taking too much time.
I'd like to change the query so that it only returns the previous and next records and also takes less time. Maybe I need to add an index.
Here's the current query :
SELECT id
FROM events
WHERE latitude BETWEEN 48.772559580567 AND 48.952423619433
AND longitude BETWEEN 2.2154977888952 AND 2.4889022111048
ORDER
BY start_time ASC
, id
I have to sort by start_time & id since several events can start at the same time. I have to use a sort so that the list is constant and next & prev buttons always use the same algorithm.
This currently takes between 0.1 and 0.4 seconds which is way too long. And then I need to parse in PHP the 6000 returned rows to get the next and previous records.
I've read many similar topics but none of them seemed to address speed. Mostly I've found "min(id) FROM ... WHERE recordid > targetid UNION max(id) FROM ... WHERE recordid < targetid" which means doubling the 0.1 to 0.4 seconds.
Schema of the DB :
id int(11)
latitude double
longitude double
start_time timestamp
indexes :
id PRIMARY
latitude
longitude
latitude+longitude

Speed up SQL SELECT with arithmetic and geometric calculations

This is a follow-up to my previous post How to improve wind data SQL query performance.
I have expanded the SQL statement to also perform the first part in the calculation of the average wind direction using circular statistics. This means that I want to calculate the average of the cosines and sines of the wind direction. In my PHP script, I will then perform the second part and calculate the inverse tangent and add 180 or 360 degrees if necessary.
The wind direction is stored in my table as voltages read from the sensor in the field 'dirvolt' so I first need to convert it to radians.
The user can look at historical wind data by stepping backwards using a pagination function, hence the use of LIMIT which values are set dynamically in my PHP script.
My SQL statement currently looks like this:
SELECT ROUND(AVG(speed),1) AS speed_mean, MAX(speed) as speed_max,
MIN(speed) AS speed_min, MAX(dt) AS last_dt,
AVG(SIN(2.04*dirvolt-0.12)) as dir_sin_mean,
AVG(COS(2.04*dirvolt-0.12)) as dir_cos_mean
FROM table
GROUP BY FLOOR(UNIX_TIMESTAMP(dt) / 300)
ORDER BY FLOOR(UNIX_TIMESTAMP(dt) / 300) DESC
LIMIT 0, 72
The query takes about 3-8 seconds to run depending on what value I use to group the data (300 in the code above).
In order for me to learn, is there anything I can do to optimize or improve the SQL statement otherwise?
SHOW CREATE TABLE table;
From that I can see if you already have INDEX(dt) (or equivalent). With that, we can modify the SELECT to be significantly faster.
But first, change the focus from 72*300 seconds worth of readings to datetime ranges, which is 6(?) hours.
Let's look at this query:
SELECT * FROM table
WHERE dt >= '...' - INTERVAL 6 HOUR
AND dt < '...';
The '...' would be the same datetime in both places. Does that run fast enough with the index?
If yes, then let's build the final query using that as a subquery:
SELECT FORMAT(AVG(speed), 1) AS speed_mean,
MAX(speed) as speed_max,
MIN(speed) AS speed_min,
MAX(dt) AS last_dt,
AVG(SIN(2.04*dirvolt-0.12)) as dir_sin_mean,
AVG(COS(2.04*dirvolt-0.12)) as dir_cos_mean
FROM
( SELECT * FROM table
WHERE dt >= '...' - INTERVAL 6 HOUR
AND dt < '...'
) AS x
GROUP BY FLOOR(UNIX_TIMESTAMP(dt) / 300)
ORDER BY FLOOR(UNIX_TIMESTAMP(dt) / 300) DESC;
Explanation: What you had could not use an index, hence had to scan the entire table (which is getting bigger and bigger). My subquery could use an index, hence was much faster. The effort for my outer query was not "too bad" since it worked with only N rows.

query optimization with multiple sub-queries

I want to retrieve data
number of meters in this month, minus the number of meters in the previous month,
and the value of the meter is deducted in accordance with their respective codes.
then summed the whole.
there are about 8000 records.
but I try to take 5 records, and it takes time 2:53 sec,
100 records takes time (1 min 1:57 sec).
really matter .
I have query like this.
SELECT code hvCode,
IFNULL( (SELECT meter
FROM bmrpt
WHERE waktu_foto LIKE '2014-05%'
GROUP BY code HAVING code = hvCode),0 )
-IFNULL( (SELECT meter
FROM bmrpt WHERE waktu_foto LIKE '2014-04%'
GROUP BY code HAVING code = hvCode),0 )hasil
FROM bmrpt group by code;
does anybody have an idea to change the query to be optimized?
this the sqlfiddle http://www.sqlfiddle.com/#!2/495c0/1
best regards
though your question is unclear but try below subquery , as what I understand
SELECT COALESCE(SUM(`meter`),0) FROM table WHERE code ='hvCode' AND MONTH(`date_column`) = 5
-
SELECT COALESCE(SUM(`meter`),0) FROM table WHERE code ='hvCode' AND MONTH(`date_column`) = 4

subtract the data for every 5 minutes between two particular times

I have some problem with MYSQL,I need to subtract the data between two particular times,for every 5 minutes and then average it the 5 minutes data.
What I am doing now is:
select (avg(columnname)),convert((min(datetime) div 500)*500, datetime) + INTERVAL 5 minute as endOfInterval
from Databasename.Tablename
where datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:50:00'
group by datetime div 500;
It is the cumulative average.
Suppose i get 500 at 11 o' clock and 700 at 11.05 ,the average i need is (700-500)/5 = 40.
But now i am getting (500+700)/5 = 240.
I dont need the cumulative average .
Kindly help me.
For the kind of average you're talking about, you don't want to aggregate multiple rows using a GROUP BY clause. INstead, you want to compute your result using exactly two diffrent rows from the same table. This calls for a self-join:
SELECT (b.columnname - a.columnname)/5, a.datetime, b.datetime
FROM Database.Tablename a, Database.Tablename b
WHERE b.datetime = a.datetime + INTERVAL 5 MINUTE
AND a.datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:45:00'
a and b refer to two different rows of the same table. The WHERE clause ensures that they are exactly 5 minutes apart.
If there is no second column matching that temporal distance, no resulting row will be included in the query result. If your table doesn't have data points exactly every five minutes, but you have to search for the suitable partner instead, then things become much more difficult. This answer might perhaps be adjusted for that use case. Or you might implement this at the application level, instead of on the database server.