Using BETWEEN to create exclusive set of intervals - mysql

I have a database that has id and distance columns. I need to count the number of ids that are between a certain distance.
For example, I want to count how many ids are between 1km and 2km in distance, so I'm using this code:
SELECT COUNT (distance)
FROM table
WHERE distance BETWEEN 1000 AND 2000
-- Returns a COUNT of 240,600
When I want to count the number of ids that are between 2km and 3km and I use the same query, but the "2km" value is counted in both queries as the BETWEEN operator is inclusive.
SELECT COUNT (distance)
FROM table
WHERE distance BETWEEN 2000 AND 3000
-- Returns a COUNT of 353,440
As I understand it the two queries above will both include rows where the distance is exactly 2000.
I'm curious to know if there is another way to count things like distance (and there are a lot of rows) or do I need to GROUP BY And then count?

You can try this sort of thing to get a histogram table
SELECT DIV(distance, 1000) * 1000 distance_from,
(DIV(distance, 1000) * 1000) + 999 distance_to,
COUNT(*) num
FROM mytable
GROUP BY DIV(distance, 1000) * 1000
The expression DIV(distance, 1000) creates groups of distance values when you use it with GROUP BY.
As you have discovered, BETWEEN has limited usefulness for handling ranges of numbers. You may want distance >= 1000 AND distance < 2000 for the range [1000, 2000).

According to SQL specs, this expression:
WHERE distance BETWEEN 1000 AND 2000
is same as:
WHERE distance >= 1000 AND distance <= 2000
From what I understand, you need to remove = from one of the endpoints to create mutually exclusive ranges. Depending on your definition of between 1km and 2km one of these set of conditions should be used:
WHERE distance >= 0 AND distance < 1000
WHERE distance >= 1000 AND distance < 2000
WHERE distance >= 2000 AND distance < 3000
Or
WHERE distance <= 1000
WHERE distance > 1000 AND distance <= 2000
WHERE distance > 2000 AND distance <= 3000

Related

List [rows] within range of plus or minus the average of a column

I am doing a homework problem: "List each of the clients’ name and weight, for those clients who within 10 pounds of the average weight. (Do NOT use [an integer] as the average weight, you will have to use a function to find the average weight.)
I have tried several things like: "WHERE c_weight BETWEEN avg(c_weight - 10) and avg(c_weight + 10)" to no avail.
My current code compiles:
SELECT c_first, c_last, c_weight
FROM client
WHERE c_weight between 160 and 180;
The code above is just a placeholder; I know correction lies in my WHERE statement, I'm just not sure how to properly create it. Thanks for the help!
I would phrase this as:
SELECT c_first, c_last, c_weight
FROM client
WHERE c_weight - (SELECT AVG(c_weight) FROM client) BETWEEN -10 AND 10;
Note that we use a subquery to find the average weight across the entire table, and then compare each record's weight against that to see if it be within range.
To understand the logic in the WHERE clause, let's that the average weight were 150. Then the range in the WHERE clause is saying:
c_weight - 150 >= -10 AND c_weight - 150 <= 10
This is identical to saying:
c_weight >= 140 AND c_weight <= 160

Cutting SELECT query time in MySQL

I'm using CodeIgniter 2 and in my database model, I have a query that joins two tables and filters row based upon distance from a given geolocation.
SELECT users.id,
(3959 * acos(cos(radians(42.327612)) *
cos(radians(last_seen.lat)) * cos(radians(last_seen.lon) -
radians(-77.661591)) + sin(radians(42.327612)) *
sin(radians(last_seen.lat)))) AS distance
FROM users
JOIN last_seen ON users.id = last_seen.seen_id
WHERE users.age >= 18 AND users.age <= 30
HAVING distance < 50
I'm not sure if it's the distance that is making this query take especially long. I do have over 300,000 rows in my users table. The same amount in my last_seen table. I'm sure that plays a role.
But, the age column in the users table is indexed along with the id column.
The lat and lon columns in the last_seen table are also indexed.
Does anyone have ideas as to why this query takes so long and how I can improve it?
UPDATE
It turns out that this query actually runs pretty quickly. When I execute this query in PHPMyAdmin, it takes 0.56 seconds. Not too bad. But, when I try to execute this query with a third party SQL client like SequelPro, it takes at least 20 seconds and all of the other apps on my mac slow down. When the query is executed by loading the script via jQuery's load() method, it takes around the same amount of time.
Upon viewing my network tab in Google Chrome's developer tools, it seems that the reason it's taking so long to load is because of what's called TTFB or Time To First Byte. It's taking forever.
To make this query faster you need to limit the count of rows using an index before actually calculating the distance on every and each of them. To do so you can limit the rows from last_seen based on their lat/lon and a rough formula for desired distance.
The idea is that the positions with the same latitude as the reference latitude would be in 50 miles distance if their longitude falls in a certain distance from the reference longitude and vice versa.
For 50 miles distance, RefLat+-1 and RefLon+-1 would be a good start to limit the rows before actually calculating the precise distance.
last_seen.lat BETWEEN 42.327612 - 1 AND 42.327612 + 1
AND last_seen.lon BETWEEN -77.661591 - 1 AND -77.661591 + 1
For this query:
SELECT users.id, (3959 * acos(cos(radians(42.327612)) * cos(radians(last_seen.lat)) * cos(radians(last_seen.lon) - radians(-77.661591)) + sin(radians(42.327612)) * sin(radians(last_seen.lat)))) AS distance
FROM users JOIN
last_seen
ON users.id = last_seen.seen_id
WHERE users.age >= 18 AND users.age <= 30
HAVING distance < 50;
The best index is users(age, id) and last_seen(seen_id). Unfortunately, the distance calculations are going to take a while, because they have to be calculated for every row. You might want to consider a GIS extension to MySQL to help with this type of query.

mysql large table with geo-locations - find intersections

I have a large table ( > 20 millions rows ) with this structure
[ Id, IdUser (int), Latitude(double), Longitude (double), EventDateTime (datetime) ]
and I need to find all the moments where users have been in the same area( within 500 meters ).
What is the best solution for this?
First, so we don't have to write insanely complex SQL queries full of transcendental functions, let's define a stored function distance(lat1, lon1, lat2, lon2) to get ourselves a distance between two pairs of points.
DELIMITER $$
DROP FUNCTION IF EXISTS distance$$
CREATE FUNCTION distance(
lat1 FLOAT, lon1 FLOAT,
lat2 FLOAT, lon2 FLOAT
) RETURNS FLOAT
NO SQL DETERMINISTIC
COMMENT 'Returns the distance in metres on the Earth
between two known points of latitude and longitude'
BEGIN
RETURN 111045 * DEGREES(ACOS(
COS(RADIANS(lat1)) *
COS(RADIANS(lat2)) *
COS(RADIANS(lon2) - RADIANS(lon1)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))
));
END$$
DELIMITER ;
Now we need to compare pairs of items in your table to find coincidences. Let's say we want one-minute resolution on the time comparison. This query will do the trick, but take a while.
SELECT DISTINCT a.IdUser, b.IdUser,
DATE_FORMAT (a.EventDateTime, '%Y-%m-%d %H:%i:00) AS EventDateTime
FROM table a
JOIN table b
ON a.IdUser < b.IdUser /* compare different users */
AND a.EventDateTime >= b.EventDateTime - INTERVAL 1 HOUR
AND a.EventDateTime <= b.EventDateTime + INTERVAL 1 HOUR
AND distance(a.Latitude, a.Longitude, b.Latitude, b.Longitude) <= 500.0
This will work, giving a list of pairs of users and the hours in which they were near one another. But it won't be very fast.
You'll to experiment with indexes. Probably an index on (EventDateTime, IdUser) will help. You probably should experiment with this query by adding a time restriction like this...
WHERE a.EventDateTime >= CURDATE - INTERVAL 2 DAY
AND a.EventDateTime < CURDATE - INTERVAL 1 DAY
so you don't take hours to run the query.
Now, let's try to do an optimization pass over the self-join, in an attempt to cut down the use of the distance function, and to use indexes better. In order to do this, we need to know that there are ~11045m per degree of (north-south) latitude, so that 500m is 500/111045 degrees.
This query will generate pairs of observations that are within 500m north-to-south of each other, then use a WHERE clause to further eliminate points that are still too far apart. That will reduce the use of the distance function.
SELECT a.IdUser, b.IdUser,
DATE_FORMAT (a.EventDateTime, '%Y-%m-%d %H:%i:00) AS EventDateTime
FROM table a
JOIN table b
ON a.IdUser < b.IdUser /* compare different users */
AND a.EventDateTime >= b.EventDateTime - INTERVAL 1 HOUR
AND a.EventDateTime <= b.EventDateTime + INTERVAL 1 HOUR
AND a.Latitude >= b.Latitude - (500.0/111045.0)
AND a.Latitude <= b.Latitude + (500.0/111045.0)
WHERE distance(a.Latitude, a.Longitude, b.Latitude, b.Longitude) <= 500.0
It is worth trying a compound covering index on (IdUser, EventDateTime, Latitude, Longitude) to try to optimize this query.

HAVING distance Greater than and Less than in the same Query

SELECT city, (6372.797 * acos(cos(radians({$latitude})) * cos(radians(`latitude_range`)) * cos(radians(`longitude_range`) - radians({$longitude})) + sin(radians({$latitude})) * sin(radians(`latitude_range`)))) AS distance FROM cities WHERE active = 1 HAVING distance > 25 ORDER BY distance ASC
I like to be able to grab all cities HAVING a distance greater than 25KM and less than 50KM. Anything I try entering either results in all cities greater than 25KM or an error.
How does one go about adding HAVING distance > 25 AND distance <= 50 to my SQL query?
Exactly the way that you have in the question:
SELECT city, (6372.797 * acos(cos(radians({$latitude})) * cos(radians(`latitude_range`)) * cos(radians(`longitude_range`) - radians({$longitude})) + sin(radians({$latitude})) * sin(radians(`latitude_range`)))) AS distance
FROM cities
WHERE active = 1
HAVING distance > 25 and distance <= 50
ORDER BY distance ASC;
Just as a small note: the use of the having clause to filter on column aliases (like distance) is a MySQL extension. In most databases, you would have to use a subquery.

MySQL select the nearest lower value in table

I have an SQL table that stores running times and a score associated with each time on the table.
/////////////////////
/ Time * Score /
/ 1531 * 64 /
/ 1537 * 63 /
/ 1543 * 61 /
/ 1549 * 60 /
/////////////////////
This is an example of 4 rows in the table. My question is how do I select the nearest lowest time.
EXAMPLE: If someone records a time of 1548 I want to return the score for 1543 (not 1549) which is 61.
Is there an SQL query I can use to do this thank you.
Use SQL's WHERE clause to filter the records, its ORDER BY clause to sort them and LIMIT (in MySQL) to obtain only the first result:
SELECT Score
FROM my_table
WHERE Time <= 1548
ORDER BY Time DESC
LIMIT 1
See it on sqlfiddle.