mysql large table with geo-locations - find intersections - mysql

I have a large table ( > 20 millions rows ) with this structure
[ Id, IdUser (int), Latitude(double), Longitude (double), EventDateTime (datetime) ]
and I need to find all the moments where users have been in the same area( within 500 meters ).
What is the best solution for this?

First, so we don't have to write insanely complex SQL queries full of transcendental functions, let's define a stored function distance(lat1, lon1, lat2, lon2) to get ourselves a distance between two pairs of points.
DELIMITER $$
DROP FUNCTION IF EXISTS distance$$
CREATE FUNCTION distance(
lat1 FLOAT, lon1 FLOAT,
lat2 FLOAT, lon2 FLOAT
) RETURNS FLOAT
NO SQL DETERMINISTIC
COMMENT 'Returns the distance in metres on the Earth
between two known points of latitude and longitude'
BEGIN
RETURN 111045 * DEGREES(ACOS(
COS(RADIANS(lat1)) *
COS(RADIANS(lat2)) *
COS(RADIANS(lon2) - RADIANS(lon1)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))
));
END$$
DELIMITER ;
Now we need to compare pairs of items in your table to find coincidences. Let's say we want one-minute resolution on the time comparison. This query will do the trick, but take a while.
SELECT DISTINCT a.IdUser, b.IdUser,
DATE_FORMAT (a.EventDateTime, '%Y-%m-%d %H:%i:00) AS EventDateTime
FROM table a
JOIN table b
ON a.IdUser < b.IdUser /* compare different users */
AND a.EventDateTime >= b.EventDateTime - INTERVAL 1 HOUR
AND a.EventDateTime <= b.EventDateTime + INTERVAL 1 HOUR
AND distance(a.Latitude, a.Longitude, b.Latitude, b.Longitude) <= 500.0
This will work, giving a list of pairs of users and the hours in which they were near one another. But it won't be very fast.
You'll to experiment with indexes. Probably an index on (EventDateTime, IdUser) will help. You probably should experiment with this query by adding a time restriction like this...
WHERE a.EventDateTime >= CURDATE - INTERVAL 2 DAY
AND a.EventDateTime < CURDATE - INTERVAL 1 DAY
so you don't take hours to run the query.
Now, let's try to do an optimization pass over the self-join, in an attempt to cut down the use of the distance function, and to use indexes better. In order to do this, we need to know that there are ~11045m per degree of (north-south) latitude, so that 500m is 500/111045 degrees.
This query will generate pairs of observations that are within 500m north-to-south of each other, then use a WHERE clause to further eliminate points that are still too far apart. That will reduce the use of the distance function.
SELECT a.IdUser, b.IdUser,
DATE_FORMAT (a.EventDateTime, '%Y-%m-%d %H:%i:00) AS EventDateTime
FROM table a
JOIN table b
ON a.IdUser < b.IdUser /* compare different users */
AND a.EventDateTime >= b.EventDateTime - INTERVAL 1 HOUR
AND a.EventDateTime <= b.EventDateTime + INTERVAL 1 HOUR
AND a.Latitude >= b.Latitude - (500.0/111045.0)
AND a.Latitude <= b.Latitude + (500.0/111045.0)
WHERE distance(a.Latitude, a.Longitude, b.Latitude, b.Longitude) <= 500.0
It is worth trying a compound covering index on (IdUser, EventDateTime, Latitude, Longitude) to try to optimize this query.

Related

How to SELECT all rows within a certain date/time range with a certain timestamp step size in MySQL?

I have a table that contains sensor data with a column timestamp that holds the unix timestamp of the time the sensor measurement has been taken.
Now I would like to SELECT all measurements within a certain date/time range with a specific time step.
I figured the first part out myself like you can see in my posted code snippet below.
// With $date_start and $date_stop in the format: '2010-10-01 12:00:00'
$result = mysqli_query($connection, "SELECT sensor_1
FROM sensor_table
WHERE timestamp >= UNIX_TIMESTAMP($date_start)
AND timestamp < UNIX_TIMESTAMP($date_stop)
ORDER BY timestamp");
Now is there a convenient way in MySQL to include a time step size into the same SELECT query?
My table contains thousands of measurements over months with one measurement taken every 5 seconds.
Now let's say I would like to SELECT measurements in between 2010-10-01 12:00:00 and 2010-10-02 12:00:00 but in this date/time range only SELECT one measurement every 10 minutes? (as my table contains measurements taken every 5 seconds).
Any smart ideas how to solve this in a single query?
(also other ideas are very welcome :))
Since you take one measurement every 5 seconds, the difference between $date_start and the first matching measurement cannot be greater than 4. We then take one entry every 600 seconds (allowing for some discrepancy from clock to clock...)
SELECT sensor_1
FROM sensor_table
WHERE timestamp >= UNIX_TIMESTAMP($date_start)
AND
timestamp < UNIX_TIMESTAMP($date_stop)
AND
((timestamp - UNIX_TIMESTAMP($date_start)) % 600) BETWEEN 0 AND 4
ORDER BY timestamp;
It is not elegant, but you can do:
SELECT s.sensor_1
FROM sensor_table s
WHERE s.timestamp >= UNIX_TIMESTAMP($date_start) AND
s.timestamp < UNIX_TIMESTAMP($date_stop) AND
s.timestamp = (SELECT MIN(s2.timestamp)
FROM sensor_table s2
WHERE s2.timestamp >= 60 * 10 * FLOOR(UNIX_TIMESTAMP(s.timestamp) / (60 * 10)) AND
s2.timestamp < s2.timestamp >= 60 * 10 * (1 + FLOOR(UNIX_TIMESTAMP(s.timestamp) / (60 * 10)))
)
ORDER BY timestamp;
This selects the first in each 10 minute period.
I think that you could use a simple cursor in plSQL
CREATE TABLE StoreValuesId
(
valueId int primary key;
)
CREATE OR REPLACE procedure_store[date_start date,date_stop date]
DECLARE date_startUpdated date , date_stopUpdated date , date_diff TIME(7) = '00:10:00'
IS
BEGIN
SELECT date_start INTO date_startUpdated;
SELECT date_stop INTO date_stopUpdated;
IF timestamp BETWEEN date_start and date_stop then
INSERT INTO StoreValuesId values(timestamp)
date_startUpdated=DATEADD(SECOND, DATEDIFF(SECOND, 0, date_diff), date_startUpdated);
date_stopUpdated=DATEADD(SECOND, DATEDIFF(SECOND, 0, date_diff), date_stopUpdated);
END IF
COMMIT;
END
Then again the syntax might be wrong but I hope you'll get the idea (haven't played with sql in a while)

stored dates in unixtime (INT 10) check interval

I have a table to store data for car rental purposes (reservations awaiting response). The is a field called 'initiated_datime' (INT, 10) where the initiated datime of the reservation is stored (I already know that INT 10 is not the most efficient way, 2038 etc.).
I would like to view reservations awaiting response more than 24 hours for further reporting...
The following examples work for me
SELECT * FROM rentals WHERE rental_flag = 1 AND '".$cur_datetime_unixtime."' >(initiated_datime + 86400) ... WHERE $cur_datetime_unixtime is created in php
AND
SELECT * FROM rentals WHERE rental_flag = 1 AND unix_timestamp(now()) > (initiated_datime + 86400)
is there any way to change (initiated_datime + 86400) with a more efficient code? like: unix_timestamp(initiated_datime + 1 day interval)?
Thank you in advance!
Not in the general case; but in this instance, that inequality is equivalent to:
SELECT * FROM rentals WHERE rental_flag = 1
AND unix_timestamp(now() - INTERVAL 1 DAY) > initiated_datime
The most efficient way to fix this is to change your columns to a datetime. You can than easily use date/time functions to manipulate your filter.

Postgres View / Function

I am trying to create a view which joins 2 tables but the results are dependent on the query. Here is a example of what I want to achieve.
I have a table called sessionaccounting and a table called sessionaccountingdailysplit.
sessionaccounting contains all our accounting data and sessionaccountingdailysplit is the sessionaccounting data split by date. They are joined by the foreign key sessionaccountingid
How the two tables work in unison is as follows:
for the row in sessionaccounting :
starttime - '2012-01-01', endtime - '2012-01-03', usage - 10000
for the rows in sessionaccountingdailysplit :
date - '2012-01-01', usage - 3000
date - '2012-01-02', usage - 5000
date - '2012-01-03', usage - 2000
Now what I want to do is if I run a view called vw_sessionaccounting as
SELECT *
FROM vw_sessionaccounting
WHERE starttime >= '2011-01-01' AND starttime <= '2011-01-02';
it must only sum the first two dates from sessionaccountingdailysplit and replace the usage in sessionaccounting accordingly for each effected row. (most cases sessionaccountingdailysplit wont have a row as there was no split)
So as above if I run
SELECT *
FROM sessionaccounting
WHERE starttime >= '2011-01-01' AND starttime <= '2011-01-02';
I will get the result of
starttime - '2012-01-01', endtime - '2012-01-03', usage - 10000
but if I run the query
SELECT *
FROM vw_sessionaccounting
WHERE starttime >= '2011-01-01'
AND starttime <= '2011-01-02';
I will get the result of
starttime - '2012-01-01', endtime - '2012-01-03', usage - 8000
Your question is a bit vague in several respects. But from what I gather and guess, your query (view) could look like this:
SELECT s.starttime
,s.endtime
,COALESCE(max(sd.date), s.endtime) AS effective_endtime_max_2_days
,COALESCE(sum(sd.usage), s.usage) AS usage_max_2_days
FROM sessionaccounting s
LEFT JOIN sessionaccountingdailysplit sd USING (sessionaccountingid)
WHERE sd.sessionaccountingid IS NULL -- no split ..
OR (starttime + 2) > sd.date -- .. or only the first two days of the split
GROUP BY s.starttime, s.endtime
Major points
Use a LEFT JOIN because:
... most cases sessionaccountingdailysplit wont have a row as there was no split
Only include the first two days: (starttime + 2) > sd.date
Be sure to include sessions without spit: WHERE sd.sessionaccountingid IS NULL OR
Use table aliases to get your monstrous table names out of the way:
FROM sessionaccounting s
sum() the usage for the first two days. If there was no spit take the original total usage instead: COALESCE(sum(sd.usage), s.usage) AS usage_max_2_days

MySQL: Average interval between records

Assume this table:
id date
----------------
1 2010-12-12
2 2010-12-13
3 2010-12-18
4 2010-12-22
5 2010-12-23
How do I find the average intervals between these dates, using MySQL queries only?
For instance, the calculation on this table will be
(
( 2010-12-13 - 2010-12-12 )
+ ( 2010-12-18 - 2010-12-13 )
+ ( 2010-12-22 - 2010-12-18 )
+ ( 2010-12-23 - 2010-12-22 )
) / 4
----------------------------------
= ( 1 DAY + 5 DAY + 4 DAY + 1 DAY ) / 4
= 2.75 DAY
Intuitively, what you are asking should be equivalent to the interval between the first and last dates, divided by the number of dates minus 1.
Let me explain more thoroughly. Imagine the dates are points on a line (+ are dates present, - are dates missing, the first date is the 12th, and I changed the last date to Dec 24th for illustration purposes):
++----+---+-+
Now, what you really want to do, is evenly space your dates out between these lines, and find how long it is between each of them:
+--+--+--+--+
To do that, you simply take the number of days between the last and first days, in this case 24 - 12 = 12, and divide it by the number of intervals you have to space out, in this case 4: 12 / 4 = 3.
With a MySQL query
SELECT DATEDIFF(MAX(dt), MIN(dt)) / (COUNT(dt) - 1) FROM a;
This works on this table (with your values it returns 2.75):
CREATE TABLE IF NOT EXISTS `a` (
`dt` date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `a` (`dt`) VALUES
('2010-12-12'),
('2010-12-13'),
('2010-12-18'),
('2010-12-22'),
('2010-12-24');
If the ids are uniformly incremented without gaps, join the table to itself on id+1:
SELECT d.id, d.date, n.date, datediff(d.date, n.date)
FROM dates d
JOIN dates n ON(n.id = d.id + 1)
Then GROUP BY and average as needed.
If the ids are not uniform, do an inner query to assign ordered ids first.
I guess you'll also need to add a subquery to get the total number of rows.
Alternatively
Create an aggregate function that keeps track of the previous date, and a running sum and count. You'll still need to select from a subquery to force the ordering by date (actually, I'm not sure if that's guaranteed in MySQL).
Come to think of it, this is a much better way of doing it.
And Even Simpler
Just noting that Vegard's solution is much better.
The following query returns correct result
SELECT AVG(
DATEDIFF(i.date, (SELECT MAX(date)
FROM intervals WHERE date < i.date)
)
)
FROM intervals i
but it runs a dependent subquery which might be really inefficient with no index and on a larger number of rows.
You need to do self join and get differences using DATEDIFF function and get average.

Mysql: Select all data between two dates

I have a mysql table with data connected to dates. Each row has data and a date, like this:
2009-06-25 75
2009-07-01 100
2009-07-02 120
I have a mysql query that select all data between two dates. This is the query:
SELECT data FROM tbl WHERE date BETWEEN date1 AND date2
My problem is that I also need to get the rows between date1 and date2 even if there is no data for a day.
So my query would miss the dates that are empty between 2009-06-25 and 2009-07-01.
Can I in some way add these dates with just 0 as data?
You can use a concept that is frequently referred to as 'calendar tables'. Here is a good guide on how to create calendar tables in MySql:
-- create some infrastructure
CREATE TABLE ints (i INTEGER);
INSERT INTO ints VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);
-- only works for 100 days, add more ints joins for more
SELECT cal.date, tbl.data
FROM (
SELECT '2009-06-25' + INTERVAL a.i * 10 + b.i DAY as date
FROM ints a JOIN ints b
ORDER BY a.i * 10 + b.i
) cal LEFT JOIN tbl ON cal.date = tbl.date
WHERE cal.date BETWEEN '2009-06-25' AND '2009-07-01';
You might want to create table cal instead of the subselect.
Select * from emp where joindate between date1 and date2;
But this query not show proper data.
Eg
1-jan-2013 to 12-jan-2013.
But it's show data
1-jan-2013 to 11-jan-2013.
its very easy to handle this situation
You can use BETWEEN CLAUSE in combination with date_sub( now( ) , INTERVAL 30 DAY )
AND NOW( )
SELECT
sc_cust_design.design_id as id,
sc_cust_design.main_image,
FROM
sc_cust_design
WHERE
sc_cust_design.publish = 1
AND **`datein`BETWEEN date_sub( now( ) , INTERVAL 30 DAY ) AND NOW( )**
Happy Coding :)
Do you have a table that has all dates? If not, you might want to consider implementing a calendar table and left joining your data onto the calendar table.
IF YOU CAN AVOID IT.. DON'T DO IT
Databases aren't really designed for this, you are effectively trying to create data (albeit a list of dates) within a query.
For anyone who has an application layer above the DB query the simplest solution is to fill in the blank data there.
You'll more than likely be looping through the query results anyway and can implement something like this:
loop_date = start_date
while (loop_date <= end_date){
if(loop_date in db_data) {
output db_data for loop_date
}
else {
output default_data for loop_date
}
loop_date = loop_date + 1 day
}
The benefits of this are reduced data transmission; simpler, easier to debug queries; and no worry of over-flowing the calendar table.
you must add 1 day to the end date, using: DATE_ADD('$end_date', INTERVAL 1 DAY)
You can use as an alternate solution:
SELECT * FROM TABLE_NAME WHERE `date` >= '1-jan-2013'
OR `date` <= '12-jan-2013'