I want to weekly update a field in a MySQL table "Persons", with the avg of two fields of the "Tasks" table, end_date and start_date:
PERSON:
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| average_speed | int(11) | NO | | 0 | |
+----------------+-------------+------+-----+---------+-------+
TASKS:
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| person_id | int(11) | NO | | NULL | |
| start_date | date | NO | | NULL | |
| end_date | date | NO | | NULL | |
+----------------+-------------+------+-----+---------+-------+
(tables are not complete).
average_speed = AVG(task.end_date - task.start_date)
Now, the Tasks table is really big, and ** I don't want to compute the average on every task for every person every week**. (That's a solution, but I'm trying to avoid it).
What's the best way to update the average_speed?
I thought about adding two columns in the person's table:
"last_count": count of computed tasks since now for each person
"last_sum": last sum of (end_date - start_date) for each person
So that on a new update i could do something like average_speed = (last_sum+new_sum) / (last_count + new_count) where new_count is the sum of the tasks in the last week.
Is there a better solution/architecture?
EDIT:
to answer a comment, the query I would do is something like this:
SELECT
count(t.id) as last_count,
sum(TIMESTAMPDIFF(MINUTE, t.start_date, t.end_date)) as last_sum
avg(TIMESTAMPDIFF(MINUTE, t.start_date, t.end_date))
from tasks as t
where t.end_date BETWEEN DATE_SUB(CURDATE(), INTERVAL 1 WEEK) AND CURDATE()
And i can rely on a php script to get result and do some calculations
Having a periodic update to the table is a bad way to go for all the reasons you've listed above, and others.
If you have access to the code that writes to the Tasks table, that’s the best place to put the update. Add an Average field and calculate and set the value when you write the task end time.
If you don’t have access to the code, you can add a calculated field to the table that shows the average and let SQL figure it out during the execution of a query. This can slow queries down a little, but the data is always valid and SQL is smart enough to only calculate that value when it is needed.
A third (ugly) option is a trigger on the table that updates the value when appropriate. I’m not a fan of triggers because they hide business logic in unexpected places, but sometimes you just have to get the job done.
Related
My SQL knowledge is rather weak and I come from procedural programming, so bear with me. I have a database that contains data from a weather station - these are collected each minute and the (important part of the) table is
MariaDB [weather]> describe readings;
+------------------+------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------+------+-----+-------------------+-------+
| time | timestamp | NO | PRI | CURRENT_TIMESTAMP | |
| inside_temp | float | YES | | NULL | |
| outside_temp | float | YES | | NULL | |
+------------------+------------+------+-----+-------------------+-------+
I want to find all days where the outside_temp was not lower and not larger than some values.
I can code it externally using MySQL for queries like
select min(outside_temp), max(outside_temp) from readings where date(time)='2022-01-27';
and iterating over all days in the database to check temperature values for each day separately, but I wonder if it is possible to do the selection just using MySQL command (I suppose it is, just beyond my imagination).
Something like select date(time), min(outside_temp), max(outside_temp) from readings group by date(time); would give you all timestamps that meet the requirements
I have a table rate with the following structure (approximate):
CREATE TABLE `rate` (
`id` int(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`from` date NOT NULL,
`to` date NOT NULL
)
And an (approximately) identical table stop_sale:
CREATE TABLE `stop_sale` (
`id` int(11) PRIMARY KEY NOT NULL AUTO_INCREMENT,
`from` date NOT NULL,
`to` date NOT NULL
)
Considering that, for each one, their time interval is the range of days they cover between their respective from and to fields:
I want to query these tables together in such a way that the time intervals do not overlap, but instead adjust so that stop_sale takes priority.
Example
rates
| id | from | to |
| 1 | "2018-01-05" | "2018-01-31" |
| 2 | "2018-02-01" | "2018-02-15" |
stop_sale
| id | from | to |
| 1 | "2018-01-11" | "2018-01-20" |
| 2 | "2018-02-01" | "2018-02-10" |
Desired Result
| rate_id | from | to |
| 1 | "2018-01-05" | "2018-01-10" |
| 0 | "2018-01-11" | "2018-01-20" |
| 1 | "2018-01-21" | "2018-01-31" |
| 0 | "2018-02-01" | "2018-02-10" |
| 2 | "2018-02-11" | "2018-02-15" |
Notice how rate with id=1 gets split into two records based on the time interval of stop_rate with id=1 (Note: ids are not important, just the time intervals).
In other words, stop_sale time intervals perform a subtraction operation upon the time intervals of rate, and are also painted with the final result set.
Is this possible with SQL? And MySQL?
If so, how optimal a query is it? Is it better to handle this operation in PHP?
As far as I know there is no way to do this with just a SQL query. This could be solved iteratively within a stored function, however there is no clean option to return the data. The best being returning a delimited string of the data.
An alternative is to build a store procedure that populates a table of the result data periodically and have php query against that. The basic logic would be:
pass in starting data value - starting date.
create a temp table with query:
select * from rates
where from >= starting_date
union
select * from stop_sale
where from >= starting_date
order by from asc
Iterated through temp table.
get first 'from' value.
get next greater 'to' value
look for 'from' value < current 'to' value and > current 'from' value but != 'from' values already in the results table being populated
if found insert current 'from' value and 'to' value -1day into results table being populated
else insert current 'from' value and 'to' value into results table being populated
This basic logic could be done in php, although it may be more complicated since you would need to build an array of the rows returned from the above temp table query, run the logic on it, and build out an array of results. This would be less efficient than running the stored procedure as once the stored procedure is run you would only need to run it again on the data newer than that run. Hence the starting_data parameter.
I have a table with over then 50kk rows.
trackpoint:
+----+------------+-------------------+
| id | created_at | tag |
+----+------------+-------------------+
| 1 | 1484407910 | visitorDevice643 |
| 2 | 1484407913 | visitorDevice643 |
| 3 | 1484407916 | visitorDevice643 |
| 4 | 1484393575 | anonymousDevice16 |
| 5 | 1484393578 | anonymousDevice16 |
+----+------------+-------------------+
where 'created_at' is a timestamp of row added.
and i have a list of timestamps, for example like this one:
timestamps = [1502744400, 1502830800, 1502917200]
I need to select all timestamp in every interval between i and i+1 of timestamp.
Using Django ORM it's look like:
step = 86400
for ts in timestamps[:-1]:
trackpoint_set.filter(created_at__gte=ts,created_at__lt=ts + step).values('tag').distinct().count()
Because of actually timestamps list is very very longer and table has many of rows, finally i getting 500 time-out
So, my question is, how to for it in ONE raw SQL query join rows and list of values, so it looks like [(1502744400, 650), (1502830800, 1550)...]
Where second first value is timestamp, and the second is count of unique tags in each interval.
First index created_at. Next build query like created_at in (timestamp, timestamp+1). For each timestamp, run the query one by one rather than all at once.
I have two tables one called slotLength and one called schedule. Here are their descriptions:
+------------+------+------+-----+----------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------+------+-----+----------+-------+
| slotLength | time | NO | PRI | 00:00:00 | |
+------------+------+------+-----+----------+-------+
+-----------+-------------+------+-----+----------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+----------+----------------+
| dayId | int(1) | NO | PRI | NULL | auto_increment |
| dayName | varchar(15) | YES | | NULL | |
| startHour | time | YES | | 08:00:00 | |
+-----------+-------------+------+-----+----------+----------------+
I know that they do not have the best design, but I am still learning. Also, they are just an experiment so please ignore the mistakes in their desigin.
Before I proceed, let's assume that slotLenght table contains just one row and let's call it's value slotSize.
I want to compute startHour + slotSize * n, where startHour represents values from startHour column in schedule table and n is a number. To be more specific, let's see an example,
if slotSize is "01:00:00" and a value from startHour is "09:00:00", then startHour +
slorSize * 3 I expect to have the value "12:00:00".
How the query should look like? Thank you.
I'm not sure you can multiply against a time field in SQL. Adding and subtracting is fairly easy, but multiplication would get a bit more complicated. You can use ADDTIME() for part of the equation, but I'm unsure of the multiplication part. Some heavier mathematics might have to come into play for that.
Here is a list of MySQL time functions
http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html
EDIT:
Perhaps something along the lines of converting to seconds, multiplying, and then converting back would work. I'm not overly familiar with MySQL so hopefully someone will give you the query your looking for, but this should get you started to try on your own.
SELECT ADDTIME(startHour, SEC_TO_TIME(TIME_TO_SEC(slotSize) * 3)) AS Calculated
FROM Table, OtherTable
Not sure if ^ Table is needed so if it works without it forget if not throw it in there to select startHour.
I believe this is close to what your looking for. As I said being unfamiliar with MySQL errors most likely lie in wait especially with the ADDTIME() function as I believe it requires the date portion to work.
I am a total SQL noob; sorry.
I have a table
mysql> describe activity;
+--------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| user | text | NO | | NULL | |
| time_stamp | int(11) | NO | | NULL | |
| activity | text | NO | | NULL | |
| item | int | NO | | NULL | |
+--------------+---------+------+-----+---------+-------+
Normally activity is a two-step process; 1) "check out" and 2 "use"
An item cnnot be checked out a second time, unless used.
Now I want to find any cases where an item was checked out but not used.
Being dumb, I would use two selects, one for check out &one for use, on the same item, then compare the timestamps.
Is there a SELECT statemnt that will help me selct the items which were checked out but not used?
Tricky with the possibility of multipel checkouts. Or should I just code
loop over activity, form oldest until newset
if find a checkout and there is no newer used time then i have a hit
You could get the last date of each checkout or use and then compare them per item:
SELECT MAX(IF(activity='check out', time_stamp, NULL)) AS last_co,
MAX(IF(activity='use', time_stamp, NULL)) AS last_use
FROM activity
GROUP BY item
HAVING NOT(last_use >= last_co);
The NOT(last_use >= last_co) is written that way because of how NULL compare behaviour works: last_use < last_co will not work if last_use is null.
Without proper indexing, this query will not perform very well though. Plus you might want to bound the query using a WHERE condition.