mysql db for time-temperature values - mysql

I need your help to build my db the right way.
I need to store time-temperature values for different rooms of my house
and I want to use DyGraph to graph the data sets.
I want to implement different time windows: 1 hour, 24 hours, 48 hours,
1 week, ....
I will be detecting the temperature with a 15 minutes interval, so I will have 4 time-temperature values per hour.
Each room has an ID so the time-temperature values will be associated
to the proper room.
The table I built i very simple:
----------------------------------
| ID | DATE | TEMP |
----------------------------------
| 1 |2014-04-30 00:00:00 | 18.6 |
----------------------------------
| 2 |2014-04-30 00:00:00 | 18.3 |
----------------------------------
| 3 |2014-04-30 00:00:00 | 18.3 |
----------------------------------
| 1 |2014-04-30 00:15:00 | 18.5 |
----------------------------------
For some strange reason, when the number of rows gets to 500 or so,the
server becomes very slow.
Also, I have a web page were I can read the different temperatures of
the rooms: this page polls the server through AJAX every 5 seconds (because it needs
to be frequently updated!), but when the number of rows of the table
gets around 500, it hangs.
I tried to split the table and I created a table for each room, then a
table for each time-window and now everything seems to be working fine.
Since I do not think this is the best/most efficient way to organize
this thing, I need your help to give it a better structure.
I use a php script to retrieve the temperature data for all the rooms of my house:
$query = "SELECT * FROM temperature t1
WHERE (id, date) IN
(SELECT id,MAX(date) FROM
temperature t2 GROUP BY id)";
this query allows me to collect the temperature values in an array called $options:
$result_set = mysql_query($query, $connection);
while($rows = mysql_fetch_array($result_set)){
$options [] = $rows;
}
then, I json-encode the array:
$j = json_encode($options);
and send it to the ajax script, which shows the data on the web page:
echo $j;
In the ajax script, I save the data in a variable and then parse it:
var return_data = xhr.responseText;
var temperature = JSON.parse(return_data);
next I loop through the array to extract the temperature values and put it in the right place on the web page:
for(var j=0; j<temperature.length; j++){
document.getElementById("TEMPArea" + j).innerHTML = temperature[j].temp + "°C";
}
This works fine as long as the rows in the 'temperature' table are less than 600 or so: polling every 5 seconds is not a problem.
Above 600, the page refresh gets slow and eventually it hangs and stops refreshing.
EDIT: Right now, I am working on a virtual machine with Windows 7 64bit, Apache, PHP and MySQL, 4GB RAM. Do you think this could be an issue?

It seems like I was poor in details, so here's something more to what I said.
I use a php script to retrieve the temperature data for all the rooms of my house:
$query = "SELECT * FROM temperature t1
WHERE (id, date) IN
(SELECT id,MAX(date) FROM
temperature t2 GROUP BY id)";
this query allows me to collect the temperature values in an array called $options:
$result_set = mysql_query($query, $connection);
while($rows = mysql_fetch_array($result_set)){
$options [] = $rows;
}
then, I json-encode the array:
$j = json_encode($options);
and send it to the ajax script, which shows the data on the web page:
echo $j;
In the ajax script, I save the data in a variable and then parse it:
var return_data = xhr.responseText;
var temperature = JSON.parse(return_data);
next I loop through the array to extract the temperature values and put it in the right place on the web page:
for(var j=0; j<temperature.length; j++){
document.getElementById("TEMPArea" + j).innerHTML = temperature[j].temp + "°C";
}
As I said in the first message, this works fine as long as the rows in the 'temperature' table are less than 600 or so: polling every 5 seconds is not a problem.
Above 600, the page refresh gets slow and eventually it hangs and stops refreshing.
I am not an expert, the code is pretty simple and straight forward, so I am having problems detecing the cause.
Thanks again.

I think the query is the main source of problems:
it's a slow way of getting the answer you want (you can always run it in Workbench and study the output of EXPLAIN - see the manual for more details
it implicitly supposes that all sensors with transmit at the same time, and as soon as that's not the case your output dataset won't be complete. Normally you'll want the latest data from each individual sensor
so I propose a somewhat different approach:
add an index on date and one on id to speed up queries. The lack of a PK is an issue, but let's first focus on solving the current issues...
obtain the list of available sensors - minimal solution
select distinct id from temperature;
but it would be better to store a list of available sensors in some other table - this query will also get slower as the number of records in temperature grows.
iterate over the results of that list to fetch the latest value for each of the sensors
select * from temperature
where id = (value obtained in previous step)
order by date desc
limit 1;
with this query you'll only get the most recent record associated with each sensor. Thanks to the indexes, the speed impact of a growing table should be minimal.
reassemble these results in a data structure to send to your client web page.
Also, as stated in the documentation, the mysql_* extension is deprecated and should not be used in new programs. Use mysqli_ or preferably PDO. Both these extensions will also allow you to use parameterized queries, the only real protection against SQL Injection issues. See here for a quick introduction on how to use them

Related

How can this query be optimized for speed?

This query creates an export for UPS from the deliveries history:
select 'key'
, ACC.Name
, CON.FullName
, CON.Phone
, ADR.AddressLine1
, ADR.AddressLine2
, ADR.AddressLine3
, ACC.Postcode
, ADR.City
, ADR.Country
, ACC.Code
, DEL.DeliveryNumber
, CON.Email
, case
when CON.Email is not null
then 'Y'
else 'N'
end
Ship_Not_Option
, 'Y' Ship_Not
, 'ABCDEFG' Description_Goods
, '1' numberofpkgs
, 'PP' billing
, 'CP' pkgstype
, 'ST' service
, '1' weight
, null Shippernr
from ExactOnlineREST..GoodsDeliveries del
join ExactOnlineREST..Accounts acc
on ACC.ID = del.DeliveryAccount
join ExactOnlineREST..Addresses ADR
on ADR.ID = DEL.DeliveryAddress
join ExactOnlineREST..Contacts CON
on CON.ID = DEL.DeliveryContact
where DeliveryDate between $P{P_SHIPDATE_FROM} and $P{P_SHIPDATE_TO}
order
by DEL.DeliveryNumber
It takes many minutes to run. The number of deliveries and accounts grows with several hundreds each day. Addresses and contacts are mostly 1:1 with accounts. How can this query be optimized for speed in Invantive Control for Excel?
Probably this query is run at most once every day, since the deliverydate does not contain time. Therefore, the number of rows selected from ExactOnlineREST..GoodsDeliveries is several hundreds. Based upon the statistics given, the number of accounts, deliveryaddresses and contacts is also approximately several hundreds.
Normally, such a query would be optimized by a solution such as Exact Online query with joins runs more than 15 minutes, but that solution will not work here: the third value of a join_set(soe, orderid, 100) is the maximum number of rows on the left-hand side to be used with index joins. At this moment, the maximum number on the left-hand side is something like 125, based upon constraints on the URL length for OData requests to Exact Online. Please remember the actual OData query is a GET using an URL, not a POST with unlimited size for the filter.
The alternatives are:
Split volume
Data Cache
Data Replicator
Have SQL engine or Exact Online adapted :-)
Split Volume
In a separate query select the eligible GoodsDeliveries and put them in an in-memory or database table using for instance:
create or replace table gdy#inmemorystorage as select ... from ...
Then create a temporary table per 100 or similar rows such as:
create or replace table gdysubpartition1#inmemorystorage as select ... from ... where rowidx$ between 0 and 99
... etc for 100, 200, 300, 400, 500
And then run the query several times, each time with a different gdysubpartition1..gdysubpartition5 instead of the original from ExactOnlineREST..GoodsDeliveries.
Of course, you can also avoid the use of intermediate tables by using an inline view like:
from (select * from goodsdeliveries where date... limit 100)
or alike.
Data Cache
When you run the query multiple times per day (unlikely, but I don't know), you might want to cache the Accounts in a relational database and update it every day.
You can also use a 'local memorize results clipboard andlocal save results clipboard to to save the last results to a file manually and later restore them usinglocal load results clipboard from ...andlocal insert results clipboard in table . And maybe theninsert into from exactonlinerest..accounts where datecreated > trunc(sysdate)`.
Data Replicator
With Data Replicator enabled, you can have replicas created and maintained automatically within an on-premise or cloud relational database for Exact Online API entities. For low latency, you will need to enable the Exact webhooks.
Have SQL Engine or Exact adapted
You can also register a request to have the SQL engine to allow higher number in the join_set hint, which would require addressing the EOL APIs in another way. Or register a request at Exact to also allow POST requests to the API with the filter in the body.

How do i make it so i don't have to query a million+ timestamps

Quick synopsis of the problem:
I am working on a graph page to map the performance of a device my company is working on.
I get a new statpoint (timestamp, stats, nodeid, volumeid, clusterid) every 2 seconds. from every node.
this results in approximately 43k records per day per node per stat.
now let's say i have 13 stats. that's 520k ish records a day.
So a row would look something like:
timestamp typeid clusterid nodeid volumeid value
01/02/2016 05:02:22 0 1 1 1 82.20
So brief explanation, we decided to go with mysql because it's easily scaleable in Amazon. i was using Influx before which could easily solve this problem but there is no way to auto scale InfluxDB in Amazon.
My ultimate goal is to get a return value that looks like:
object[ {
node1-stat1: 20.0,
node2-stat1: 23.2,
node3-stat1: xx.x,
node1-stat2: 20.0
node2-stat2: xx.x,
node3-stat2: xx.x,
timestamp: unixtimestamp
},
{
node1-stat1: 20.0,
node2-stat1: 23.2,
node3-stat1: xx.x,
node1-stat2: 20.0
node2-stat2: xx.x,
node3-stat2: xx.x,
timestamp: unixtimestamp + 2 seconds
}]
i currently have a query that gathers all the unique timestamps.
and then loops over those to get the values belonging to that timestamp.
and that get's put in an object.
That results in the desired output. but it takes FOREVER to do this and it's over a million queries.
Can something like this even be done in Mysql? should i go back to a timeseries db? and just deal with scaling it manually?
// EDIT //
I think i might have solved my problem:
SELECT data_points.*, data_types.friendly_name as friendly_name
FROM data_points, data_types
WHERE (cluster_id = '5'
AND data_types.id = data_points.data_type_id
AND unix_timestamp(timestamp) BETWEEN '1456387200' AND '1457769599')
ORDER BY timestamp, friendly_name, node_id, volume_id
Gives me all the fields i need.
I then loop over these datapoints. and create a new "object" for each timestamp, and just add stats to this object for all the ones that match the timestamp.
this executes in under a second going over a million records.
I will for sure try to see if swapping to a Timeseries db will make an improvement in the future.

Ruby on Rails SQL "SELECT" taking a long time

I am selecting records from a table named "bookings" that contains over 100,000 records. I am new to SQL and for some reason this is taking many seconds to finish, and even timing out on my production server:
def bookings_in_date_range(division, startdate, enddate)
sql = "SELECT * FROM bookings WHERE division = '#{division}';"
bookings = ActiveRecord::Base.connection.execute(sql) # all bookings from this division
bookingsindaterange = bookings.select { |b| (parsedate(b["date"]) >= parsedate(startdate)) and (parsedate(b["date"]) <= parsedate(enddate)) } # refine to bookings in date range
end
def parsedate(date) # get date from mm/dd/yy format
d = date.split("/")
return Date.parse("#{d[2]}-#{d[0]}-#{d[1]}")
end
I also included the function I'm using to re-format the date, however executing the SQL statement appears to be where the process is hanging up, based on my tests.
My goal is to select all "bookings" in a "division" within a specified date range. Existing code works faster for divisions with low numbers of bookings.
EDIT
Otávio's code below seems to speed things up a bit. However, my requirement is to see if a booking falls within a date range (on or after startdate and on or before enddate). I can't figure out how to get this logic into the .where statement, so I am running a loop like this:
bookings_start_thru_end = []
(startdate..enddate).each do |date|
date_bookings = Booking.where("division = ? AND date = ?",division, date.strftime("%m/%d/%y"))
date_bookings.each do |b|
bookings_start_thru_end.push b
end
end
Also, the issue with crashing was ActiveRecord session store filling up. I was dumping a bunch of data from the report into the session store to save it between requests to avoid doing another database query, but this was killing performance. The database query is still taking 5 seconds or so, but I can live with that.
Use EXPLAIN to see what the query execution plan is:
https://dev.mysql.com/doc/refman/5.6/en/explain.html
https://dev.mysql.com/doc/refman/5.6/en/using-explain.html
Now my guess is that you do not have indexes on the columns that you are referencing in your WHERE and that leads to table scans which are causing your query to run very slowly. But that is just my guess since I do not know your tables.
The indexes will be required whether you are using raw sql or active record (spit).
Whenever possible, you should avoid executing raw SQL in your applications. Prefer to use ActiveRecord interfaces, this not only will make your app more secure but it will also execute queries in a way in which they are optimized to.
In your case, refactor your bookings_in_date_range method to use ActiveRecord's .where method:
def bookings_in_date_range(division, enddate, startdate)
YourModelName.where("division = ? AND enddate = ? AND startdate = ?",division, parsedate(enddate), parsedate(startdate))
end
To look for things in a range, use
YourModelName.where("division = ? AND enddate <= ? AND startdate >= ?",division, parsedate(enddate), parsedate(startdate))

Sql queries for getting information from GTFS files in Java

I'm working on a project for school which uses gtfs database (MySQL).
I wrote some code that parses the gtfs files and inserts them into MySQL DB (each file is a table in my DB).
I'm trying to write two SQL queries:
Given a stationId, time, and line number - I want to get all trips that pass by this station in the next 10 minutes.
Given a tripId, directionId and stopId - I want to get all the remaining stations in this trip (in order to draw on a map the stations to come in my trip).
Does anyone knows how can I state this SQL queries in Java?
I tried this:
SELECT * FROM stops, routes, stop_times, calendar, trips
where departure_time > "08:24:00"
and departure_time < "16:40:00"
and route_short_name = "10"
and stops.stop_id = 29335
and stops.stop_id = stop_times.stop_id
and stop_times.trip_id = trips.trip_id
and calendar.service_id = trips.service_id
and calendar.sunday = 1
I have fixed this problem exactly for GTFS data in Belgium. The code is available on github:
https://github.com/iRail/MIVBSTIBResource/blob/master/MIVBSTIBStopTimesDao.php

How to find all database rows with a time AFTER a specific time

I have a "last_action" column in my users table that updates a unix timestamp every time a user takes an action. I want to select all users in my users table who have made an action on the site after a specific time (probably 15 minutes, the goal here is to make a ghetto users online list).
I want to do something like the following...
time = Time.now.to_i - 900 # save the timestamp 15 minutes ago in a variable
User.where(:all, :where => :last_action > time)
What's the best way to do this? Is there a way of doing it without using any raw SQL?
Maybe this will work?
User.where("users.last_action > ?", 15.minutes.ago)
Try this:
time = Time.now.to_i - 900
User.where("last_action > #{time}")
There might be a nicer/safer syntax to put variable arguments in to the where clause, but since you are working with integers (timestamps) this should work and be safe.