SQL - MySQL - average group by and limit problems - mysql

I am collecting data from various remote sensors that send their data every so many seconds. I record the name of the remote sensor and the time difference since the last time I received data from that instrument. The data for each instrument comes in a random order and not at set intervals.
The table looks like:
id instname timediff
1 inst01 1000
2 inst02 1100
3 inst01 1210
4 inst03 900
etc.
The id column is auto incrementing.
What I am trying to do is get the average timediff for each instrument for the last 10 values of each instrument.
the closest I've got is:
SELECT
inst AS Instrument,
AVG(diff / 1000) AS Average
FROM
(SELECT
instname AS inst, timediff AS diff
FROM
log
WHERE
instname = 'Inst01'
ORDER BY id DESC
LIMIT 0 , 10) AS two
Obviously this only works for 1 instrument and I'm not convinced the limit is working properly either. I don't know the names of the instruments nor how many I'll be collecting data from.
How do I get the average timediff of the last 10 values for each instrument using SQL?

Somewhat painfully. I think the easiest way is to use variables. The following query enumerates the readings for each instrument:
select l.*,
(#rn := if(#i = instname, #rn + 1,
if(#i := instname, 1, 1)
)
) as rn
from log l cross join
(select #i := '', #rn := 0)
order by instname, id desc;
You can then use this as a subquery to do your calculation:
select instname, avg(timediff)
from (select l.*,
(#rn := if(#i = instname, #rn + 1,
if(#i := instname, 1, 1)
)
) as rn
from log l cross join
(select #i := '', #rn := 0)
order by instname, id desc
) l
where rn <= 10
group by instname;

try using this:tested on less data but should work.
SELECT
inst AS Instrument,
diff AS Average
FROM
(SELECT
t1.instname AS inst,AVG(t1.timediff / 1000) AS diff
FROM
inst t1,inst t2
WHERE
t1.instname = t2.instname group by t1.instname ORDER BY t2.id DESC
LIMIT 0,10
) AS two

Related

SQL query to select last X entries for a certain non-primary field

I'm having difficulties setting up a slightly more advanced SQL query.
What I'm trying to do is to select the last 24 entries for every zr_miner_id, but I keep getting SQL timeouts (the table has around 40000 entries so far).
So let's say there's 200 entries for zr_miner_id 1 and 200 for zr_miner_id 2, I'd end up with 48 results.
So far, I've come up with the query below.
What this is supposed to do is to select each result in zec_results that has less than 24 newer entries with the same zr_miner_id.
I couldn't think of any better way to perform this task, but then again, I'm not that far advanced at SQL yet.
SELECT results_a.*
FROM zec_results results_a
WHERE (
SELECT COUNT(results_b.zr_id)
FROM zec_results AS results_b
WHERE results_b.zr_miner_id = results_a.zr_miner_id
AND results_b.zr_id >= results_a.zr_id
) <= 24
Use variables!
SELECT r.*
FROM (SELECT r.*,
(#rn := if(#m = r.zr_miner_id, #rn + 1,
if(#m := r.zr_miner_id, 1, 1)
)
) as rn
FROM zec_results r CROSS JOIN
(SELECT #m := -1, #rn := 0) params
ORDER BY r.zr_miner_id, r.zr_id DESC
) r
WHERE rn <= 24 ;
If you want to put the query into a view, then the above will not work. Performance on your approach might improve with an index on (zr_miner_id, zr_id).

Top 20 percent by id - MySQL

I am using a modified version of a query similiar to another question here:Convert SQL Server query to MySQL
Select *
from
(
SELECT tbl.*, #counter := #counter +1 counter
FROM (select #counter:=0) initvar, tbl
Where client_id = 55
ORDER BY ordcolumn
) X
where counter >= (80/100 * #counter);
ORDER BY ordcolumn
tbl.* contains the field 'client_id' and I am attempting to get the top 20% of the records for each client_id in a single statement. Right now if I feed it a single client_id in the where statement it gives me the correct results, however if I feed it multiple client_id's it simply takes the top 20% of the combined recordset instead of doing each client_id individually.
I'm aware of how to do this in most databases, but the logic in MySQL is eluding me. I get the feeling it involves some ranking and partitioning.
Sample data is pretty straight forward.
Client_id rate
1 1
1 2
1 3
(etc to rate = 100)
2 1
2 2
2 3
(etc to rate = 100)
Actual values aren't that clean, but it works.
As an added bonus...there is also a date field associated to these records and 1 to 100 exists for this client for multiple dates. I need to grab the top 20% of records for each client_id, year(date),month(date)
You need to do the enumeration for each client:
SELECT *
FROM (SELECT tbl.*, #counter := #counter +1 counter
(#rn := if(#c = client_id, #rn + 1,
if(#c := client_id, 1, 1)
)
)
FROM (select #c := -1, #rn := 0) initvar CROSS JOIN tbl
ORDER BY client_id, ordcolumn
) t cross join
(SELECT client_id, COUNT(*) as cnt
FROM tbl
GROUP BY client_id
) tt
where rn >= (80/100 * tt.cnt);
ORDER BY ordcolumn;
Using Gordon's answer as a starting point, I think this might be closer to what you need.
SELECT t.*
, (#counter := #counter+1) AS overallRow
, (#clientRow := if(#prevClient = t.client_id, #clientRow + 1,
if(#prevClient := t.client_id, 1, 1) -- This just updates #prevClient without creating an extra field, though it makes it a little harder to read
)
) AS clientRow
-- Alteratively (for everything done in clientRow)
, #clientRow := if(#prevClient = t.client_id, #clientRow + 1, 1) AS clientRow
, #prevClient := t.client_id AS extraField
-- This may be more reliable as well; I not sure if the order
-- of evaluation of IF(,,) is reliable enough to guarantee
-- no side effects in the non-"alternatively" clientRow calculation.
FROM tbl AS t
INNER JOIN (
SELECT client_id, COUNT(*) AS c
FROM tbl
GROUP BY client_id
) AS cc ON tbl.client_id = cc.client_id
INNER JOIN (select #prevClient := -1, #clientRow := 0) AS initvar ON 1 = 1
WHERE t.client_id = 55
HAVING clientRow * 5 < cc.c -- You can use a HAVING without a GROUP BY in MySQL
-- (note that clientRow is derived, so you cannot use it in the `WHERE`)
ORDER BY t.client_id, t.ordcolumn
;

calculating median for data mysql

I am trying to calculate median of time spent by people on a specific category. The whole dataset I have is around 500k rows but I have tried to summarize a snippet of it below
person category time spent (in mins)
roger dota 20
jim dota 50
joe call of duty 5
jim fallout 25
kathy GTA 40
alicia fallout 100
I have tried to use the query below but I am getting no where.
SELECT x1.person, x1.time spent
from data x1, data x2
GROUP BY x1.val
HAVING SUM(SIGN(1-SIGN(x2.val-x1.val))) = (COUNT(*)+1)/2
A self-join on 500,000 rows is likely to be expensive. Why not just enumerate the rows and grab the one in the middle?
select d.*
from (select d.*, (#rn := #rn + 1) as rn
from data d cross join
(select #rn := 0) params
order by d.val
) d
where 2*rn in (#rn, #rn + 1);
The weird where clause chooses the value in the middle -- it is just an approximation if there are an eve number of rows. Because you want the actual row values, you need the approximation. The normal calculation of the median itself would be:
select avg(d.val)
from (select d.*, (#rn := #rn + 1) as rn
from data d cross join
(select #rn := 0) params
order by d.val
) d
where 2*rn in (#rn - 1, #rn, #rn + 1);
EDIT:
The same logic works per person as well, but with a bit more logic to get the overall counts:
select d.person, avg(val) as median
from (select d.*,
(#rn := if(#p = person, #rn + 1
if(#p := person, 1, 1)
) as rn
from data d cross join
(select #rn := 0, #p := '') params
order by person, d.val
) d join
(select person, count(*) as cnt
from data
group by person
) p
on d.person = p.person
where 2*rn in (d.cnt - 1, d.cnt, d.cnt + 1)
group by person;

MySQL Query get the last N rows per Group

Suppose that I have a database which contains the following columns:
VehicleID|timestamp|lat|lon|
I may have multiple times the same VehicleId but with a different timestamp. Thus VehicleId,Timestamp is the primary key.
Now I would like to have as a result the last N measurements per VehicleId or the first N measurements per vehicleId.
How I am able to list the last N tuples according to an ordering column (e.g. in our case timestamp) per VehicleId?
Example:
|VehicleId|Timestamp|
1|1
1|2
1|3
2|1
2|2
2|3
5|5
5|6
5|7
In MySQL, this is most easily done using variables:
select t.*
from (select t.*,
(#rn := if(#v = vehicle, #rn + 1,
if(#v := vehicle, 1, 1)
)
) as rn
from table t cross join
(select #v := -1, #rn := 0) params
order by VehicleId, timestamp desc
) t
where rn <= 3;

I need to find any 5 rows that match where clause and they occur "in a row" (they are neighbors)

I have a MySQL table for fictional fitness app.
Let's say that app is monitoring user progress on doing pushups day by day.
TrainingDays
id | id_user | date | number_of_pushups
Now, I need to find if user have ever managed to do more than 100 pushups 5 days in a row.
I know this is probably doable by fetching all days and then making some php loops, but I wonder if there is possibility to do this in plain mysql...
In MySQL, the easiest way is to use variables. The following gets all sequences of days with 100 or more pushups:
select grp, count(*) as numdaysinarow
from (select (date - interval rn day) as grp, td.*
from (select td.*,
(#rn := if(#i = id_user, #rn + 1
if(#i := id_user, 1, 1)
) as rn
from trainingdays td cross join
(select #rn := 0, #i := NULL) vars
where number_of_pushups >= 100
order by id_user, date
) td
) td
group by grp;
This uses the observation that when you subtract a sequence of numbers from a series of dates that increment, then the resulting value is constant.
To determine if there are 5 or more days in a row, use max():
select max(numdaysinarow)
from (select grp, count(*) as numdaysinarow
from (select (date - interval rn day) as grp, td.*
from (select td.*,
(#rn := if(#i = id_user, #rn + 1
if(#i := id_user, 1, 1)
) as rn
from trainingdays td cross join
(select #rn := 0, #i := NULL) vars
where number_of_pushups >= 100
order by id_user, date
) td
) td
group by grp
) td;
Your app can then check the value against whatever minimum you like.
Note: this assumes that there is only one record per day. The above can easily be modified if you are looking for the sum of the number of pushups on each day.
Order of records shouldn't be relied on, e.g. with ORDER BY you can change the sequence.
However, you have many functions at hand in a database, which also enables you to use less PHP. What you want is SUM function. Combined with a WHERE clause, this should get you started:
SELECT SUM(number_of_pushups) AS sum_pushups
FROM TrainingDays
WHERE date >= :start_day
AND user_id = :user_id