I have a database table with lists of temperature readings from many locations in a number of buildings. I need a query that will give me a true or false if more than 10% of the readings in a building, taken on a date, are greater than X
I am not looking for a average. If there are 100 measurements taken in a building on a date, and 10 of them are over X (say 80 degrees) then create a flag.
The table is laid out as
Building # location # date temperature
| 123 | 555 |2016-04-08 | 68.5 |
| 123 | 556 |2016-04-08 | 70.2 |
| 123 | 557 |2016-04-08 | 65.4 |
| 888 | 999 |2013-03 22 | 80.4 |
Typically a building would have over 100 readings. There are many hundreds of building/date entries in the table
Can this be done with a single mysql query and can you share that query with me?
I obviously haven't made my question clear.
The result I am looking for is a single True or False.
If more than 10% of the results for a building/date combination were over X (say 80%) then show true, or some flag equal to true.
The known fields will be building and date. The location is not relevant, and can be ignored. So given the input of building (123) and date (2016-04-08) are more than 10% of the entries in the table that have that building number and date greater than X (e.g. 80). The only data to be tested are those for that building and date. So the query would end in:
where building_id=`123` AND date =`2016-04-08`
I am NOT looking for an average or a median. I am NOT looking to see a list of the data for that 10%. I am just looking for true or false.
You can use conditional aggregation, something like this:
select building, date,
(case when avg(temperature > x) > 0.1 then 'Y' else 'N' end) as flag
from t
group by building, date;
To return building and date, and "create a flag" for rows where more than 10% of the readings for that building on that date are over a given value X ...
SELECT r.building
, DATE(r.date)
, ( SUM(r.reading > X ) > SUM(.10) ) AS _flag
FROM myreadings r
GROUP BY r.building, DATE(r.date)
Absent more specification about the actual resultset you want to return, we're just guessing at what result set you want to return.
FOLLOWUP
Based on the update to the question... to return a row for a single building and a single date, add the WHERE clause as shown in the question. And remove expressions from the SELECT list.
SELECT ( SUM(r.reading > X ) > SUM(.10) ) AS _flag
FROM myreadings r
WHERE r.building = '123'
AND r.date >= '2016-04-08'
AND r.date < '2016-04-08' + INTERVAL 1 DAY
If there are no rows for the given building and given date, the query will return zero rows. If there is at least one row, and the number of rows that have a reading greater than X is more than 10% of the total number of rows, the query will return a single row, with _flag having a value of 1 (TRUE). Otherwise, the query will return a single row with _flag having a value of 0 (FALSE).
If you want the query to return a row even when there are no matching rows in the table, that can be accomplished with a more complex SQL statement.
If you want the query to return string values 'TRUE' or 'FALSE', that can be accomplished as well.
Again, absent an example of the resultset you are expecting to have returned, (without an actual specification which we can compare a resultset to), we're just guessing.
Related
I need to calculate the Average value of fields, but two things needs to happen:
1- The empty values should NOT be counted for the average math.
2- If the field is empty it still must be shown in the result (with avg === 0)
Imagine that I have this dataset:
-----------------------
Code | valField | Date
-----------------------
A | | 2020-09-08
B | 12 | 2020-09-09
A | 10 | 2020-09-08
B | 15 | 2020-09-09
B | | 2020-09-09
C | | 2020-09-09
So I need the average of the day. As you can see, we have:
A = { empty, 10 }
B = { 12, 15, empty }
C = { empty }
I need to make the average like this:
Average of A = 10
Average of B = (12+15)/2 (because we have 2 non-empty values)
Average of C = 0 (It has not a single value, but I need it to show on result as 0)
So far I could accomplish both of the requirements, but not in the same time.
This query will show empty values BUT will also count empty fields on average math
SELECT AVG(valField) FROM myTable;
So Average of B would be = (12+15+0)/3 - wrong!
Now this will ignore empty values, the AVG math will be correct, but C would NOT be shown.
SELECT AVG(valFIeld) FROM myTable WHERE valField <> ''
How may I accomplish both requirements?
From your comment I understood, you have valField defined as varchar, so you can use next trick:
select
Code,
coalesce(avg(nullif(valField, '')), 0) as avg_value
from tbl
group by Code;
Test the query on SQLize.online
Here I used NULLIF function for convert empty values to null before calculate the average
I think you want:
SELECT code, COALESCE(AVG(valField), 0) FROM myTable GROUP BY code
This assumes valField is of a numeric datatype, and that by empty you mean null.
Here is what happens behind the hood:
avg(), as most other aggregate functions, ignores null values
if all values are null, then avg() does return null; you can replace that with 0 using coalesce()
That should be easy just create two queries one that calculates the average using non null values and the other one calculating the codes having no value in the data.
select round(avg(valField)) as avg, code from new where valField is not null group by Code
union all
select 0 as avg, code from new group by Code having avg(valField) is null;
In a MySQL database, prices are stored in a way like this:
98.06K
97.44K
929.14K
91.87K
2.66M
146.64K
14.29K
when i try to sort price ASC or Price DESC, it returns unexpected result.
Kindly suggest me how can i sort price when price is in
10K, 20M, 1.6B
I want result
14.29K
91.87K
97.44K
98.06K
146.64K
929.14K
2.66M
MySQL ignores trailing non-digits when casting string to numeric. This will return the correct price:
price *
case right(price,1)
when 'K' then 1000
when 'M' then 1000000
else 1
end
Of course, you can order by this, but you better apply it during load and store the price in a numeric column.
The problem lies in your data model. I understand that 2.66M is not necessarily exactly 2,660,000, which is why you don't want to store the whole number, but store '2.66M' instead to indicate the precision. This, however, is two pieces of information: the value and the precision, so use two columns:
mytable
value | unit
-------+-----
98.06 | K
97.44 | K
929.14 | K
91.87 | K
2.66 | M
146.64 | K
14.29 | K
Along with a lookup table:
units
unit | factor
-----+--------
K | 1000
M | 1000000
A possible query would be:
select *
from mytable
join units using (unit)
order by mytable.value * units.factor;
where you may want to extend the ORDER BY clause to something like
order by mytable.value * units.factor, units.factor;
or apply some rounding or whatever to consider precision of two seemingly equal values.
It is possible, though not advisable:
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=0a837287c7646823fa6657706f9ae634
SELECT *
, CAST(LEFT(price, LENGTH(price) - 1) AS DECIMAL(10,2)) AS value
, RIGHT(price, 1) AS unit
, CASE RIGHT(price,1)
WHEN 'K' THEN 1000
WHEN 'M' THEN 1000000
ELSE 1
END AS amount
FROM test1
ORDER BY amount, value;
Why not advisable? As the Explain in the dbfiddle shows, this query uses filesort for sorting, which is not very fast. If you do not have too many rows in your data, this should be no problem though.
I have a table that for an ID, will have data in several bucket fields. I want a function to pull out a sum of buckets, but the function parameters will include the start and end bucket field.
So, if I had a table like this:
ID Bucket0 Bucket30 Bucket60 Bucket90 Bucket120
10 5.00 12.00 10.00 0.0 8.00
If I send in the ID and the parameters Bucket0, Bucket0, it would return only the value in the Bucket0 field: 5.00
If I send in the ID and the parameters Bucket30, Bucket120, it would return the sum of the buckets from 30 to 120, or (12+10+0+8) 30.00.
Is there a nicer way to write this other than a huge ugly
if parameter1=bucket0 and parameter2=bucket0
then select bucket0
else if parameter1=bucket0 and parameter2=bucket1
then select bucket0 + bucket1
else if parameter1=bucket0 and parameter2=bucket2
then select bucket0 + bucket1 + bucket2
and so on?
The table already exists, so I don't have a lot of control over that. I can make my parameters for the function however I want. I can safely say that if a set of buckets are wanted, none in the middle will be skipped, so specifying start and end buckets would work. I could have a single comma delimited string of all buckets wanted.
It would have been better if your table had been normalised, like this:
id | bucket | value
---+-----------+------
10 | bucket000 | 5
10 | bucket030 | 12
10 | bucket060 | 10
10 | bucket090 | 0
10 | bucket120 | 8
Also, the buckets should better have names that are easy to compare in ranges, so that bucket030 comes between bucket000 and bucket120 in the normal alphabetical order, which is not the case if you leave out the padded zeroes.
If the above normalisation is not possible, then use an unpivot clause to turn your current table into the structure depicted above:
select id, sum(value)
from (
select *
from mytable
unpivot (value for bucket_id in (bucket0 as 'bucket000',
bucket30 as 'bucket030',
bucket60 as 'bucket060',
bucket90 as 'bucket090',
bucket120 as 'bucket120'))
) normalised
where bucket_id between 'bucket000' and 'bucket060'
group by id
When you do this with parameter variables, make sure those parameters have the padded zeroes as well.
You could for instance ensure that as follows for parameter1:
if parameter1 like 'bucket%' then
parameter1 := 'bucket' || lpad(+substr(parameter1, 7), 3, '0');
end if;
...etc.
I have a table, which includes the following columns and data:
id dtime instance data dtype
1 2012-10-22 10000 d 1
2 2012-10-22 10000 d 1
..
7 2012-10-22 10004 d 1
..
15 2012-10-22 10000 # 1
16 2012-10-22 10004 d 1
17 2012-10-22 10000 d 1
I want to group sequences of 'd's in the data column, with the '#' at the end of the sequence.
This could have been done by grouping via the instance column, which is an individual stream of data, however there can be multiple sequences within the stream.
I also want to end a sequence if there are no data columns in the same instance for, say, 3 seconds after the last data of that instance and no '#'s have been found within that interval.
I have managed to do exactly this using cursors and while loops, which worked reasonably well for tables with 1000s of rows, however this query will be used on far more rows eventually, and these two methods would take around a minute with a dataset of just 3-5000 rows.
Reading on this website and others, it seems that set-based logic may be the way to go, however I can think of no way to do what I need without some kind of loop on each row that compares it to every other to build the 'sequences'.
If anyone could help, or point me in the direction of something that could, it would be greatly appreciated. :)
I would ideally like the data to be output in the following format:
datacount instance lastdata dtime
20 10000 # 2012-10-22
19 10000 d 2012-10-22
22 10004 # 2012-10-22
20 10022 # 2012-10-22
Where (datacount) is a count of the number of rows in a 'sequence' (which is the data leading up to a '#' or 3 second delay), (instance) is the instance ID from the original table, (lastdata) is the last data value in the sequence, (dtime) is the datetime value of the last data value.
Let me show you how to do this for the final '#'. The time difference follows a similar idea. The key idea is to get the next '#' after the current row. For this you need a correlated subquery. After that, you can do a group by:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime
from (select t.*,
(select min(t2.id) from t t2 where t2.id > t.id and t2.data = '#'
) as groupid
from t
) t
group by groupid
Handling the time sequence is a bit more complicated. It is something like this:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime,
(case when sum(case when data = '#' then 1 else 0 end) > 0 then '#' else 'd' end) as FinalData
from (select t.*,
(select min(t2.id)
from t t2
where t2.id > t.id and
(t2.data = '#' or UNIX_TIMESTAMP(t2.dtime) - UNIX_TIMESTAMP(t.dtime) < 3
) as groupid
from t
) t
group by groupid
I have a table code_prices that looks something like this:
CODE | DATE | PRICE
ABC | 25-7-2011 | 2.81
ABC | 23-7-2011 | 2.52
ABC | 22-7-2011 | 2.53
ABC | 21-7-2011 | 2.54
ABC | 20-7-2011 | 2.58
ABC | 17-7-2011 | 2.42
ABC | 16-7-2011 | 2.38
The problem with the data set is there are gaps in the dates, so I may want to look for the price of item ABC on the 18th however there is no entry because the item wasnt sold on this date. So I would like to return the most recent hisotrical entry for the price.
Say if I query on the date 19-7-2011, I would like to return the entry on the 17th then the next 10 avalaible entries.
If however I query for the price of ABC on the 20th, I would want to return the price on the 20th and the next 10 prices after that...
What is the most efficient way to go about this either in SQL statement or using a stored proc.
I can think of just writing a stored proc which takes the date as a param and then querying for all rows where DATE >= QUERY-DATE ordering by the date and then selecting the 11 items (via limit). Then basically I need to see if that set contains the current date, if it does then return, otherwise I will need to return the 10 most recent entires out of those 11 and also do another query on the table to return the previous entry by getting the max date where date < QUERY-DATE. I am thinking there might be a better way, however I'm not an expert with SQL (clearly)...
Thanks!
This is for one specific code:
SELECT code, `date`, price
FROM code_prices
WHERE code = #inputCode
AND `date` >=
( SELECT MAX(`date`)
FROM code_prices
WHERE code = #inputCode
AND `date` <= #inputDate
)
ORDER BY `date`
LIMIT 11
For ABC and 19-7-2011, the above will you give the row for 17-7-2011 and the 10 subsequent rows (20-7-2011, 21-7-2011, etc)
I'm not entirely clear on what you want to achieve, but I'll have a go anyway. This searches for the ID of the row that contains a date less than or equal to your specified date. It then uses that ID to return all rows with an ID greater than or equal to that value. It assumes that you have a column other than the date column on which the rows can be ordered. This is because you said that the dates are non-linear - I assume that you must have some other way of ordering the rows.
SELECT id, code, dt, price
FROM code_prices
WHERE id >= (
SELECT id
FROM code_prices
WHERE dt <= '2011-07-24'
ORDER BY dt DESC
LIMIT 1 )
ORDER BY id
LIMIT 11;
Alternative with code condition - thanks to #ypercube for highlighting that ;-)
SELECT id, code, dt, price
FROM code_prices
WHERE code = 'ABC'
AND id >= (
SELECT id
FROM code_prices
WHERE dt <= '2011-07-23'
AND code = 'ABC'
ORDER BY dt DESC
LIMIT 1 )
ORDER BY id
LIMIT 11;