Let`s say I have the following table
+----+-------+
| Id | Value |
+----+-------+
| 1 | 2.0 |
| 2 | 8.0 |
| 3 | 3.0 |
| 4 | 9.0 |
| 5 | 1.0 |
| 6 | 4.0 |
| 7 | 2.5 |
| 8 | 6.5 |
+----+-------+
I want to plot these values, but since my real table has thousands of values, I thought about getting and average for each X rows. Is there any way for me to do so for, ie, each 2 or 4 rows, like below:
2
+-----+------+
| 1-2 | 5.0 |
| 3-4 | 6.0 |
| 5-6 | 2.5 |
| 7-8 | 4.5 |
+-----+------+
4
+-----+------+
| 1-4 | 5.5 |
| 5-8 | 3.5 |
+-----+------+
Also, is there any way to make this X value dynamic, based on the total number of rows in my table? Something like, if I have 1000 rows, the average will be calculated based on each 200 rows (1000/5), but if I have 20, calculate it based on each 4 rows (20/5).
I know how to do that programmatically, but is there any way to do so using pure SQL?
EDIT: I need it to work on mysql.
Depending on your DBMS, something like this will work:
SELECT
ChunkStart = Min(Id),
ChunkEnd = Max(Id),
Value = Avg(Value)
FROM
(
SELECT
Chunk = NTILE(5) OVER (ORDER BY Id),
*
FROM
YourTable
) AS T
GROUP BY
Chunk
ORDER BY
ChunkStart;
This creates 5 groups or chunks no matter how many rows there are, as you requested.
If you have no windowing functions you can fake it:
SELECT
ChunkStart = Min(Id),
ChunkEnd = Max(Id),
Value = Avg(Value)
FROM
YourTable
GROUP BY
(Id - 1) / (((SELECT Count(*) FROM YourTable) + 4) / 5)
;
I made some assumptions here such as Id beginning with 1 and there being no gaps, and that you would want the last group too small instead of too big if things didn't divide evenly. I also assumed integer division would result as in Ms SQL Server.
You can use modulos operator to act on every Nth row of the table. This example would get the average value for every 10th row:
select avg(Value) from some_table where id % 10 = 0;
You could then do a count of the rows in the table, apply some factor to that, and use that value as a dynamic interval:
select avg(Value) from some_table where id % (select round(count(*)/1000) from some_table) = 0;
You'll need to figure out the best interval based on the actual number of rows you have in the table of course.
EDIT:
Rereading you post I realize this is getting an average of every Nth row, and not each sequential N rows. I'm not sure if this would suffice, or if you specifically need sequential averages.
Look at the NTILE function (as in quartile, quintile, decile, percentile). You can use it to split your data evenly into a number of buckets - in your case it seems you would like five.
Then you can use AVG to calculate an average for each bucket.
NTILE is in SQL-99 so most DBMSes should have it.
You can try that
CREATE TABLE #YourTable
(
ID int
,[Value] float
)
INSERT #YourTable (ID, [Value]) VALUES
(1,2.0)
,(2,8.0)
,(3,3.0)
,(4,9.0)
,(5,1.0)
,(6,4.0)
,(7,2.5)
,(8,6.5)
SELECT
ID = MIN(ID) + '-' + MAX(ID)
,[Value] = AVG([Value])
FROM
(
SELECT
GRP = ((ROW_NUMBER() OVER(ORDER BY ID) -1) / 2) + 1
,ID = CONVERT(VARCHAR(10), ID)
,[Value]
FROM
#YourTable
) GrpTable
GROUP BY
GRP
DROP TABLE #YourTable
Related
The table
The query
SELECT
id, MAX(fecha_hora_carga) AS fecha_hora_carga
FROM
calibraciones_instrumentos
GROUP BY
instrumento_id
The result
Its returning the most recent fecha_hora_carga dates, but the ids are 24 and 28...i think they should be 27 and 29!
Why are the ids not corresponding with the date?
The problem is MySQL does not make much sense when grouping by a max value.
It grabs the max column and then the other columns in that table you selected by whatever order you sort them by.
To get what you want, you have to use subqueries to pull the data you want.
Here is an example:
SELECT
t1.id,
t1.fecha_hora_carga
FROM
calibraciones_instrumentos AS t1
JOIN(
SELECT MAX(fecha_hora_carga) AS fecha_hora_carga,
instrument_id
FROM
calibraciones_instrumentos
GROUP BY
instrument_id
) AS t2
ON (t1.fecha_hora_carga = t2.fecha_hora_carga AND
t1.instrument_id = t2.instrument_id
);
Because you are misusing SQL. You have one column in the GROUP BY clause and that column isn't even being selected!
In most databases -- including the most recent versions of MySQL -- your query would generate a syntax error because id is neither in the GROUP BY nor an argument to an aggregation function such as MIN().
So, MySQL is providing just an arbitrary id. I would expect an aggregation query to look like this:
SELECT instrumento_id, MAX(fecha_hora_carga) AS fecha_hora_carga
FROM calibraciones_instrumentos
GROUP BY instrumento_id;
Or, if you want the row with the maximum fecha_hora_carga for each instrumento_id, use filtering:
select ci.*
from calibraciones_instrumentos ci
where ci.fecha_hora_carga = (select max(ci2.fecha_hora_carga)
from calibraciones_instrumentos ci2
where ci2.instrumento_id = ci.instrumento_id
);
This is because your query is incorrect
The MAX is an aggregate function and gets the max. value from the fecha_hora_carga, this won't give you the corresponding id too it just gets the maximum value stored in the fecha_hora_carga column, not a row.
See the following sample:
mysql>CREATE TABLE test_group_by (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, val1 INT, val2 INT);`
mysql>INSERT INTO test_group_by (val1, val2) VALUES(10,1), (6, 1), (18, 1), (22, 2), (4, 2);
mysql> SELECT * FROM test_group_by;
+----+------+------+
| id | val1 | val2 |
+----+------+------+
| 1 | 10 | 1 |
| 2 | 6 | 1 |
| 3 | 18 | 1 |
| 4 | 22 | 2 |
| 5 | 4 | 2 |
+----+------+------+
mysql> SELECT id, MAX(val1) FROM test_group_by GROUP BY val2;
+----+-----------+
| id | MAX(val1) |
+----+-----------+
| 1 | 18 |
| 4 | 22 |
+----+-----------+
As you can see in the example, that is a simplified representation of your table.
The MAX function does not retrieves a entry, just the max. value of all the entries in the table. But your query also asks for a ID, it just makes one up (which ID is returned cannot be said for sure).
I have a table containing some similar rows representing objects for a game. I use this table as a way to select objects randomly. Of course, I ignore the size of the table. My problem is that I would like to have a single query that returns the probability to select every object and I don't know how to proceed.
I can get the total number of objects I have in my table:
select count(id) from objects_desert_tb;
Which returns
+-----------+
| count(id) |
+-----------+
| 81 |
+-----------+
1 row in set (0.00 sec)
and I have a query that return the number of occurence of every object in the table:
select name, (count(name)) from objects_desert_tb group by name;
which gives:
+-------------------+---------------+
| name | (count(name)) |
+-------------------+---------------+
| carrots | 5 |
| metal_scraps | 14 |
| plastic_tarpaulin | 8 |
| rocks_and_stones | 30 |
| wood_scraps | 24 |
+-------------------+---------------+
5 rows in set (0.00 sec)
Computing the probability for every object just consist in doing (count(name)) divided by the total number of rows in the table. For example with the row carrots, just compute 5/81, from the two queries given above. I would like a single query that would return:
+-------------------+---------------+
| carrots | 5/81 = 0.06172839
| metal_scraps | 0.1728...
| plastic_tarpaulin | 0.09876...
| rocks_and_stones | 0.37...
| wood_scraps | 0.29...
+-------------------+---------------+
Is there a way to use the size of the table as a variable inside a SQL query? Maybe by nesting several queries?
Cross join your queries:
select c.name, c.counter / t.total probability
from (
select name, count(name) counter
from objects_desert_tb
group by name
) c cross join (
select count(id) total
from objects_desert_tb
) t
In MySQL 8+, you would just use window functions:
select name, count(*) as cnt,
count(*) / sum(count(*)) over () as ratio
from objects_desert_tb
group by name;
I have the following table below and I am trying to find each IID that has at least 2 or more same IID's and find the average of the stats. The MySQL statement is as follows:
select count(distinct IID) as counted from rate; but I get one field and it states a number of 15, the table is very small and only has 20 tuples. I am stuck and can't go any further than this.
UID| IID | stats
-----------------
1 | 1 | 3
1 | 1 | 4
1 | 3 | 1
2 | 3 | 1
2 | 3 | 1
2 | 1 | 3
2 | 2 | 4
The result I would like to see is
grouped by IID's if there is two or more and an average of stats. I have a feeling I have to group by IID and sum the stats and divide by amount of count.
IID | stats
-----------------
1 | 3.5
3 | 1.5
If you really want an average then just use avg(stats). I couldn't see how to make the numbers match with your output so I thought perhaps you wanted a different divisor.
select sum(stats) / (count(distinct UID) * 1e) /* cast to float */
from rate
group by IID
having count(*) > 1
That is most likely correct. Your statement counts the number of distinct values in that column. For your case, you should use group by and having.
why don't you use group_by(IID) that will group your data according to IID then use where clause in it.
The following query will return the average of stats for each iid that has more than one row:
select iid, avg(stats)
from table t
group by iid
having count(*) > 1;
This seems like a reasonable result, but the values would be:
1 3.33
3 1
I am not sure what calculation you have in mind for the results in your question.
I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);
Is there a way to multiply a column with a predefined number based on another column? There are multiple predefined numbers that are used depending on the value in the column.
Example:
Table
Columns: persons_id,activity,scale
Values
1,swimming,4
1,baseball,2
1,basketball,3
2,swimming,6
2,basketball,3
If my predefined numbers are: 6 (swimming), 8 (baseball), 5 (basketball)
The output would look like this
1,swimming,4,24
1,baseball,2,16
1,basketball,2,10
2,swimming,6,36
2,basketball,3,15
Edit: Thank you everyone for contributing. I ended up using the solution from sgeddes.
Sure, you can use CASE:
SELECT Persons_Id, Activity, Scale,
Scale *
CASE
WHEN Activity = 'swimming' THEN 6
WHEN Activity = 'baseball' THEN 8
WHEN Activity = 'basketball' THEN 5
ELSE 1
END Total
FROM YourTable
Good luck.
Have another column called WEIGHT that multiples the SCALE value. Perhaps you can calculate the product using a trigger to populate the column. Otherwise, a simple SELECT will do fine.
you can use this query:
select persons_id, activity, scale,
scale * case when activity = 'swimming' then 6
when activity = 'baseball' then 8
when activity = 'basketball' then 5 end as result
from Table1
but a better solution will be defining a new table Coefficients(activity, coefficient)
so that you can insert rows:
'swimming', 6
'baseball', 8
'basketball', 5
then use something like this:
select persons_id, activity, scale, scale * coefficient as result
from Table1 inner join Coefficients on Table1.activity = Coefficients.activity
You can also use a table that stores the value or create a subquery that will return the multipliers:
select persons_id,
t.activity,
scale,
scale * s.val as result
from yourtable t
inner join
(
select 'swimming' activity, 6 val
union all
select 'baseball' activity, 8 val
union all
select 'basketball' activity, 5 val
) s
on t.activity = s.activity
See SQL Fiddle with Demo
The result is:
| PERSONS_ID | ACTIVITY | SCALE | RESULT |
--------------------------------------------
| 1 | swimming | 4 | 24 |
| 1 | baseball | 2 | 16 |
| 1 | basketball | 3 | 15 |
| 2 | swimming | 6 | 36 |
| 2 | basketball | 3 | 15 |