How to get average per group and figure out outliers in SQL - mysql

This is what my data looks like:
id | value | group
------------------
1 | 4 | abc
2 | 8 | def
3 | 100 | abc
4 | 8 | ghi
5 | 7 | abc
6 | 10 | ghi
I need to figure out the averages per group where outliers (for e.g. id = 3 for group = abc) are excluded. Then display the ouliers next to averages. For above data I am expecting something like this as result:
group = 'abc'
average = '5.5'
outlier = '100'

One method creates a subquery containing the stats for each group (mean and standard deviation), and then joins this back to the original table to determine which records are outliers, for which group.
SELECT t1.id,
t1.group AS `group`,
t2.valAvg AS average,
t1.value AS outlier
FROM yourTable t1
INNER JOIN
(
SELECT `group`, AVG(value) AS valAvg, STDDEV(value) AS valStd
FROM yourTable
GROUP BY `group`
) t2
ON t1.group = t2.group
WHERE ABS(t1.value - t2.valAvg) > t2.valStd -- any record whose value is MORE
-- than one standard deviation from
-- the mean is an outlier
Update:
It appears that, for some reason, your value column is actual varchar rather than a numeric type. This means you won't be able to do any math on it. So first, convert that column to integer via:
ALTER TABLE yourTable MODIFY value INTEGER;
If you only want outliers which are greater than the average then use the following WHERE clause:
WHERE t1.value - t2.valAvg > t2.valStd

You can exclude the value you don't need with a subquery
select `group`, avg/value) from my_table
where (group, value) not in (select `group`, max(value)
from my_table
group by `group`)
from my_table
group by `group`

Related

First record combined with GROUP BY

Let's say I got a table "values" which contains the fields
id (int)
name (varchar)
value (float)
timestamp (int)
Now I want to to calculate the highest lowest and first value (timestamp based) for each name on the entire values table.
Is this possible to be achieved in one single performant query? I stumbled upon the 'first_value' function, but that one doesn't seem to work. I tried the following query, using joins, but also without success.
SELECT
a.name,
b.value as open,
MIN(a.value) as low,
MAX(a.value) as high
FROM values a
LEFT JOIN values b
ON a.name = b.name AND b.id = MIN(a.id)
GROUP BY a.name;
Isn't there some sort of function which would make something similar as this possible?
SELECT
name,
FIRST_VALUE(value) as open,
MIN(value) as low,
MAX(value) as high
FROM values
GROUP BY name
ORDER BY timestamp ASC;
Example data
id name value timestamp
1 USD 3 16540
2 EUR 5 16540
3 GBP 4 16540
4 EUR 2 16600
5 USD 4 16600
6 GBP 5 16600
7 USD 6 16660
8 EUR 7 16660
9 GBP 6 16660
10 USD 5 16720
11 EUR 5 16720
12 GBP 7 16720
13 EUR 8 16780
14 USD 7 16780
15 GBP 8 16780
Example output
name open low high
USD 3 3 7
EUR 5 2 8
GBP 4 4 8
I'm using MySQL-client version: 5.6.39
A tie should not be possible, if it does, I don't care which value gets picked.
If you are running MySQL 8.0, this can be quite easily solved with window functions:
select name, value open, low, high
from (
select
name,
value,
min(value) over(partition by name) low,
max(value) over(partition by name) high,
row_number() over(partition by name order by timestamp) rn
from mytable
) x
where rn = 1
Demo on DB Fiddle:
| name | open | low | high |
| ---- | ---- | --- | ---- |
| EUR | 5 | 2 | 8 |
| GBP | 4 | 4 | 8 |
| USD | 3 | 3 | 7 |
In earlier versions, you could:
use a correlated subquery to filter on the first record for each name
join the table with an aggregate query that computes the min and max of each name
Query:
select
t.name,
t.value open,
t0.low,
t0.high
from
mytable t
inner join (
select name, min(value) low, max(value) high from mytable group by name
) t0 on t0.name = t.name
where t.timestamp = (
select min(t1.timestamp) from mytable t1 where t1.name = t.name
);
Demo on MySQL 5.6 DB Fiddle: same results as above
This could also be achieved using inline subqueries (which may actually perform better):
select
t.name,
t.value open,
(select min(value) from mytable t1 where t1.name = t.name) low,
(select max(value) from mytable t1 where t1.name = t.name) high
from
mytable t
where timestamp = (
select min(t1.timestamp) from mytable t1 where t1.name = t.name
)
Demo on MySQL 5.6 DB Fiddle
in one single performant query
Do it logically and let the DBMS worry about performance. If that isn't fast enough, check your indexes.
The value associated with the first timestamp requires a join. You can find the first timestamp easily enough. Getting a value from a row associated with a given row: that's what joins are for.
So, we have:
SELECT
name,
value as open,
v1.low
v1.high
FROM values as v join (
select name,
min(timestamp) as timestamp,
min(value) as low,
max(value) as high
FROM values
GROUP BY name
) as v1
on v.name = v1.name and v.timestamp = v1.timestamp
This solution seems to have the best performance.
SELECT
name,
CAST(SUBSTRING_INDEX(GROUP_CONCAT(CAST(value AS CHAR) ORDER BY TIMESTAMP ASC), ',', 1) AS DECIMAL(10, 6)) AS open,
MIN(value) AS low,
MAX(value) AS high
FROM mytable
GROUP BY name
ORDER BY name ASC

MySQL: Retrieve Values and Counts For Each

How can I count the occurrence of the field/column in SQL?
Example dataset:
A
A
A
A
B
B
C
I want:
A | 4
A | 4
A | 4
A | 4
B | 2
B | 2
C | 1
Is there anyway to do it without using GROUP BY? So far all answer I get my query retuns the following:
A | 4
B | 2
C | 1
select value, count(*) from table group by value
Use HAVING to further reduce the results, e.g. only values that occur more than 3 times:
select value, count(*) from table group by value having count(*) > 3
You could use a nested sub-select for this desired result set.
If the example table name is my_table and the column called col1:
select col1,
(select count(*) from my_table where col1 = t.col1) as Count
from my_table t;
Or if you want to remove the duplicates, use the distinct statement. It removes the duplicates of your result set.
select distinct col1,
(select count(*) from my_table where col1 = t.col1) as Count
from my_table t;

MYSQL Updating row to maximum value of similar rows

I have a table like this in MYSQL:
ID | NAME | VALUE |
----------------------------
1 | Bob | 1 |
2 | Bob | 2 |
3 | Jack | 5 |
4 | Jack | 8 |
5 | Jack | 10 |
and I'm trying to update the VALUE column to the highest value of rows with same NAME. So the result should be:
ID | NAME | VALUE |
----------------------------
1 | Bob | 2 |
2 | Bob | 2 |
3 | Jack | 10 |
4 | Jack | 10 |
5 | Jack | 10 |
I managed to get the max value like this:
SELECT MAX(Value) max FROM `table` GROUP BY Name having count(*) >1 AND MAX(Value) != MIN(Value)
But can't figure out how to put it in my update
Update table set Value = (SELECT MAX(Value) max FROM `table` GROUP BY Name having count(*) >1 AND MAX(Value) != MIN(Value))
Doesn't work. I'd appreciate any help.
This is easier than other answers are making it.
UPDATE MyTable AS t1 INNER JOIN MyTable AS t2 USING (Name)
SET Value = GREATEST(t1.Value, t2.Value);
You don't have to find the largest value. You just have to join each row to the set of rows with the same name, and set the Value to the greater Value of the two joined rows. This is a no-op on some rows, but it will apply to every row in turn.
http://sqlfiddle.com/#!9/f79a3/1
UPDATE t1
INNER JOIN (SELECT name, MAX(`value`) max_value
FROM t1 GROUP BY name) t2
ON t1.name = t2.name
SET t1.value = t2.max_value;
Create a temporary table consisting of ID NAME and MAX VALUE as follows:
CREATE TEMP TABLE TABLE1 AS
(SELECT NAME,MAX(Value) value FROM `table` GROUP BY Name having count(*) >1
AND MAX(Value) != MIN(Value)
);
Use this temporary table to do your update as follows:
UPDATE
Table_A
SET
Table_A.value = Table_B.value
FROM
`table` AS Table_A
INNER JOIN TABLE1 AS Table_B
ON Table_A.NAME = Table_B.NAME
Also this code is somewhat of an approximation as i am not familiar with mysql but i am familiar with sql.
Let me know if this doesn't help.
Simple left join would do the trick.
Try this out and let me know in case of any queries.
select a.id,a.name,b.value
from
table a
left join
(select name,max(value) as value from table group by name) b
on a.name=b.name;
You may use this query. The table is joined with a subquery (table t2) that contains the results you want to update your table with:
UPDATE `table` t1,
(SELECT Name, MAX(Value) maxv, MIN(Value) minv
FROM `table`
GROUP BY Name
HAVING COUNT(*)>1 AND maxv != minv) t2
SET t1.Value = t2.maxv
WHERE t1.Name = t2.Name;
If you want to know how will the values be updated, you can first run an equivalent SELECT query:
SELECT t1.*, t2.maxv
FROM `table` t1,
(SELECT Name, MAX(Value) maxv, MIN(Value) minv
FROM `table`
GROUP BY Name
HAVING COUNT(*)>1 AND maxv != minv) t2
WHERE t1.Name = t2.Name;
This query will display all the fields of table, followed by the new value maxv. You can check the current value and the new value, and if it looks fine, you may run the UPDATE query.

MySql select next lower number without using limit

Is it possible to select the next lower number from a table without using limit.
Eg: If my table had 10, 3, 2 , 1 I'm trying to select * from table where col > 10.
The result I'm expecting is 3. I know I can use limit 1, but can it be done without that?
Try
SELECT MAX(no) no
FROM table1
WHERE no < 10
Output:
| NO |
------
| 3 |
SQLFiddle
Try this query
SELECT
*
FROM
(SELECT
#rid:=#rid+1 as rId,
a.*
FROM
tbl a
JOIN
(SELECT #rid:=0) b
ORDER BY
id DESC)tmp
WHERE rId=2;
SQL FIDDLE:
| RID | ID | TYPE | DETAILS |
------------------------------------
| 2 | 28 | Twitter | #sqlfiddle5 |
Another approach
select a.* from supportContacts a inner join
(select max(id) as id
from supportContacts
where
id in (select id from supportContacts where id not in
(select max(id) from supportContacts)))b
on a.id=b.id
SQL FIDDLE:
| ID | TYPE | DETAILS |
------------------------------
| 28 | Twitter | #sqlfiddle5 |
Alternatively, this query will always get the second highest number based on the inner where clause.
SELECT *
FROM
(
SELECT t.col,
(
SELECT COUNT(distinct t2.col)
FROM tableName t2
WHERE t2.col >= t.col
) as rank
FROM tablename t
WHERE col <= 10
) xx
WHERE rank = 2 -- <<== means second highest
SQLFiddle Demo
SQLFiddle Demo (supports duplicate values)
If you want to get next lower number from table
you can get it with this query:
SELECT distinct col FROM table1 a
WHERE 2 = (SELECT count(DISTINCT(b.col)) FROM table1 b WHERE a.col >= b.col);
later again if you want to get third lower number you can just pass 3 in place of 2 in where clause
again if you want to get second higher number, just change the condition of where clause in inner query with
a.col <= b.col

SQL statement for querying with multiple conditions including 3 most recent dates

I need help in finding the rows that correspond to the most recent date, the next most recent and the one after that, where some condition ABC is "Y" and group it by a column name XYZ ASC but XYZ can appear multiple times. So, say XYZ is 50, then for the rows in the three years, the XYZ will be 50. I have the following code that executes but returns only two rows out of thousands which is impossible. I tried executing just the date condition but it returned dates that were less than or equal to MAX(DATE)-3 as well. Don't know where I am going wrong.
select * from money.cash where DATE =(
select
MAX(DATE)
from
money.cash
where
DATE > (select MAX(DATE)-3 from money.cash)
)
GROUP BY XYZ ASC
having ABC = "Y";
The structure of the table is as follows (only a schematic, not the real thing).
Comp_ID DATE XYZ ABC $$$$ ....
1 2012-1-1 10 Y SOME-AMOUNT
2 2011-1-1 10 Y
3 2006-1-1 10 Y
4 2011-1-1 20 Y
5 2002-1-1 20 Y
6 2000-1-1 20 Y
7 1998-1-1 20 Y
The desired o/p would be the first three rows for XYZ=10 in ascending order and the most recent 3 dates for XYZ=20.
LAST AND IMPORTANT-This table's values keeps changing as new data comes in. So, the o/p(which will be in a new table) must reflect the dynamics in the 1st/original/above TABLE.
MySQL doesn't have functionallity that is friendly to greatest-n-per-group queries.
One option would be...
- Find the MAX(Date) per group (XYZ)
- Then use that result to find the MAX(Date) of all records before that date
- Then do it again for all records before that date
It's really innefficient, but MySQL hasn't got the functionality required to do this efficiently. Sorry...
CREATE TABLE yourTable
(
comp_id INT,
myDate DATE,
xyz INT,
abc VARCHAR(1)
)
;
INSERT INTO yourTable SELECT 1, '2012-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 2, '2011-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 3, '2006-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 4, '2011-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 5, '2002-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 6, '2000-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 7, '1998-01-01', 20, 'Y';
SELECT
yourTable.*
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
yourTable.XYZ,
MAX(yourTable.myDate) AS MaxDate
FROM
yourTable
WHERE
yourTable.ABC = 'Y'
GROUP BY
yourTable.XYZ
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
INNER JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate >= lookup.MaxDate
WHERE
yourTable.ABC = 'Y'
ORDER BY
yourTable.comp_id
;
DROP TABLE yourTable;
There are other options, but they're all a bit hacky. Search SO for greatest-n-per-group mysql.
My results using your example data:
Comp_ID | DATE | XYZ | ABC
------------------------------
1 | 2012-1-1 | 10 | Y
2 | 2011-1-1 | 10 | Y
3 | 2006-1-1 | 10 | Y
4 | 2011-1-1 | 20 | Y
5 | 2002-1-1 | 20 | Y
6 | 2000-1-1 | 20 | Y
Here's another way, hopefully more efficient than Dems' answer.
Test it with an index on (abc, xyz, date):
SELECT m.xyz, m.date --- for all columns: SELECT m.*
FROM
( SELECT DISTINCT xyz
FROM money.cash
WHERE abc = 'Y'
) AS dm
JOIN
money.cash AS m
ON m.abc = 'Y'
AND m.xyz = dm.xyz
AND m.date >= COALESCE(
( SELECT im.date
FROM money.cash AS im
WHERE im.abc = 'Y'
AND im.xyz = dm.xyz
ORDER BY im.date DESC
LIMIT 1
OFFSET 2 --- to get 3 latest rows per xyz
), DATE('1000-01-01') ) ;
If you have more than rows with same (abc, xyz, date), the query may return more than 3 rows per xyz (all tied in 3rd place will all be shown).