Generated columns using aggregate functions - mysql

The starting point
Suppose I have a table devTest that looks like this:
+----+------+
| id | j |
+----+------+
| 1 | 5 |
| 2 | 9 |
| 3 | 4 |
| 4 | 7 |
+----+------+
The goal
I want a column specifying the row's deviation from the mean in the j column (expressed in standard deviations). That is, the table would look like this:
+----+------+------------+
| id | j | jDev |
+----+------+------------+
| 1 | 5 | -0.5637345 |
| 2 | 9 | 1.2402159 |
| 3 | 4 | -1.0147221 |
| 4 | 7 | 0.3382407 |
+----+------+------------+
What I've tried
>alter table devTest add column jDev decimal as ((j - avg(j)) / std(j));
To which I receive an error indicating that aggregate functions can't be used in the definition of a generated column:
ERROR 1901 (HY000): Function or expression 'avg()' cannot be used in the
GENERATED ALWAYS AS clause of `jDev`
Making this kind of column must be pretty common, so I'd love to know the best solution!

In standard SQL you'd do:
select id, j, (j - avg(j) over ()) / std(j) over () as jdev
from devtest;
But MySQL doesn't support analytic window functions such as AVG OVER. So in MySQL, you must select the aggregation values separately:
select d.id, d.j, (d.j - agg.javg) / agg.jstd as jdev
from devtest d
cross join (select avg(j) as javg, std(j) as jstd from devtest) agg;
Then create a view as Benjamin Crouzier suggests in his answer:
CREATE VIEW v_devtest AS
select d.id, d.j, (d.j - agg.javg) / agg.jstd as jdev
from devtest d
cross join (select avg(j) as javg, std(j) as jstd from devtest) agg;
A computed column can only calculate its value from other values in the same record. So what you are trying to do cannot be done with a calculated column.

This error makes sense because any change in your table (say you add a j with value 0) would update your average, and this in turn would change all your generated columns. So this would be quite a bit of work for the query engine.
Another solution would be to define a view instead:
CREATE VIEW j_dev (id, j, j_dev) AS
SELECT id, j,
(j - avg(j)) / std(j) as j_dev
FROM devTest
(not sure about the create view syntax, correct me if I'm wrong)

Related

How to find data based on comma separated parameter in comma separated data in my SQL query

We have below data,
plant table
----------------------------
| name | classification |
| A | 1,4,7 |
| B | 2,3,7 |
| C | 3,4,9,8 |
| D | 1,5,6,9 |
Now from front end side, they will send multiple parameter like "4,9",
and the objective output should be like this
plant table
---------------------------
| name | classification |
| A | 1,4,7 |
| C | 3,4,9,8 |
| D | 1,5,6,9 |
Already tried with FIND_IN_SET code, but only able to fetch only with 1 parameter
select * from plant o where find_in_set('4',classification ) <> 0
Another solution is by doing multiple queries, for example if the parameter is "4,9" then we do loop the query two times with parameter 4 and 9, but actually that solution will consume so much resources since the data is around 10000+ rows and the parameter itself actually can be more than 5 params
If the table design is in bad practice then OK but we are unable to change it since the table is in third party
Any solution or any insight will be appreciated,
Thank you
Schema (MySQL v8.0)
CREATE TABLE broken_table (name CHAR(12) PRIMARY KEY,classification VARCHAR(12));
INSERT INTO broken_table VALUES
('A','1,4,7'),
('B','2,3,7'),
('C','3,4,9,8'),
('D','1,5,6,9');
Query #1
WITH RECURSIVE cte (n) AS
(
SELECT 1
UNION ALL
SELECT n + 1 FROM cte WHERE n < 5
)
SELECT DISTINCT x.name, x.classification FROM broken_table x JOIN cte
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(classification,',',n),',',-1) IN (4,9);
name
classification
A
1,4,7
C
3,4,9,8
D
1,5,6,9
View on DB Fiddle
EDIT:
or, for older versions...
SELECT DISTINCT x.name, x.classification FROM broken_table x JOIN
(
SELECT 1 n UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
) cte
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(classification,',',n),',',-1) IN (4,9)
Let's just avoid the CSV altogether and fix your table design:
plant table
----------------------------
| name | classification |
| A | 1 |
| A | 4 |
| A | 7 |
| B | 2 |
| B | 3 |
| B | 7 |
| ... | ... |
Now with this design, you may use the following statement:
SELECT *
FROM plant
WHERE classification IN (?);
To the ? placeholder, you may bind your collection of values to match (e.g. (4,9)).
You want or so you can use regular expressions. If everything were one digit:
where classification regexp replace('4,9', ',', '|')
However, this would match 42 and 19, which I'm guessing you do not want. So, make this a little more complicated so you have comma delimiters:
where classification regexp concat('(,|^)', replace('4,9', ',', ',|,'), '(,|$)')

SQL operator IN returns only DISTINCT

I have the following query:
SELECT class, subclass ,weight
FROM classes
WHERE classes.term in ('this','paper','present','this','and','this','this')
The above query returns only distinct values. For example I have the following table:
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 3 |
| c | d | paper | 2 |
| e | f | sth | 1 |
+-----------------------------------+
the result I will get is
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 3 |
| c | d | paper | 1 |
+-----------------------------------+
what I actually wanted is the following
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 3 |
| a | b | this | 3 |
| a | b | this | 3 |
| a | b | this | 3 |
| c | d | paper | 2 |
+-----------------------------------+
I there any other way to get all the results without IN "cutting" only distinct values?
The problem is that I cannot change that part: ('this','paper','present','this','and','this','this')
because it is not created by a query. It is a string of words I want to search.
Edit:
- In the original scenario the table contains more than 3000 different words and the actual string is generated by a function I do not have
rights to access and contains 300+ words with many duplicates.
- In the original scenario I want to add the weight of the word every
time it appears in the string
Edit2:
The result I expect is to sum the weights every time a term appears in string.
Expecting results like the following:
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 12 |
| c | d | paper | 2 |
+-----------------------------------+
Is there any other solution?
Use a join:
select c.*
from (select 'this' as term union all
select 'paper' as term union all
select 'present' as term union all
select 'this' as term union all
select 'and' as term union all
select 'this' as term union all
select 'this' as term
) terms left join
classes c
on c.term = terms.term;
This will work in both MySQL and SQLite.
For reference, see this question on how to count the number of occurrences in a substring:
SELECT m.*, (LENGTH('this paper present this and this this') - LENGTH(REPLACE('this paper present this and this this', term, ''))) / LENGTH(term) AS count
FROM myTable;
Once you have the number of occurrences for each string, you can multiply that value by the weight to get the total, like this:
SELECT term, weight * (LENGTH('this paper present this and this this') - LENGTH(REPLACE('this paper present this and this this', term, ''))) / LENGTH(term) AS totalWeight
FROM myTable m;
Note that this solution does not take a separated list of words, but concatenates that list into one string.
Here is an SQL Fiddle example for you.
EDIT
If you want the sum of weights for all terms in the string, without regard to the terms themselves, you can just adjust the query to use the SUM() function, and don't use GROUP BY because you want to sum for the whole table:
SELECT SUM(weight * (LENGTH('this paper present this and this this') - LENGTH(REPLACE('this paper present this and this this', term, ''))) / LENGTH(term)) AS totalWeight
FROM myTable m;
EDIT 2
A little more explanation for the query based on lengths. You can break it up into multiple parts:
LENGTH('this paper present this and this this') returns the number of characters in the string you are searching
LENGTH(REPLACE(myString, term)) is the length of the string above, with your term removed. (So, for example of 'this', it's going to be total length 37, subtracting 16 (4 for each occurrence) which will give you 21.
By subtracting the second value from the first, you'll get the number of characters in the overall string that are as a result of your value (37 - 21 = 16).
Then, it divides it by the length of 'term' to get the number of occurrences. 16 characters, divided by 4 characters in each occurrence means the substring occured 4 times. (16 / 4 = 4). Try these steps again with 'paper' and you will see.
The above procedure is illustrated step by step in this SQL Fiddle.

INSERT data from one table INTO another with the copies (as many as `quantity` field in first table says)

I have an MySQL table creatures:
id | name | base_hp | quantity
--------------------------------
1 | goblin | 5 | 2
2 | elf | 10 | 1
And I want to create creature_instances based on it:
id | name | actual_hp
------------------------
1 | goblin | 5
2 | goblin | 5
3 | elf | 10
The ids of creatures_instances are not important and not relevant to creatures.ids.
How can I make it with just the MySQL in the most optimal (in terms of execution time) way? The single query would be best, but procedure is ok too. I use InnoDB.
I know that with a help of e.g. php I could:
select each row separately,
make for($i=0; $i<line->quantity; $i++) loop in which I insert one row to creatures_instances for each iteration.
The most efficient way is to do everything in SQL. It helps if you have a numbers table. Without one, you can generate the numbers in a subquery. The following works up to 4 copies:
insert into creatures_instances(id, name, actual_hp)
select id, name, base_hp
from creatures c join
(select 1 as n union all select 2 union all select 3 union all select 4
) n
on n.n <= c.quantity;

Is it possible to automaticly have in a row a substraction between two MYSQL variables

I would like to have something like this :
+----------+------+-----+--------+
| image_id | good | bad | result |
+----------+------+-----+--------+
| 1 | 10 | 2 | x |
+----------+------+-----+--------+
| 2 | 4 | 1 | y |
+----------+------+-----+--------+
Where x and y is calculated automaticaly to be respectively 10 - 2 and 4 - 1. (good - bad) -avoid negative number if possible-
I would like this value to change if values (good or bad) changes as well.
+----------+------+-----+--------+
| image_id | good | bad | result |
+----------+------+-----+--------+
| 1 | 10 | 2 | x |
+----------+------+-----+--------+
| 2 | 4 | 1 | y |
+----------+------+-----+--------+
I can do this in php but is there a way to do this directly with MYSQL ?
Calculate the result and return no less than zero, so avoiding negative numbers:
SELECT image_id, good, bad, GREATEST(good-bad, 0) AS result from `table`;
use this query:
select image_id, good, bad, GREATEST(good-bad, 0) as 'result' from tbl
This will calculate the difference for each row and returns the result (or 0 if the result is negative= in another column named result.
As a general rule, try to avoid to store in columns the results of calculation based entirely on other columns of the same table, expecially if the calculations are so trivial like a simple difference.
You can simply write:
select image_id, good, bad, (good-bad) as result from mytable
What you could do is have this schema:
CREATE TABLE tbl (image_id INTEGER PRIMARY KEY, good INTEGER, bad INTEGER);
CREATE VIEW tbl_result AS SELECT image_id, good, bad, CAST(good AS INTEGER) - bad AS result FROM tbl;

Combine count rows in MySQL

I've got a table in MySQL that looks roughly like:
value | count
-------------
Fred | 7
FRED | 1
Roger | 3
roger | 1
That is, it was created with string ops outside of MySQL, so the values are case- and trailing-whitespace-sensitive.
I want it to look like:
value | count
-------------
Fred | 8
Roger | 4
That is, managed by MySQL, with value a primary key. It's not important which one (of "Fred" or "FRED") is kept.
I know how to do this in code. I also know how to generate a list of problem values (with a self-join). But I'd like to come up with a SQL update/delete to migrate my table, and I can't think of anything.
If I knew that no pair of records had variants of one value, with the same count (like ("Fred",4) and ("FRED",4)), then I think I can do it with a self-join to copy the counts, and then an update to remove the zeros. But I have no such guarantee.
Is there something simple I'm missing, or is this one of those cases where you just write a short function outside of the database?
Thanks!
As an example of how to obtain the results you are looking for with a SQL query alone:
SELECT UPPER(value) AS name, SUM(count) AS qty FROM table GROUP BY name;
If you make a new table to hold the correct values, you INSERT the above query to populate the new table as so:
INSERT INTO newtable (SELECT UPPER(value) AS name, SUM(count) AS qty FROM table GROUP BY name);
Strangely, MySQL seems to do this for you. I just tested this in MySQL 5.1.47:
create table c (value varchar(10), count int);
insert into c values ('Fred',7), ('FRED',1), ('Roger',3), ('roger',1);
select * from c;
+-------+-------+
| value | count |
+-------+-------+
| Fred | 7 |
| FRED | 1 |
| Roger | 3 |
| roger | 1 |
+-------+-------+
select value, sum(count) from c group by value;
+-------+------------+
| value | sum(count) |
+-------+------------+
| Fred | 8 |
| Roger | 4 |
+-------+------------+
I was surprised to see MySQL transform the strings like that, and I'm not sure I can explain why it did that. I was expecting to have to get four distinct rows, and to have to use some string functions to map the values to a canonical form.