Create a mysql function that accepts a result set as parameter? - mysql

Is it possible to create a mySQL function that accepts as a parameter the result set from a query?
Basically I have a lot of queries that will return a result result set as follows:
id | score
70 | 25
71 | 7
72 | 215
74 | 32
75 | 710
76 | 34
78 | 998
79 | 103
80 | 3
I want to normalize the values such that they come to a range between 0 and 1.
The way I thought I'd do this was by applying calculation:
nscore = (score-min(score))/(max(score) - min(score))
to get following result
id | score
70 | 0.022
71 | 0.004
72 | 0.213
74 | 0.029
75 | 0.710
76 | 0.031
78 | 1.000
79 | 0.100
80 | 0.000
But I'm not able to come up with a query to get the min and max in this query along with results, hence thought of using a function (cannot use stored procedure) but couldn't documentation on how to pass a result set.
Any help appreciated!Thanks!
EDIT:
The score field in result is a computed field. Cannot select it directly.
For eg: Sample query that returns the above result -
select t.id as id, count(*) as score
from tbl t
inner join tbl2 t2 on t.idx = t2.idx
where t2.role in (.....)
just for demo purpose, not actual schema or query

No. MySQL doesn't support defining a function with a resultset as an argument.
Unfortunately, MySQL does not support Common Table Expression (CTE), and does not support Analytic functions.
To get this result from a MySQL query... one way to do that in MySQL would require the original query to be returned as an inline view, two times ...
As an example:
SELECT t.id
, (t.score-s.min_score)/(s.max_score-s.min_score) AS normalized_score
FROM (
-- original query here
SELECT id, score FROM ...
) t
CROSS
JOIN ( SELECT MIN(r.score) AS min_score
, MAX(r.score) AS max_score
FROM (
-- original query here
SELECT id, score FROM ...
) r
) s
ORDER BY t.id
EDIT
Based on the query added to the question ...
SELECT q.id
, (q.score-s.min_score)/(s.max_score-s.min_score) AS normalized_score
FROM ( -- original query goes here
-- ------------------------
select t.id as id, count(*) as score
from tbl t
inner join tbl2 t2 on t.idx = t2.idx
where t2.role in (.....)
-- ------------------------
) q
CROSS
JOIN ( SELECT MIN(r.score) AS min_score
, MAX(r.score) AS max_score
FROM ( -- original query goes here
-- ------------------------
select t.id as id, count(*) as score
from tbl t
inner join tbl2 t2 on t.idx = t2.idx
where t2.role in (.....)
-- ------------------------
) r
) s
ORDER BY q.id

Related

How to optimize a my sql query taking 2 mins and 25 secs to run

I have a table with 3 GB data (It will keep on increasing) and I need to display total sales, top category and top product (Maximum occurrence in the column).
Following is the query that's giving me the above mentioned result:
select t.category,
sum(t.sale) sales,
(select product
from demo
where category = t.category
group by product
order by count(*) desc
limit 1) top_product
from demo t
group by t.category
The above query takes approximately 2 mins and 25 seconds. I couldn't find any way to optimize it. Is there any other way that someone could recommend?
Example table:
category product sale
C1 P1 10
C2 P2 12
C3 P1 14
C1 P2 15
C1 P1 02
C2 P2 10
C2 P3 22
C3 P1 01
C3 P2 27
C3 P3 02
Output:
category Top product Total sales
C1 P1 27
C2 P2 44
C3 P1 44
Your query could be written like this:
SELECT g1.category, g1.sum_sale, g2.product
FROM (
SELECT category, SUM(sale) AS sum_sale
FROM demo
GROUP BY category
) AS g1
INNER JOIN (
SELECT category, product, COUNT(*) AS product_count
FROM demo
GROUP BY category, product
) AS g2 ON g1.category = g2.category
INNER JOIN (
SELECT category, MAX(product_count) AS product_count_max
FROM (
SELECT category, product, COUNT(*) AS product_count
FROM demo
GROUP BY category, product
) AS x
GROUP BY category
) AS g3 ON g2.category = g3.category AND g2.product_count = g3.product_count_max
Basically it tries to find the maximum count(*) per category and from that it calculates the product. It could benefit from appropriate indexes.
A MySQL only hack solution is using GROUP_CONCAT in combination with nested SUBSTRING_INDEX functions to get the first element in an Ordered comma separated string.
It is not an ideal approach; but it will reduce the number of subqueries required, and may be efficient for your peculiar case.
You will also need to use SET SESSION group_concat_max_len = ##max_allowed_packet;.
We basically determine sales and count of occurrence, for a product and category combination. This result-set is then used as a Derived Table, and we use the Group_concat() hack to determine the product with maximum count in a category.
SET SESSION group_concat_max_len = ##max_allowed_packet;
SELECT
dt.category,
SUM(dt.sale_per_category_product) AS total_sales,
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(dt.product ORDER BY dt.product_count_per_category DESC)
, ','
, 1
)
, ','
, -1
) AS top_product
FROM
(
SELECT
category,
product,
SUM(sale) AS sale_per_category_product,
COUNT(*) AS product_count_per_category
FROM demo
GROUP BY category, product
) AS dt
GROUP BY dt.category
Schema (MySQL v5.7)
| category | total_sales | top_product |
| -------- | ----------- | ------------|
| C1 | 27 | P1 |
| C2 | 44 | P2 |
| C3 | 44 | P1 |
View on DB Fiddle

How to get average per group and figure out outliers in SQL

This is what my data looks like:
id | value | group
------------------
1 | 4 | abc
2 | 8 | def
3 | 100 | abc
4 | 8 | ghi
5 | 7 | abc
6 | 10 | ghi
I need to figure out the averages per group where outliers (for e.g. id = 3 for group = abc) are excluded. Then display the ouliers next to averages. For above data I am expecting something like this as result:
group = 'abc'
average = '5.5'
outlier = '100'
One method creates a subquery containing the stats for each group (mean and standard deviation), and then joins this back to the original table to determine which records are outliers, for which group.
SELECT t1.id,
t1.group AS `group`,
t2.valAvg AS average,
t1.value AS outlier
FROM yourTable t1
INNER JOIN
(
SELECT `group`, AVG(value) AS valAvg, STDDEV(value) AS valStd
FROM yourTable
GROUP BY `group`
) t2
ON t1.group = t2.group
WHERE ABS(t1.value - t2.valAvg) > t2.valStd -- any record whose value is MORE
-- than one standard deviation from
-- the mean is an outlier
Update:
It appears that, for some reason, your value column is actual varchar rather than a numeric type. This means you won't be able to do any math on it. So first, convert that column to integer via:
ALTER TABLE yourTable MODIFY value INTEGER;
If you only want outliers which are greater than the average then use the following WHERE clause:
WHERE t1.value - t2.valAvg > t2.valStd
You can exclude the value you don't need with a subquery
select `group`, avg/value) from my_table
where (group, value) not in (select `group`, max(value)
from my_table
group by `group`)
from my_table
group by `group`

selecting multiple records where count is some value

There is a huge database with more than 500k values, but with only one table having all the data. I need to extract some of it for a given condition.
Table structure is like this,
column_a | column_b
A | 30
A | 40
A | 70
B | 25
B | 45
C | 10
C | 15
C | 25
I need to extract all the data having a count(column_a) = 3. the catch is that i need to get all the three records too. Like this,
column_a | column_b
A | 30
A | 40
A | 70
C | 10
C | 15
C | 25
I have tried to do this with a query like this
select column_a,column_b group by column_a having count(*)=3;
Here i get the correct values for column_a but only one record from each.
Thanks in advance,
Bhashithe
One approach is to INNER JOIN your original table to a subquery which identifies the column_a records which come in groups of exactly 3.
SELECT t1.column_a, t1.column_b
FROM table t1
INNER JOIN
(
SELECT column_a, COUNT(*)
FROM table
GROUP BY column_a
HAVING COUNT(*) = 3
) t2
ON t1.column_a = t2.column_a
You can use nested query, if you want.
Here, inner query fetches the records having column_a size equals to 3 and outer query displays all the records using the 'IN' clause.
SELECT t.column_a, t.column_b FROM table t
WHERE t.column_a IN
(
SELECT t1.column_a FROM table t1
GROUP BY t1.column_a
HAVING COUNT(t1.column_a) = 3
)
ORDER BY t.column_a;

SQL statement for querying with multiple conditions including 3 most recent dates

I need help in finding the rows that correspond to the most recent date, the next most recent and the one after that, where some condition ABC is "Y" and group it by a column name XYZ ASC but XYZ can appear multiple times. So, say XYZ is 50, then for the rows in the three years, the XYZ will be 50. I have the following code that executes but returns only two rows out of thousands which is impossible. I tried executing just the date condition but it returned dates that were less than or equal to MAX(DATE)-3 as well. Don't know where I am going wrong.
select * from money.cash where DATE =(
select
MAX(DATE)
from
money.cash
where
DATE > (select MAX(DATE)-3 from money.cash)
)
GROUP BY XYZ ASC
having ABC = "Y";
The structure of the table is as follows (only a schematic, not the real thing).
Comp_ID DATE XYZ ABC $$$$ ....
1 2012-1-1 10 Y SOME-AMOUNT
2 2011-1-1 10 Y
3 2006-1-1 10 Y
4 2011-1-1 20 Y
5 2002-1-1 20 Y
6 2000-1-1 20 Y
7 1998-1-1 20 Y
The desired o/p would be the first three rows for XYZ=10 in ascending order and the most recent 3 dates for XYZ=20.
LAST AND IMPORTANT-This table's values keeps changing as new data comes in. So, the o/p(which will be in a new table) must reflect the dynamics in the 1st/original/above TABLE.
MySQL doesn't have functionallity that is friendly to greatest-n-per-group queries.
One option would be...
- Find the MAX(Date) per group (XYZ)
- Then use that result to find the MAX(Date) of all records before that date
- Then do it again for all records before that date
It's really innefficient, but MySQL hasn't got the functionality required to do this efficiently. Sorry...
CREATE TABLE yourTable
(
comp_id INT,
myDate DATE,
xyz INT,
abc VARCHAR(1)
)
;
INSERT INTO yourTable SELECT 1, '2012-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 2, '2011-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 3, '2006-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 4, '2011-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 5, '2002-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 6, '2000-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 7, '1998-01-01', 20, 'Y';
SELECT
yourTable.*
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
yourTable.XYZ,
MAX(yourTable.myDate) AS MaxDate
FROM
yourTable
WHERE
yourTable.ABC = 'Y'
GROUP BY
yourTable.XYZ
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
INNER JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate >= lookup.MaxDate
WHERE
yourTable.ABC = 'Y'
ORDER BY
yourTable.comp_id
;
DROP TABLE yourTable;
There are other options, but they're all a bit hacky. Search SO for greatest-n-per-group mysql.
My results using your example data:
Comp_ID | DATE | XYZ | ABC
------------------------------
1 | 2012-1-1 | 10 | Y
2 | 2011-1-1 | 10 | Y
3 | 2006-1-1 | 10 | Y
4 | 2011-1-1 | 20 | Y
5 | 2002-1-1 | 20 | Y
6 | 2000-1-1 | 20 | Y
Here's another way, hopefully more efficient than Dems' answer.
Test it with an index on (abc, xyz, date):
SELECT m.xyz, m.date --- for all columns: SELECT m.*
FROM
( SELECT DISTINCT xyz
FROM money.cash
WHERE abc = 'Y'
) AS dm
JOIN
money.cash AS m
ON m.abc = 'Y'
AND m.xyz = dm.xyz
AND m.date >= COALESCE(
( SELECT im.date
FROM money.cash AS im
WHERE im.abc = 'Y'
AND im.xyz = dm.xyz
ORDER BY im.date DESC
LIMIT 1
OFFSET 2 --- to get 3 latest rows per xyz
), DATE('1000-01-01') ) ;
If you have more than rows with same (abc, xyz, date), the query may return more than 3 rows per xyz (all tied in 3rd place will all be shown).

Select distinct values from two columns

I have a table with the following structure:
itemId | direction | uid | created
133 0 17 1268497139
432 1 140 1268497423
133 0 17 1268498130
133 1 17 1268501451
I need to select distinct values for two columns - itemId and direction, so the output would be like this:
itemId | direction | uid | created
432 1 140 1268497423
133 0 17 1268498130
133 1 17 1268501451
In the original table we have two rows with the itemId - 133 and direction - 0, but we need only one of this rows with the latest created time.
Thank you for any suggestions!
Use:
SELECT t.itemid,
t.direction,
t.uid,
t.created
FROM TABLE t
JOIN (SELECT a.itemid,
MAX(a.created) AS max_created
FROM TABLE a
GROUP BY a.itemid) b ON b.itemid = t.itemid
AND b.max_created = t.created
You have to use an aggregate (IE: MAX) to get the largest created value per itemid, and join that onto an unaltered copy of the table to get the values associated with the maximum created value for each itemid.
select t1.itemid, t1.direction, t1.uid, t1.created
from (select t2.itemid, t2.direction, t2.created as maxdate
from tbl t2
group by itemid, direction) x
inner join tbl t1
on t1.itemid = x.itemid
and t1.direction = x.direction
and t1.created = x.maxdate