Can these two SQL queries be combined? - mysql

I'm wondering if there's a way I can combine these two queries into one? I need to get the mean and std dev for each column in the company_feature table. I then need to take those two values and use them in an aggregation query on each row in the company_feature table.
/* Get mean and std dev for each feature column */
SELECT
AVG(F1) AS F1_mean,
STDDEV(F1) AS F1_std_dev
FROM company_feature_test cft;
/* Add averages for each feature to the following query */
SELECT
DATA.company_id,
(
CASE
WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
THEN
1 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 1
WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
THEN
2 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 2
WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
THEN
3 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 3
WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
THEN
4 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev ) * 4
ELSE
5 * ((LN(DATA.F1 + 1) - :F1_mean) / :F1_std_dev )
END
) AS feature_1
FROM (
SELECT company.in_ref_set, company.size, cft.*
FROM company_feature_test cft
JOIN company ON company.id = cft.company_id
GROUP BY company.id
) AS DATA
GROUP BY DATA.company_id;
the tables look like the following (below). There is a relation between company.id and company_feature.company_id.
company table
| id | ref_set | size |
| -- | --- | --- |
| 1 | 0 | SMALL |
| 2 | 1 | LARGE |
company_feature table
| company_id | F1 | F2 |
| --- | --- | --- |
| 1 | 5 | 10 |
| 2 | 15 | 20 |
The query outputs the following data:
| company_id | feature_1 |
| --- | --- |
| 1 | -1.66 |
| 2 | -1.44 |

Yes, you just cross join them:
SELECT
DATA.company_id,
(
CASE
WHEN DATA.in_ref_set = 0 AND DATA.size = 'SMALL'
THEN
1 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 1
WHEN DATA.in_ref_set = 0 AND DATA.size = 'MEDIUM'
THEN
2 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 2
WHEN DATA.in_ref_set = 0 AND DATA.size = 'LARGE'
THEN
3 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 3
WHEN DATA.in_ref_set = 0 AND DATA.size = 'VERY_LARGE'
THEN
4 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev ) * 4
ELSE
5 * ((LN(DATA.F1 + 1) - TOTALS.F1_mean) / TOTALS.F1_std_dev )
END
) AS feature_1
FROM (
SELECT company.in_ref_set, company.size, cft.*
FROM company_feature_test cft
JOIN company ON company.id = cft.company_id
GROUP BY company.id
) AS DATA
CROSS JOIN (
SELECT
AVG(F1) AS F1_mean,
STDDEV(F1) AS F1_std_dev
FROM company_feature_test cft
) AS TOTALS
Note that there's no need to group by in the outer query; there will already only be one row per company.
Note that you still seem to be doing conditional aggregation incorrectly, if that is what you are trying to do; assuming there are multiple rows in cft for each company, you will be selecting an arbitrary F1 for each company. Default settings in newer versions of mysql will prohibit this.

Related

Sum of repeating values in successive rows

I have a table like this:
id col1 col2 col3
10 1 3
9 1 2 3
8 2 3
7 2 3
6 1 2
5 3
Each column has one value only or null. Eg. Col1 has 1 or empty. Col2 has 2 or empty.
I'd like to get the sum of repeating values only between two successive rows.
so the result would look like this:
I need to get the sum of total repeating values in each row.
id col1 col2 col3 Count
10 1 3 2 (shows the repeating values between id10 & id9 rows)
9 1 2 3 2 (shows the repeating values between id9 & id8 rows)
8 2 3 1
7 2 1
6 1 2 0
5 3
I googled and tried some queries I found on the web but couldn't get the right result. Thanks in advance for your help.
To further clarify, for example:
id10 row has (1,,3) and id9 row has (1,2,3). so there is two values repeating. so count is 2.
If the ids are consecutive and there are no gaps, you can do it with a self join:
select
t.*,
coalesce((t.col1 = tt.col1), 0) +
coalesce((t.col2 = tt.col2), 0) +
coalesce((t.col3 = tt.col3), 0) count
from tablename t left join tablename tt
on tt.id = t.id - 1
See the demo.
Results:
| id | col1 | col2 | col3 | count |
| --- | ---- | ---- | ---- | ----- |
| 10 | 1 | | 3 | 2 |
| 9 | 1 | 2 | 3 | 2 |
| 8 | | 2 | 3 | 1 |
| 7 | | 2 | | 1 |
| 6 | 1 | 2 | | 0 |
| 5 | | | 3 | 0 |
And if there are gaps...
SELECT a.id
, a.col1
, a.col2
, a.col3
, COALESCE(a.col1 = b.col1,0) + COALESCE(a.col2 = b.col2,0) + COALESCE(a.col3 = b.col3,0) n
FROM
( SELECT x.*
, MIN(y.id) y_id
FROM my_table x
JOIN my_table y
ON y.id > x.id
GROUP
BY x.id
) a
LEFT
JOIN my_table b
ON b.id = a.y_id;
Were you to restructure your schema, then you could do something like this instead...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL
,val INT NOT NULL
,PRIMARY KEY(id,val)
);
INSERT INTO my_table VALUES
(10,1),
(10,3),
( 9,1),
( 9,2),
( 9,3),
( 8,2),
( 8,3),
( 7,2),
( 7,3),
( 6,1),
( 6,2),
( 5,3);
SELECT a.id
, COUNT(b.id) total
FROM
( SELECT x.*
, MIN(y.id) next
FROM my_table x
JOIN my_table y
ON y.id > x.id
GROUP
BY x.id
, x.val
) a
LEFT
JOIN my_table b
ON b.id = a.next
AND b.val = a.val
GROUP
BY a.id;
+----+-------+
| id | total |
+----+-------+
| 5 | 0 |
| 6 | 1 |
| 7 | 2 |
| 8 | 2 |
| 9 | 2 |
+----+-------+
You can use :
select t1_ID, t1_col1,t1_col2,t1_col3, count
from
(
select t1.id as t1_ID, t1.col1 as t1_col1,t1.col2 as t1_col2,t1.col3 as t1_col3, t2.*,
case when t1.col1 = t2.col1 then 1 else 0 end +
case when t1.col2 = t2.col2 then 1 else 0 end +
case when t1.col3 = t2.col3 then 1 else 0 end as count
from tab t1
left join tab t2
on t1.id = t2.id + 1
order by t1.id
) t3
order by t1_ID desc;
Demo
If there are gaps between id values for the next row, you could have user defined variables to explicitly assign values to rows in their natural ordering in the table. Rest logic remains the same as already answered. You would do an inner join between current row number and next row number to get the col1,col2 and col3 values and use coalesce for computation of count.
select derived_1.*,
coalesce((derived_1.col1 = derived_2.col1), 0) +
coalesce((derived_1.col2 = derived_2.col2), 0) +
coalesce((derived_1.col3 = derived_2.col3), 0) count
from (
select #row := #row + 1 as row_number,t1.*
from tablename t1,(select #row := 0) d1
) derived_1
left join (
select *
from (
select #row2 := #row2 + 1 as row_number,t2.*
from tablename t2,(select #row2 := 0) d2
) d3
) derived_2
on derived_1.row_number + 1 = derived_2.row_number;
Demo: https://www.db-fiddle.com/f/wAzb67zSEfbZKg5RywQvC8/1

How to get percentage of rows which are not NULL in a specific column?

I have a database setup like this:
Table called reviews
+ -------- +
| review |
+ -------- +
| awda |
| ggagw |
| okok |
| ligjr |
| kkfm |
| seff |
| oawr |
| abke |
| (null) |
| (null) |
| (null) |
| (null) |
| (null) |
| (null) |
| (null) |
+ -------- +
How do I get the percentage of how many rows there are, which are NOT NULL?
A basic "formula" of what I want:
percentage = 100 * ( (Sum of rows where review is not null) / (Amount of rows) )
For the example above, this would be:
percentage = 100 * ( ( 8 ) / ( 15) )
= 53.33333333
How can I achieve that by using only one MySQL query?
I think the simplest way is:
select avg( review is not null ) * 100
from reviews;
MySQL treats boolean expressions as numbers in a numeric context, with 0 for false and 1 for true.
Similar method does the explicit division:
select 100*count(review) / count(*)
from reviews;
-- now for finished
-- Calculate the percentage with the formula
-- 100 * value / (total)
select _not_null, _null, percentage FROM (
select _not_null, _null, (100 * _not_null / ( _not_null + _null)) as percentage FROM (
-- it starts here !!!
-- sum 1 for each time the column in null
-- and 0 for each time the column is not null
select some_column,
sum( CASE WHEN review is not null THEN 1 ELSE 0 END) as _not_null,
sum( CASE WHEN review is null THEN 1 ELSE 0 END) as _null
from my_table
group by some_column
) as internal
) as internal2
--
-- you can select values over a percentage yet
--
where internal2.percentage > 75

Get record in range of multiples of 5

i have a existing new week_table -
start_date end_date weekno ----------------------------------------------
1996-01-01 1996-01-05 1
1996-01-08 1996-01-12 2
1996-01-15 1996-01-19 3
1996-01-22 1996-01-26 4
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''till
1998-12-21 1998-12-26 156
i am trying to extract records with a count of 5 weeks in group. I am looking at results like
start_date end_date weekno_start weekno_end ----------------------------------------------
1996-01-01 1996-02-02 1 5
1996-02-05 1996-03-08 6 10
1996-03-11 1996-04-12 11 16
i do get the results but the weekno numbers keep running over the maximum week no in the database. for records over weekno 156 i get rows with null value.
How can i avoid the records with null and limit the view to the maximum week no
my current code is-
SELECT (t1.weekno * 5) - 4 AS start_id
,t3.start_date
,t4.end_date
,(t1.weekno * 5) AS end_id
FROM weekcon_table t1
LEFT JOIN weekcon_table t2 ON (t2.weekno = t1.weekno * 5)
LEFT JOIN weekcon_table t3 ON (t3.weekno = (t1.weekno * 5) - 4)
LEFT JOIN weekcon_table t4 ON (t4.weekno = (t1.weekno * 5))
Have you tried something like this:
select min(weekno) as `start_id`,
min(start_date) as `start_date`,
max(end_date) as `end_date`,
min(weekno) as `weekno_start`,
max(weekno) as `weekno_end`
from weekcon_table
group by ((weekno - 1) DIV 5)
order by ((weekno - 1) DIV 5) asc
Here is the output:
start_id start_date end_date weekno_start weekno_end
1 01/01/1996 26/01/1996 1 5
6 04/03/1996 24/02/1996 6 10
11 01/04/1996 30/03/1996 11 15
16 06/05/1996 27/04/1996 16 20
21 03/06/1996 25/05/1996 21 23
Record Count: 5; Execution Time: 1ms View Execution Plan link
I create two tables and asign a rank_id
the first one is for star_date ... will be each row weekno % 5 = 1
second table is for end_date ... will be each row weekno % 5 = 0 and also include the last date of all weeks.
Then join by rank_id
Sql Fiddle Demo In the demo you can change the select fields for * if want see what is happening
SELECT ini_range.start_date,
end_range.end_date,
ini_range.weekno,
end_range.weekno
FROM
(
SELECT r.* ,
(SELECT count(distinct r2.weekno)
FROM
(
SELECT *
FROM t_week
WHERE weekno % 5 = 1
) r2
WHERE r2.weekno <= r.weekno
) as rank
FROM
(
SELECT *
FROM t_week
WHERE weekno % 5 = 1
) r
) ini_range
JOIN
(
SELECT r.* ,
(SELECT count(distinct r2.weekno)
FROM
(
SELECT *
FROM t_week
WHERE weekno % 5 = 0
or weekno = (SELECT max(weekno) FROM t_week)
) r2
WHERE r2.weekno <= r.weekno
) as rank
FROM
(
SELECT *
FROM t_week
WHERE weekno % 5 = 0
or weekno = (SELECT max(weekno) FROM t_week)
) r
) end_range
ON ini_range.rank = end_range.rank
OUTPUT
| start_date | end_date | weekno | weekno |
|------------|------------|--------|--------|
| 01/01/1996 | 03/02/1996 | 1 | 5 |
| 05/02/1996 | 09/03/1996 | 6 | 10 |
| 11/03/1996 | 13/04/1996 | 11 | 15 |
| 15/04/1996 | 18/05/1996 | 16 | 20 |
| 20/05/1996 | 08/06/1996 | 21 | 23 | <- 23 is last week
and group only have
3 week instead of 5
I found another solution
SQL Fiddle Demo
SELECT *
FROM t_week w_ini
JOIN t_week w_end
ON w_ini.weekno = w_end.weekno + 4
OR w_ini.weekno + 5 > w_end.weekno
WHERE
w_ini.weekno % 5 = 1
and w_ini.weekno < w_end.weekno
and(
w_end.weekno % 5 = 0 or
w_end.weekno = (SELECT max(weekno) FROM t_week)
)

MySQL calculate score by rank in percentage

I am using MYSQL to create a rating system to implement my database. What I want to do is to rate each attribute by its percentage with some calculation. Here is the example database:
| ID | VALUE1 | VALUE2|
-----------------------
| 2 | 5 | 20 |
| 4 | 5 | 30 |
| 1 | 3 | 5 |
| 3 | 2 | 8 |
Here is the ideal output I need:
| ID | VALUE1 | RANK1 | Score1 | VALUE2 | RANK2 | Score2 |
---------------------------------------------------------
| 2 | 5 | 1 | 10 | 20 | 2| 8.3|
| 4 | 5 | 1 | 10 | 30 | 1| 10|
| 1 | 3 | 2 | 7.5| 5 | 4| 5|
| 3 | 2 | 3 | 5 | 8 | 3| 6.6|
The formula for score calculation is
5+5*(MaxRank-rank)/(MaxRank-MinRank)
How to generate multiple ranking like the table? I have tried
SELECT
#min_rank := 1 AS min_rank
, #max_rank1 := (SELECT COUNT(DISTINCT value1) FROM table) AS max_rank1
, #max_rank2 := (SELECT COUNT(DISTINCT value2) FROM table) AS max_rank2
;
SELECT
ID
, R1
, TRUNCATE(5.0+5.0 * (#max_rank1 - R1) / (#max_rank1 - #min_rank), 2) AS Score1
, R2
, TRUNCATE(5.0+5.0 * (#max_rank2 - R2) / (#max_rank2 - #min_rank), 2) AS Score2
FROM (
SELECT
ID
, value1
, FIND_IN_SET( `value1`, (SELECT GROUP_CONCAT(DISTINCT `value1` ORDER BY `value1` DESC) FROM table)) AS R1
, value2
, FIND_IN_SET( `value2`, (SELECT GROUP_CONCAT(DISTINCT `value2` ORDER BY `value2` DESC) FROM table)) AS R2
FROM table
) ranked_table;
It works fine with ranking below 170. My database has approximate 200+ ranking for some values and ranks larger then 170 will be seen as 0 when it returns. In that case, the scores with ranks >170 will be miscalculated. Thank you guys.
That looks nasty to calculate.
Something like this might do it
SELECT a.ID, a.VALUE1, Sub1.Rank1, (5.0+5.0 * (Sub3.MaxRank1 - Sub1.Rank1) / (Sub3.MaxRank1 - 1)) AS Score1, a.VALUE2, Sub2.Rank2, (5.0+5.0 * (Sub4.MaxRank2 - Sub2.Rank2) / (Sub4.MaxRank2 - 1)) AS Score2
FROM TestTable a
INNER JOIN (SELECT DISTINCT z.VALUE1, (SELECT ((COUNT(DISTINCT VALUE1) + 1)) FROM TestTable y WHERE z.VALUE1 < y.VALUE1) AS RANK1
FROM TestTable z
) Sub1 ON a.VALUE1 = Sub1.VALUE1
INNER JOIN (SELECT DISTINCT z.VALUE2, (SELECT ((COUNT(DISTINCT VALUE2) + 1)) FROM TestTable y WHERE z.VALUE2 < y.VALUE2) AS RANK2
FROM TestTable z
) Sub2 ON a.VALUE2 = Sub2.VALUE2
CROSS JOIN (SELECT COUNT(*) + 1 AS MaxRank1 FROM TestTable CROSS JOIN (SELECT MAX(VALUE1) AS MaxValue1 FROM TestTable) Sub3a WHERE VALUE1 < MaxValue1) Sub3
CROSS JOIN (SELECT COUNT(*) + 1 AS MaxRank2 FROM TestTable CROSS JOIN (SELECT MAX(VALUE2) AS MaxValue2 FROM TestTable) Sub4a WHERE VALUE2 < MaxValue2) Sub4
Note I am not sure on your score calculation. The equation you give doesn't appear to me to give the results in your example. But I might just be misreading it.

Select a row and rows around it

Ok, let's say I have a table with photos.
What I want to do is on a page display the photo based on the id in the URI. Bellow the photo I want to have 10 thumbnails of nearby photos and the current photo should be in the middle of the thumbnails.
Here's my query so far (this is just an example, I used 7 as id):
SELECT
A.*
FROM
(SELECT
*
FROM media
WHERE id < 7
ORDER BY id DESC
LIMIT 0, 4
UNION
SELECT
*
FROM media
WHERE id >= 7
ORDER BY id ASC
LIMIT 0, 6
) as A
ORDER BY A.id
But I get this error:
#1221 - Incorrect usage of UNION and ORDER BY
Only one ORDER BY clause can be defined for a UNION'd query. It doesn't matter if you use UNION or UNION ALL. MySQL does support the LIMIT clause on portions of a UNION'd query, but it's relatively useless without the ability to define the order.
MySQL also lacks ranking functions, which you need to deal with gaps in the data (missing due to entries being deleted). The only alternative is to use an incrementing variable in the SELECT statement:
SELECT t.id,
#rownum := #rownum+1 as rownum
FROM MEDIA t, (SELECT #rownum := 0) r
Now we can get a consecutively numbered list of the rows, so we can use:
WHERE rownum BETWEEN #midpoint - ROUND(#midpoint/2)
AND #midpoint - ROUND(#midpoint/2) +#upperlimit
Using 7 as the value for #midpoint, #midpoint - ROUND(#midpoint/2) returns a value of 4. To get 10 rows in total, set the #upperlimit value to 10. Here's the full query:
SELECT x.*
FROM (SELECT t.id,
#rownum := #rownum+1 as rownum
FROM MEDIA t,
(SELECT #rownum := 0) r) x
WHERE x.rownum BETWEEN #midpoint - ROUND(#midpoint/2) AND #midpoint - ROUND(#midpoint/2) + #upperlimit
But if you still want to use LIMIT, you can use:
SELECT x.*
FROM (SELECT t.id,
#rownum := #rownum+1 as rownum
FROM MEDIA t,
(SELECT #rownum := 0) r) x
WHERE x.rownum >= #midpoint - ROUND(#midpoint/2)
ORDER BY x.id ASC
LIMIT 10
I resolve this by using the below code:
SELECT A.* FROM (
(
SELECT * FROM gossips
WHERE id < 7
ORDER BY id DESC
LIMIT 2
)
UNION
(
SELECT * FROM gossips
WHERE id > 7
ORDER BY id ASC
LIMIT 2
)
) as A
ORDER BY A.id
I don't believe that you can have an "order by" in different sections of a UNION. Could you just do something like this:
SELECT * FROM media where id >= 7 - 4 and id <= 7 + 4 ORDER BY id
I'm agree with the answer suggested by malonso(+1), but if you try it with id= 1, you will get only 5 thumbnails. I don't know if you want this behaviour. If you want always 10 thumbs, you can try:
select top 10 * from media where id > 7 - 4
The problem is that select top is database dependent (in this case is a sql server clause). Other database has similar clauses:
Oracle:
SELECT * media
FROM media
WHERE ROWNUM < 10
AND id > 7 - 4
MySQL:
SELECT *
FROM media
WHERE id > 7 - 4
LIMIT 10
So maybe you can use the last one.
If we do it, we will have the same problem if you want the last 10 thumbs. By example, If we have 90 thumbs and we give an id=88 ... You can solve it adding an OR condition. In MySQL will be something like:
SELECT *
FROM media
WHERE id > 7 - 4
OR (Id+5) > (select COUNT(1) from media)
LIMIT 10
If you're happy to use temp tables, your original query could be broken down to use them.
SELECT
*
FROM media
WHERE id < 7
ORDER BY id DESC
LIMIT 0, 4
INTO TEMP t1;
INSERT INTO t1
SELECT
*
FROM media
WHERE id >= 7
ORDER BY id ASC
LIMIT 0, 6;
select * from t1 order by id;
drop table t1;
Try union all instead. Union requires the server to ensure that the results are unique and this conflicts with your ordering.
I had to solve a similar problem, but needed to account situations where we always got the same number of rows, even if the desired row was near the top or bottom of the result set (i.e. not exactly in the middle).
This solution is a tweak from OMG Ponies' response, but where the rownum maxes out at the desired row:
set #id = 7;
SELECT natSorted.id
FROM (
SELECT gravitySorted.* FROM (
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
) AS gravitySorted ORDER BY gravity DESC LIMIT 10
) natSorted ORDER BY id;
Here's a break down of what's happening:
NOTE: In the example below I made a table with 20 rows and removed ids 6 and 9 to ensure a gap in ids do not affect the results
First we assign every row a gravity value that's centered around the particular row you're looking for (in this case where id is 7). The closer the row is to the desired row, the higher the value will be:
SET #id = 7;
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
returns:
+----+---------+
| id | gravity |
+----+---------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 7 | 6 |
| 8 | 5 |
| 10 | 4 |
| 11 | 3 |
| 12 | 2 |
| 13 | 1 |
| 14 | 0 |
| 15 | -1 |
| 16 | -2 |
| 17 | -3 |
| 18 | -4 |
| 19 | -5 |
| 20 | -6 |
| 21 | -7 |
+----+---------+
Next we order all the results by the gravity value and limit on the desired number of rows:
SET #id = 7;
SELECT gravitySorted.* FROM (
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
) AS gravitySorted ORDER BY gravity DESC LIMIT 10
returns:
+----+---------+
| id | gravity |
+----+---------+
| 7 | 6 |
| 5 | 5 |
| 8 | 5 |
| 4 | 4 |
| 10 | 4 |
| 3 | 3 |
| 11 | 3 |
| 2 | 2 |
| 12 | 2 |
| 1 | 1 |
+----+---------+
At this point we have all the desired ids, we just need to sort them back to their original order:
set #id = 7;
SELECT natSorted.id
FROM (
SELECT gravitySorted.* FROM (
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
) AS gravitySorted ORDER BY gravity DESC LIMIT 10
) natSorted ORDER BY id;
returns:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 7 |
| 8 |
| 10 |
| 11 |
| 12 |
+----+