MySQL rollup when some columns include NULL values - mysql

I have a dataset where it is somewhat common for fields to have NULL as a valid value. This causes an issue when I want to use the ROLLUP operator in MySQL, as I can't distinguish between the NULL values it generates as part of its subtotals/totals and the actual NULL values in the data.
My current query is as follows:
SELECT
COALESCE(car_score, "Total") AS car_score,
COUNT(DISTINCT id) AS volume
FROM cars_table
GROUP BY
car_score ASC WITH ROLLUP;
This provides me with the following table:
cars_score | volume
---------------------------
Total | 500
1 | 100
2 | 200
3 | 300
4 | 400
5 | 500
Total | 2000
when I'd like it to be:
cars_score | volume
---------------------------
NULL | 500
1 | 100
2 | 200
3 | 300
4 | 400
5 | 500
Total | 2000
This is a simple example, and it becomes more frustrating once I have multiple dimensions for the ROLLUP. The reason I can't just change the NULL value before to something else is that I also need to be able to aggregate the data in other parts of the application, so having a proper NULL is important to me.

One option would be to wrap with a subquery which first replaces the actual NULL values which indicate missing data. Then, use COALESCE() as you were to replace the NULL from the rollup with the string "Total":
SELECT
COALESCE(t.car_score, 'Total') AS car_score,
COUNT(DISTINCT t.id) AS volume
FROM
(
SELECT COALESCE(cars_score, 99) AS car_score, id
FROM cars_table
) t
GROUP BY t.car_score WITH ROLLUP
Here I have used 99 as a placeholder to indicate car scores which were missing. You can use any placeholder you want, other than NULL.

Related

MySQL COUNT results for first two entries on an ID in one table only if value is set in the first row but not in the second

my problem revolves around expanding a working MySQL query by limiting the rows it takes into consideration to the first two.
The table evaluations looks like this:
id | item_id | in | out |
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
1 | 232 | NULL | 1 |
2 | 834 | NULL | 1 |
3 | 232 | 1 | NULL |
4 | 56 | NULL | 1 |
Currently, there can be N entries for each item_id. The goal is to compare if there is any mismatch between in & out for one item_id.
I currently achieve this with the following SQL-statement:
SELECT COUNT(DISTINCT item_id), item_id
FROM evaluations x1
WHERE in IS NOT NULL and EXISTS(
Select 1
FROM evaluations x2
WHERE x1.item_id = x2.item_id AND out IS NOT NULL AND
x1.id <> x2.id);
This returns the cases where in & out are mismatched once over all ids reliably.
I now want to limit the above query to only consider the first two rows for each item_id for the mismatch check. Just LIMITing the subquery logically does not lead to any success as it would only limit once out is found.
Using LIMIT using the current query does not lead to anything; unfortunately, the MariaDB I'm running does not support LIMIT in subqueries.
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery
I've tried using HAVING and JOINing the table to itself to no avail:
SELECT e1.in, e1.form_id, COUNT(e2.form_id), e2.out, e2.in AS res
FROM evaluations AS e1
INNER JOIN evaluations AS e2 USING(form_id)
GROUP BY e1.form_id
HAVING e1.in > 0 and e2.out > 0
LIMIT 2;
I'm really at a loss here.
Any guidance is very much appreciated. Thank you in advance.
Kind regards
Phil

mysql returns wrong results with random duplicate values

i need to return the best 5 scores in each category from a table.so far i have tried query below following an example from this site: selecting top n records per group
query:
select
subject_name,substring_index(substring_index
(group_concat(exams_scores.admission_no order by exams_scores.score desc),',',value),',',-1) as names,
substring_index(substring_index(group_concat(score order by score desc),',',value),',',-1)
as orderedscore
from exams_scores,students,subjects,tinyint_asc
where tinyint_asc.value >=1 and tinyint_asc.value <=5 and exam_id=2
and exams_scores.admission_no=students.admission_no and students.form_id=1 and
exams_scores.subject_code=subjects.subject_code group by exams_scores.subject_code,value;
i get the top n as i need but my problem is that its returning duplicates at random which i dont know where they are coming from
As you can see English and Math have duplicates which should not be there
+------------------+-------+--------------+
| subject_name | names | orderedscore |
+------------------+-------+--------------+
| English | 1500 | 100 |
| English | 1500 | 100 |
| English | 2491 | 100 |
| English | 1501 | 99 |
| English | 1111 | 99 |
|Mathematics | 1004 | 100 |
| Mathematics | 1004 | 100 |
| Mathematics | 2722 | 99 |
| Mathematics | 2734 | 99 |
| Mathematics | 2712 | 99 |
+-----------------------------------------+
I have checked table and no duplicates exist
to confirm there are no duplicates in the table:
select * from exams_scores
having(exam_id=2) and (subject_code=121) and (admission_no=1004);
result :
+------+--------------+---------+--------------+-------+
| id | admission_no | exam_id | subject_code | score |
+------+--------------+---------+--------------+-------+
| 4919 | 1004 | 2 | 121 | 100 |
+------+--------------+---------+--------------+-------+
1 row in set (0.00 sec)
same result for English.
If i run the query like 5 times i sometimes end up with another field having duplicate values.
can anyone tell me why my query is behaving this way..i tried adding distinct inside
group_concat(ditinct(exams_scores.admission_no))
but that didnt work ??
You're grouping by exams_scores.subject_code, value. If you add them to your selected columns (...as orderedscore, exams_scores.subject_code, value from...), you should see that all rows are distinct with respect to these two columns you grouped by. Which is the correct semantics of GROUP BY.
Edit, to clarify:
First, the SQL server removes some rows according to your WHERE clause.
Afterwards, it groups the remaining rows according to your GROUP BY clause.
Finally, it selects the colums you specified, either by directly returning a column's value or performing a GROUP_CONCAT on some of the columns and returning their accumulated value.
If you select columns not included in the GROUP BY clause, the returned results for these columns are arbitrary, since the SQL server reduces all rows equal with respect to the columns specified in the GROUP BY clause to one single row - as for the remaining columns, the results are pretty much undefined (hence the "randomness" you're experiencing), because - what should the server choose as a value for this column? It can only pick one randomly from all the reduced rows.
In fact, some SQL servers won't perform such a query and return an SQL error, since the result for those columns would be undefined, which is something you don't want to have in general. With these servers (I believe MSSQL is one of them), you more or less can only have columns in you SELECT clause which are part of your GROUP BY clause.
Edit 2: Which, finally, means that you have to refine your GROUP BY clause to obtain the grouping that you want.

Select different results from same table

I have a table used to gather evaluations from users, its structure is really basic and defined below along with some example rows:
id_user | id_affirmation | isCorrect
------------------------------------
1 | 10 | true
1 | 13 | false
2 | 23 | true
2 | 45 | false
3 | 31 | false
3 | 90 | true
3 | 67 | true
In the application, the users basically evaluate if the affirmations are correct or wrong, marking them as true or false. Each affirmation is evaluated only once, so users are evaluating different affirmations.
What I'm trying to do is select a resultset like the one bellow, where I can count the number of affirmations each user marked as correct and the number they marked as false.
user | correct_count | wrong_count
------------------------------------
1 | 35 | 12
2 | 76 | 22
3 | 23 | 41
I have a query to count the number of correct answers of each user, and I can simply change the expected value of Ă­sCorrect' field to false, so I'll count the number of wrong answers each user gave. My problem is how to gather the corrcet count and the wrong count, since I can't simply use UNION.
SELECT id_user,
SUM(isCorrect),
SUM(NOT isCorrect)
FROM mytable
GROUP BY
id_user
Assuming isCorrect is being stored as VARCHAR string values true and false, one way to get the counts is to use a conditional boolean in the SELECT list, and perform use an aggregate function around that.
For example:
SELECT t.user
, SUM(t.isCorrect='true') AS correct_count
, SUM(t.isCorrect='false') AS wrong_count
FROM mytable t
GROUP BY t.user
If there are values other than true or false, would those be included in either count? (e.g. NULL, 'maybe', et al.), or if the datatype of isCorrect is other than VARCHAR, the conditional expression would need to be modified appropriately, so that each expression returns 1'if it's to be included in the count, or 0 if it's not to be included.

How to group values from a table if they're close?

Let's say I define 10 as being a close enough difference between two values, what I want is the average of all the values that are close enough to each other (or in other words, grouped by their closeness). So, if I have a table with the following values:
+-------+
| value |
+-------+
| 1 |
| 1 |
| 2 |
| 4 |
| 2 |
| 1 |
| 4 |
| 3 |
| 22 |
| 23 |
| 24 |
| 22 |
| 20 |
| 19 |
| 89 |
| 88 |
| 86 |
+-------+
I want a query that would output the following result:
+---------+
| 2.2500 |
| 21.6667 |
| 87.6667 |
+---------+
Where 2.2500 would be produced as the average of all the values ranging from 1 to 4 since they're for 10 or less away from each other. In the same way, 21.6667 would be the average of all the values ranging from 19 to 24, and 87.6667 would be the average of all the values ranging from 86 to 89.
Where my specified difference of what is currently 10, would have to be variable.
This isn't so bad. You want to implement the lag() function in MySQL to determine if a value is the start of a new set of rows. Then you want a cumulative sum of this value to identify a group.
The code looks painful, because in MySQL you need to do this with correlated subqueries and join/aggregation rather than with ANSI standard functions, but this is what it looks like:
select min(value) as value_min, max(value) as value_max, avg(value) as value_avg
from (select t.value, count(*) as GroupId
from table t join
(select value
from (select value,
(select max(value)
from table t2
where t2.value < t.value
) as prevValue
from table t
) t
where value - prevvalue < 10
) GroupStarts
on t.value >= GroupStarts.value
group by t.value
) t
group by GroupId;
The subquery GroupStarts is finding the break points, that is, the set of values that differ by 10 or more from the previous value. The next level uses join/aggregation to count the number of such break points before any given value. The outermost query then aggregation using this GroupId.
Create another column with a hash value for the field. This field will be used to test for equality. For example with strings you may store a soundex. For numbers you may store the closest multiple of ten
Otherwise doing a calculation will be much slower. You could also cross join the table to itself and group where the difference of the two fields < 10
I like the other user's suggestion to create a hash column. Joining to yourself has an exponential effect, and should be avoided.
One other possibility is to use /, for example select avg(val), val/10 from myTable group by val/10 would have a value of group that is 0 for 0-9, 1 for 10-19, etc.
At least, it works in SQL Server that way
At first, I would export to an array the whole result.
Afterwards, use a function
function show(elements_to_agroup=4)
{
for (i = 0; i < count(array) ; i++)
{
sum = 0;
if (i % elements_to_agroup)
{
sum = sum / elements_to_agroup;
return sum;
}
else
{
sum =+ array[i];
}
}
}

How can I make two condition in having clause

I have a table similar to:
domain | file | Number
------------------------------------
aaa.com | aaa.com_1 | 111
bbb.com | bbb.com_1 | 222
ccc.com | ccc.com_2 | 111
ddd.com | ddd.com_1 | 222
eee.com | eee.com_1 | 333
I need to query the number of Domains that share the same Number and their File name ends with _1. I tried the following:
select count(domain) as 'sum domains', file
from table
group by Number
having
count(Number) >1 and File like '%\_1';
It gives me:
sum domains | file
------------------------------
2 | aaa.com
2 | bbb.com
I expected to see the following:
sum domains | file
------------------------------
1 | aaa.com
2 | bbb.com
Because the Number 111 appears once with File ends with _1 and _2, so it should count 1 only. How can I apply the 2 conditions that I stated earlier correctly ?
As documented under SELECT Syntax:
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization.
In other words, it is applied after the grouping operation has been performed (in contrast with WHERE, which is performed before any grouping operation). See WHERE vs HAVING.
Therefore, your current query first forms the resultset from the following:
SELECT COUNT(domain) AS `sum domains`, file
FROM `table`
GROUP BY Number
See it on sqlfiddle:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
| 1 | eee.com_1 |
As you can see, the values selected for the file column are merely one of the values from each group—as documented under MySQL Extensions to GROUP BY:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Your current query then proceeds to filter these results according to your HAVING clause:
HAVING COUNT(Number) > 1 AND file LIKE '%\_1'
With the values of file selected above, every single group matches on the second criterion; and the first two groups match on the first criterion. Therefore the results of the complete query are:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
Following your comments above, you want to filter the records on file before grouping and then filter the resulting groups for those containing more than one match. Therefore use WHERE and HAVING respectively (and select Number instead of file to identify each group):
SELECT Number, COUNT(*) AS `sum domains`
FROM `table`
WHERE file LIKE '%\_1'
GROUP BY Number
HAVING `sum domains` > 1
See it on sqlfiddle:
| NUMBER | SUM DOMAINS |
------------------------
| 222 | 2 |
i am using by following
having ( SUM(qty) > 4 AND SUM(qty) < 15 )
You cannot have the file name in the SELECT statement if it is not also in the GROUP BY. You have to get your GROUP BY result than JOIN back to the original and add the filter logic like so:
SELECT *
FROM
(
select count(domain) as 'sum_domains', Number
from table
group by Number
having
count(Number) >1
) result
join table t on result.Number = t.Number
WHERE file like '%\_1'
Try the nested query below:
select count(domain) as 'sum domains', domain as fileName
from
(select domain, file from tableName
group by Number
having count(Number) >1) as temp
WHERE file like '%\_1';