Comparing 2 Columns in same table - mysql

I need to compare 2 columns in a table and give 3 things:
Count of rows checked (Total Rows that were checked)
Count of rows matching (Rows in which the 2 columns matched)
Count of rows different (Rows in which the 2 columns differed)
I've been able to get just rows matching using a join on itself, but I'm unsure how to get the others all at once. The importance of getting all of the information at the same time is because this is a very active table and the data changes with great frequency.
I cannot post the table schema as there is a lot of data in it that is irrelevant to this issue. The columns in question are both int(11) unsigned NOT NULL DEFAULT '0'. For purposes of this, I'll call them mask and mask_alt.

select
count(*) as rows_checked,
sum(col = col2) as rows_matching,
sum(col != col2) as rows_different
from table
Note the elegant use of sum(condition).
This works because in mysql true is 1 and false is 0. Summing these counts the number of times the condition is true. It's much more elegant than case when condition then 1 else 0 end, which is the SQL equivalent of coding if (condition) return true else return false; instead of simply return condition;.

Assuming you mean you want to count the rows where col1 is or is not equal to col2, you can use an aggregate SUM() coupled with CASE:
SELECT
COUNT(*) AS total,
SUM(CASE WHEN col = col2 THEN 1 ELSE 0 END )AS matching,
SUM(CASE WHEN col <> col2 THEN 1 ELSE 0 END) AS non_matching
FROM table
It may be more efficient to get the total COUNT(*) in a subquery though, and use that value to subtract the matching to get the non-matching, if the above is not performant enough.
SELECT
total,
matching,
total - matching AS non_matching
FROM
(
SELECT
COUNT(*) AS total,
SUM(CASE WHEN col = col2 THEN 1 ELSE 0 END )AS matching
FROM table
) sumtbl

Related

MySQL Case function behave strange and inconstent

We are using MySQL 8 as our java application DB.
We have a query with the following format:
select
id,
group_concat(NAME ORDER BY ID separator ',,') AS Code,
CASE
WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END AS factor
from MA_TABLE
join TABLE_P p on (...)
join TABLE_I i on (...)
group by id
The query worked very fine in our development environments until deploy with client where the factor column is getting null.
We have run the same query in the client environment from MySQL Workbench and we can see that the factor column is getting well populated.
After some debugging,we changed :
CASE
WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END AS factor
to
MAX(
WHEN p.VARIABLEfactor = 1 THEN i.factor ELSE p.factor END ) AS factor,
and the query worked correctly.
Any help here please?
From your explanation I gather that you don't understand the difference of your two case expressions. But they are very different. Let's look at an example for one ID:
ID
VARIABLEfactor
i.factor
p.factor
100
0
null
10
100
1
null
20
Your expression
CASE WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END
looks at the maximum VARIABLEfactor, which is 1, so the THEN case applies and the maximum i.factor is returned. This is null, as all i.factor are null.
Your expression
MAX(WHEN p.VARIABLEfactor = 1 THEN i.factor ELSE p.factor END)
looks at each row's VARIABLEfactor. For the first row this is 0, so the ELSE case applies and p.factor 10 is used. For the second row the VARIABLEfactor is 1, so its i.factor null gets used. Of these you take the maximum, which is 10.
To recap: The first expression is just a CASE expression on the aggregation results. It returns null here. The second expression is a conditional aggregation. It returns 10 for the sample data.

MySQL - Match certain IDs, but only those IDs

I have a table like so:
id_type id_option
"1" "1"
"1" "5"
"2" "1"
"2" "5"
"2" "8"
I am trying to write a query that given a list of option IDs finds the "type" that matches the list, but only those ID's
For example, if given 1 and 5 as options, it should return the type 1 but only the type 1 as the 8 required to match type 2 is not present.
I have tried the following:
SELECT *
FROM my_table
WHERE id_option IN (1, 5)
GROUP BY id_type
HAVING COUNT(DISTINCT id_option) = 2
This returns both "types" - I had hoped that the COUNT restriction of 2 would have helped but I now understand why it doesn't, but I can't think of a clever way to limit this.
I could just pull the first record as typically the types with less options are saved first but I don't think I can rely on this 100%
Thank you for your time
Here's a solution:
SELECT *
FROM my_table
GROUP BY id_type
HAVING SUM(id_option IN (1,5)) = COUNT(*)
It relies on a trick specific to MySQL: boolean true is literally the integer 1. So you can use SUM() to count the rows where a condition is true, but putting a boolean expression inside SUM().
For folks reading this who use other databases besides MySQL, you'd have to use an expression to convert the boolean condition to the integer 1:
HAVING SUM(CASE WHEN id_option IN (1,5) THEN 1 ELSE 0 END) = COUNT(*)
In this case, let all rows become part of the groups. That is, do not use a WHERE clause to restrict the query to rows where the id_option is 1 or 5. Then count the total rows in the group, and "count" (i.e. use the SUM() trick) the rows where the id_options is 1 or 5. Comparing these counts will be equal if there are no id_options values besides 1 or 5.
If you also want to make sure that both 1 and 5 are found, you need another condition:
SELECT *
FROM my_table
GROUP BY id_type
HAVING SUM(id_option IN (1,5)) = COUNT(*)
AND COUNT(DISTINCT CASE WHEN id_option IN (1,5) THEN id_option END) = 2
The CASE expression will return 1 or 5, or if there are any other values, those are converted to NULL. The COUNT() function ignores NULLs.
If you can pass the options as a sorted comma separated list string, then use GROUP_CONCAT():
SELECT id_type
FROM my_table
GROUP BY id_type
HAVING GROUP_CONCAT(id_option ORDER BY id_option) = '1,5'
If there are duplicate options for each type, use DISTINCT:
HAVING GROUP_CONCAT(DISTINCT id_option ORDER BY id_option) = '1,5'
While I can't comment yet, here's a tiny adjustment to Bill Karwin's last example (in the accepted solution):
SELECT *
FROM my_table
GROUP BY id_type
HAVING SUM(id_option IN (1,5)) = COUNT(*)
AND COUNT(DISTINCT id_option) = 2

Selecting a value based on how another field was generated

I'm selecting some data;
select c.*,
coalesce(s.column1, ...),
coalesce(s.column2, ...),
FROM
(SELECT ...)
Basically, if s.column1 or s.column2 is null then I am putting in some logic to take the average of that column and use it instead.
I want to have another field so I can know weather or not that value was computing using the average or not - perhaps a boolean? Lets say the average for column1 was 120, the table would look like;
column1 column2 avg
54 10 0
200 40 0
120 180 1
499 160 0
This allows me to see that the third row was generated using the avg of all rows as it was initially null.
How could the logic for the avg column work?
Your question seems fairly moot to me because:
The AVG function ignores NULL values by default, so the average using the overall average for NULL slots is the same as leaving out those slots entirely, and
If you just want to mark the rows which had a NULL value, you can use a CASE expression
So, to get what you want, just use this:
SELECT
column1,
column2,
CASE WHEN column1 IS NULL THEN 1 ELSE 0 END AS avg
FROM yourTable;
And know that SELECT AVG(column1) FROM yourTable would return the same value whether NULL rows were omitted, or the overall average were used.

SQL - Order By "Custom Preference List"

Scenario:
I have a column in a MySql table:
my_column - [INT] (Unsigned)
What I need:
I need a query to select ONE ENTRY with conditions as follows:
Given A=n
SELECT FIRST the one with my_column = n
ELSE (my_column = n null result)
SELECT the one with my_column = 0
ELSE
SELECT the one with my_column = whatever
ELSE
Return 0 entries
What I looked into:
I tried:
... WHEREmy_columnIN (n,0) ORDER BYmy_columnDESC LIMIT 1
Which applies for the first two steps, but not for the third one.
Thanks for reading.
Given your description, just use case:
order by (case when field = n then 1
when field = 0 then 2
else 3
end)
Then, of course, you would add limit 1.

Counting total and true condition lines

How to count the number of lines in a table and the number of lines where a certain condition is true without resorting to subselects like this:
create table t (a integer);
insert into t (a) values (1), (2), (null);
select
(select count(*) from t) as total_lines,
(select count(*) from t where a = 1) as condition_true
;
total_lines | condition_true
-------------+----------------
3 | 1
select count(*) as total_lines, count(a = 1 or null) as condition_true
from t
;
total_lines | condition_true
-------------+----------------
3 | 1
It works because:
First while count(*) counts all lines regardless of anything, count(my_column) will count only those lines where my_column is not null:
select count(a) as total
from t
;
total
-------
2
Second (false or null) returns null so whenever my condition is not met it will return null and will not be counted by count(condition or null) which only counts not nulls.
Use SUM(condition)!
select
count(*) as total_lines,
sum(a = 1) as condition_true
from t
See it working here.
This works because in mysql, true is 1 and false is 0, so the sum() of a condition will add 1 when it's true and 0 when it's false - which effectively counts the number of times the condition is true.
Many people falsely believe you need a case statement, but you don't with mysql (you do with some other databases)
this can be easily done using a condition inside count. I don't know if its the optimized method of doing it but it gets the work done
you can do it as follows
select count(*) as total_lines, COUNT(CASE WHEN a = 1 THEN 1 END) as condition_true from t
you can check it here
sqlFiddle