CASE query optimization - mysql

SELECT
COUNT(CASE WHEN VALUE = 1 THEN 1 END) AS score_1,
COUNT(CASE WHEN VALUE = 2 THEN 1 END) AS score_2,
COUNT(CASE WHEN VALUE = 3 THEN 1 END) AS score_3,
COUNT(CASE WHEN VALUE = 4 THEN 1 END) AS score_4,
COUNT(CASE WHEN VALUE = 5 THEN 1 END) AS score_5,
COUNT(CASE WHEN VALUE = 6 THEN 1 END) AS score_6,
COUNT(CASE WHEN VALUE = 7 THEN 1 END) AS score_7,
COUNT(CASE WHEN VALUE = 8 THEN 1 END) AS score_8,
COUNT(CASE WHEN VALUE = 9 THEN 1 END) AS score_9,
COUNT(CASE WHEN VALUE = 10 THEN 1 END) AS score_10
FROM
`answers`
WHERE
`created_at` BETWEEN '2017-01-01 00:00:00' AND '2019-11-30 23:59:59'
Is there a way to optimize this query, because I have 4 million answer records in my DB, and it runs very slowly?

Try running this one time to create an index:
CREATE INDEX ix_ca on answers(created_at)
That should speed your query up. If you are curious about why, see here:
What is an index in SQL?

You could try add a redundant composite index
create idx1 on table answers(created_at, value)
using redudance in index the query should be result without accessing to table data just using the index content

Want it to be 10 times as fast? Use the Data Warehousing technique of buiding and maintaining a "Summary table". In this example the summary table might be
CREATE TABLE subtotals (
dy DATE NOT NULL,
`value` ... NOT NULL, -- TINYINT UNSIGNED ?
ct SMALLINT UNSIGNED NOT NULL, -- this is 2 bytes, max 65K; change if might be bigger
PRIMARY KEY(value, dy) -- or perhaps the opposite order
) ENGINE=InnoDB
Each night you summarize the day's data and build 10 new rows in subtotals.
Then the "report" query becomes
SELECT
SUM(CASE WHEN VALUE = 1 THEN ct END) AS score_1,
SUM(CASE WHEN VALUE = 2 THEN ct END) AS score_2,
SUM(CASE WHEN VALUE = 3 THEN ct END) AS score_3,
SUM(CASE WHEN VALUE = 4 THEN ct END) AS score_4,
SUM(CASE WHEN VALUE = 5 THEN ct END) AS score_5,
SUM(CASE WHEN VALUE = 6 THEN ct END) AS score_6,
SUM(CASE WHEN VALUE = 7 THEN ct END) AS score_7,
SUM(CASE WHEN VALUE = 8 THEN ct END) AS score_8,
SUM(CASE WHEN VALUE = 9 THEN ct END) AS score_9,
SUM(CASE WHEN VALUE = 10 THEN ct END) AS score_10
FROM
`subtotals`
WHERE `created_at` >= '2017-01-01'
AND `created_at` < '2019-12-01'
Based on what you have provided, there will be about 10K rows in subtotals; that's a lot less to wade through than 4M rows. It might run more than 10 times as fast.
More discussion: http://mysql.rjweb.org/doc.php/summarytables

Related

Why are my various CASE WHEN functions returning the same values?

Im trying to write a query that returns a count depending on the value of a feedback field that ranges from 0-5 (0 meaning that it was not rated).
I want:
Count of all rows ( anything rated 1 or greater)
Count of all rows rated as 1 (anything = 1)
And all rows rated as 1 and also is the first iteration of a given task (anything rated =1 and iteration = 0)
I have written this query but I am getting the same value for all counts:
select
DATE_FORMAT(created_at,'%M') as Month,
COUNT(CASE WHEN rate > 0 THEN 1 ELSE 0 END) AS total,
COUNT(CASE WHEN rate = 1 THEN 1 ELSE 0 END) AS Rated_1,
COUNT(CASE WHEN client_feedback = 1 AND index = 0 THEN 1 ELSE 0 END) AS first_iteration_rated_1
from tablexxx
where created_at between date('2022-04-01') and date('2022-10-01')
GROUP BY Month
Try to use SUM() instead of COUNT().
Count() will count up regardless of the value being 0 or 1.
you can have two approaches:
method 1: use NULL in else part of the CASE
select
DATE_FORMAT(created_at,'%M') as Month,
COUNT(CASE WHEN rate > 0 THEN 1 ELSE null END) AS total,
COUNT(CASE WHEN rate = 1 THEN 1 ELSE null END) AS Rated_1,
COUNT(CASE WHEN client_feedback = 1 AND index = 0 THEN 1 ELSE null END) AS first_iteration_rated_1
from tablexxx
where created_at between date('2022-04-01') and date('2022-10-01')
GROUP BY Month
method 2: use sum instead of count
select
DATE_FORMAT(created_at,'%M') as Month,
SUM(CASE WHEN rate > 0 THEN 1 ELSE 0 END) AS total,
SUM(CASE WHEN rate = 1 THEN 1 ELSE 0 END) AS Rated_1,
SUM(CASE WHEN client_feedback = 1 AND index = 0 THEN 1 ELSE 0 END) AS first_iteration_rated_1
from tablexxx
where created_at between date('2022-04-01') and date('2022-10-01')
GROUP BY Month

Mysql Query: How many sum() recommended in single query?

I have 70 different types of accounts. And I am fetching the data as per the account type.
The query like this,
$mainData = "SELECT
count(*) AS totalRows,
sum(pay) as totalPay
sum(case when account_type = 1 then 1 else 0 end) AS account_1_Total,
sum(case when account_type = 1 then pay else 0 end) AS account_1_Pay,
sum(case when account_type = 2 then 1 else 0 end) AS account_2_Total,
sum(case when account_type = 2 then pay else 0 end) AS account_2_Pay,
{all_account_types_here}
FROM account_table";
In the end, those sum() are about more than 140.
So the question is, how many sum() is recommended in a single query?
Thanks!
EDITED:
The GROUP BY is the solution of it.

display column in horizontal format

I have a table as follows.
[Date] [Test_Item] [Result]
1/2/2014 A 1.1
2/2/2014 B 31.1
3/2/2014 C 20
5/2/2014 A 44
i would like display in the following format
[Test_Item] 1/2/2014 2/2/2014 3/2/2014 5/2/2014
A 1.1
B 31.1
C 20
A 44
How can i achieve this? please suggest the query in this case.
This is a basic use of the case statement:
select test_item,
(case when `date` = '1/2/2014' then result end) as `1/2/2014`,
(case when `date` = '2/2/2014' then result end) as `2/2/2014`,
(case when `date` = '3/2/2014' then result end) as `3/2/2014`,
(case when `date` = '5/2/2014' then result end) as `5/2/2014`
from table t;
You don't mention anything about types. If date is actually stored as a date or date/time (as it should be), then you should use ISO standard date formats for comparison, which assuming your format is d/m/yyyy, would be:
select test_item,
(case when `date` = '2014-02-01' then result end) as `1/2/2014`,
(case when `date` = '2014-02-02' then result end) as `2/2/2014`,
(case when `date` = '2014-02-03' then result end) as `3/2/2014`,
(case when `date` = '2014-02-05' then result end) as `5/2/2014`
from table t;

mysql query split column into m

I have the following database structure:
FieldID|Year|Value
a|2011|sugar
a|2012|salt
a|2013|pepper
b|2011|pepper
b|2012|pepper
b|2013|pepper
c|2011|sugar
c|2012|salt
c|2013|salt
now I would like to run a query that counts the number of fields for every item in the particular year looking something like this:
value|2011|2012|2013
sugar|2|0|0
salt |0|2|1
pepper|1|1|2
I used multiple tables for every year before. However the distinct values for 2011,2012 and 2013 might be different (e.g. sugar would only be present in 2011)
For individual years I used:
SELECT `Value`, COUNT( `FieldID` ) FROM `Table` WHERE `Year`=2011 GROUP BY `Value`
A1ex07's answer is fine. However, in MySQL, I prefer this formulation:
SELECT Value,
sum(`Year` = 2011) AS cnt2011,
sum(`Year` = 2012) AS cnt2012,
sum(`Year` = 2013) AS cnt2013
FROM t
GROUP BY value;
The use of count( . . . ) produces the correct answer, but only because the else clause is missing. The default value is NULL and that doesn't get counted. To me, this is a construct that is prone to error.
If you want the above in standard SQL, I go for:
SELECT Value,
sum(case when `Year` = 2011 then 1 else 0 end) AS cnt2011,
sum(case when `Year` = 2012 then 1 else 0 end) AS cnt2012,
sum(case when `Year` = 2013 then 1 else 0 end) AS cnt2013
FROM t
GROUP BY value;
You can do pivoting :
SELECT `Value`,
COUNT(CASE WHEN `Year` = 2011 THEN FieldID END) AS cnt2011,
COUNT(CASE WHEN `Year` = 2012 THEN FieldID END) AS cnt2012,
COUNT(CASE WHEN `Year` = 2013 THEN FieldID END) AS cnt2013
FROM `Table`
GROUP BY `Value`
It is called Pivot Table, achieve with a chain of CASE statements which apply a 1 or 0 for each condition, then SUM() up the ones and zeros to retrieve a count.
SELECT
Value,
SUM(CASE WHEN Year = 2011 THEN 1 ELSE 0 END) AS 2012,
SUM(CASE WHEN Year = 2012 THEN 1 ELSE 0 END) AS 2012,
SUM(CASE WHEN Year = 2013 THEN 1 ELSE 0 END) AS 2013
FROM Table
GROUP BY Value

Best way to index and query analytic table in MySQL

I have an analytics table (5M rows and growing) with the following structure
Hits
id int() NOT NULL AUTO_INCREMENT,
hit_date datetime NOT NULL,
hit_day int(11) DEFAULT NULL,
gender varchar(255) DEFAULT NULL,
age_range_id int(11) DEFAULT NULL,
klout_range_id int(11) DEFAULT NULL,
frequency int(11) DEFAULT NULL,
count int(11) DEFAULT NULL,
location_id int(11) DEFAULT NULL,
source_id int(11) DEFAULT NULL,
target_id int(11) DEFAULT NULL,
Most queries to the table is to query between two datetimes for a particular sub-set of columns and them sum up all the count column across all rows. For example:
SELECT target.id,
SUM(CASE gender WHEN 'm' THEN count END) AS 'gender_male',
SUM(CASE gender WHEN 'f' THEN count END) AS 'gender_female',
SUM(CASE age_range_id WHEN 1 THEN count END) AS 'age_18 - 20',
SUM(CASE target_id WHEN 1 then count END) AS 'target_test'
SUM(CASE location_id WHEN 1 then count END) AS 'location_NY'
FROM Hits
WHERE (location_id =1 or location_id = 2)
AND (target_id = 40 OR target_id = 22)
AND cast(hit_date AS date) BETWEEN '2012-5-4'AND '2012-5-10'
GROUP BY target.id
The interesting thing about queries to this table is that the where clause include any permutation of Hit columns names and values since those are what we're filtering against. So the particular query above is getting the # of males and females between the ages of 18 and 20 (age_range_id 1) in NY that belongs to a target called "test". However, there are over 8 age groups, 10 klout ranges, 45 locations, 10 sources etc (all
foreign key references).
I currently have an index on hot_date and another one on target_id. What the best way to properly index this table?. Having a composite index on all column fields seems inherently wrong.
Is there any other way to run this query without using a sub-query to sum up all counts? I did some research and this seems to be the best way to get the data-set I need but is there a more efficient way of handling this query?
Here's your optimized query. The idea is to get rid of the ORs and the CAST() function on hit_date so that MySQL can utilize a compound index that covers each of the subsets of data. You'll want a compound index on (location_id, target_id, hit_date) in that order.
SELECT id, gender_male, gender_female, `age_18 - 20`, target_test, location_NY
FROM
(
SELECT target.id,
SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id =1)
AND (target_id = 40)
AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id
UNION ALL
SELECT target.id,
SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id = 2)
AND (target_id = 22)
AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id
UNION ALL
SELECT target.id,
SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id =1)
AND (target_id = 22)
AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id
UNION ALL
SELECT target.id,
SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id = 2)
AND (target_id = 22)
AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id
) a
GROUP BY id
If your selection size is so large that this is no improvement, then you may as well keep scanning all rows like you're already doing.
Note, surround aliases with back ticks, not single quotes, which are deprecated. I also fixed your CASE clauses which had count instead of 1.