MYSQL get count of each column where it equals a specific value - mysql

I recently set up a MYSQL database connected to a form filled with checkboxes. If the checkbox was selected, it would insert into the associated column a value of '1'; otherwise, it would receive a value of '0'.
I'd like to eventually look at aggregate data from this form, and was wondering if there was any way I could use MYSQL to get a number for each column which would be equal to the number of rows that had a value of '1'.
I've tried variations of:
select count(*) from POLLDATA group by column_name
which was unsuccessful, and nothing else I can think of seems to make sense (admittedly, I'm not all too experienced in SQL).
I'd really like to avoid:
select count(*) from POLLDATA where column_1='1'
for each column (there close to 100 of them).
Is there any way to do this besides typing out a select count(*) statement for each column?
EDIT:
If it helps, the columns are 'artist1', 'artist2', ....'artist88', 'gender', 'age', 'city', 'state'. As I tried to explain below, I was hoping that I'd be able to do something like:
select sum(EACH_COLUMN) from POLLDATA where gender='Male', city='New York City';
(obviously EACH_COLUMN is bogus)

SELECT SUM(CASE
WHEN t.your_column = '1' THEN 1
ELSE 0
END) AS OneCount,
SUM(CASE
WHEN t.your_column='0' THEN 1
ELSE 0
END) AS ZeroCount
FROM YOUR_TABLE t

If you are just looking for the sheer number of 1's in the columns, you could try…
select sum(col1), sum(col2), sum(col3) from POLLDATA

A slightly more compact notation is SUM( IF( expression ) ).
For the askers example, this could look something like:
select
count(*) as total,
sum(if(gender = 'MALE', 1, 0)) as males,
sum(if(gender = 'FEMALE', 1, 0)) as females,
sum(if(city = 'New York City', 1, 0)) as newYorkResidents
from POLLDATA;
Example result:
+-------+-------+---------+------------------+
| total | males | females | newYorkResidents |
+-------+-------+---------+------------------+
| 42 | 23 | 19 | 42 |
+-------+-------+---------+------------------+

select count(*) from POLLDATA group by column_name
I dont think you want to do a count cause this will also count the records with a 0.
try
select column_naam,sum(column_name) from POLLDATA group by column_name
or
select column_naam,count(*) from POLLDATA
where column_name <> 0
group by column_name
only adds the 0

Instead of strings why not store actual numbers, 1 or 0.
Then you could use the sql SUM function.

When the query begins to be a little too complicated, maybe it's because you should think again about your database structure. But if you want to keep your table as it is, you could use a prepared statement that automatically calculates all the sums for you, without specifying every single column:
SELECT
CONCAT(
'SELECT ',
GROUP_CONCAT(CONCAT('SUM(', `column_name`, ') AS sum_', `column_name`)),
' FROM POLLDATA WHERE gender=? AND city=?')
FROM `information_schema`.`columns`
WHERE `table_schema`=DATABASE()
AND `table_name`='POLLDATA'
AND `column_name` LIKE 'artist%'
INTO #sql;
SET #gender := 'male';
SET #city := 'New York';
PREPARE stmt FROM #sql;
EXECUTE stmt USING #gender, #city;
Please see fiddle here.

Related

Is there a way to use aggregate COUNT() values within CASE?

I need to retrieve unique yet truncated part numbers, with their description values being conditionally determined.
DATA:
Here's some simplified sample data:
(the real table has half a million rows)
create table inventory(
partnumber VARCHAR(10),
description VARCHAR(10)
);
INSERT INTO inventory (partnumber,description) VALUES
('12345','ABCDE'),
('123456','ABCDEF'),
('1234567','ABCDEFG'),
('98765','ZYXWV'),
('987654','ZYXWVU'),
('9876543','ZYXWVUT'),
('abcde',''),
('abcdef','123'),
('abcdefg','321'),
('zyxwv',NULL),
('zyxwvu','987'),
('zyxwvut','789');
TRIED:
I've tried too many things to list here.
I've finally found a way to get past all the 'unknown field' errors and at least get SOME results, but:
it's SUPER kludgy!
my results are not limited to unique prods.
Here's my current query:
SELECT
LEFT(i.partnumber, 6) AS prod,
CASE
WHEN agg.cnt > 1
OR i.description IS NULL
OR i.description = ''
THEN LEFT(i.partnumber, 6)
ELSE i.description
END AS `descrip`
FROM inventory i
INNER JOIN (SELECT LEFT(ii.partnumber, 6) t, COUNT(*) cnt
FROM inventory ii GROUP BY ii.partnumber) AS agg
ON LEFT(i.partnumber, 6) = agg.t;
GOAL:
My goal is to retrieve:
prod
descrip
12345
ABCDE
123456
123456
98765
ZYXWV
987654
987654
abcde
abcde
abcdef
abcdef
zyxwv
zyxwv
zyxwvu
zyxwvu
QUESTION:
What are some cleaner ways to use the COUNT() aggregate data with a CASE type conditional?
How can I limit my results so that all prods are UNIQUE?
You can check if a left(partnumber, 6) is not unique in the result by checking if count(*) > 1. In such a case let descrip be left(partnumber, 6). Otherwise you can use max(description) (or min(description)) to get the single description but satisfy the needs to use an aggregation function on columns not in the GROUP BY. To replace empty or NULL descriptions, nullif() and coalesce() can be used.
That would lead to the following using just one level of aggregation and no joins:
SELECT left(partnumber, 6) AS prod,
CASE
WHEN count(*) > 1 THEN
left(partnumber, 6)
ELSE
coalesce(nullif(max(description), ''), left(partnumber, 6))
END AS descrip
FROM inventory
GROUP BY left(partnumber, 6)
ORDER BY left(partnumber, 6);
But there seems to be a bug in MySQL and this query fails. The engine doesn't "see" that, in the list after SELECT partnumber is only used in the expression left(partnumber, 6), which is also in the GROUP BY. Instead the engine falsely complains about partnumber not being in the GROUP BY and not subject to an aggregation function.
As a workaround, we can use a derived table, that does the shortening of partnumber to its first six characters. We then use use that column of the derived table instead of left(partnumber, 6).
SELECT l6pn AS prod,
CASE
WHEN count(*) > 1 THEN
l6pn
ELSE
coalesce(nullif(max(description), ''), l6pn)
END AS descrip
FROM (SELECT left(partnumber, 6) AS l6pn,
description
FROM inventory) AS x
GROUP BY l6pn
ORDER BY l6pn;
Or we slap some actually pointless max()es around the left(partnumber, 6) other than the first, to work around the bug.
SELECT left(partnumber, 6) AS prod,
CASE
WHEN count(*) > 1 THEN
max(left(partnumber, 6))
ELSE
coalesce(nullif(max(description), ''), max(left(partnumber, 6)))
END AS descrip
FROM inventory
GROUP BY left(partnumber, 6)
ORDER BY left(partnumber, 6);
db<>fiddle (Change the DBMS to some other like Postgres or MariaDB to see that they also accept the first query.)

Select several max types for each datatype per distinct value in mysql

userid data_type, timespentaday
1 League of Legends 500
1 Hearthstone 1500
1 Hearthstone 1400
2 World of Warcraft 1200
1 Dota 2 100
2 Final Fantasy 500
1 Dota 2 700
Given this data. I would like to query the most time each user has spent on every.
Output desired:
User League Of Legends Hearthstone World of Warcraft Dota 2
1 500 1500 0 700
2 0 0 1200 0
Something along the lines of this is something I've tried
SELECT t1.* FROM user_info GROUP BY userid JOIN(
SELECT(
(SELECT max(timespentaday) where data_type='League of Legends'),
(SELECT max(timespentaday) where data_type='Hearhstone'),
(SELECT max(timespentaday) where data_type='Dota 2)'
FROM socialcount AS t2
) as t2
ON t1.userid = t2.userid
basically to do this you need the greatest n per group.. there is a good article on it but the gist is in mysql you have to use variables to even get close to this.. especially with doing a pivot on the table (a fake pivot since MySQL doesn't have native support for that).
SELECT userid,
MAX(CASE WHEN data_type = "League of Legends" THEN timespentaday ELSE 0 END) as "League of Legends",
MAX(CASE WHEN data_type = "Hearthstone" THEN timespentaday ELSE 0 END) as "Hearthstone",
MAX(CASE WHEN data_type = "Dota 2" THEN timespentaday ELSE 0 END) as "Dota 2",
MAX(CASE WHEN data_type = "World of Warcraft" THEN timespentaday ELSE 0 END) as "World of Warcraft",
MAX(CASE WHEN data_type = "Final Fantasy" THEN timespentaday ELSE 0 END) as "Final Fantasy"
FROM
( SELECT *, #A := if(#B = userid, if(#C = data_type, #A + 1, 1), 1) as count_to_use, #B := userid, #C := data_type
FROM
( SELECT userid, timespentaday, data_type
FROM gamers
CROSS JOIN(SELECT #A := 0, #B := 0, #C := '') temp
ORDER BY userid ASC, data_type ASC, timespentaday DESC
) t
HAVING count_to_use = 1
)t1
GROUP BY userid
DEMO
NOTE:
MySQL DOCS is quite clear on warnings about using user defined variables:
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed. The order of
evaluation for expressions involving user variables is undefined and
may change based on the elements contained within a given statement;
in addition, this order is not guaranteed to be the same between
releases of the MySQL Server. In SELECT #a, #a:=#a+1, ..., you might
think that MySQL will evaluate #a first and then do an assignment
second. However, changing the statement (for example, by adding a
GROUP BY, HAVING, or ORDER BY clause) may cause MySQL to select an
execution plan with a different order of evaluation.
I am not going to give you a query with the output format you desire, as implementing that pivot table is going to be a very ugly and poorly performing query, as well as something that is not scalable as the number of distinct games increases.
Instead, I will focus on how to query the data in the most straightforward manner and how to read it into a data structure that would be used by application logic to create the pivot view as desired.
First the query:
SELECT
userid,
data_type,
MAX(timespentaday) AS max_timespent
FROM social_count
GROUP BY userid, data_type
This would give results like
userid data_type max_timespent
------ --------- -------------
1 League of Legends 500
1 Hearthstone 1500
1 Dota 2 700
2 World of Warcraft 1200
2 Final Fantasy 500
Now when reading the results out of the database, you just read it into a structure that is useful. I will use PHP as example language, but this should be pretty easily portable to any langauge
// will hold distinct list of all available games
$games_array = array();
// will hold user data from DB
$user_data = array();
while ($row = /* your database row fetch mechanism here */) {
// update games array as necessary
if (!in_array($row['data_type'], $games_array)) {
// add this game to $games_array as it does not exist there yet
$games_array[] = $row['data_type'];
}
// update users array
$users[$row['userid']][$row['data_type']] = $row['max_timespent'];
}
// build pivot table
foreach($users as $id => $game_times) {
// echo table row start
// echo out user id in first element
// then iterate through available games
foreach($games_array as $game) {
if(!empty($game_times[$game])) {
// echo $game_times['game'] into table element
} else {
// echo 0 into table element
}
}
// echo table row end
}
You will not be able to build a query with a dynamic number of columns. You can do this query if you already know the game list, which I guess is not what you need.
BUT you can always post-process your results with any programming language, so you only have to retrieve the data.
The SQL query would look like this:
SELECT
userid AS User,
data_type AS Game,
max(timespentaday) AS TimeSpentADay
FROM
my_table
GROUP BY
userid
data_type
Then iterate over the results to fill any interface you want
OR
If and only if you can't afford any post-processing of any kind, you can retrieve the list of games first THEN you can build a query like the query below. Please bear in mind that this query is a lot less maintainable than the previous (beside being more difficult to build) and can and will cause you a lot of pain later in debugging.
SELECT
userid AS User,
max(CASE
WHEN data_type = 'Hearthstone' THEN timespentaday
ELSE NULL
END) AS Hearthstone,
max(CASE
WHEN data_type = 'League Of Legends' THEN timespentaday
ELSE NULL
END) AS `League Of Legends`,
...
FROM
my_table
GROUP BY
userid
The CASE contstruction is like an if in a procedural programming language, the following
CASE
WHEN data_type = 'League Of Legends' THEN timespentaday
ELSE NULL
END
Is evaluated to the value of timespentaday if the game is League Of Legends, and to NULL otherwise. The max aggregator simply ignore the NULL values.
Edit: added warning on the second query to explain the caveat of using a generated query thanks to Mike Brant's comment

SELECT COUNT(*) Performance

Lately I discovered that the most consuming requests in my website are the SELECT COUNT(*)
a simply request can take sometimes more than a second
SELECT COUNT(*) as count FROM post WHERE category regexp '[[:<:]](17|222)[[:>:]]' AND approve=1 AND date < '2014-01-25 19:08:17';
+-------+
| count |
+-------+
| 3585 |
+-------+
1 row in set (0.49 sec)
I'm not sure what's the problem I've indexes for category, approve and date.
This is your query:
SELECT COUNT(*) as count
FROM post
WHERE category regexp '[[:<:]](17|222)[[:>:]]' AND approve=1 AND
date < '2014-01-25 19:08:17';
It is not a simple request because the regexp has to run on every row (or every row filtered by the other conditions).
An index on post(approve, date, category) might help. You want one index with the columns listed in that order.
EDIT:
If the values are being stored in a space separated list, you might try this to see if it is faster:
WHERE (concat(' ', category, ' ') like '% 17 %' or concat(' ', category, ' ') like '% 222 %') AND
approve = 1 AND date < '2014-01-25 19:08:17';
It is possible that these expressions are faster than the regular expression.
And, finally, if you really do need to search for "words" in a field, then consider a full text index. I think you might have to tinker with the options in this case so numbers are allowed in the index.

SELECT CASE, COUNT(*)

I want to select the number of users that has marked some content as favorite and also return if the current user has "voted" or not. My table looks like this
CREATE TABLE IF NOT EXISTS `favorites` (
`user` int(11) NOT NULL DEFAULT '0',
`content` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`user`,`content`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Say I have 3 rows containing
INSERT INTO `favorites` (`user`, `content`) VALUES
(11, 26977),
(22, 26977),
(33, 26977);
Using this
SELECT COUNT(*), CASE
WHEN user='22'
THEN 1
ELSE 0
END as has_voted
FROM favorites WHERE content = '26977'
I expect to get has_voted=1 and COUNT(*)=3 but
I get has_voted=0 and COUNT(*)=3. Why is that? How to fix it?
This is because you mixed aggregated and non-aggregated expressions in a single SELECT. Aggregated expressions work on many rows; non-aggregated expressions work on a single row. An aggregated (i.e. COUNT(*)) and a non-aggregated (i.e. CASE) expressions should appear in the same SELECT when you have a GROUP BY, which does not make sense in your situation.
You can fix your query by aggregating the second expression - i.e. adding a SUM around it, like this:
SELECT
COUNT(*) AS FavoriteCount
, SUM(CASE WHEN user=22 THEN 1 ELSE 0 END) as has_voted
FROM favorites
WHERE content = 26977
Now both expressions are aggregated, so you should get the expected results.
Try this with SUM() and without CASE
SELECT
COUNT(*),
SUM(USER = '22') AS has_voted
FROM
favorites
WHERE content = '26977'
See Fiddle Demo
Try this:
SELECT COUNT(*), MAX(USER=22) AS has_voted
FROM favorites
WHERE content = 26977;
Check the SQL FIDDLE DEMO
OUTPUT
| COUNT(*) | HAS_VOTED |
|----------|-----------|
| 3 | 1 |
You need sum of votes.
SELECT COUNT(*), SUM(CASE
WHEN user='22'
THEN 1
ELSE 0
END) as has_voted
FROM favorites WHERE content = '26977'
You are inadvertently using a MySQL feature here: You aggregate your results to get only one result record showing the number of matches (aggregate function COUNT). But you also show the user (or rather an expression built on it) in your result line (without any aggregate function). So the question is: Which user? Another dbms would have given you an error, asking you to either state the user in a GROUP BY or aggregate users. MySQL instead picks a random user.
What you want to do here is aggregate users (or rather have your expression aggregated). Use SUM to sum all votes the user has given on the requested content:
SELECT
COUNT(*),
SUM(CASE WHEN user='22' THEN 1 ELSE 0 END) as sum_votes
FROM favorites
WHERE content = '26977';
You forgot to wrap the CASE statement inside an aggregate function. In this case has_voted will contain unexpected results since you are actually doing a "partial group by". Here is what you need to do:
SELECT COUNT(*), SUM(CASE WHEN USER = 22 THEN 1 ELSE 0 END) AS has_voted
FROM favorites
WHERE content = 26977
Or:
SELECT COUNT(*), COUNT(CASE WHEN USER = 22 THEN 1 ELSE NULL END) AS has_voted
FROM favorites
WHERE content = 26977

Select multiple sums with MySQL query and display them in separate columns

Let's say I have a hypothetical table like so that records when some player in some game scores a point:
name points
------------
bob 10
mike 03
mike 04
bob 06
How would I get the sum of each player's scores and display them side by side in one query?
Total Points Table
bob mike
16 07
My (pseudo)-query is:
SELECT sum(points) as "Bob" WHERE name="bob",
sum(points) as "Mike" WHERE name="mike"
FROM score_table
You can pivot your data 'manually':
SELECT SUM(CASE WHEN name='bob' THEN points END) as bob,
SUM(CASE WHEN name='mike' THEN points END) as mike
FROM score_table
but this will not work if the list of your players is dynamic.
In pure sql:
SELECT
sum( (name = 'bob') * points) as Bob,
sum( (name = 'mike') * points) as Mike,
-- etc
FROM score_table;
This neat solution works because of mysql's booleans evaluating as 1 for true and 0 for false, allowing you to multiply truth of a test with a numeric column. I've used it lots of times for "pivots" and I like the brevity.
Are the player names all known up front? If so, you can do:
SELECT SUM(CASE WHEN name = 'bob' THEN points ELSE 0 END) AS bob,
SUM(CASE WHEN name = 'mike' THEN points ELSE 0 END) AS mike,
... so on for each player ...
FROM score_table
If you don't, you still might be able to use the same method, but you'd probably have to build the query dynamically. Basically, you'd SELECT DISTINCT name ..., then use that result set to build each of the CASE statements, then execute the result SQL.
This is called pivoting the table:
SELECT SUM(IF(name = "Bob", points, 0)) AS points_bob,
SUM(IF(name = "Mike", points, 0)) AS points_mike
FROM score_table
SELECT sum(points), name
FROM `table`
GROUP BY name
Or for the pivot
SELECT sum(if(name = 'mike',points,0)),
sum(if(name = 'bob',points,0))
FROM `table
you can use pivot function also for the same thing .. even by performance vise it is better option to use pivot for pivoting... (i am talking about oracle database)..
you can use following query for this as well..
-- (if you have only these two column in you table then it will be good to see output else for other additional column you will get null values)
select * from game_scores
pivot (sum(points) for name in ('BOB' BOB, 'mike' MIKE));
in this query you will get data very fast and you have to add or remove player name only one place
:)
if you have more then these two column in your table then you can use following query
WITH pivot_data AS (
SELECT points,name
FROM game_scores
)
SELECT *
FROM pivot_data
pivot (sum(points) for name in ('BOB' BOB, 'mike' MIKE));