How to bin arbitrarily in MySQL? - mysql

I have a table with a column that lists ages of users. I want to bin ages in arbitrary groupings (13-17,18-25, etc) and then be able to group by those bins and count users in each group. How can I accomplish this in a query?

SELECT CASE WHEN age BETWEEN 13 AND 17 THEN '13-17'
WHEN age BETWEEN 18 AND 25 THEN '18-25'
ELSE '26+' END AS AgeGroup,
COUNT(*) AS total
FROM MyTable
GROUP BY AgeGroup

SELECT
COUNT(CASE WHEN `age` BETWEEN 13 AND 17 THEN 1 END) `13-17`,
COUNT(CASE WHEN `age` BETWEEN 18 AND 25 THEN 2 END) `18-25`,
COUNT(CASE WHEN `age` > 25 THEN 3 END) `> 25`
FROM tableListOfAges;

Related

How to replace WITH clause in SQL 4.9.5. I am using this in version 8 but my server doesnt supports that

Below is my SQL query. I am not able to run it on db version below version 8. It is perfectly running on version 8. But my live server is behind version 8. Help me in this please.
WITH ages AS
(
SELECT
ROUND(DATEDIFF(Cast(CURRENT_TIMESTAMP() as Date), Cast(dob as Date)) / 365, 0) as age
FROM artisan_bio
)
SELECT
count(case when age between 0 and 24 then 1 end) as age_00_24_cnt,
count(case when age between 25 and 34 then 1 end) as age_25_34_cnt,
count(case when age between 35 and 44 then 1 end) as age_35_44_cnt,
count(case when age between 45 and 54 then 1 end) as age_45_54_cnt,
count(case when age >= 55 then 1 end) as age_55_xx_cnt
FROM ages
This query should work:
SELECT
count(case when age between 0 and 24 then 1 end) as age_00_24_cnt,
count(case when age between 25 and 34 then 1 end) as age_25_34_cnt,
count(case when age between 35 and 44 then 1 end) as age_35_44_cnt,
count(case when age between 45 and 54 then 1 end) as age_45_54_cnt,
count(case when age >= 55 then 1 end) as age_55_xx_cnt
FROM (
SELECT
ROUND(DATEDIFF(Cast(CURRENT_TIMESTAMP() as Date), Cast(dob as Date)) / 365, 0) as age
FROM artisan_bio
) AS ages;
The change moves your first query to be a derived table with the second query.

SQL display age range

I'm writing a query for age range, in which I want to show the count of people of all age ranges eg
AGE PEOPLE
"0-10" 0
"11-20" 2
"21-30" 5
"31-40" 0
"41-50" 1
I've tried using
SELECT SUM(CASE WHEN age < 10 THEN 1 ELSE 0 END) AS [Under 10],
SUM(CASE WHEN age BETWEEN 11 AND 20 THEN 1 ELSE 0 END) AS [11-20],
SUM(CASE WHEN age BETWEEN 21 AND 30 THEN 1 ELSE 0 END) AS [21-30]
FROM people
But it shows ranges as column names
0-10 11-20 21-30 31-40 41-50
0 2 5 0 1
which i dont want.
I have also tried GROUP BY but it didn't show the ranges in which the count was 0.
You can use UNION ALL:
SELECT '[Under 10]' as Age, SUM(CASE WHEN age < 10 THEN 1 ELSE 0 END) as People
FROM people
UNION ALL
SELECT '[11-20]', SUM(CASE WHEN age BETWEEN 11 AND 20 THEN 1 ELSE 0 END)
FROM people
UNION ALL
SELECT '[21-30]', SUM(CASE WHEN age BETWEEN 21 AND 30 THEN 1 ELSE 0 END)
FROM people;
you case when should be like below
CASE WHEN age < 10 then '0-10'
when age age BETWEEN 11 AND 20 then '11-20'
when age BETWEEN 21 AND 30 then '21-30'
..... end as agegroup,--put here more according to your need
count(*)
from table group by agegroup
You need to perform UNION All for this.
SELECT SUM(CASE WHEN age < 10 THEN 1 ELSE 0 END) AS PEOPLE, 'UNDER 10' AS AGE FROM people
UNION ALL
SELECT SUM(CASE WHEN age BETWEEN 11 AND 20 THEN 1 ELSE 0 END) AS PEOPLE, `11-20` FROM people
UNION ALL
SELECT SUM(CASE WHEN age BETWEEN 21 AND 30 THEN 1 ELSE 0 END) , `21-30` FROM people
You want to get the group of result in rows so need to perform UNION in this case.
Please find this link for more info on UNION in MYSQL.link
If you are going to use UNION, use UNION ALL and move the conditions to the WHERE clause:
SELECT '[Under 10]' as Age, COUNT(*)
FROM people
WHERE age < 10
UNION
SELECT '[11-20]', COUNT(*)
FROM people
WHERE BETWEEN 11 AND 20
UNION ALL
SELECT '[21-30]', COUNT(*)
FROM people
WHERE age BETWEEN 21 AND 30;
Filtering and UNION ALL both improve performance. (UNION incurs overhead for removing duplicates).
There are other approaches. For instance, you can unpivot your table:
SELECT grp.age,
(CASE grp
WHEN 1 THEN [Under 10]
WHEN 2 THEN [11-20]
WHEN 3 THEN [21-30]
END)
FROM (SELECT SUM(CASE WHEN age < 10 THEN 1 ELSE 0 END) AS [Under 10]
SUM(CASE WHEN age BETWEEN 11 AND 20 THEN 1 ELSE 0 END) AS [11-20],
SUM(CASE WHEN age BETWEEN 21 AND 30 THEN 1 ELSE 0 END) AS [21-30]
FROM people p
) p CROSS JOIN
(SELECT 1 as grp, '[Under 10]' as age UNION ALL
SELECT 2 as grp, '[11-20]' as age UNION ALL
SELECT 3, as grp, '[21-30]' as age
) grps;
Although this looks more complicated, it is much better from a performance perspective, because it only scans the original table once.
There are other variants as well that only touch the original table once.

Mysql add an extra column at end instead of using a union

I am trying to combine two queries so the data shows up in one table. I am using a union to combine the two queries. However, everything is added to the same column, what do I change so the results from the different queries take up a new column.
Here is an image of the query result.
Here is my code
select * from(
SELECT
CASE
WHEN age BETWEEN 18 and 25 THEN 'Under 25'
WHEN age BETWEEN 25 and 40 THEN '25 - 40'
WHEN age >= 40 THEN 'Over 40'
WHEN age IS NULL THEN 'Not Filled In (NULL)'
END as age_range,
COUNT(*) AS count,
CASE
WHEN age between 18 and 25 THEN 1
WHEN age BETWEEN 25 and 40 THEN 2
WHEN age >= 40 THEN 8
WHEN age IS NULL THEN 9
END as ordinal
FROM (SELECT TIMESTAMPDIFF(YEAR, users.birthdate_on, CURDATE()) AS age FROM users
join subscriptions on users.id = subscriptions.user_id
where users.plan <> 'domain' and users.plan <> '' and users.plan <> 'domain_cpi' and users.birthdate_on is not null
) as derived
GROUP BY age_range
union
SELECT
CASE
WHEN age BETWEEN 18 and 25 THEN 'Under 25'
WHEN age BETWEEN 25 and 40 THEN '25 - 40'
WHEN age >= 40 THEN 'Over 40'
WHEN age IS NULL THEN 'Not Filled In (NULL)'
END as age_range2,
COUNT(*) AS count2,
CASE
WHEN age between 18 and 25 THEN 1
WHEN age BETWEEN 25 and 40 THEN 2
WHEN age >= 40 THEN 8
WHEN age IS NULL THEN 9
END as ordinal
FROM (SELECT TIMESTAMPDIFF(YEAR, users.birthdate_on, CURDATE()) AS age FROM users) as derived2
GROUP BY age_range2
) as test2
ORDER BY ordinal
I want the result so only one under 25 shows, but the two results for under 25 493 and 2046 are in different columns. Same for all other ranges
Sounds like you want to put a JOIN to derived.age_range ON test2.age_range2
SELECT
CASE
WHEN age BETWEEN 18 and 25 THEN 'Under 25'
WHEN age BETWEEN 25 and 40 THEN '25 - 40'
WHEN age >= 40 THEN 'Over 40'
WHEN age IS NULL THEN 'Not Filled In (NULL)'
END as age_range,
CASE
WHEN age between 18 and 25 THEN 1
WHEN age BETWEEN 25 and 40 THEN 2
WHEN age >= 40 THEN 8
WHEN age IS NULL THEN 9
END as ordinal,
count, count2
FROM (
SELECT
derived.age,
COUNT(*) AS count
FROM (
SELECT TIMESTAMPDIFF(YEAR, users.birthdate_on, CURDATE()) AS age FROM users
join subscriptions on users.id = subscriptions.user_id
where users.plan <> 'domain' and users.plan <> '' and users.plan <> 'domain_cpi' and users.birthdate_on is not null
GROUP BY age
) as derived
JOIN
SELECT
derived2.age,
COUNT(*) AS count2
FROM (
SELECT TIMESTAMPDIFF(YEAR, users.birthdate_on, CURDATE()) AS age FROM users
GROUP BY age
) as derived2
ON derived.age = derived2.age
)
ORDER BY ordinal ASC;
I don't believe you need 2 queries just a left join instead. The count() function ONLY increments for non-null values so you can have users counted even if they don't meet the subscription criteria.
SELECT
CASE
WHEN age BETWEEN 18 and 25 THEN 'Under 25'
WHEN age BETWEEN 25 and 40 THEN '25 - 40'
WHEN age >= 40 THEN 'Over 40'
WHEN age IS NULL THEN 'Not Filled In (NULL)'
END as age_range
, CASE
WHEN age between 18 and 25 THEN 1
WHEN age BETWEEN 25 and 40 THEN 2
WHEN age >= 40 THEN 8
WHEN age IS NULL THEN 9
END as ordinal
, COUNT(DISTINCT id) AS user_count # distinct might not be needed
, COUNT(subscriber_id) AS subscriber_count
FROM (
SELECT
users.id
, TIMESTAMPDIFF(YEAR, users.birthdate_on, CURDATE()) AS age
, subscriptions.user_id AS subscriber_id
FROM users
LEFT JOIN subscriptions ON users.id = subscriptions.user_id
AND users.plan <> 'domain'
AND users.plan <> ''
AND users.plan <> 'domain_cpi'
AND users.birthdate_on IS NOT NULL
) d
GROUP BY
CASE
WHEN age BETWEEN 18 and 25 THEN 'Under 25'
WHEN age BETWEEN 25 and 40 THEN '25 - 40'
WHEN age >= 40 THEN 'Over 40'
WHEN age IS NULL THEN 'Not Filled In (NULL)'
END
, CASE
WHEN age between 18 and 25 THEN 1
WHEN age BETWEEN 25 and 40 THEN 2
WHEN age >= 40 THEN 8
WHEN age IS NULL THEN 9
END

Calculate percentage and total after create categories mysql

I've this query
SELECT
trage,
CASE trage
WHEN '<18' THEN SUM(CASE WHEN AGE <18 THEN 1 ELSE 0 END)
WHEN '18-24' THEN SUM(CASE WHEN AGE >= 18 AND AGE <= 24 THEN 1 ELSE 0 END)
WHEN '25-34' THEN SUM(CASE WHEN AGE >= 25 AND AGE <= 34 THEN 1 ELSE 0 END)
WHEN '35-44' THEN SUM(CASE WHEN AGE >= 35 AND AGE <= 44 THEN 1 ELSE 0 END)
WHEN '45-54' THEN SUM(CASE WHEN AGE >= 45 AND AGE <= 54 THEN 1 ELSE 0 END)
WHEN '>=55' THEN SUM(CASE WHEN AGE >= 55 THEN 1 ELSE 0 END)
END Total
FROM
( SELECT
t_personne.pers_date_naissance,
t_personne.pers_date_inscription,
TIMESTAMPDIFF(Year, t_personne.pers_date_naissance, t_personne.pers_date_inscription)
- CASE
WHEN MONTH(t_personne.pers_date_naissance) > MONTH(t_personne.pers_date_inscription)
OR (MONTH(t_personne.pers_date_naissance) = MONTH(t_personne.pers_date_inscription)
AND DAY(t_personne.pers_date_naissance) > DAY(t_personne.pers_date_inscription))
THEN 1 ELSE 0
END AS AGE
FROM t_personne
) AS Total
CROSS JOIN
( SELECT '<18' trage UNION ALL
SELECT '18-24' UNION ALL
SELECT '25-34' UNION ALL
SELECT '35-44' UNION ALL
SELECT '45-54' UNION ALL
SELECT '>=55'
)a
GROUP BY trage
ORDER BY FIELD(trage, '<18', '18-24', '25-34', '35-44', '45-54', '>=55')
it give a table with two columns trage and Total for all categories
How to add a column percentage with a line TOTAL for the column Total and %
Thanks for your help
For the time being, you can't do this. To support this MySQL needs Window Function support which it still doesn't have. If you need functions like these I would recommend switching to PostgreSQL.
Also take a look at this question: MySql using correct syntax for the over clause

Grouping items between 2 numbers

I have a query that looks like this:
select
price,
item_id,
sum(price),
count(item_id)
from transactions
group by
(price <= 20),
(price between 21 and 30),
(price between 31 and 40),
(price between 41 and 50),
(price > 50)
I have never done a group like this before when I wrote it I was just guessing to see if the query was even valid, and it was. But my question is, is it really getting me what I want?
I want all transactions grouped by:
Items that cost less than or equal to $20
Items that cost between $21 and $30
Items that cost between $31 and $40
Items that cost between $41 and $50
Items that cost more than $50
So, is that query doing what I am asking?
The way to do this in standard SQL (and MySQL) is to use the case statement. Also, I put the definition in a subquery like this:
select pricegrp, sum(price), count(item_id)
from (select t.*,
(case when price <= 20 then '00-20'
when price between 21 and 30 then '21-30'
when price between 31 and 40 then '31-40'
when price between 41 and 50 then '41-50'
when price > 50 then '50+'
end) as pricegrp
from transactions t
) t
group by pricegrp
Also, do you want to group by item_id as well? Or are you just trying to return one arbitrary item? Based on what you want, I'm removing the item_id from the select clause. It doesn't seem necessary.
Your query actually does work in MySQL, in the sense that it runs. It is going to produce one row for each group that you want, so in that sense it "works". However, within each group, it is going to choose an arbitrary price and item_id. These are not explicitly mentioned in the group by clause, so you are using a MySQL (mis)feature called Hidden Columns. Different runs of the query or slight changes to the data or slight changes to the query can change the values of price and item_id returned for each group.
I strongly suggest that you actually name the group. This makes the query and the output much clearer.
Also, I recommend that you get in the habit of putting all columns in the select in the group by clause. There are a few cases where hidden columns are actually useful, but I think, in general, you should depend on them sparingly.
If the price is not stored as an integer, then correct logic is:
select pricegrp, sum(price), count(item_id)
from (select t.*,
(case when price <= 20 then '00-20'
when price <= 30 then '21-30'
when price <= 40 then '31-40'
when price <= 50 then '41-50'
when price > 50 then '50+'
end) as pricegrp
from transactions t
) t
group by pricegrp
SELECT
price,
item_id,
sum(price),
count(item_id),
IF(price<=20,0,IF(price<=30,1,IF(price<=40,2,IF(price<=50,3,4)))) AS pricegroup
FROM transactions
GROUP BY pricegroup
or even
SELECT
price,
item_id,
sum(price),
count(item_id)
FROM transactions
GROUP BY
IF(price<=20,0,IF(price<=30,1,IF(price<=40,2,IF(price<=50,3,4))))
SELECT price,
item_id,
SUM(CASE WHEN price <= 20 THEN price ELSE 0 END) `(price <= 20) SUM`,
SUM(CASE WHEN price <= 20 THEN 1 ELSE 0 END) `(price <= 20) COUNT`,
SUM(CASE WHEN price between 21 and 30 THEN price ELSE 0 END) `(price <= 20) SUM`,
SUM(CASE WHEN price between 21 and 30 THEN 1 ELSE 0 END) `(price <= 20) COUNT`,
SUM(CASE WHEN price between 31 and 40 THEN price ELSE 0 END) `price between 31 and 40 SUM`,
SUM(CASE WHEN price between 31 and 40 THEN 1 ELSE 0 END) `price between 31 and 40 COUNT`,
SUM(CASE WHEN price between 41 and 50 THEN price ELSE 0 END) `price between 41 and 50 SUM`,
SUM(CASE WHEN price between 41 and 50 THEN 1 ELSE 0 END) `price between 41 and 50 COUNT`,
SUM(CASE WHEN price > 50 THEN price ELSE 0 END) `price > 50 SUM`,
SUM(CASE WHEN price > 50 THEN 1 ELSE 0 END) `price > 50 COUNT`
FROM transactions
GROUP BY price, item_id