SQL — segment by groups [closed] - mysql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
In a MySQL DB, I have a purchases table that has these columns:
USERID PURCHASE_AMOUNT
3 20
9 30
3 5
4 5
1 10
1 5
I would like to generate a report like this
SUM_OF_PURCHASES_RANGE NUM_OF_USERS
0-1 0
1-5 1
5-20 1
20-30 2
Where it means: there are 0 users who bought up to 1(SUM of purchases) (inclusive), there are 1 users who bought between 1 to 5 etc...
What query should I use to generate it?

You can create the range using a UNION, and just LEFT JOIN to that to get all categories; (edited for your change in the desired result)
SELECT CONCAT(base.lower,'-',base.upper) PURCHASE_RANGE, COUNT(userid) NUM_OF_USERS
FROM (
SELECT 0 lower, 1 upper UNION SELECT 2, 5 UNION SELECT 6,20 UNION SELECT 21,30
) base
LEFT JOIN (
SELECT userid, SUM(purchase_amount) pa FROM purchases GROUP BY userid
) p
ON p.pa >= base.lower AND p.pa <= base.upper
GROUP BY base.upper
An SQLfiddle to test with.

More easier syntax :
SELECT PURCHASE_RANGE , COUNT(*) as NUM_OF_USERS
FROM
(
SELECT
CASE
WHEN PURCHASE_AMOUNT <= 1 THEN 1
WHEN PURCHASE_AMOUNT > 1 AND PURCHASE_AMOUNT <= 5 THEN 5
WHEN PURCHASE_AMOUNT > 5 AND PURCHASE_AMOUNT <= 10 THEN 10
WHEN PURCHASE_AMOUNT > 10 AND PURCHASE_AMOUNT <= 20 THEN 20
WHEN PURCHASE_AMOUNT > 20 AND PURCHASE_AMOUNT <= 30 THEN 30 END AS PURCHASE_RANGE
FROM Table1
) AS A
GROUP BY PURCHASE_RANGE
ORDER BY PURCHASE_RANGE
SqlFiddle

try this
select PURCHASE_RANGE , NUM_OF_USERS
from (
select 1 as PURCHASE_RANGE ,count(*) as NUM_OF_USERS from table1 where PURCHASE_AMOUNT between 0 and 1
union all
select 5 ,count(*) from table1 where PURCHASE_AMOUNT between 1 and 5
union all
select 20 ,count(*) from table1 where PURCHASE_AMOUNT between 6 and 20
union all
select 30 ,count(*) from table1 where PURCHASE_AMOUNT between 21 and 30
)t
DEMO HERE

There are faster ways to do this if you need the performance (this will do a full table scan), but try this:
SELECT
SUM(CASE WHEN purchase_amount BETWEEN 0 AND 1 THEN 1 ELSE 0) bucket_0_to_1,
SUM(CASE WHEN purchase_amount BETWEEN 1 AND 5 THEN 1 ELSE 0) bucket_1_to_5,
SUM(CASE WHEN purchase_amount BETWEEN 5 AND 20 THEN 1 ELSE 0) bucket_5_to_20,
SUM(CASE WHEN purchase_amount BETWEEN 20 AND 30 THEN 1 ELSE 0) bucket_20_to_30,
SUM(CASE WHEN purchase_amount > 30 THEN 1 ELSE 0) bucket_over_30, FROM my_table LIMIT 1;

To get the values you want in rows, you need to start with a driver table that has all the values you are interested in, and then left outer join to the data:
select driver.mina, coalesce(sum(cnt), 0) as Num_Of_Users
from (select 1 as mina, 5 as maxa union all
select 5, 10 union all
select 10, 20 union all
select 20, 30 union all
select 30, NULL
) driver left outer join
(select purchase_amount, count(*) as cnt
from purchases
group by purchase_amount
) pa
on driver.mina >= pa.purchase_amount and
(pa.purchase_amount < driver.maxa or driver.maxa is null)
group by driver.mina
order by driver.mina
You can actually do this without the inner group by. That is likely to reduce the size of the data significantly (especially in your example) before join.
I would encourage you to include both the lower and upper bounds of the range on each row.

This might be easier if the ranges will ever change.
with ranges(rstart, rfinish) as (
select 0, 1 union all
select 2, 5 union all
select 6, 20 union all
select 21, 30
), purchases(amount) as (
select sum(PURCHASE_AMOUNT)
from <purchases_basetable> -- <-- your tablename goes here
group by USERID
)
select
-- concat(case when r.rstart = 0 then 0 else r.rstart-1 end, '-', r.rfinish) as SUM_OF_PURCHASES_RANGE /* op's name for the group */,
concat(r.rstart, '-', r.rfinish) as SUM_OF_PURCHASES_RANGE /* better name for the group */,
count(*) as NUM_OF_USERS
from
purchases as p inner join
ranges as r
on p.amount between r.start and r.finish
group by r.rstart, r.rfinish
order by r.rstart, r.rfinish
I don't know what the mysql query plan will look like. It's trivial to change the query to use derived tables rather than table expressions. (But I include it below anyway.)
You might also find the UNPIVOT operation to be useful on a platform that supports it.
select
-- concat(case when r.rstart = 0 then 0 else r.rstart-1 end, '-', r.rfinish) as SUM_OF_PURCHASES_RANGE /* op's name for the group */,
concat(r.rstart, '-', r.rfinish) as SUM_OF_PURCHASES_RANGE /* better name for the group */,
count(*) as NUM_OF_USERS
from
(
select sum(PURCHASE_AMOUNT) as amount
from <purchases_basetable> -- <-- your tablename goes here
group by USERID
) as p inner join
(
select 0 as rstart, 1 as rfinish union all
select 2, 5 union all
select 6, 20 union all
select 21, 30
) as r
on p.amount between r.start and r.finish
group by r.rstart, r.rfinish
order by r.rstart, r.rfinish

Related

get rows grouped by just when state changed to opposite values from previous

I have table:
ID STATUS
----------
0 0
1 1
2 0
3 1
4 2
5 2
6 0
7 3
8 2
9 0
10 1 etc.
I want to get only first occurences of 1 and 2 when they change state (I mean 0 and 3 are not important to me) - so in this case I should get ids: 1, 4, 10. I tried group by but it only groups all values by 1 or 2, and not just the cases when states has changed.
Any idea please how to specify mysql query?
So, it's a slow day...
SELECT MIN(id)
FROM
( SELECT x.*
, CASE WHEN #prev=status THEN #i:=#i ELSE #i:=#i+1 END i
, #prev:=status prev
FROM
( SELECT *
FROM my_table
WHERE status IN (1,2)
) x
JOIN
( SELECT #prev:=1,#i:=1 ) vars
ORDER
BY id
) a
GROUP
BY i;
SELECT id
, status
, if( status = #switch
and if( #switch = 1, #switch:=2, #switch:=1 ) in(1,2)
, 1
, 0
) sw
FROM status
JOIN( SELECT #switch :=1 ) t
WHERE status in (1,2)
HAVING sw=1
ORDER BY id

How to reduce SQL queries in only one in this case

I want to save the hassle of doing many querys for the following:
I have a table like this:
name, age
{
Mike, 7
Peter, 2
Mario, 1
Tony, 4
Mary, 2
Tom, 7
Jerry, 3
Nick, 2
Albert, 22
Steven, 7
}
And I want the following result:
Results(custom_text, num)
{
1 Year, 1
2 Year, 3
3 Year, 1
4 Year, 1
5 Year, 0
6 Year, 0
7 Year, 3
8 Year, 0
9 Year, 0
10 Year, 0
More than 10 Year, 1
}
I know how to do this but in 11 queries :( But how to simplify it?
EDIT:
Doing the following, I can obtain the non zero values, but I need the zeroes in the right places.
SELECT COUNT(*) AS AgeCount
FROM mytable
GROUP BY Age
How can I achieve this?
Thanks for reading.
you can use below query but it will not show the gaps if you want gaps then the use Linoff's answer:
select t.txt, count(t.age) from
(select
case
when age<11 then concat(age ,' year')
else 'more than 10'
end txt, age
from your_table)t
group by t.txt
order by 1
SQL FIDDLE DEMO
You can use left join and a subquery to get what you want:
select coalesce(concat(ages.n, ' year'), 'More than 10 year') as custom_text,
count(*)
from (select 1 as n union all select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7 union all select 8 union all
select 9 union all select 10 union all select null
) ages left join
tabla t
on (t.age = ages.n or ages.n is null and t.age > 10)
group by ages.n;
EDIT:
I think the following is a better way to do this query:
select (case when least(age, 11) = 11 then 'More than 10 year'
else concat(age, ' year')
end) as agegroup, count(name)
from (select 1 as age, NULL as name union all
select 2, NULL union all
select 3, NULL union all
select 4, NULL union all
select 5, NULL union all
select 6, NULL union all
select 7, NULL union all
select 8, NULL union all
select 9, NULL union all
select 10, NULL union all
select 11, NULL
union all
select age, name
from tabla t
) t
group by least(age, 11);
Basically, the query need a full outer join and MySQL does not provide one. However, we can get the same result by adding in extra values for each age, so we know something is there. Then because name is NULL, the count(name) will return 0 for those rows.
Please try using this query for required output.
SQL FIDDLE link http://www.sqlfiddle.com/#!9/4e52a/6
select coalesce(concat(ages.n, ' year'), 'More than 10 year') as custom_text,
count(t.age) from (select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10 union all select null ) ages left join tabla t
on (case when ages.n<11 then t.age = ages.n else t.age > 10 end)
group by ages.n;

MySQL nested select: am I able to use any data from outer select?

Let's imagine I have the following table users:
id name
1 John
2 Mike
3 Max
And table posts
id author_id date title
1 1 2014-12-12 Post 2
2 1 2014-12-10 Post 1
3 2 2014-10-01 Lorem ipsum
...and so on
And I'd like to have a query containing the following data:
user name
number of user's posts within last week
number of user's posts within last month
I can do it just for each individual user (with id 1 in the following example):
SELECT
`name`,
(SELECT COUNT(*) FROM `posts`
WHERE
`author` = 1 AND
UNIX_TIMESTAMP()-UNIX_TIMESTAMP(`date`) < 7*24*3600) AS `posts7`,
(SELECT COUNT(*) FROM `posts`
WHERE
`author` = 1 AND
UNIX_TIMESTAMP()-UNIX_TIMESTAMP(`date`) < 30*24*3600) AS `posts30`
FROM `users`
WHERE `id` = 1
I suspect that MySQL will allow do this for all users within one query if I could exchange the data between inner and outer SELECT's. I probably using wrong words, but I really hope that people here will understand my needs and I will have some help.
This is NOT the final SQL but gives you the jist...
Select name, sum(case when datewithin7 then 1 else 0 end) as posts7,
sum(case when datewithin30 then 1 else 0 end) as posts30
from name
left join posts on name.id = posts.nameid
GROUP BY name.
Note you need the group by. but I don't have the time to put the case statement together...
Try something like this
SELECT
`name`,
sum(IF(`date` between DATE(NOW()-INTERVAL 7 DAY) and now() , 1, 0) as posts7,
sum(IF(`date` between DATE(NOW()-INTERVAL 30 DAY) and now() , 1, 0) as posts30
FROM
`users` as u, posts as p
WHERE
u.id = p.author_id
GROUP BY
1
Certainly aggregating a non-nested query is the way to solve the problem although both Benni and xQbert have written unbounded queries - which, while satisfying the objective, are very innefficient. Consider (adapted from Benni's answer):
SELECT `name`
, SUM(IF(
`date` between DATE(NOW()-INTERVAL 7 DAY) and now()
, 1
, 0) as posts7
, SUM(IF(
p.author_id IS NULL
, 0
, 1) as posts30
FROM `users` as u
LEFT JOIN posts as p
ON u.id = p.author_id
AND p.date > NOW()-INTERVAL 30 DAY
GROUP BY name
Note that NOT using the conversion to a UNIX timestamp allows the database to use an index (if available) to resolve the query.
However there are scenarios where it is more effective / appropriate to use a nested query. So although it's not the best solution to this problem:
SELECT
`name`
, (SELECT COUNT(*)
FROM `posts` AS p7
WHERE p7.author = users.id
AND p7.`date` > NOW() - INTERVAL 7 DAY) AS `posts7`
, (SELECT COUNT(*)
FROM `posts` AS p30
WHERE p30.author = users.id
AND p30.`date` > NOW() - INTERVAL 30 DAY) AS `posts30`
FROM `users`
WHERE `id` = 1

count consecutive number of 10 days when number is = 0 or > 10

Here is sqlfiddle that i made with mysql query
http://sqlfiddle.com/#!2/f2794/4
It count 10 consecutive days when present = 0, but i need to add second condition to count where present is > 10.
For example
11
22
0
0
0
0
0
0
0
0
0
0
0
0
1
should count 14
here is that query
select sum(count) total from (
SELECT COUNT(present) as count FROM (
SELECT
IF((q.present != 0), #rownum:=#rownum+1, #rownum:=#rownum) AS rownumber, #prevDate:=q.date, q.*
FROM (
SELECT
name
, date
, present
FROM
teacher, (SELECT #rownum:=0, #prevDate:='') vars
WHERE date BETWEEN '2013-07-01' AND '2013-07-31'
ORDER BY date, present
) q
) sq
GROUP BY present, rownumber
HAVING COUNT(*) >= 10
) d
So if U can help me, pls do it :)
best regards
m.
I dont really understand your query overly well, but I think simply changing (q.Present != 0) to incorporate the additional test should solve your problem:
SELECT sum(count) total from (
SELECT COUNT(present) as count FROM (
SELECT
IF((q.present != 0 AND q.present <= 10), #rownum:=#rownum+1, #rownum:=#rownum) AS rownumber, #prevDate:=q.date, q.*
FROM (
SELECT
name
, date
, present
FROM
teacher, (SELECT #rownum:=0, #prevDate:='') vars
WHERE date BETWEEN '2013-07-01' AND '2013-07-31'
ORDER BY date, present
) q
) sq
GROUP BY present, rownumber
HAVING COUNT(*) >= 10
) d

find missing dates from date range

I have query regarding get the dates which are not exists in database table.
I have below dates in database.
2013-08-02
2013-08-02
2013-08-02
2013-08-03
2013-08-05
2013-08-08
2013-08-08
2013-08-09
2013-08-10
2013-08-13
2013-08-13
2013-08-13
and i want the result which is expected as below,
2013-08-01
2013-08-04
2013-08-06
2013-08-07
2013-08-11
2013-08-12
as you can see result has six dates which are not present into database,
i have tried below query
SELECT
DISTINCT DATE(w1.start_date) + INTERVAL 1 DAY AS missing_date
FROM
working w1
LEFT JOIN
(SELECT DISTINCT start_date FROM working ) w2 ON DATE(w1.start_date) = DATE(w2.start_date) - INTERVAL 1 DAY
WHERE
w1.start_date BETWEEN '2013-08-01' AND '2013-08-13'
AND
w2.start_date IS NULL;
but above return following result.
2013-08-04
2013-08-14
2013-08-11
2013-08-06
as you can see its giving me back four dates from that 14 is not needed but its still not contain 3 dates its because of left join.
Now please look into my query and let me know what are the best way i can do this?
Thanks for looking and giving time.
I guess you could always generate the date sequence and just use a NOT IN to eliminate the dates that actually exist. This will max out at a 1024 day range, but is easy to shrink or extend, the date column is called "mydate" and is in the table "table1";
SELECT * FROM (
SELECT DATE_ADD('2013-08-01', INTERVAL t4+t16+t64+t256+t1024 DAY) day
FROM
(SELECT 0 t4 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 ) t4,
(SELECT 0 t16 UNION ALL SELECT 4 UNION ALL SELECT 8 UNION ALL SELECT 12 ) t16,
(SELECT 0 t64 UNION ALL SELECT 16 UNION ALL SELECT 32 UNION ALL SELECT 48 ) t64,
(SELECT 0 t256 UNION ALL SELECT 64 UNION ALL SELECT 128 UNION ALL SELECT 192) t256,
(SELECT 0 t1024 UNION ALL SELECT 256 UNION ALL SELECT 512 UNION ALL SELECT 768) t1024
) b
WHERE day NOT IN (SELECT mydate FROM Table1) AND day<'2013-08-13';
From the "I would add an SQLfiddle if it wasn't down" dept.
Thanks for help here is the query i am end up with and its working
SELECT * FROM
(
SELECT DATE_ADD('2013-08-01', INTERVAL t4+t16+t64+t256+t1024 DAY) missingDates
FROM
(SELECT 0 t4 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 ) t4,
(SELECT 0 t16 UNION ALL SELECT 4 UNION ALL SELECT 8 UNION ALL SELECT 12 ) t16,
(SELECT 0 t64 UNION ALL SELECT 16 UNION ALL SELECT 32 UNION ALL SELECT 48 ) t64,
(SELECT 0 t256 UNION ALL SELECT 64 UNION ALL SELECT 128 UNION ALL SELECT 192) t256,
(SELECT 0 t1024 UNION ALL SELECT 256 UNION ALL SELECT 512 UNION ALL SELECT 768) t1024
) b
WHERE
missingDates NOT IN (SELECT DATE_FORMAT(start_date,'%Y-%m-%d')
FROM
working GROUP BY start_date)
AND
missingDates < '2013-08-13';
My bet would be probably to create a dedicated Calendar table just to be able to use it on a LEFT JOIN.
You could create the table on per need basis, but as it will not represent a such large amount of data, the simplest and probably most efficient approach is to create it once for all, as I do below using a stored procedure:
--
-- Create a dedicated "Calendar" table
--
CREATE TABLE Calendar (day DATE PRIMARY KEY);
DELIMITER //
CREATE PROCEDURE init_calendar(IN pStart DATE, IN pEnd DATE)
BEGIN
SET #theDate := pStart;
REPEAT
-- Here I use *IGNORE* in order to be able
-- to call init_calendar again for extend the
-- "calendar range" without to bother with
-- "overlapping" dates
INSERT IGNORE INTO Calendar VALUES (#theDate);
SET #theDate := #theDate + INTERVAL 1 DAY;
UNTIL #theDate > pEnd END REPEAT;
END; //
DELIMITER ;
CALL init_calendar('2010-01-01','2015-12-31');
In this example, the Calendar hold 2191 consecutive days, which represent at a roughly estimate less that 15KB. And storing all the dates from the 21th century will represent less that 300KB...
Now, this is your actual data table as described in the question:
--
-- *Your* actual data table
--
CREATE TABLE tbl (theDate DATE);
INSERT INTO tbl VALUES
('2013-08-02'),
('2013-08-02'),
('2013-08-02'),
('2013-08-03'),
('2013-08-05'),
('2013-08-08'),
('2013-08-08'),
('2013-08-09'),
('2013-08-10'),
('2013-08-13'),
('2013-08-13'),
('2013-08-13');
And finally the query:
--
-- Now the query to find date not "in range"
--
SET #start = '2013-08-01';
SET #end = '2013-08-13';
SELECT Calendar.day FROM Calendar LEFT JOIN tbl
ON Calendar.day = tbl.theDate
WHERE Calendar.day BETWEEN #start AND #end
AND tbl.theDate IS NULL;
Producing:
+------------+
| day |
+------------+
| 2013-08-01 |
| 2013-08-04 |
| 2013-08-06 |
| 2013-08-07 |
| 2013-08-11 |
| 2013-08-12 |
+------------+
This is how i would do it:
$db_dates = array (
'2013-08-02',
'2013-08-03',
'2013-08-05',
'2013-08-08',
'2013-08-09',
'2013-08-10',
'2013-08-13'
);
$missing = array();
$month = "08";
$year = "2013";
$day_start = 1;
$day_end = 14
for ($i=$day_start; $i<$day_end; $i++) {
$day = $i;
if ($i<10) {
$day = "0".$i;
}
$check_date = $year."-".$month."-".$day;
if (!in_array($check_date, $db_dates)) {
array_push($missing, $check_date);
}
}
print_r($missing);
I made it just to that interval but you can just define another interval or make it work for the whole year.
I'm adding this to the excellent answer by Dipesh if anybody wants more than 1024 days (or hours). I generated below 279936 hours from 2015 to 2046:
SELECT
DATE_ADD('2015-01-01', INTERVAL
POWER(6,6)*t6 + POWER(6,5)*t5 + POWER(6,4)*t4 + POWER(6,3)*t3 + POWER(6,2)*t2 +
POWER(6,1)*t1 + t0
HOUR) AS period
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t4,
(SELECT 0 t5 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t5,
(SELECT 0 t6 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) t6
ORDER BY period
just plug this into the answer query.
The way I would solve in this in a datawarehouse-type situation is to populate a "static" table with dates over an appropriate period (there are example scripts for this type of thing which are easy to google) and then left outer join or right outer join your table to it: rows where there are no matches are the missing dates.
DECLARE #date date;
declare #dt_cnt int = 0;
set #date='2014-11-1';
while #date < '2014-12-31'
begin
select #dt_cnt = COUNT(att_id) from date_table where att_date=#date ;
if(#dt_cnt = 0)
BEGIN
print #date
END
set #date = DATEADD(day,1,#date);
end