sql group by with double conditions - mysql

I need to get the amount of distinct parent_ids that fill in one of the conditions below , grouped by day:
parent_ids that have both status = pending & processing
OR
parent_ids who have both status = canceled and processing.
I ve tried something similar to :
SELECT count(parent_id) as pencan, created_at, DATE_FORMAT(a.created_at, '%Y') AS year_key, DATE_FORMAT(a.created_at, '%m-%d') as day_key
FROM sales_flat_order_status_history
where created_at BETWEEN '2010-01-01 00:00:00' AND '2013-04-30 23:59:59'
GROUP BY created_at ,parent_id
HAVING SUM(status = 'processing')
AND SUM(status IN ('pending', 'cancelling'))

I think you just need to fix the group by:
SELECT DATE(created_at), count(parent_id) as pencan
FROM sales_flat_order_status_history
where created_at >= '2010-01-01' AND
created_at < '2013-05-01'
GROUP BY DATE(created_at) , parent_id
HAVING SUM(status = 'processing') AND
SUM(status IN ('pending', 'cancelling'))

Related

Include main queries results to sub queries condition

I have a table "products" and a table "links". every product can have multiple links, each link can have many sales, clicks and impressions, but doesn't necessarily have all of them. I want to get a list of links of a certain product matching some criteria for them. I want to get this data grouped per day and campaign and link banner size.
The following query works correctly, but it's much slower than it could be. The problem is that the subqueries get the data for all link ids and it's just filtered in the end. The overall query would become much faster if the sub queries included something like
where link_id IN (...) but I only know the link_ids from the main query, not before
if I try to add
where link_id = l.id
it's obviously an unknown column, because the sub query doesn't have access to the main queries results.
how can I let the sub queries only look up data for the link_Ids that the main query found? I could split it up to 2 complete separate queries, but is this possible in one query?
select p.id, p.name, l.id, l.banner_size,
coalesce(sum(case when t1.col = 'sales' then ct else 0 end), 0) as total_sales,
coalesce(sum(case when t1.col = 'clicks' then ct else 0 end), 0) as total_clicks,
coalesce(sum(case when t1.col = 'impressions' then ct else 0 end), 0) as total_impressions,
t1.dt
from links l
inner join products p
on p.id = l.product_id
left join
(
select count(1) as ct, link_id, date(clicked) dt, 'sales' as col
from sales
where clicked >= '2020-01-01 00:00:00' and clicked <= '2020-01-31 00:00:00'
group by date(clicked), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'clicks'
from clicks_source1
where created >= '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'clicks'
from clicks_source2
where time >= '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
union all
select count(1) as ct, link_id, date(created) dt, 'impressions'
from impression_source1
where created > '2020-01-01 00:00:00' and created <= '2020-01-31 00:00:00'
group by date(created), link_id
union all
select count(1) as ct, link_id, date(time) dt, 'impressions'
from impression_source2
where time > '2020-01-01 00:00:00' and time <= '2020-01-31 00:00:00'
group by date(time), link_id
) t1 on t1.link_id = l.id
where l.agent_id = 300
and p.id = 3454
and l.banner_size = 48
and p.company NOT IN (13592, 28189)
group by c.id, l.banner_size, t1.dt
having (total_sales + total_clicks + total_impressions) > 0
order by dt DESC, p.id ASC, l.banner_size ASC
you can just add inner joins with links to all the subqueries
select count(1) as ct, s.link_id, date(s.clicked) dt, 'sales' as col
from sales s
join links l1 on l1.id = s.link_id
where s.clicked >= '2020-01-01 00:00:00' and s.clicked <= '2020-01-31 00:00:00'
group by date(s.clicked), s.link_id
union all etc.
Then you'll only get the rows with a match and the query should run faster

How to combine the results of two MySQL queries into one?

Suppose I have a table like this:
How can I count the number of data that occur at the day of 2018-09-07 for each person, and the number of data that occur at the month 2018-09 for each person?
I mean I want to create a table like this:
I know that
SELECT
name,
COUNT(*) AS day_count_2018_09_07
FROM data_table
WHERE
arrive_time >= '2018-09-07 00:00:00'
AND
arrive_time <= '2018-09-07 23:59:59'
GROUP BY name;
can generate the number of data that occur at the day of 2018-09-07 for each person, and
SELECT
name,
COUNT(*) AS month_count_2018_09
FROM data_table
WHERE
arrive_time >= '2018-09-01 00:00:00'
AND
arrive_time <= '2018-09-30 23:59:59'
GROUP BY name;
can generate the number of data that occur at the month 2018-09 for each person.
But I don't know how to combine the above two queries so that day_count_2018_09_07 and month_count_2018_09 columns can be created in one query.
Here's the SQL fiddle where you can directly get the data in my question.
You can use conditional aggregation to get both results from the same query:
SELECT name,
SUM(CASE WHEN SUBSTR(DATE(arrive_time),1,7)='2018-09' THEN 1 ELSE 0 END) AS month_count_2018_09,
SUM(CASE WHEN DATE(arrive_time)='2018-09-07' THEN 1 ELSE 0 END) AS day_count_2018_09_07
FROM data_table
GROUP BY name
Output:
name month_count_2018_09 day_count_2018_09_07
Ben 3 0
Jane 1 1
John 3 2
Try to combine them like that:
Select DayCounter.name, DayCounter.day_count_2018_09_07, MonthCounter.month_count_2018_09
from
(SELECT
name,
COUNT(*) AS day_count_2018_09_07
FROM data_table
WHERE
arrive_time >= '2018-09-07 00:00:00'
AND
arrive_time <= '2018-09-07 23:59:59'
GROUP BY name) as DayCounter
Inner Join
(SELECT
name,
COUNT(*) AS month_count_2018_09
FROM data_table
WHERE
arrive_time >= '2018-09-01 00:00:00'
AND
arrive_time <= '2018-09-30 23:59:59'
GROUP BY name) as MonthCounter
On DayCounter.name = MonthCounter.name
What about something like this:
SELECT
name,
SUM(CASE WHEN (arrive_time BETWEEN '2018-09-07 00:00:00' AND '2018-09-07 23:59:59') THEN 1 ELSE 0 END) AS day_count_2018_09_07,
SUM(CASE WHEN (arrive_time BETWEEN '2018-09-01 00:00:00' AND '2018-09-30 23:59:59') THEN 1 ELSE 0 END) AS month_count_2018_09
FROM
data_table
GROUP BY
name;

Better performance in MySQL subqueries for timeline graph

i have a query with subqueries for a timeline widget of participants, leads and customers.
For example with 15k rows in the table but only 2k in this date range (January 1st to January 28th) this takes about 40 seconds!
SELECT created_at as date,
(
SELECT COUNT(id)
FROM participant
WHERE created_at <= date
) as participants,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "lead"
AND created_at <= date
) as leads,
(
SELECT COUNT(DISTINCT id)
FROM participant
WHERE participant_type = "customer"
AND created_at <= date
) as customer
FROM participant
WHERE created_at >= '2016-01-01 00:00:00'
AND created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
How can i improve the performance?
The table fields are declared as follows:
id => primary_key, INT 10, auto increment
participant_type => ENUM "lead,customer", NULLABLE, ut8_unicode_ci
created_at => TIMESTAMP, default '0000-00-00 00:00:00'
Possibly try using conditions within the counts (or sums) to get the values you want, having cross joined things:-
SELECT a.created_at as date,
SUM(IF(b.created_at <= a.created_at, 1, 0)) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead" AND b.created_at <= a.created_at, b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer" AND b.created_at <= a.created_at, b.id, NULL)) AS customer
FROM participant a
CROSS JOIN participant b
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
or maybe move the date check into the join
SELECT a.created_at as date,
COUNT(b.id) AS participants,
COUNT(DISTINCT IF(b.participant_type = "lead", b.id, NULL)) AS leads,
COUNT(DISTINCT IF(b.participant_type = "customer", b.id, NULL)) AS customer
FROM participant a
LEFT OUTER JOIN participant b
ON b.created_at <= a.created_at
WHERE a.created_at >= '2016-01-01 00:00:00'
AND a.created_at <= '2016-01-28 23:59:59'
GROUP BY date(date)
I'm not clearly understanding what you want to do with this query. But may I can provide way for optimization.
Try this one:
SELECT
participants.day as day,
participants.total_count,
leads.lead_count,
customer.customer_count
FROM
(
SELECT created_at as day, COUNT(id) as total_count
FROM participant
WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as participants
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as lead_count
FROM participant
WHERE participant_type = "lead"
AND created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as leads ON (participants.day = leads.day)
LEFT JOIN
(
SELECT created_at as day, COUNT(DISTINCT id) as customer_count
FROM participant
WHERE participant_type = "customer"
AND WHERE created_at BETWEEN '2016-01-01 00:00:00' AND '2016-01-28 23:59:59'
GROUP BY day
) as customer ON (participants.day = customer.day)
Add index to the query. You can execute Explain on this query.
With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows.

Query Help Between Date Range, but Not Recent

I'm know this can be written as a single SQL statement, but I just don't know how to do it. I have two separate queries. Ont that pulls all orders from a specific period of last year
SELECT * FROM `order` WHERE date_added BETWEEN '2014-10-01' AND '2014-11-01';
and one that pulls from the last month
SELECT * FROM `order` WHERE date_added BETWEEN DATE_SUB( now(), INTERVAL 1 MONTH) AND Now() ORDER BY date_added ASC
What I want to do is now join the two so that I only get the customer_id of orders that were placed inside of the date range last year (query 1), but haven't placed an order in the last month (query 2).
I know there is a way to set this up as a join, but my knowledge on sql join's is not very limited. Thanks for any help.
http://sqlfiddle.com/#!9/35ed0/1
SELECT * FROM `order`
WHERE date_added BETWEEN '2014-10-01' AND '2014-11-01'
AND customer_id NOT IN (
SELECT DISTINCT customer_id FROM `order`
WHERE date_added BETWEEN DATE_SUB( now(), INTERVAL 1 MONTH) AND Now())
UPDATE If you need only 1 records per customer_id, here is an example . It is not very best from performance perspective. But it returns only last (according to the date_added column) order per customer.
SELECT t.*,
if(#fltr=customer_id, 0, 1) fltr,
#fltr:=customer_id
FROM (SELECT *
FROM `order`
WHERE date_added BETWEEN '2014-10-01' AND '2014-11-01'
AND customer_id NOT IN (
SELECT DISTINCT customer_id FROM `order`
WHERE date_added BETWEEN DATE_SUB( now(), INTERVAL 1 MONTH) AND Now())
ORDER BY customer_id, date_added DESC
) t
HAVING (fltr=1);
I usually use a correlated not exists predicate for this as I feel that it corresponds well with the intent of the question:
SELECT *
FROM `order` o1
WHERE date_added BETWEEN '2014-10-01' AND '2014-11-01'
AND NOT EXISTS (
SELECT 1
FROM `order` o2
WHERE date_added BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
AND o1.customer_id = o2.customer_id
);
I like to approach these questions using group by and having. You are looking for customer ids, so:
select o.customer_id
from orders o
group by o.customer_id
having sum(date_added BETWEEN '2014-10-01' AND '2014-11-01') > 0 and
sum(date_added BETWEEN DATE_SUB( now(), INTERVAL 1 MONTH) AND Now()) = 0;

mysql Select from Select

I have this query:
SELECT DATE( a.created_at ) AS order_date, count( * ) as cnt_order
FROM `sales_order_item` AS a
WHERE MONTH( a.created_at ) = MONTH( now())-1
GROUP BY order_date
which will return result something like this (snapshot only otherwise will return per 31 days):
order_date cnt_order
2012-08-29 580
2012-08-30 839
2012-08-31 1075
and my full query is selecting based on above selection:
SELECT order_date
, MAX(cnt_order) AS highest_order
FROM (
SELECT DATE (a.created_at) AS order_date
, count(*) AS cnt_order
FROM `sales_order_item` AS a
WHERE MONTH(a.created_at) = MONTH(now()) - 1
GROUP BY order_date
) AS tmax
But it result :
order_date highest_order
2012-08-01 1075
Which has the date wrong and always pick the first row of date where it suppose 2012-08-31. Maybe this is a simple error that I dont know. So how to get the date right point to 2012-08-31? Any help would be great.
You could try ordering the sub query result set; something like:
SELECT
DATE (a.created_at) AS order_date,
COUNT(*) AS cnt_order
FROM
`sales_order_item` AS a
WHERE
MONTH(a.created_at) = MONTH(now()) - 1
GROUP BY
order_date
ORDER BY
cnt_order DESC
You can add ORDER BY order_date DESC in subquery.