SQL select count from multiple tables - mysql

I'm a starter at SQL and I have the following tables, ORDER_PRODUCTS, listing the products of an order and EXCHANGE_PRODUCTS, listing products that will be exchanged.
Both have the same fields, and I need to make a selection counting the amount of products in both tables, distinguishing them by the order_id, does anyone knows how I can do this?
ORDER_PRODUCTS
+-----+------------+----------+---------+
| id | product_id | order_id | amount |
+-----+------------+----------+---------+
| 1 | 5 | 1 | 2 |
| 2 | 7 | 1 | 1 |
| 3 | 13 | 5 | 1 |
| 4 | 18 | 8 | 3 |
| 5 | 45 | 11 | 4 |
+-----+------------+----------+---------+
EXCHANGE_PRODUCTS
+-----+------------+----------+---------+
| id | product_id | order_id | amount |
+-----+------------+----------+---------+
| 1 | 5 | 1 | 1 |
| 2 | 7 | 1 | 2 |
| 3 | 13 | 5 | 1 |
| 4 | 3 | 8 | 2 |
| 5 | 2 | 11 | 1 |
+-----+------------+----------+---------+

You want to use union all to combine the tables and then aggregate them. I might recommend:
select order_id, sum(ordered) as ordered, sum(exchanged) as exchanged,
sum(exchanged + ordered) as total
from ((select order_id, amount as ordered, 0 as exchanged
from order_products
) union all
(select order_id, 0 as ordered, amount as exchanged
from exhange_products
)
) oe
group by order_id;
It is important to use union all rather than union, because union removes duplicates (which can result in bad numbers). Union also incurs overhead that is unnecessary.
And, by "count amount" I assume you really mean to take the sum.

I think this query should do what you Need:
select sum(amount), order_id from (
select amount,order_id from order_products
union
select amount,order_id from Exchange_products)
group by order_id

Related

SQL query that finds the an id that is not associated with another id

I'm currently learning the ropes of SQL and i have an tutorial from school that goes like this:
All stores (storeid) sells (productid, storeid) some products (productid)  
A store is considered a monopoly if every product they sell is not sold by any other store.
How do I find the monopolies?
I was thinking of selecting the storeid from 2 of the same tables, but I'm not sure how to continue from there on.
Tables are below:
Store:
+-----------+
| storeid |
+-----------+
| --------- |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+-----------+
Products:
+-------------+
| productid |
+-------------+
| --------- |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+-------------+
Sells:
+--------------------------+
| productid | storeid |
+--------------------------+
| -----------+------------ |
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 2 |
| 1 | 2 |
| 3 | 3 |
| 2 | 4 |
| 4 | 4 |
| 5 | 5 |
| 6 | 5 |
+--------------------------+
So by my count, only store 5 is considered a monopoly, because they sell products that are not available in other stores.
We can try a self join approach combined with aggregation:
SELECT t1.storeid
FROM yourTable t1
LEFT JOIN yourTable t2
ON t2.productid = t1.productid AND
t2.store_id <> t1.storeid
GROUP BY t1.storeid
HAVING COUNT(t2.storeid) = 0;
The approach here is to try to match each row in Sells to some other row on the condition that it is the same product, but is being sold by some other store. A matching store is one for which none of its products are being sold by other stores, so the count of the second table column in the join should be zero.
Use window functions and aggregation:
select s.storeid
from (select s.*,
count(*) over (partition by productid) as num_stores
from sells s
) s
group by s.storeid
having max(num_stores) = 1;
This should be much faster than a self-join. It is also almost a direct translation of your question. The subquery counts the number of stores where each product is sold. The outer query selects stores where all products are sold in one store.

Query with dynamic date intervals

Given a statuses table that holds information about products availability, how do I select the date that corresponds to the 1st day in the latest 20 days that the product has been active?
Yes I know the question is hard to follow. I think another way to put it would be: I want to know how many times each product has been sold in the last 20 days that it was active, meaning the product could have been active for years, but I'd only want the sales count from the latest 20 days that it had a status of "active".
It's something easily doable in the server-side (i.e. getting any collection of products from the DB, iterating them, performing n+1 queries on the statuses table, etc), but I have hundreds of thousands of items so it's imperative to do it in SQL for performance reasons.
table : products
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Grape |
+-------+-----------+
table : statuses
+-------+-------------+---------------+---------------+
| id | name | product_id | created_at |
+-------+-------------+---------------+---------------+
| 1 | active | 1 | 2018-01-01 |
| 2 | inactive | 1 | 2018-02-01 |
| 3 | active | 1 | 2018-03-01 |
| 4 | inactive | 1 | 2018-03-15 |
| 6 | active | 1 | 2018-04-25 |
| 7 | active | 2 | 2018-03-01 |
| 8 | active | 3 | 2018-03-10 |
| 9 | inactive | 3 | 2018-03-15 |
+-------+-------------+---------------+---------------+
table : items (ordered products)
+-------+---------------+-------------+
| id | product_id | order_id |
+-------+---------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 2 | 3 |
| 7 | 2 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 5 |
+-------+---------------+-------------+
table : orders
+-------+---------------+
| id | created_at |
+-------+---------------+
| 1 | 2018-01-02 |
| 2 | 2018-01-15 |
| 3 | 2018-03-02 |
| 4 | 2018-03-10 |
| 5 | 2018-03-13 |
+-------+---------------+
I want my final results to look like this:
+-------+-----------+----------------------+--------------------------------+
| id | name | recent_sales_count | date_to_start_counting_sales |
+-------+-----------+----------------------+--------------------------------+
| 1 | Apple | 3 | 2018-01-30 |
| 2 | Banana | 0 | 2018-04-09 |
| 3 | Grape | 1 | 2018-03-10 |
+-------+-----------+----------------------+--------------------------------+
So this is what I mean by latest 20 active days for e.g. Apple:
It was last activated at '2018-04-25'. That's 4 days ago.
Before that, it was inactive since '2018-03-15', so all these days until '2018-04-25' don't count.
Before that, it was active since '2018-03-01'. That's more 14 days until '2018-03-15'.
Before that, inactive since '2018-02-01'.
Finally, it was active since '2018-01-01', so it should only count the missing 2 days (4 + 14 + 2 = 20) backwards from '2018-02-01', resulting in date_to_start_counting_sales = '2018-01-30'.
With the '2018-01-30' date in hand, I'm then able to count Apple orders in the last 20 active days: 3.
Hope that makes sense.
Here is a fiddle with the data provided above.
I've got a standard SQL solution, that does not use any window function as you are on MySQL 5
My solution requires 3 stacked views.
It would have been better with a CTE but your version doesn't support it. Same goes for the stacked Views... I don't like to stack views and always try to avoid it, but sometimes you have no other choice, because MySQL doesn't accept subqueries in FROM clause for Views.
CREATE VIEW VIEW_product_dates AS
(
SELECT product_id, created_at AS active_date,
(
SELECT created_at
FROM statuses ti
WHERE name = 'inactive' AND ta.created_at < ti.created_at AND ti.product_id=ta.product_id
GROUP BY product_id
) AS inactive_date
FROM statuses ta
WHERE name = 'active'
);
CREATE VIEW VIEW_product_dates_days AS
(
SELECT product_id, active_date, inactive_date, datediff(IFNULL(inactive_date, SYSDATE()),active_date) AS nb_days
FROM VIEW_product_dates
);
CREATE VIEW VIEW_product_dates_days_cumul AS
(
SELECT product_id, active_date, ifnull(inactive_date,sysdate()) AS inactive_date, nb_days,
IFNULL((SELECT SUM(V2.nb_days) + V1.nb_days
FROM VIEW_product_dates_days V2
WHERE V2.active_date >= IFNULL(V1.inactive_date, SYSDATE()) AND V1.product_id=V2.product_id
),V1.nb_days) AS cumul_days
FROM VIEW_product_dates_days V1
);
The final view produce this :
| product_id | active_date | inactive_date | nb_days | cumul_days |
|------------|----------------------|----------------------|---------|------------|
| 1 | 2018-01-01T00:00:00Z | 2018-02-01T00:00:00Z | 31 | 49 |
| 1 | 2018-03-01T00:00:00Z | 2018-03-15T00:00:00Z | 14 | 18 |
| 1 | 2018-04-25T00:00:00Z | 2018-04-29T11:28:39Z | 4 | 4 |
| 2 | 2018-03-01T00:00:00Z | 2018-04-29T11:28:39Z | 59 | 59 |
| 3 | 2018-03-10T00:00:00Z | 2018-03-15T00:00:00Z | 5 | 5 |
So it aggregates all active periods of all products, it counts the number of days for each period, and the cumulative days of all past active periods since current date.
Then we can query this final view to get the desired date for each product. I set a variable for your 20 days, so you can change that number easily if you want.
SET #cap_days = 20 ;
SELECT PD.id, Pd.name,
SUM(CASE WHEN o.created_at > PD.date_to_start_counting_sales THEN 1 ELSE 0 END) AS recent_sales_count ,
PD.date_to_start_counting_sales
FROM
(
SELECT p.*,
(CASE WHEN LowerCap.max_cumul_days IS NULL
THEN ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(-#cap_days))
ELSE
CASE WHEN LowerCap.max_cumul_days < #cap_days AND HigherCap.min_inactive_date IS NULL
THEN ADDDATE(ifnull(LowerCap.max_inactive_date,sysdate()),(-LowerCap.max_cumul_days))
ELSE ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(LowerCap.max_cumul_days-#cap_days))
END
END) as date_to_start_counting_sales
FROM products P
LEFT JOIN
(
SELECT product_id, MAX(cumul_days) AS max_cumul_days, MAX(inactive_date) AS max_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days <= #cap_days
GROUP BY product_id
) LowerCap ON P.id=LowerCap.product_id
LEFT JOIN
(
SELECT product_id, MIN(cumul_days) AS min_cumul_days, MIN(inactive_date) AS min_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days > #cap_days
GROUP BY product_id
) HigherCap ON P.id=HigherCap.product_id
) PD
LEFT JOIN items i ON PD.id = i.product_id
LEFT JOIN orders o ON o.id = i.order_id
GROUP BY PD.id, Pd.name, PD.date_to_start_counting_sales
Returns
| id | name | recent_sales_count | date_to_start_counting_sales |
|----|--------|--------------------|------------------------------|
| 1 | Apple | 3 | 2018-01-30T00:00:00Z |
| 2 | Banana | 0 | 2018-04-09T20:43:23Z |
| 3 | Grape | 1 | 2018-03-10T00:00:00Z |
FIDDLE : http://sqlfiddle.com/#!9/804f52/24
Not sure which version of MySql you're working with, but if you can use 8.0, that version came out with a lot of functionality that makes things slightly more doable (CTE's, row_number(), partition, etc.).
My recommendation would be to create a view like in this DB-Fiddle Example, call the view on server side and iterate programatically. There are ways of doing it in SQL, but it'd be a bear to write, test and likely would be less efficient.
Assumptions:
Products cannot be sold during inactive date ranges
Statuses table will always alternate status active/inactive/active for each product. I.e. no date ranges where a certain product is both active and inactive.
View Results:
+------------+-------------+------------+-------------+
| product_id | active_date | end_date | days_active |
+------------+-------------+------------+-------------+
| 1 | 2018-01-01 | 2018-02-01 | 31 |
+------------+-------------+------------+-------------+
| 1 | 2018-03-01 | 2018-03-15 | 14 |
+------------+-------------+------------+-------------+
| 1 | 2018-04-25 | 2018-04-29 | 4 |
+------------+-------------+------------+-------------+
| 2 | 2018-03-01 | 2018-04-29 | 59 |
+------------+-------------+------------+-------------+
| 3 | 2018-03-10 | 2018-03-15 | 5 |
+------------+-------------+------------+-------------+
View:
CREATE OR REPLACE VIEW days_active AS (
WITH active_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'active'),
inactive_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'inactive')
SELECT x1.product_id,
x1.created_at AS active_date,
CASE WHEN x2.created_at IS NULL
THEN Curdate()
ELSE x2.created_at
END AS end_date,
CASE WHEN x2.created_at IS NULL
THEN Datediff(Curdate(), x1.created_at)
ELSE Datediff(x2.created_at,x1.created_at)
END AS days_active
FROM active_rn x1
LEFT OUTER JOIN inactive_rn x2
ON x1.rownum = x2.rownum
AND x1.product_id = x2.product_id ORDER BY
x1.product_id);

MySQL query to select only entries where value = X and value <> Y

Raw MySQL queries are absolutely not my forte, so I'm struggling with this a bit, but: with a straightforward table layout like this:
+----+-----------+----------+---------------------+
| id | status_id | order_id | created_at |
+----+-----------+----------+---------------------+
| 1 | 1 | 1 | 2016-03-21 20:40:39 |
| 2 | 3 | 1 | 2016-03-21 20:40:45 |
| 3 | 5 | 1 | 2016-03-21 20:47:14 |
| 4 | 1 | 2 | 2016-03-25 12:14:44 |
| 6 | 3 | 2 | 2016-03-25 12:16:12 |
| 7 | 5 | 2 | 2016-03-25 12:47:43 |
| 8 | 1 | 3 | 2016-03-26 17:25:12 |
| 9 | 3 | 3 | 2016-03-26 17:25:48 |
+----+-----------+----------+---------------------+
I want to select only the order_id rows where the status_id equals 3, but not where that same order_id has a status_id of 5. As a result, my query should only return order ID 3, but my current query returns all 3 order IDs in the results:
$statusQueryString = 'SELECT DISTINCT order_id
FROM shop_order_status_log_records
WHERE status_id = 3 AND status_id <> 5 ORDER BY created_at';
Where am I going wrong with my query?
Use post aggregate filtering when you need 2 or more conditions per group.A simple rule WHERE filters rows HAVING filters groups
SELECT order_id FROM shop_order_status_log_records
GROUP BY order_id
HAVING SUM(status_id = 3)>0
AND SUM(status_id = 5)=0

How to find duplicate rows with SQL- GROUP BY

I've a table
+----+------------+
| id | day |
+----+------------+
| 1 | 2006-10-08 |
| 2 | 2006-10-08 |
| 3 | 2006-10-09 |
| 4 | 2006-10-09 |
| 5 | 2006-10-09 |
| 5 | 2006-10-09 |
| 6 | 2006-10-10 |
| 7 | 2006-10-10 |
| 8 | 2006-10-10 |
| 9 | 2006-10-10 |
+----+------------
I want to group by the frequency and its count, for eg:-
Since there's a date 2006-10-08 that appears twice, hence frequency 2 and there is only one date that appears twice , hence total dates 1.
Another eg:-
2006-10-10 and 2006-10-09 both appears 4 times, hence frequency 4 and total dates with frequency 4 are 2.
Following is the expected output.
+----------+--------------------------------+
| Freuency | Total Dates with frequency N |
+----------+--------------------------------+
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 2 |
+----------+--------------------------------+ and so on till the maximum frequency.
What I've tried is the following:-
select day, count(*) from test GROUP BY day;
It returns the frequency of each date, ie
+------------+----------+
| day | count(*) |
+------------+----------+
| 2006-10-08 | 2 |
| 2006-10-09 | 4 |
| 2006-10-09 | 4 |
+------------+----------+
Please help with the above problem.
Just use your query as a subquery:
select freq, count(*)
from (select day, count(*) as freq
from test
group by day
) d
group by freq;
If you want to get the 0 values, then you have to work harder. A numbers table is handy (if you have one) or you can do:
select n.freq, count(d.day)
from (select 1 as freq union all select 2 union all select 3 union all select 4
) n left join
(select day, count(*) as freq
from test
group by day
) d
on n.freq = d.freq
group by n.freq;

how to approach this in MySql query?

I want to select the data as per condition:I have a table with physician_key and corresponding quality score for a given month. I want to select count of distinct physicians with quality score 1,2.
For a month, there could be more entries for a physician_key and accordingly the quality assigned(on scale 1-7). I want to select only the count of those physicians which have quality (1,2) and if the same physician has quality >2 in given month, I don't want to count that physician.I want the information by product and month
I created an example table, since you didn't provide one:
mysql> select * from sales_mkt_rep_qual;
+-------------------+---------+-------+-------------------+
| GEO_PHYSICIAN_KEY | product | month | SALES_REP_QUALITY |
+-------------------+---------+-------+-------------------+
| 1 | a | 8 | 1 |
| 1 | a | 8 | 2 |
| 1 | a | 8 | 3 |
| 2 | b | 8 | 2 |
| 2 | b | 8 | 1 |
| 2 | b | 9 | 2 |
| 1 | a | 9 | 2 |
| 2 | b | 9 | 3 |
| 3 | a | 9 | 2 |
+-------------------+---------+-------+-------------------+
The query from your comment indeed gives an error:
SELECT COUNT(DISTINCT GEO_PHYSICIAN_KEY) AS encount_1to2,
product,MONTH
FROM sales_mkt_rep_qual
WHERE MAX(SALES_REP_QUALITY) = 2 ;
ERROR 1111 (HY000): Invalid use of group function
If you change that to:
SELECT DISTINCT geo_physician_key AS encount_1to2, product, month
FROM sales_mkt_rep_qual
WHERE (geo_physician_key,month,product)
NOT IN (
SELECT geo_physician_key, month, product
FROM sales_mkt_rep_qual
WHERE sales_rep_quality >2 );
you see the detailed result:
+--------------+---------+-------+
| encount_1to2 | product | month |
+--------------+---------+-------+
| 2 | b | 8 |
| 1 | a | 9 |
| 3 | a | 9 |
+--------------+---------+-------+
No, you can introduce the counting:
SELECT COUNT(distinct geo_physician_key ) AS no_of_physicians,product, month
FROM sales_mkt_rep_qual
WHERE (geo_physician_key,month,product)
NOT IN (
SELECT geo_physician_key, month, product
FROM sales_mkt_rep_qual WHERE sales_rep_quality >2 )
GROUP BY month, product;
+------------------+---------+-------+
| no_of_physicians | product | month |
+------------------+---------+-------+
| 1 | b | 8 |
| 2 | a | 9 |
+------------------+---------+-------+
If that still isn't what you are looking for, give more specific table structure and data example.
Try this:
SELECT count(DISTINCT physician_key)
FROM my_table
WHERE month = desired_month
AND max(quality) = 2
GROUP BY month
Actually I want the data to be like the output below:
+--------------+---------+-------+
| encount_1to2 | product | MONTH |
+--------------+---------+-------+
| 2 | b | 8 |
+--------------+---------+-------+
and for the criteria SALES_REP_QUALITY <= 2, isn't there a possibility that while selecting the distinct geo physician key, it might select out of first 2 considering it matches the criteria? Thats the reason I have used Thanix approach of max function with group by product and month, so that the aggregate function is applied on every product within a month