I have this following table, which tells me how many rentals a certain film had in a certain month. Here's the top 10 rows:
| month | title | rentals |
+-------+-------------------+---------+
| 2 | ACE GOLDFINGER | 1 |
| 2 | AFFAIR PREJUDICE | 1 |
| 2 | AFRICAN EGG | 1 |
| 2 | ALI FOREVER | 1 |
| 2 | ALONE TRIP | 1 |
| 2 | AMADEUS HOLY | 1 |
| 2 | AMERICAN CIRCUS | 1 |
| 2 | AMISTAD MIDSUMMER | 1 |
| 2 | ARMAGEDDON LOST | 1 |
| 2 | BAKED CLEOPATRA | 1 |
+-------+-------------------+---------+
My main objective here is to create a new table where, for each month, it gives me the title of the filme with the most rentals in that month.
So far, I've tried using a combination of group by queries, but it didn't gave much result. Despite that, I achieved to create a new table that gives me the number of rentals the top movie (or movies) had in each month. Here it is:
CREATE VIEW temp AS (SELECT month, MAX(rentals) rentals FROM film_per_month GROUP BY 1);
mysql> SELECT * FROM temp;
+-------+---------+
| month | rentals |
+-------+---------+
| 2 | 2 |
| 5 | 5 |
| 6 | 7 |
| 7 | 16 |
| 8 | 13 |
+-------+---------+
5 rows in set (0.05 sec)
The obstacle here is that I can't extract it to show the titles of the movies that were rented that maximum amount of times.
I've tried to amend that using inner join, self-joins, but I just messed it up.
So my question is: What would be the better way to create a new table where, for each month, it gives me the title of the filme with the most rentals in that month?
I think you can try to use EXISTS subquery to get Max rentals for each month rows.
SELECT t1.*
FROM film_per_month t1
WHERE EXISTS (
SELECT 1
FROM film_per_month tt
WHERE tt.month = t1.month
GROUP BY tt.month
HAVING MAX(tt.rentals) = t1.rentals
)
Given a statuses table that holds information about products availability, how do I select the date that corresponds to the 1st day in the latest 20 days that the product has been active?
Yes I know the question is hard to follow. I think another way to put it would be: I want to know how many times each product has been sold in the last 20 days that it was active, meaning the product could have been active for years, but I'd only want the sales count from the latest 20 days that it had a status of "active".
It's something easily doable in the server-side (i.e. getting any collection of products from the DB, iterating them, performing n+1 queries on the statuses table, etc), but I have hundreds of thousands of items so it's imperative to do it in SQL for performance reasons.
table : products
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Grape |
+-------+-----------+
table : statuses
+-------+-------------+---------------+---------------+
| id | name | product_id | created_at |
+-------+-------------+---------------+---------------+
| 1 | active | 1 | 2018-01-01 |
| 2 | inactive | 1 | 2018-02-01 |
| 3 | active | 1 | 2018-03-01 |
| 4 | inactive | 1 | 2018-03-15 |
| 6 | active | 1 | 2018-04-25 |
| 7 | active | 2 | 2018-03-01 |
| 8 | active | 3 | 2018-03-10 |
| 9 | inactive | 3 | 2018-03-15 |
+-------+-------------+---------------+---------------+
table : items (ordered products)
+-------+---------------+-------------+
| id | product_id | order_id |
+-------+---------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 2 | 3 |
| 7 | 2 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 5 |
+-------+---------------+-------------+
table : orders
+-------+---------------+
| id | created_at |
+-------+---------------+
| 1 | 2018-01-02 |
| 2 | 2018-01-15 |
| 3 | 2018-03-02 |
| 4 | 2018-03-10 |
| 5 | 2018-03-13 |
+-------+---------------+
I want my final results to look like this:
+-------+-----------+----------------------+--------------------------------+
| id | name | recent_sales_count | date_to_start_counting_sales |
+-------+-----------+----------------------+--------------------------------+
| 1 | Apple | 3 | 2018-01-30 |
| 2 | Banana | 0 | 2018-04-09 |
| 3 | Grape | 1 | 2018-03-10 |
+-------+-----------+----------------------+--------------------------------+
So this is what I mean by latest 20 active days for e.g. Apple:
It was last activated at '2018-04-25'. That's 4 days ago.
Before that, it was inactive since '2018-03-15', so all these days until '2018-04-25' don't count.
Before that, it was active since '2018-03-01'. That's more 14 days until '2018-03-15'.
Before that, inactive since '2018-02-01'.
Finally, it was active since '2018-01-01', so it should only count the missing 2 days (4 + 14 + 2 = 20) backwards from '2018-02-01', resulting in date_to_start_counting_sales = '2018-01-30'.
With the '2018-01-30' date in hand, I'm then able to count Apple orders in the last 20 active days: 3.
Hope that makes sense.
Here is a fiddle with the data provided above.
I've got a standard SQL solution, that does not use any window function as you are on MySQL 5
My solution requires 3 stacked views.
It would have been better with a CTE but your version doesn't support it. Same goes for the stacked Views... I don't like to stack views and always try to avoid it, but sometimes you have no other choice, because MySQL doesn't accept subqueries in FROM clause for Views.
CREATE VIEW VIEW_product_dates AS
(
SELECT product_id, created_at AS active_date,
(
SELECT created_at
FROM statuses ti
WHERE name = 'inactive' AND ta.created_at < ti.created_at AND ti.product_id=ta.product_id
GROUP BY product_id
) AS inactive_date
FROM statuses ta
WHERE name = 'active'
);
CREATE VIEW VIEW_product_dates_days AS
(
SELECT product_id, active_date, inactive_date, datediff(IFNULL(inactive_date, SYSDATE()),active_date) AS nb_days
FROM VIEW_product_dates
);
CREATE VIEW VIEW_product_dates_days_cumul AS
(
SELECT product_id, active_date, ifnull(inactive_date,sysdate()) AS inactive_date, nb_days,
IFNULL((SELECT SUM(V2.nb_days) + V1.nb_days
FROM VIEW_product_dates_days V2
WHERE V2.active_date >= IFNULL(V1.inactive_date, SYSDATE()) AND V1.product_id=V2.product_id
),V1.nb_days) AS cumul_days
FROM VIEW_product_dates_days V1
);
The final view produce this :
| product_id | active_date | inactive_date | nb_days | cumul_days |
|------------|----------------------|----------------------|---------|------------|
| 1 | 2018-01-01T00:00:00Z | 2018-02-01T00:00:00Z | 31 | 49 |
| 1 | 2018-03-01T00:00:00Z | 2018-03-15T00:00:00Z | 14 | 18 |
| 1 | 2018-04-25T00:00:00Z | 2018-04-29T11:28:39Z | 4 | 4 |
| 2 | 2018-03-01T00:00:00Z | 2018-04-29T11:28:39Z | 59 | 59 |
| 3 | 2018-03-10T00:00:00Z | 2018-03-15T00:00:00Z | 5 | 5 |
So it aggregates all active periods of all products, it counts the number of days for each period, and the cumulative days of all past active periods since current date.
Then we can query this final view to get the desired date for each product. I set a variable for your 20 days, so you can change that number easily if you want.
SET #cap_days = 20 ;
SELECT PD.id, Pd.name,
SUM(CASE WHEN o.created_at > PD.date_to_start_counting_sales THEN 1 ELSE 0 END) AS recent_sales_count ,
PD.date_to_start_counting_sales
FROM
(
SELECT p.*,
(CASE WHEN LowerCap.max_cumul_days IS NULL
THEN ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(-#cap_days))
ELSE
CASE WHEN LowerCap.max_cumul_days < #cap_days AND HigherCap.min_inactive_date IS NULL
THEN ADDDATE(ifnull(LowerCap.max_inactive_date,sysdate()),(-LowerCap.max_cumul_days))
ELSE ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(LowerCap.max_cumul_days-#cap_days))
END
END) as date_to_start_counting_sales
FROM products P
LEFT JOIN
(
SELECT product_id, MAX(cumul_days) AS max_cumul_days, MAX(inactive_date) AS max_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days <= #cap_days
GROUP BY product_id
) LowerCap ON P.id=LowerCap.product_id
LEFT JOIN
(
SELECT product_id, MIN(cumul_days) AS min_cumul_days, MIN(inactive_date) AS min_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days > #cap_days
GROUP BY product_id
) HigherCap ON P.id=HigherCap.product_id
) PD
LEFT JOIN items i ON PD.id = i.product_id
LEFT JOIN orders o ON o.id = i.order_id
GROUP BY PD.id, Pd.name, PD.date_to_start_counting_sales
Returns
| id | name | recent_sales_count | date_to_start_counting_sales |
|----|--------|--------------------|------------------------------|
| 1 | Apple | 3 | 2018-01-30T00:00:00Z |
| 2 | Banana | 0 | 2018-04-09T20:43:23Z |
| 3 | Grape | 1 | 2018-03-10T00:00:00Z |
FIDDLE : http://sqlfiddle.com/#!9/804f52/24
Not sure which version of MySql you're working with, but if you can use 8.0, that version came out with a lot of functionality that makes things slightly more doable (CTE's, row_number(), partition, etc.).
My recommendation would be to create a view like in this DB-Fiddle Example, call the view on server side and iterate programatically. There are ways of doing it in SQL, but it'd be a bear to write, test and likely would be less efficient.
Assumptions:
Products cannot be sold during inactive date ranges
Statuses table will always alternate status active/inactive/active for each product. I.e. no date ranges where a certain product is both active and inactive.
View Results:
+------------+-------------+------------+-------------+
| product_id | active_date | end_date | days_active |
+------------+-------------+------------+-------------+
| 1 | 2018-01-01 | 2018-02-01 | 31 |
+------------+-------------+------------+-------------+
| 1 | 2018-03-01 | 2018-03-15 | 14 |
+------------+-------------+------------+-------------+
| 1 | 2018-04-25 | 2018-04-29 | 4 |
+------------+-------------+------------+-------------+
| 2 | 2018-03-01 | 2018-04-29 | 59 |
+------------+-------------+------------+-------------+
| 3 | 2018-03-10 | 2018-03-15 | 5 |
+------------+-------------+------------+-------------+
View:
CREATE OR REPLACE VIEW days_active AS (
WITH active_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'active'),
inactive_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'inactive')
SELECT x1.product_id,
x1.created_at AS active_date,
CASE WHEN x2.created_at IS NULL
THEN Curdate()
ELSE x2.created_at
END AS end_date,
CASE WHEN x2.created_at IS NULL
THEN Datediff(Curdate(), x1.created_at)
ELSE Datediff(x2.created_at,x1.created_at)
END AS days_active
FROM active_rn x1
LEFT OUTER JOIN inactive_rn x2
ON x1.rownum = x2.rownum
AND x1.product_id = x2.product_id ORDER BY
x1.product_id);
I've been looking around and trying to get this to work but I can't seem to get it. I have 2 tables:
TABLE: products
| id | name | some more values |
|----|-----------|------------------|
| 1 | Product 1 | Value 1 |
| 2 | Product 2 | Value 2 |
| 3 | Product 3 | Value 3 |
TABLE: value
| pid | value | stamp |
|-----|-----------|------------------|
| 1 | 7 | 2015-07-11 |
| 2 | 4 | 2015-07-11 |
| 3 | 8 | 2015-07-11 |
| 1 | 9 | 2015-07-21 |
| 2 | 4 | 2015-07-21 |
| 3 | 6 | 2015-07-21 |
First table simply has a list of products, second table has a value for each product (by pid), and the timestamp the value. note: timestamps are not every day, nor are they evenly spaced.
What I would like, is a resulting table like this:
| id | name | some more values | value now | value last month |
|----|-----------|------------------|-----------|------------------|
| 1 | Product 1 | Value 1 | 9 | 7 |
| 2 | Product 2 | Value 2 | 4 | 4 |
| 3 | Product 3 | Value 3 | 6 | 8 |
where 'value now' is the value of the newest timestamp, and the 'value last month' is the value of the timestamp closest to the newest timetamp - 30 days. Keep in mind that -30 days might not have a specific timestamp, the query will need to find the closest timestamp. (looking only up or down doesn't matter, it's an approximation.)
I have made some huge queries but I'm pretty sure there must be an easier way... Any help would be appreciated.
Assuming you get last month and year by PHP or by mysql function, here is a not checked query I hope it will work on first time:
SELECT *, v_now, v_lastmonth FROM products p
LEFT JOIN (SELECT `value` AS v_now FROM value ORDER BY stamp DESC) AS v_now ON p.id=v_now.pid
LEFT JOIN (SELECT `value` AS v_lastmonth FROM value
WHERE month(stamp)='$month' AND year(stamp)='$year'
ORDER BY stamp DESC) AS v_now ON p.id=v_now.pid
You can use group by to get one row for each product result.
I've a table
+----+------------+
| id | day |
+----+------------+
| 1 | 2006-10-08 |
| 2 | 2006-10-08 |
| 3 | 2006-10-09 |
| 4 | 2006-10-09 |
| 5 | 2006-10-09 |
| 5 | 2006-10-09 |
| 6 | 2006-10-10 |
| 7 | 2006-10-10 |
| 8 | 2006-10-10 |
| 9 | 2006-10-10 |
+----+------------
I want to group by the frequency and its count, for eg:-
Since there's a date 2006-10-08 that appears twice, hence frequency 2 and there is only one date that appears twice , hence total dates 1.
Another eg:-
2006-10-10 and 2006-10-09 both appears 4 times, hence frequency 4 and total dates with frequency 4 are 2.
Following is the expected output.
+----------+--------------------------------+
| Freuency | Total Dates with frequency N |
+----------+--------------------------------+
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 2 |
+----------+--------------------------------+ and so on till the maximum frequency.
What I've tried is the following:-
select day, count(*) from test GROUP BY day;
It returns the frequency of each date, ie
+------------+----------+
| day | count(*) |
+------------+----------+
| 2006-10-08 | 2 |
| 2006-10-09 | 4 |
| 2006-10-09 | 4 |
+------------+----------+
Please help with the above problem.
Just use your query as a subquery:
select freq, count(*)
from (select day, count(*) as freq
from test
group by day
) d
group by freq;
If you want to get the 0 values, then you have to work harder. A numbers table is handy (if you have one) or you can do:
select n.freq, count(d.day)
from (select 1 as freq union all select 2 union all select 3 union all select 4
) n left join
(select day, count(*) as freq
from test
group by day
) d
on n.freq = d.freq
group by n.freq;
I have this 2 tables and I need to return the moset used office. Note: 1 office can be used by more than 1 guys and the column ido from TableB is populate from TableA
Probaly is a query with group by and desc limit 1
TableA
| ido| office | guy |
---------------------
| 1 | office1| guy1|
| 2 | office2| guy2|
| 3 | office1| guy3|
| 4 | office1| guy4|
| 5 | office5| guy5|
| 6 | office2| guy6|
TableB
| idb| vizit | ido|
---------------------
| 1 | date | 4 |
| 2 | date | 2 |
| 3 | date | 5 |
| 4 | date | 6 |
| 5 | date | 1 |
| 6 | date | 6 |
Thanks!
You were correct in that GROUP BY, LIMIT and DESC are useful here; it leads to a fairly straight forward query;
SELECT TableA.office
FROM TableA
JOIN TableB
ON TableA.ido = TableB.ido
GROUP BY TableA.office
ORDER BY COUNT(*) DESC
LIMIT 1
What it does is basically create rows with all valid combinations, counting the number of generated rows per office. A plain descending sort by that count will give you the most frequently used office.
An SQLfiddle to test with.