Selecting the most recent result from one table joining to another - mysql

I have two tables.
One table contains customer data, like name and email address. The other table contains a log of the status changes.
The status log table looks like this:
+-------------+------------+------------+
| customer_id | status | date |
+-------------+------------+------------+
| 1 | Bought | 2018-07-01 |
| 1 | Bought | 2018-07-02 |
| 2 | Ongoing | 2018-07-03 |
| 3 | Ongoing | 2018-07-04 |
| 1 | Not Bought | 2018-07-05 |
| 4 | Bought | 2018-07-06 |
| 4 | Not Bought | 2018-07-07 |
| 4 | Bought | 2018-07-08 | *
| 3 | Cancelled | 2018-07-09 |
+-------------+------------+------------+
And the customer data:
+-------------+------------+
| id | name | email |
+-------------+------------+
| 1 | Alex | alex#home |
| 2 | John | john#home |
| 3 | Simon | si#home |
| 4 | Philip | phil#home |
+-------------+------------+
I would like to select the customer's who have "Bought" in July (07). But exclude customers who's status has changed from "Bought" anything other most recently.
The result should be just one customer (Philip) - all the others have had their status change to something other than Bought most recently.
I have the following SQL:
SELECT
a.customer_id
FROM
statuslog a
WHERE
DATE(a.`date`) LIKE '2018-07-%'
AND a.status = 'Bought'
ORDER BY a.date DESC
LIMIT 1
But that is as far as I have got! The above query only returns one result, but essentially there could be more than one.
Any help is appreciated!

Here is an approach that uses a correlated subquery to get the most recent status record:
SELECT sl.customerid
FROM wwym_statuslog sl
WHERE sl.date = (SELECT MAX(sl2.date)
FROM wwym_statuslog sl2
WHERE sl2.customer_id = sl.customer_id AND
sl2.date >= '2018-07-01' AND
sl2.date < '2018-08-01'
) AND
sl.status = 'Bought'
ORDER BY sl.date DESC
LIMIT 1;
Notes:
Use meaningful table aliases! That is, abbreviations for the table names, rather than arbitrary letters such as a and b.
Use proper date arithmetic. LIKE is for strings. MySQL has lots of date functions that work.
In MySQL 8+, you would use ROW_NUMBER().

Related

Query with dynamic date intervals

Given a statuses table that holds information about products availability, how do I select the date that corresponds to the 1st day in the latest 20 days that the product has been active?
Yes I know the question is hard to follow. I think another way to put it would be: I want to know how many times each product has been sold in the last 20 days that it was active, meaning the product could have been active for years, but I'd only want the sales count from the latest 20 days that it had a status of "active".
It's something easily doable in the server-side (i.e. getting any collection of products from the DB, iterating them, performing n+1 queries on the statuses table, etc), but I have hundreds of thousands of items so it's imperative to do it in SQL for performance reasons.
table : products
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Grape |
+-------+-----------+
table : statuses
+-------+-------------+---------------+---------------+
| id | name | product_id | created_at |
+-------+-------------+---------------+---------------+
| 1 | active | 1 | 2018-01-01 |
| 2 | inactive | 1 | 2018-02-01 |
| 3 | active | 1 | 2018-03-01 |
| 4 | inactive | 1 | 2018-03-15 |
| 6 | active | 1 | 2018-04-25 |
| 7 | active | 2 | 2018-03-01 |
| 8 | active | 3 | 2018-03-10 |
| 9 | inactive | 3 | 2018-03-15 |
+-------+-------------+---------------+---------------+
table : items (ordered products)
+-------+---------------+-------------+
| id | product_id | order_id |
+-------+---------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 2 | 3 |
| 7 | 2 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 5 |
+-------+---------------+-------------+
table : orders
+-------+---------------+
| id | created_at |
+-------+---------------+
| 1 | 2018-01-02 |
| 2 | 2018-01-15 |
| 3 | 2018-03-02 |
| 4 | 2018-03-10 |
| 5 | 2018-03-13 |
+-------+---------------+
I want my final results to look like this:
+-------+-----------+----------------------+--------------------------------+
| id | name | recent_sales_count | date_to_start_counting_sales |
+-------+-----------+----------------------+--------------------------------+
| 1 | Apple | 3 | 2018-01-30 |
| 2 | Banana | 0 | 2018-04-09 |
| 3 | Grape | 1 | 2018-03-10 |
+-------+-----------+----------------------+--------------------------------+
So this is what I mean by latest 20 active days for e.g. Apple:
It was last activated at '2018-04-25'. That's 4 days ago.
Before that, it was inactive since '2018-03-15', so all these days until '2018-04-25' don't count.
Before that, it was active since '2018-03-01'. That's more 14 days until '2018-03-15'.
Before that, inactive since '2018-02-01'.
Finally, it was active since '2018-01-01', so it should only count the missing 2 days (4 + 14 + 2 = 20) backwards from '2018-02-01', resulting in date_to_start_counting_sales = '2018-01-30'.
With the '2018-01-30' date in hand, I'm then able to count Apple orders in the last 20 active days: 3.
Hope that makes sense.
Here is a fiddle with the data provided above.
I've got a standard SQL solution, that does not use any window function as you are on MySQL 5
My solution requires 3 stacked views.
It would have been better with a CTE but your version doesn't support it. Same goes for the stacked Views... I don't like to stack views and always try to avoid it, but sometimes you have no other choice, because MySQL doesn't accept subqueries in FROM clause for Views.
CREATE VIEW VIEW_product_dates AS
(
SELECT product_id, created_at AS active_date,
(
SELECT created_at
FROM statuses ti
WHERE name = 'inactive' AND ta.created_at < ti.created_at AND ti.product_id=ta.product_id
GROUP BY product_id
) AS inactive_date
FROM statuses ta
WHERE name = 'active'
);
CREATE VIEW VIEW_product_dates_days AS
(
SELECT product_id, active_date, inactive_date, datediff(IFNULL(inactive_date, SYSDATE()),active_date) AS nb_days
FROM VIEW_product_dates
);
CREATE VIEW VIEW_product_dates_days_cumul AS
(
SELECT product_id, active_date, ifnull(inactive_date,sysdate()) AS inactive_date, nb_days,
IFNULL((SELECT SUM(V2.nb_days) + V1.nb_days
FROM VIEW_product_dates_days V2
WHERE V2.active_date >= IFNULL(V1.inactive_date, SYSDATE()) AND V1.product_id=V2.product_id
),V1.nb_days) AS cumul_days
FROM VIEW_product_dates_days V1
);
The final view produce this :
| product_id | active_date | inactive_date | nb_days | cumul_days |
|------------|----------------------|----------------------|---------|------------|
| 1 | 2018-01-01T00:00:00Z | 2018-02-01T00:00:00Z | 31 | 49 |
| 1 | 2018-03-01T00:00:00Z | 2018-03-15T00:00:00Z | 14 | 18 |
| 1 | 2018-04-25T00:00:00Z | 2018-04-29T11:28:39Z | 4 | 4 |
| 2 | 2018-03-01T00:00:00Z | 2018-04-29T11:28:39Z | 59 | 59 |
| 3 | 2018-03-10T00:00:00Z | 2018-03-15T00:00:00Z | 5 | 5 |
So it aggregates all active periods of all products, it counts the number of days for each period, and the cumulative days of all past active periods since current date.
Then we can query this final view to get the desired date for each product. I set a variable for your 20 days, so you can change that number easily if you want.
SET #cap_days = 20 ;
SELECT PD.id, Pd.name,
SUM(CASE WHEN o.created_at > PD.date_to_start_counting_sales THEN 1 ELSE 0 END) AS recent_sales_count ,
PD.date_to_start_counting_sales
FROM
(
SELECT p.*,
(CASE WHEN LowerCap.max_cumul_days IS NULL
THEN ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(-#cap_days))
ELSE
CASE WHEN LowerCap.max_cumul_days < #cap_days AND HigherCap.min_inactive_date IS NULL
THEN ADDDATE(ifnull(LowerCap.max_inactive_date,sysdate()),(-LowerCap.max_cumul_days))
ELSE ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(LowerCap.max_cumul_days-#cap_days))
END
END) as date_to_start_counting_sales
FROM products P
LEFT JOIN
(
SELECT product_id, MAX(cumul_days) AS max_cumul_days, MAX(inactive_date) AS max_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days <= #cap_days
GROUP BY product_id
) LowerCap ON P.id=LowerCap.product_id
LEFT JOIN
(
SELECT product_id, MIN(cumul_days) AS min_cumul_days, MIN(inactive_date) AS min_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days > #cap_days
GROUP BY product_id
) HigherCap ON P.id=HigherCap.product_id
) PD
LEFT JOIN items i ON PD.id = i.product_id
LEFT JOIN orders o ON o.id = i.order_id
GROUP BY PD.id, Pd.name, PD.date_to_start_counting_sales
Returns
| id | name | recent_sales_count | date_to_start_counting_sales |
|----|--------|--------------------|------------------------------|
| 1 | Apple | 3 | 2018-01-30T00:00:00Z |
| 2 | Banana | 0 | 2018-04-09T20:43:23Z |
| 3 | Grape | 1 | 2018-03-10T00:00:00Z |
FIDDLE : http://sqlfiddle.com/#!9/804f52/24
Not sure which version of MySql you're working with, but if you can use 8.0, that version came out with a lot of functionality that makes things slightly more doable (CTE's, row_number(), partition, etc.).
My recommendation would be to create a view like in this DB-Fiddle Example, call the view on server side and iterate programatically. There are ways of doing it in SQL, but it'd be a bear to write, test and likely would be less efficient.
Assumptions:
Products cannot be sold during inactive date ranges
Statuses table will always alternate status active/inactive/active for each product. I.e. no date ranges where a certain product is both active and inactive.
View Results:
+------------+-------------+------------+-------------+
| product_id | active_date | end_date | days_active |
+------------+-------------+------------+-------------+
| 1 | 2018-01-01 | 2018-02-01 | 31 |
+------------+-------------+------------+-------------+
| 1 | 2018-03-01 | 2018-03-15 | 14 |
+------------+-------------+------------+-------------+
| 1 | 2018-04-25 | 2018-04-29 | 4 |
+------------+-------------+------------+-------------+
| 2 | 2018-03-01 | 2018-04-29 | 59 |
+------------+-------------+------------+-------------+
| 3 | 2018-03-10 | 2018-03-15 | 5 |
+------------+-------------+------------+-------------+
View:
CREATE OR REPLACE VIEW days_active AS (
WITH active_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'active'),
inactive_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'inactive')
SELECT x1.product_id,
x1.created_at AS active_date,
CASE WHEN x2.created_at IS NULL
THEN Curdate()
ELSE x2.created_at
END AS end_date,
CASE WHEN x2.created_at IS NULL
THEN Datediff(Curdate(), x1.created_at)
ELSE Datediff(x2.created_at,x1.created_at)
END AS days_active
FROM active_rn x1
LEFT OUTER JOIN inactive_rn x2
ON x1.rownum = x2.rownum
AND x1.product_id = x2.product_id ORDER BY
x1.product_id);

mysql return two minimum values

I have a table named: workers and a table named: schedule with the following format:
workers:
| id | name | vacationA | vacationB | workhistory |
| 1 | Florin | 2017-05-05 | 2017-05-25 | 2010-01-01 |
| 2 | Andrei | 2017-06-05 | 2017-06-25 | 2010-01-01 |
| 3 | Alexandra | 2017-07-05 | 2017-07-25 | 2010-01-01 |
| 4 | Emilia | 2017-08-05 | 2017-08-25 | 2010-01-01 |
| 5 | Nicoleta | 2017-09-05 | 2017-09-25 | 2010-01-01 |
+----+-----------+------------+------------+-------------+
schedule:
| day | month | name | shifts |
+-----+-------+-----------+--------+
| 1 | 6 | Florin | 0 |
| 1 | 6 | Andrei | 1 |
| 1 | 6 | Alexandra | 2 |
| 1 | 6 | Emilia | 3 |
| 1 | 6 | Nicoleta | 4 |
+-----+-------+-----------+--------+
I need to interrogate table "workers" to give me 2 random names, with minimum shifts number, and workers should not be in vacation period. Also work history must be greater than 18 MONTHS.
In this case, the query i need should return Florin and Andrei.
This is what I've got so far, but it doesn't work as supposed:
SELECT name FROM workers WHERE (CURDATE() NOT BETWEEN vacationA AND vacationB) AND workhistory > (DATE_SUB(CURDATE(), INTERVAL 18 MONTH)) AND name IN (SELECT name FROM schedule ORDER BY shifts LIMIT 2) ORDER BY RAND() LIMIT 2;
This query returns
1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'.
Thank you!
As you have got name column in schedule table already (although it's not a good design), you don't need a join. You can just use ORDER BY with LIMIT,.e.g.
SELECT name
FROM schedule
WHERE day ? AND month = ? --Remove this if there is no crriteria
ORDER BY shifts
LIMIT 2;
The obvious answer is just to sort the table by the number of shifts and grab the first two entries:
SELECT name FROM schedule ORDER BY shifts ASC LIMIT 2
I notice, however, that you already have an ORDER BY clause, so it seems you want the results in random order.
If you need the random order as well, then wrap the whole thing in a subquery like this:
SELECT name FROM (SELECT name FROM schedule ORDER BY shifts ASC LIMIT 2) ORDER BY RAND()

In MySQL, set value in each row to a DATEDIFF computation on the same rows

It was very tricky to figure out what to title this question, so if anyone has any ideas for improvements feel free to edit :-).
Here's the deal. I have a MySQL table that includes a bunch of donations, and there's a date for each donation. I also have a years_active column. I need to run a query that will SET the years active for every row to the difference (in years) from the first date to the last date for each unique user.
So this is my starting table:
------------------------------------------------------------
| user_id | donation | date | years_active |
------------------------------------------------------------
| 1 | $10 | 2002-01-01 | null |
| 1 | $15 | 2005-01-01 | null |
| 1 | $20 | 2009-01-01 | null |
| 2 | $10 | 2003-01-01 | null |
| 2 | $5 | 2006-01-01 | null |
| 3 | $15 | 2001-01-01 | null |
------------------------------------------------------------
And this is the table I'd like to achieve:
------------------------------------------------------------
| user_id | donation | date | years_active |
------------------------------------------------------------
| 1 | $10 | 2002-01-01 | 8 |
| 1 | $15 | 2005-01-01 | 8 |
| 1 | $20 | 2009-01-01 | 8 |
| 2 | $10 | 2003-01-01 | 4 |
| 2 | $5 | 2006-01-01 | 4 |
| 3 | $15 | 2001-01-01 | 1 |
------------------------------------------------------------
I know that it's far from ideal to be storing the years_active redundantly in multiple rows like this. Unfortunately this table is for data visualizations and with my software I have absolutely no ability to restructure the data altogether; the years_active MUST be in every row.
In my research it seems like I would use subqueries to get the MIN value for each user id and the MAX value for each unique user id, and then do a DATEDIFF on those, and set the result to the column. But I don't really understand how I would run all these queries over and over again for every unique user.
Can someone point me in the right direction? Is this possible?
SELECT t1.user_id, t1.donation, t1.date, t2.years_active
FROM yourTable t1
INNER JOIN
(
SELECT user_id, MAX(YEAR(date)) - MIN(YEAR(date)) + 1 AS years_active
FROM yourTable
GROUP BY user_id
) t2
ON t1.user_id = t2.user_id
Follow the link below for a running demo:
SQLFiddle
Update:
Here is an UPDATE statement which will assign the years_active column the correct values:
UPDATE yourTable t1
INNER JOIN
(
SELECT user_id, MAX(YEAR(date)) - MIN(YEAR(date)) + 1 AS years_active
FROM yourTable
GROUP BY user_id
) t2
ON t1.user_id = t2.user_id
SET t1.years_active = t2.years_active

MySQL query SELECT FROM 2 tables, COUNT the most used

I have this 2 tables and I need to return the moset used office. Note: 1 office can be used by more than 1 guys and the column ido from TableB is populate from TableA
Probaly is a query with group by and desc limit 1
TableA
| ido| office | guy |
---------------------
| 1 | office1| guy1|
| 2 | office2| guy2|
| 3 | office1| guy3|
| 4 | office1| guy4|
| 5 | office5| guy5|
| 6 | office2| guy6|
TableB
| idb| vizit | ido|
---------------------
| 1 | date | 4 |
| 2 | date | 2 |
| 3 | date | 5 |
| 4 | date | 6 |
| 5 | date | 1 |
| 6 | date | 6 |
Thanks!
You were correct in that GROUP BY, LIMIT and DESC are useful here; it leads to a fairly straight forward query;
SELECT TableA.office
FROM TableA
JOIN TableB
ON TableA.ido = TableB.ido
GROUP BY TableA.office
ORDER BY COUNT(*) DESC
LIMIT 1
What it does is basically create rows with all valid combinations, counting the number of generated rows per office. A plain descending sort by that count will give you the most frequently used office.
An SQLfiddle to test with.

JOIN a products table to price change table using job date

I have a jobs table which looks something like this:
**jobs**
job_id | customer_id | date
------------------------------------
| 1 | 1 | 2012-01-03 |
| 2 | 2 | 2013-02-04 |
| 3 | 1 | 2013-03-05 |
| 4 | 3 | 2013-05-04 |
Then I have a products table which looks something like this:
**products**
product_id | description | price
-----------------------------------
| 1 | prod_1 | 25.50 |
| 2 | prod_2 | 34.95 |
And finally when prices are changed I have a product_price_changes table like this:
**product_price_changes**
price_change_id | product_id | price_change_date | old_price
---------------------------------------------------------------
| 1 | 1 | 2013-01-01 | 20.00 |
| 2 | 1 | 2013-02-05 | 23.00 |
with a UNIQUE INDEX on (product_id,price_change_date)
I want to create a VIEW which grabs the product pricing reflecting the prices from the date the job was done.
which, with the data above, should create a table like this:
**view_job_pricing**
job_id | product_id | price
------------------------------
| 1 | 1 | 20.00 |
| 1 | 2 | 34.95 |
| 2 | 1 | 23.00 |
| 2 | 2 | 34.95 |
| 3 | 1 | 25.50 |
| 3 | 2 | 34.95 |
| 4 | 1 | 25.50 |
| 4 | 2 | 34.95 |
So it should select the product price change with the highest date, which is still less than the job date, if it exists, otherwise it should grab the current product price.
I have this which works:
CREATE VIEW view_job_pricing
AS
SELECT j.job_id, p.product_id, MAX(price_change_date),
IFNULL(ppc.old_price,p.price) AS price
FROM products p
JOIN jobs j
LEFT JOIN product_price_changes ppc
ON p.product_id = ppc.product_id AND date < price_change_date
GROUP BY job_id, product_id;
But it is a pretty slow on the real database (far more jobs and products). Just wondering if there is a better way. Thanks!
Try doing this as a correlated subquery. You are allowed subqueries in the select clause of a view, and the correlated subquery should use the index on product.
CREATE VIEW view_job_pricing
AS
SELECT j.job_id, p.product_id, MAX(price_change_date),
coalesce((select ppc.price
from product_price_changes ppc
where p.product_id = ppc.product_id AND j.date < ppc.price_change_date
order by ppc.price_change_date desc
limit 1
),
p.price) AS price
FROM products p cross join
jobs j
You can simplify the processing and improve the performance by changing the structure of the product_price_changes table. Instead of having only an effective_date, also have an end_date. With effdate and enddate columns, the query would be much faster and simpler.
In addition to indexing all the involved columns, and checking your indexing results using EXPLAIN EXTENDED (which you can post here for review), you can try speeding up the view by defining it as CREATE ALGORITHM = MERGE VIEW. There are a number of restrictions on when this can be done, so YMMV
[http://dev.mysql.com/doc/refman/5.0/en/create-view.html][1]