MySQL calculate gain, loss and net gain over a period of time - mysql

I have a table something like this:
id | Customer | date
-----------------------------------------
1 | Customer2 | 2013-08-01 00:00:00
-----------------------------------------
2 | Customer1 | 2013-07-15 00:00:00
-----------------------------------------
3 | Customer1 | 2013-07-01 00:00:00
-----------------------------------------
. | ... | ...
-----------------------------------------
n | CustomerN | 2012-03-01 00:00:00
I want to calculate the "gained" customers for each month, the "lost" customers for each month and the Net Gain for each month, even if done in separate tables / views.
How can I do that?
EDIT
Ok, let me demonstrate what I've done so far.
To select Gained customers for any month, I've tried to select customers from Bookings table where the following not exist:
select Customer
from Bookings
where not exists
(select Customer
from Bookings
where
(Bookings.date BETWEEN
DATE_FORMAT(DATE_SUB(Bookings.date, INTERVAL 1 MONTH), '%Y-%m-01 00:00:00')
AND DATE_FORMAT(Bookings.date, '%Y-%m-01 00:00:00'
)
) AND Bookings.date >= STR_TO_DATE('2010-11-01 00:00:00', '%Y-%m-%d 00:00:00'))
This supposedly gets the customers that existed in the "selected" month but not in the previous one. "2010-11-01" is the date of the start of bookings + 1 month.
To select Lost customers for any month, I've tried to select customers from Bookings table where the following not exist:
select Customer
from Booking
where not exists
(select Customer
from Bookings
where
(Bookings.date BETWEEN
DATE_FORMAT(Bookings.date, '%Y-%m-01 00:00:00')
AND Bookings.date
)
AND Bookings.date >= STR_TO_DATE('2010-11-01 00:00:00', '%Y-%m-%d 00:00:00'
)
)
This supposedly gets the customers that existed in a previous month but not in the "selected" one.
For the "Loss" SQL query I got empty result! For the "Gain" I got thousands of rows but not sure if that's accurate.

You can use COUNT DISTINCT to count your customers, and WHERE YEAR(Date) = [year] AND MONTH(Date) = [month] to get the month.
The total number of customers in Sept 2013:
SELECT COUNT(DISTINCT Customer) AS MonthTotalCustomers FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 9
The customers gained in Sept 2013:
SELECT COUNT(DISTINCT Customer) AS MonthGainedCustomers FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 9
AND Customer NOT IN
(SELECT Customer FROM table
WHERE date < '2013-09-01')
Figuring out the lost customers is more difficult. I would need to know by what criteria you consider them to be 'lost.' If you just mean that they were around in August 2013 but they were not around in September 2013:
SELECT COUNT(DISTINCT Customer) AS MonthLostCustomers FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 8
AND Customer NOT IN
(SELECT Customer FROM table
WHERE YEAR(date) = 2013 AND MONTH(date) = 9)
I hope from these examples you can extrapolate what you're looking for.

Related

Calculating aggregated number of days in each month in sql

I've got a table with multiple columns and two of the columns are start_date and end_date.
I need to calculate the number of days in each month. Let's assume I have following data in my table
id | start_date | end_date
1 04.01.2016 15.02.2016
2 07.01.2016 22.01.2016
3 16.05.2016 11.07.2016
I want an output as follows
Month | numberOfTravelDays
January 51
February 15
May 15
June 31
July 11
This output I am expecting is the number of total travel days each month has been utilized. I am having trouble constructing the sql query for this. Can someone assist me on this?
This is what I have for now. And it's not doing the job. The below query also filters only this year's records(but ignore that).
select MONTH(start_date) as month,
COUNT(DATEDIFF(start_date, end_date)) as numberOfTravelDays
from travel
where YEAR(start_date) = YEAR(CURDATE())
group by MONTH(start_date),
MONTH(end_date)
Use a derived table:
select monstart,
sum(datediff(least(m.monend, t.end_date) + interval 1 day,
greatest(m.monstart, t.start_date)
)
) as days_worked
from travel t join
(select date('2016-01-01') as monstart, date('2016-01-31') as monend union all
select date('2016-02-01') as monstart, date('2016-02-29') as monend union all
. . .
) m
on t.end_date >= m.monstart and t.start_date <= m.monend
group by monstart;

Need to sum transaction totals from one table using customer information in another

I have spent the last hour looking for something I can use to implement here, but haven't found exactly what I need.
I have 2 tables: TRANSACTIONS & CUSTOMERS
CUSTOMER
internal_id | name | email
TRANSACTIONS
internal_id | customer_id | transaction_date | total_amount
I would like to cycle through all CUSTOMERS, then sum up the total TRANSACTIONS for each by month and year. I thought it would be as easy as just adding select statements as columns to the initial query, but that isn't working obviously:
NOT WORKING:
select customer.internal_id,
(sum(total_amount) as 'total' from TRANSACTIONS where transactions.customer_id = customer.internal_id and transaction_date >= DATE_SUB(NOW(),INTERVAL 1 month)),
(sum(total_amount) as 'total' from TRANSACTIONS where transactions.customer_id = customer.internal_id and transaction_date >= DATE_SUB(NOW(),INTERVAL 1 year))
from CUSTOMER join TRANSACTIONS on CUSTOMER.internal_id = TRANSACTIONS.customer_id
Basically I would like the output to look like this:
CUSTOMER.name | TRANSACTIONS.total_amount_month | TRANSACTIONS.total_amount_year
ABC Company | $335.00 | $8900.34
Is this possible with a single query? I have it implemented with multiple queries using PHP and would just prefer a single query if possible for performance sake.
Thanks!
SELECT c.name,
SUM(IF(transaction_date >= DATE__SUB(NOW(), INTERVAL 1 MONTH), total_amount, 0) AS total_amount_month,
SUM(total_amount) AS total_amount_year
FROM transactions AS t
JOIN customer AS c ON c.internal_id = t.customer_id
WHERE transaction_date >= DATE__SUB(NOW(), INTERVAL 1 YEAR
GROUP BY t.customer_id

Insert blank rows in MySQL SELECT statement

I have a query like this:
SELECT COUNT(id), MONTH(date), YEAR(date)
FROM activity
GROUP BY YEAR(date) DESC, MONTH(date) DESC
ORDER BY YEAR(date) DESC, MONTH(date) DESC
Which orders and groups records by month/date. Is there any way I can insert a blank row if a certain month doesn't have a record?
So, instead of this return:
c | M% | Y%
4 | 01 | 2014 # 4 records for Jan 2014
3 | 11 | 2013 # 3 records for Nov 2013
7 | 10 | 2013 # 7 records for Oct 2013
I want to insert months for which no records could be found (Jan 2013 with count = 0), so I can have a neat visualisation of monthly activities.
c | M% | Y%
4 | 01 | 2014
0 | 12 | 2013 # <<<< no records for Dec 2013, but I still want it in array
3 | 11 | 2013
7 | 10 | 2013
Contrary to others, you can make up things in a MySQL query utilizing SQL variables. My inner query creates a baseline date of one month ahead of whatever Now() is. This is joined to the activity table just to get rows to work with. The column is created by just setting the SQL variable equal to one month less than the month result of the previous. In this example, I am doing a limit of 6 so it only goes back 6 months worth of data, but you can change that to however many you care about... as long as there are that many records in the "Activity" table (could be any table as long as it has as many records as you want to create these place-holder records). This creates a result set of a
what I have as "DynamicCalendar". I then use this as the basis to do a left-join to the activity joined by month/year
SELECT
MONTH( DynamicCalendar.GrpDate ) as Mth,
YEAR( DynamicCalendar.GrpDate ) as Yr,
COUNT(activity.id) as Entries
from
( select
#BaseDate := date(date_sub(#BaseDate, interval 1 month)) as GrpDate
from
( select #BaseDate := date_add(Now(), interval 1 month)) sqlvars,
Activity,
limit
6 ) DynamicCalendar
LEFT JOIN Activity
ON MONTH( DynamicCalendar.GrpDate ) = MONTH( Activity.Date )
AND YEAR( DynamicCalendar.GrpDate ) = YEAR( Activity.Date )
group by
MONTH( DynamicCalendar.GrpDate ) as Mth,
YEAR( DynamicCalendar.GrpDate ) as Yr
order by
YEAR( DynamicCalendar.GrpDate ) DESC,
MONTH( DynamicCalendar.GrpDate )

Find big enough gaps in booking table

A rental system uses a booking table to store all bookings and reservations:
booking | item | startdate | enddate
1 | 42 | 2013-10-25 16:00 | 2013-10-27 12:00
2 | 42 | 2013-10-27 14:00 | 2013-10-28 18:00
3 | 42 | 2013-10-30 09:00 | 2013-11-01 09:00
…
Let’s say a user wants to rent item 42 from 2013-10-27 12:00 until 2013-10-28 12:00 which is a period of one day. The system will tell him, that the item is not available in the given time frame, since booking no. 2 collides.
Now I want to suggest the earliest rental date and time when the selected item is available again. Of course considering the user’s requested period (1 day) beginning with the user’s desired date and time.
So in the case above, I’m looking for an SQL query that returns 2013-10-28 18:00, since the earliest date since 2013-10-27 12:00 at which item 42 will be available for 1 day, is from 2013-10-28 18:00 until 2013-10-29 18:00.
So I need to to find a gap between bookings, that is big enough to hold the user’s reservation and that is as close a possible to the desired start date.
Or in other words: I need to find the first booking for a given item, after which there’s enough free time to place the user’s booking.
Is this possible in plain SQL without having to iterate over every booking and its successor?
If you can't redesign your database to use something more efficient, this will get the answer. You'll obviously want to parameterize it. It says find either the desired date, or the earliest end date where the hire interval doesn't overlap an existing booking:
Select
min(startdate)
From (
select
cast('2013-10-27 12:00' as datetime) startdate
from
dual
union all
select
enddate
from
booking
where
enddate > cast('2013-10-27 12:00' as datetime) and
item = 42
) b1
Where
not exists (
select
'x'
from
booking b2
where
item = 42 and
b1.startdate < b2.enddate and
b2.startdate < date_add(b1.startdate, interval 24 hour)
);
Example Fiddle
SELECT startfree,secondsfree FROM (
SELECT
#lastenddate AS startfree,
UNIX_TIMESTAMP(startdate)-UNIX_TIMESTAMP(#lastenddate) AS secondsfree,
#lastenddate:=enddate AS ignoreme
FROM
(SELECT startdate,enddate FROM bookings WHERE item=42) AS schedule,
(SELECT #lastenddate:=NOW()) AS init
ORDER BY startdate
) AS baseview
WHERE startfree>='2013-10-27 12:00:00'
AND secondsfree>=86400
ORDER BY startfree
LIMIT 1
;
Some explanation: The inner query uses a variable to move the iteration into SQL, the outer query finds the needed row.
That said, I would not do this in SQL, if the DB structure is like the given. You could reduce the iteration count by using some smort WHERE in the inner query to a sane timespan, but chances are, this won't perform well.
EDIT
A caveat: I did not check, but I assume, this won't work, if there are no prior reservations in the list - this should not be a problem, as in this case your first reservation attempt (original time) will work.
EDIT
SQLfiddle
Searching for overlapping date ranges generally yields poor performance in SQL. For that reason having a "Calendar" of available slots often makes things a lot more efficient.
For example, the booking 2013-10-25 16:00 => 2013-10-27 12:00 would actually be represented by 44 records, each one hour long.
The "gap" until the next booking at 2013-10-27 14:00 would then be represented by 2 records, each one hours long.
Then, each record could also have the duration (in time, or number of slots) until the next change.
slot_start_time | booking | item | remaining_duration
------------------+---------+------+--------------------
2013-10-27 10:00 | 1 | 42 | 2
2013-10-27 11:00 | 1 | 42 | 1
2013-10-27 12:00 | NULL | 42 | 2
2013-10-27 13:00 | NULL | 42 | 1
2013-10-27 14:00 | 2 | 42 | 28
2013-10-27 15:00 | 2 | 42 | 27
... | ... | ... | ...
2013-10-28 17:00 | 2 | 42 | 1
2013-10-28 18:00 | NULL | 42 | 39
2013-10-28 19:00 | NULL | 42 | 38
Then your query just becomes:
SELECT
*
FROM
slots
WHERE
slot_start_time >= '2013-10-27 12:00'
AND remaining_duration >= 24
AND booking IS NULL
ORDER BY
slot_start_time ASC
LIMIT
1
OK this isn't pretty in MySQL. That's because we have to fake rownum values in subqueries.
The basic approach is to join the appropriate subset of the booking table to itself offset by one.
Here's the basic list of reservations for item 42, ordered by reservation time. We can't order by booking_id, because those aren't guaranteed to be in order of reservation time. (You're trying to insert a new reservation between two existing ones, eh?) http://sqlfiddle.com/#!2/62383/9/0
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
Here is that subset joined to itself. The trick is the a.rownum+1 = b.rownum, which joins each row to the one that comes right after it in the booking table subset. http://sqlfiddle.com/#!2/62383/8/0
SELECT a.booking_id, a.startdate asta, a.enddate aend,
b.startdate bsta, b.enddate bend
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
Here it is again, showing each reservation (except the last one) and the number of hours following it. http://sqlfiddle.com/#!2/62383/15/0
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
So, if you're looking for the starting time and ending time of the earliest twelve-hour slot you can use that result set to do this: http://sqlfiddle.com/#!2/62383/18/0
SELECT MIN(enddate) startdate, MIN(enddate) + INTERVAL 12 HOUR as enddate
FROM (
SELECT a.booking_id, a.startdate, a.enddate,
TIMESTAMPDIFF(HOUR, a.enddate, b.startdate) gaphours
FROM (
SELECT #aserial := #aserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #aserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS a
JOIN (
SELECT #bserial := #bserial+1 AS rownum,
booking.*
FROM booking,
(SELECT #bserial:= 0) AS q
WHERE item = 42
ORDER BY startdate, enddate
) AS b ON a.rownum+1 = b.rownum
) AS gaps
WHERE gaphours >= 12
here is the query, it will return needed date, obvious condition - there should be some bookings in table, but as I see from question - you do this check:
SELECT min(enddate)
FROM
(
select a.enddate from table4 as a
where
a.item=42
and
DATE_ADD(a.enddate, INTERVAL 1 day) <= ifnull(
(select min(b.startdate)
from table4 as b where b.startdate>=a.enddate and a.item=b.item),
a.enddate)
and
a.enddate>=now()
union all
select greatest(ifnull(max(enddate), now()),now()) from table4
) as q
you change change INTERVAL 1 day to INTERVAL ### hour
If I have understood your requirements correctly, you could try self-JOINing book with itself, to get the "empty" spaces, and then fit. This is MySQL only (I believe it can be adapted to others - certainly PostgreSQL):
SELECT book.*, TIMESTAMPDIFF(MINUTE, book.enddate, book.best) AS width FROM
(
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 USING (item)
WHERE item = 42 AND book1.startdate >= book.enddate
GROUP BY book.booking
) AS book HAVING width > 110 ORDER BY startdate LIMIT 1;
In the above example, "110" is the looked-for minimum width in minutes.
Same thing, a bit less readable (for me), a SELECT removed (very fast SELECT, so little advantage):
SELECT book.*, MIN(book1.startdate) AS best
FROM book
JOIN book AS book1 ON (book.item = book1.item AND book.item = 42)
WHERE book1.startdate >= book.enddate
GROUP BY book.booking
HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) > 110
ORDER BY startdate LIMIT 1;
In your case, one day is 1440 minutes and
SELECT book.*, MIN(book1.startdate) AS best FROM book JOIN book AS book1 ON (book.item = book1.item AND book.item = 42) WHERE book1.startdate >= book.enddate GROUP BY book.booking HAVING TIMESTAMPDIFF(MINUTE, book.enddate, best) >= 1440 ORDER BY startdate LIMIT 1;
+---------+------+---------------------+---------------------+---------------------+
| booking | item | startdate | enddate | best |
+---------+------+---------------------+---------------------+---------------------+
| 2 | 42 | 2013-10-27 14:00:00 | 2013-10-28 18:00:00 | 2013-10-30 09:00:00 |
+---------+------+---------------------+---------------------+---------------------+
1 row in set (0.00 sec)
...the period returned is 2, i.e., at the end of booking 2, and until "best" which is booking 3, a period of at least 1440 minutes is available.
An issue could be that if no periods are available, the query returns nothing -- then you need another query to fetch the farthest enddate. You can do this with an UNION and LIMIT 1 of course, but I think it would be best to only run the 'recovery' query on demand, programmatically (i.e. if empty(query) then new_query...).
Also, in the inner WHERE you should add a check for NOW() to avoid dates in the past. If expired bookings are moved to inactive storage, this could be unnecessary.

Compare the same row

I have a table - user_tracking - which stores the user_id, purchase sku, and event time_created. Each time a user returns to purchase the original user_id is referenced with a new timestamp:
User_ID Sku Time_Created
1 1234 2012-10-01 01:00:00
2 2345 2012-10-02 02:00:00
3 6789 2012-10-02 01:00:00
2 5432 2012-10-04 04:00:00
I want to measure the return customer usage, but only for customers that have returned within 7-60 days of initial purchase. Currently my query looks something like:
SELECT
total_purchases.user_id as user_1_id,
total_purchases.time_created as time_1_created,
total_purchases.total_purchases as total_original_purchases,
total_return.user_id as user_2_id,
total_return.time_created as time_2_created,
total_return.total_return_purchases as total_return_purchases
FROM (SELECT
user_tracking.user_id as user_id,
user_tracking.time_created as time_created,
COUNT(DISTINCT user_tracking.sku) as total_purchases
FROM user_tracking
WHERE user_tracking.time_created BETWEEN "2012-10-01 00:00:00"
AND "2012-10-15 00:00:00") AS total_purchases
LEFT JOIN (SELECT
user_tracking.user_id as user_id,
user_tracking.time_created as time_created,
COUNT(DISTINCT user_tracking.sku) as total_return_purchases
FROM user_tracking
WHERE user_tracking.time_created BETWEEN "2012-10-01 00:00:00"
and "2012-12-15 00:00:00") AS total_return
ON total_purchases.user_id = total_return.user_id
How can I ensure I'm only measuring purchases within 7-60 days with the original user?
You can use interval
AND datecolumn BETWEEN (datecolumn, INTERVAL 7 DAYS) AND (datecolumn, INTERVAL 60 DAYS)