How to GROUP BY consecutive data (date in this case) - mysql

I have a products table and a sales table that keeps record of how many items a given product sold during each date. Of course, not all products have sales everyday.
I need to generate a report that tells me how many consecutive days a product has had sales (from the latest date to the past) and how many items it sold during those days only.
I'd like to tell you how many things I've tried so far, but the only succesful (and slow, recursive) ones are solutions inside my application and not inside SQL, which is what I want.
I also have browsed several similar questions on SO but I haven't found one that lets me have a clear idea of what I really need.
I've setup a SQLFiddle here to show you what I'm talking about. There you will see the only query I can think of, which doesn't give me the result I need. I also added comments there showing what the result of the query should be.
I hope someone here knows how to accomplish that. Thanks in advance for any comments!
Francisco

http://sqlfiddle.com/#!2/20108/1
Here is a store procedure that do the job
CREATE PROCEDURE myProc()
BEGIN
-- Drop and create the temp table
DROP TABLE IF EXISTS reached;
CREATE TABLE reached (
sku CHAR(32) PRIMARY KEY,
record_date date,
nb int,
total int)
ENGINE=HEAP;
-- Initial insert, the starting point is the MAX sales record_date of each product
INSERT INTO reached
SELECT products.sku, max(sales.record_date), 0, 0
FROM products
join sales on sales.sku = products.sku
group by products.sku;
-- loop until there is no more updated rows
iterloop: LOOP
-- Update the temptable with the values of the date - 1 row if found
update reached
join sales on sales.sku=reached.sku and sales.record_date=reached.record_date
set reached.record_date = reached.record_date - INTERVAL 1 day,
reached.nb=reached.nb+1,
reached.total=reached.total + sales.items;
-- If no more rows are updated it means we hit the most longest days_sold
IF ROW_COUNT() = 0 THEN
LEAVE iterloop;
END IF;
END LOOP iterloop;
-- select the results of the temp table
SELECT products.sku, products.title, products.price, reached.total as sales, reached.nb as days_sold
from reached
join products on products.sku=reached.sku;
END//
Then you just have to do
call myProc()

A solution in pure SQL without store procedure : Fiddle
SELECT sku
, COUNT(1) AS consecutive_days
, SUM(items) AS items
FROM
(
SELECT sku
, items
-- generate a new guid for each group of consecutive date
-- ie : starting with day_before is null
, #guid := IF(#sku = sku and day_before IS NULL, UUID(), #guid) AS uuid
, #sku := sku AS dummy_sku
FROM
(
SELECT currents.sku
, befores.record_date as day_before
, currents.items
FROM sales currents
LEFT JOIN sales befores
ON currents.sku = befores.sku
AND currents.record_date = befores.record_date + INTERVAL 1 DAY
ORDER BY currents.sku, currents.record_date
) AS main_join
CROSS JOIN (SELECT #sku:=0) foo_sku
CROSS JOIN (SELECT #guid:=UUID()) foo_guid
) AS result_to_group
GROUP BY uuid, sku
The query is really not that hard. Declare variables via cross join (SELECT #type:=0) type. Then in the selects, you can set variables value row by row. It is necessary for simulating Rank function.

select
p.*,
sum(s.items) sales,
count(s.record_date) days_sold
from
products p
join
sales s
on
s.sku = p.sku
where record_date between '2013-04-18 00:00:00' and '2013-04-26 00:00:00'
group by sku;

Related

Create a Stored Procedure Debug

Using the data and schema from this site: Using the data from this site: https://www.sqlservertutorial.net/sql-server-sample-database/
I am given the prompt:
Create a stored procedure called placeOrder() that can be called to insert a new order in the database. It will receive a customerId as an INT, a productId as an INT and a qty as an INT and return (as anoutput parameter) the order_id of the new row created in table orders.
This stored procedure will find the store with the largest stock of that particular product and assign that store to the order. The order_status should be set to 1 (i.e. Pending), the current system date (see function CURDATE) will be assigned to order_date, column required_date will be 7 days from the current system date (see function ADDDATE) and the column staff_id will be assigned for anyone that works in the selected store (per previous requirement). Since the order_id column is not an auto-incremented column you need to calculate the value for it. You can use max(order_id) to find out the highest order_id in the table.
The order_item row shall be set to the productId and qty passed to the stored procedure. The item_id shall be set to 1 (since this order will only have one item).The list price should be retrieved from the products table using the passed productId. The discount value should be set to 0
If I'm understanding the prompt correctly, I need to first the store id which has the most of a particular product. Once I have that store, I need to insert a new row of data into the table "orders" with the essential data being order_id, customer_id, order_status, order_date, required, date, and staff_id. I do NOT understand what the last part of the question is asking/how to go about solving.
Here's my current code, but I'm almost positive it's chalk full of errors and notes and missing pieces so please help me out where you can:
DELIMITER //
CREATE procedure placeOrder (IN customerID INT, IN productID INT, IN QTY INT, OUT order_id INT)
BEGIN
DECLARE customerID INT;
DECLARE produtcID INT;
DECLARE quantity INT;
SELECT customer_id INTO customerID from customers where customer_id = customerID;
SELECT product_id INTO productID from order_items where product_id = productID;
SELECT quantity INTO qty from order_items where quantity = qty;
/**find store id with max stocks of product**/
select st.store_name, sk.store_id from stocks as sk
INNER JOIN
stores as st
ON sk.store_id = st.store_id
WHERE max(sk.quantity)
GROUP by sk.product_id;
select st.store_id from stores as st
INNER JOIN orders as o
ON st.store_id= o.store_id
Insert into orders (order_id, customer_id, order_status, order_date, required_date, staff_id)
WHERE order_status = '1',
AND order_date = select(curdate()),
AND required_date = adddate('order_date' +7),
AND staff_id = /**ANYONE from store listed in precious query (how do I relate these two queries)**
END
Don't re-declare function parameters.
You don't need to use a SELECT query to set variables that are already set in parameters.
You're not getting the store with the maximum quantity correctly. Use ORDER BY sk.quantity DESC LIMIT 1
You need to use INTO <variable> in a query to set variables from a query. If you don't do this, the result of the query will be turned into the result of the procedure, which isn't desired here.
You don't use WHERE in an INSERT statement. WHERE is used for finding existing rows that match a condition, but INSERT is for creating new rows. You use VALUES() to list all the values that should be assigned to the specified columns.
CREATE procedure placeOrder (IN customerID INT, IN productID INT, IN QTY INT, OUT orderId INT)
BEGIN
DECLARE topStoreId INT;
DECLARE staffId INT;
/**find store id with max stocks of product**/
select sk.store_id
from stocks as sk
INNER JOIN stores as st
ON sk.store_id = st.store_id
WHERE sk.product_id = productID
ORDER BY sk.quantity DESC
LIMIT 1
INTO topStoreId;
/* Pick an arbitrary staff from the selected store */
SELECT staff_id
FROM staffs
WHERE store_id = topStoreId
LIMIT 1
INTO staffId;
SELECT MAX(order_id)
FROM orders AS o
INTO orderId;
Insert into orders (order_id, customer_id, order_status, order_date, required_date, staff_id, store_id)
VALUES (orderId, customerId, 1, CURDATE(), DATE_ADD(CURDATE(), INTERVAL 1 WEEK), staffId, topStoreId);
END

Appointments per month, December missing?

SELECT month(dateofappointment), COUNT(*) 'NumberOfAppointments'
FROM appointment
WHERE YEAR(dateofappointment) = '2016'
GROUP BY MONTH(dateofappointment)
this shows me all months but December is not there because there weren't any appointments made in that year. how do i show December as being 0?
To solve these types of queries it often helps to express them as a series of requirements, this can make it easier to resolve.
When the results don't come out as expected, update your requirements statements with new requirements as you identify them, then try again:
As I see it now you have 2 requirements:
Return a single row for each month of the year of 2016
For each row show a count of the appointments for the corresponding month
Ok so that was verbose, but you see what you are missing from your query is a statement that defines the '1 row for each month of the year 2016' So you need to build that recordset first, either manually or through recursion.
MySQL does not currently support recursive Common Table Expressions, this is a trivial concept in many other RDBMSs
But if MySQL doesn't support recursion, what are our options? Here are some other attempts on SO:
МуSQL Get a list of dates in month, year
How to generate a dynamic sequence table in MySQL?
generate an integer sequence in MySQL
This might sound a bit of a hack, but you can use any table in your database that has more than 12 rows and has an auto-incrementing field, oh and was seeded to start at 1 (or below). Forget about whether this is right or wrong, it will work:
SELECT Id
FROM LogEvent -- An arbitrary table that I know has records starting from 1
WHERE Id BETWEEN 1 AND 12
So that is hacky, but we can implement a row count function so that we can use any table with 12 or more rows, regardless of ids or seeding, stole this from: MySQL get row number on select - Answer by Mike Cialowicz
SET #rank=0;
SELECT #rank:=#rank+1 AS rank
FROM orders
WHERE rank <= 12
Now we can either union the missing rows from this result set to the original query or use a join operator. First solution using union.
It is common to use UNION ALL to inject missing rows to a recordset because it separates the expected result query from the exceptional or default results. Sometimes this syntax makes it easier to interpret the expected operation
SET #rank = 0;
SELECT month(dateofappointment) as Month, COUNT(*) 'NumberOfAppointments'
FROM appointment
WHERE YEAR(dateofappointment) = '2016'
GROUP BY MONTH(dateofappointment)
UNION ALL
SELECT rank, 0
FROM (
SELECT #rank:=#rank+1 AS rank
FROM rows
WHERE #rank < 12
) months
WHERE NOT EXISTS (SELECT dateofappointment
FROM appointment
WHERE YEAR(dateofappointment) = '2016' AND MONTH(dateofappointment) = months.rank)
ORDER BY Month
But it makes for an ugly query. You could also join on the months query with a left join on the count of appointments, but here the intention is harder to identify.
SET #rank = 0;
SELECT months.rank, COUNT(appointment.dateofappointment)
FROM (
SELECT #rank:=#rank+1 AS rank
FROM rows
WHERE #rank < 12
) months
LEFT OUTER JOIN appointment ON months.rank = Month(appointment.dateofappointment) AND YEAR(dateofappointment) = '2016'
GROUP BY months.rank
I have saved these queries into a SqlFiddle so you can see the results:
http://sqlfiddle.com/#!9/99d485/4
As I pointed out above, this is trivial in MS SQL and Oracle RDBMS, where we can generate sequences of values dynamically through recursive Common Table Expressions (CTEs) For the players at home here is an implementation in MS SQL Server 2014. The example is a little more evolved, using a from and to date to filter the results dynamically
-- Dynamic MS SQL Example using recursive CTE
DECLARE #FromDate Date = '2016-01-01'
DECLARE #ToDate Date = '2016-12-31'
;
WITH Months(Year, Month, Date) AS
(
SELECT Year(#FromDate), Month(#FromDate), #FromDate
UNION ALL
SELECT Year(NextMonth.Date), Month(NextMonth.Date), NextMonth.Date
FROM Months
CROSS APPLY (SELECT DateAdd(m, 1, Date) Date) NextMonth
WHERE NextMonth.Date < #ToDate
)
SELECT Months.Year, Months.Month, COUNT(*) as 'NumberOfAppointments'
FROM Months
LEFT OUTER JOIN appointment ON Year(dateofappointment) = Months.Year AND Month(dateofappointment) = Months.Month
GROUP BY Months.Year, Months.Month

SQL - Keep only the first and last record of each day

I have a table that stores simple log data:
CREATE TABLE chronicle (
id INT auto_increment PRIMARY KEY,
data1 VARCHAR(256),
data2 VARCHAR(256),
time DATETIME
);
The table is approaching 1m records, so I'd like to start consolidating data.
I want to be able to take the first and last record of each DISTINCT(data1, data2) each day and delete all the rest.
I know how to just pull in the data and process it in whatever language I want then delete the records with a huge IN (...) query, but it seems like a better alternative would to use SQL directly (am I wrong?)
I have tried several queries, but I'm not very good with SQL beyond JOINs.
Here is what I have so far:
SELECT id, Max(time), Min(time)
FROM (SELECT id, data1 ,data2, time, Cast(time AS DATE) AS day
FROM chronicle) AS initial
GROUP BY day;
This gets me the first and last time for each day, but it's not separated out by the data (i.e. I get the last record of each day, not the last record for each distinct set of data for each day.) Additionally, the id is just for the Min(time).
The information I've found on this particular problem is only for finding the the last record of the day, not each last record for sets of data.
IMPORTANT: I want the first/last record for each DISTINCT(data1, data2) for each day, not just the first/last record for each day in the table. There will be more than 2 records for each day.
Solution:
My solution thanks to Jonathan Dahan and Gordon Linoff:
SELECT o.data1, o.data2, o.time FROM chronicle AS o JOIN (
SELECT Min(id) as id FROM chronicle GROUP BY DATE(time), data1, data2
UNION SELECT Max(id) as id FROM test_chronicle GROUP BY DATE(time), data1. data2
) AS n ON o.id = n.id;
From here it's a simple matter of referencing the same table to delete rows.
this will improve performance when searching on dates.
ALTER TABLE chronicle
ADD INDEX `ix_chronicle_time` (`time` ASC);
This will delete the records:
CREATE TEMPORARY TABLE #tmp_ids (
`id` INT NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO #tmp_ids (id)
SELECT
min(id)
FROM
chronicle
GROUP BY
CAST(day as DATE),
data1,
data2
UNION
SELECT
Max(id)
FROM
chronicle
GROUP BY
CAST(day as DATE),
data1,
data2;
DELETE FROM
chronicle
WHERE
ID not in (select id FROM #tmp_ids)
AND date <= '2015-01-01'; -- if you want to consider all dates, then remove this condition
You have the right idea. You just need to join back to get the original information.
SELECT c.*
FROM chronicle c JOIN
(SELECT date(time) as day, min(time) as mint, max(time) as maxt
FROM chronicle
GROUP BY date(time)
) cc
ON c.time IN (cc.mint, cc.maxt);
Note that the join condition doesn't need to include day explicitly because it is part of the time. Of course, you could add date(c.time) = cc.day if you wanted to.
Instead of deleting rows in your original table, I would suggest that you make a new table. Something lie this:
create table ChronicleByDay like chronicle;
insert into ChronicleByDay
SELECT c.*
FROM chronicle c JOIN
(SELECT date(time) as day, min(time) as mint, max(time) as maxt
FROM chronicle
GROUP BY date(time)
) cc
ON c.time IN (cc.mint, cc.maxt);
That way, you can have the more detailed information if you ever need it.

MySQL using count in query to search for availability of multiple rows

I am using one table, mrp to store multi room properties and a second table booking to store the dates the property was booked on.
I thus have the following tables:
mrp(property_id, property_name, num_rooms)
booking(property_id, booking_id, date)
Whenever a property is booked, an entry is made in the bookings table and because each table has multiple rooms, it can have multiple bookings on the same day.
I am using the following query:
SELECT * FROM mrp
WHERE property_id
NOT IN (SELECT property_id FROM booking WHERE `date` >= {$checkin_date} AND `date` <= {$checkout_date}
)
But although this query would work fine for a property with a single room (that is, it only lists properties which have not been booked altogether between the dates you provide), it does not display properties that have been booked but still have vacant rooms. How can we use count and the num_rooms table to show in my results the rooms which are still vacant, even if they already have a booking between the selected dates, and to display in my results the number of rooms that are free.
You need 3 levels of query. The innermost query will list properties and dates where all rooms are fully booked (or overbooked) on any day within your date range. The middle query narrows that down to just a list of property_id's. The outermost query lists all properties that are NOT in that list.
SELECT *
FROM mrp
WHERE property_id NOT IN (
-- List all properties sold-out on any day in range
SELECT DISTINCT Z.property_id
FROM (
-- List sold-out properties by date
SELECT MM.property_id, MM.num_rooms, BB.adate
, COUNT(*) as rooms_booked
FROM mrp MM
INNER JOIN booking BB on MM.property_id = BB.property_id
WHERE BB.adate >= #checkin AND BB.adate <= #checkout
GROUP BY MM.property_id, MM.num_rooms, BB.adate
HAVING MM.num_rooms - COUNT(*) <= 0
) as Z
)
You are close but you need to change the dates condition and add a condition to match the records from the outer and inner queries (all in the inner query's WHERE clause):
SELECT * FROM srp
WHERE NOT EXISTS
(SELECT * FROM bookings_srp
WHERE srp.booking_id = bookings_srp.booking_id
AND `date` >= {$check-in_date} AND `date` <= {$check-out_date})
You have to exclude the properties which are booked between the checkin date and checkout date. This query should do:
SELECT * FROM srp WHERE property_id NOT IN (
SELECT property_id FROM booking WHERE `date` >= {$checkin_date} AND `date` <= {$checkout_date}
)

Aggregating data to get a running total month on month

I have a table which holds the below data
This issue im having is that i need a running total for each month, I've managed to create this is an excel sheet pretty easily but when i try anything in SQL the data result varies.
The image below shows the sum of each paid amount by month, then a total of each one added onto it. I've edited excel to show the formula and the result of the formula. Also have the result i get from SQL 2008 when using (example only)
***UPDATE - The result set im trying to achieve that is in the excel document is for example month 117 + Month 118 gives Month118 TotalToDate, then month 118 + 119 gives Months 119 Total to Date.
Not sure how else to explain this?
( select sum(paid) from #tmp005 t2 where t2.[monthid] <=
t5.[monthid] ) as paid
Really feel that this is less complicated than what I think!
As I understand this you are trying to get a running total month by month, the below CTE should do what you want.
--create table #temp (M_ID Int, Paid Float)
--Insert Into #temp VALUES (116, '50.00'), (117, '50.00'),(117, '5.00'),(117, '20.00'),(117, '10.00'),(117, '75.40'),(118, '125.00'),(118, '200.00'),(118, '5.00')
;WITH y AS
(
SELECT M_ID, paid, rn = ROW_NUMBER() OVER (ORDER BY M_ID)
FROM #temp
), x AS
(
SELECT M_ID, rn, paid, rt = paid
FROM y
WHERE rn = 1
UNION ALL
SELECT y.M_ID, y.rn, y.paid, x.rt + y.paid
FROM x INNER JOIN y
ON y.rn = x.rn + 1
)
SELECT M_ID, MAX(rt) as RunningTotal
FROM x
Group By M_ID
OPTION (MAXRECURSION 10000);
It is based on the first 3 M_ID of your sample data, just change around the #temp to your specific table, I didn't know whether you had another unique identifier in the table which is why I had to use the ROW_NUMBER()but this should order it correctly based on the M_ID field.
I guess that you are storing the month in a separated table and using M_ID to reference it. So, to get the sum of each month do this:
SELECT [M_ID]
,sum([Paid])
FROM #tmp005
GROUP BY [M_ID]
I think I'd use a correlated sub query:-
select r.m_id,
(
select sum(csq.paid)
from #tmp005 csq
where csq.m_id<=r.m_id
)
from (
select distinct m_id
from #tmp005
) r
Hopefully you can figure out how to apply it to your circumstance/schema.