I'm currently trying to optimize an database by combining queries. But I keep hitting dead ends while optimizing an room availability query.
I have a room availability table where each records states the available number of rooms per date. It's formatted like so:
room_availability_id (PK)
room_availability_rid (fk_room_id)
room_availability_date (2011-02-11)
room_availability_number (number of rooms available)
The trouble is getting a list of rooms that are available for EACH of the provided days. When I use IN() like so:
WHERE room_availability_date IN('2011-02-13','2011-02-14','2011-02-15')
AND room_availability_number > 0
If the 14th has availability 0 it still gives me the other 2 dates. But I only want that room_id when it is available on ALL three dates.
Please tell me there is a way to do this in MySQL other than querying each date/room/availability combination separately (that is what is done now :-( )
I tried all sorts of combinations, tried to use room_availability_date = ALL (...), tried some dirty repeating subqueries but to no avail.
Thank you in advance for any thoughts!
You would need to construct a query to group on the room ID and then check that there is availability on each date, which can be done using the having clause. Leaving the where clause predicate in for room_availability_date will help to keep the query efficient (as indexes etc. can't be used with a having clause easily).
SELECT
room_availability_rid
WHERE room_availability_date IN ('2011-02-13','2011-02-14','2011-02-15')
AND room_availability_number > 0
GROUP BY room_availability_rid
HAVING count(case room_availability_date when '2011-02-13' THEN 1 END) > 0
AND count(case room_availability_date when '2011-02-14' THEN 1 END) > 0
AND count(case room_availability_date when '2011-02-15' THEN 1 END) > 0
I think I can improve on a'r's answer:
SELECT
room_availability_rid, count(*) n
WHERE room_availability_date IN ('2011-02-13','2011-02-14','2011-02-15')
AND room_availability_number > 0
GROUP BY room_availability_rid
HAVING n=3
Edit: This of course assumes that there is only one table entry per room per day. Is this a valid assumption?
You can group by room ID, generate a list of dates available, and then see if all the dates you need are included.
This will give you a list of dates each room is available:
select `room_availability_rid`,group_concat(`room_ availability_date`) as `datelist`
from `table` where room_availability_number>0
group by `room_availability_rid`
Then we can add a having clause to get the rooms that are available on all of the dates we need:
select `room_availability_rid`,group_concat(`room_ availability_date`) as `datelist`
from `table` where room_availability_number>0
group by `room_availability_rid`
having find_in_set('2011-02-13',`datelist`) and
find_in_set('2011-02-14',`datelist`) and
find_in_set('2011-02-15',`datelist`)
This should work. Test it for me will ya? :)
Related
I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.
I have a table name invoices. There is a column named user and late_fee. I am trying to find out the percentage of late invoices compared to how many invoices total.
He has 16 invoices, which 2 of those invoices are late. I feel like this should be an easy pie query but I can't figure it out for the life of me?
You could use something like this. It gets the count of the late_fee depending on it's value.
select sum( case
when late_fee = 1
then 1
else 0
end
)
/ count(*)
from invoices
group
by user
As #Ravinder pointed out, in MySQL this is also valid (does not work on other platforms though):
select sum( late_fee = 1
)
/ count(*)
from invoices
group
by user
I have a query that shows me the number of calls per day for the last 14 days within my app.
The query:
SELECT count(id) as count, DATE(FROM_UNIXTIME(timestamp)) as date FROM calls GROUP BY DATE(FROM_UNIXTIME(timestamp)) DESC LIMIT 14
On days where there were 0 calls, this query does not show those days. Rather than skip those days, I'd like to have a 0 or NULL in that spot.
Any ideas for how I can achieve this? If you have any questions as to what I'm asking please let me know.
Thanks
I don't believe your query is "skipping over NULL values", as your title suggests. Rather, your data probably looks something like this:
id | timestamp
----+------------
1 | 2014-01-01
2 | 2014-01-02
3 | 2014-01-04
As a result, there are no rows that contain the missing date, so there are no rows to be counted. The answer is that you need to generate a list of all the dates you want and then do a LEFT or RIGHT JOIN to it.
Unfortunately, MySQL doesn't make this as easy as other databases. There doesn't seem to be an effective way of generating a list of anything inline. So you'll need some sort of table.
I think I would create a static table containing a set of integers to be subtracted from the current date. Then you can use this table to generate your list of dates inline and JOIN to it.
CREATE TABLE days_ago_list (days_ago INTEGER);
INSERT INTO days_ago_list VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13)
;
Then:
SELECT COUNT(id), list_date
FROM (SELECT SUBDATE(CURDATE(), days_ago) AS list_date FROM days_ago_list) dates_to_list
LEFT JOIN (SELECT id, DATE(FROM_UNIXTIME(timestamp)) call_date FROM calls) calls_with_date
ON calls_with_date.call_date = dates_to_list.list_date
GROUP BY list_date
It is very important that you group by list_date; call_date will be NULL for any days without calls. It is also important to COUNT on id since NULL ids will not be counted. (That ensures you get a correct count of 0 for days with no calls.) If you need to change the dates listed, you simply update the table containing the integer list.
Here is a SQL Fiddle demonstrating this.
Alternatively, if this is for a web application, you could generate the list of dates code side and match up the counts with the dates after the query is done. This would make your web app logic somewhat more complicated, but it would also simplify the query and eliminate the need for the extra table.
create a table that contains a row for each date you want to ensure is in the results, left outer join with results of your current query, use temp table's date, count of above query and 0 if that count is null
I have one table which is having four fields:
trip_paramid, creation_time, fuel_content,vehicle_id
I want to find the difference between two rows.In my table i have one field fuel_content.Every two minutes i getting packets and inserting to database.From this i want to find out total refuel quantity.If fuel content between two packets is greater than 2,i will treat it as refueling quantity.Multiple refuel may happen in same day.So i want to find out total refuel quantity for a day for a vehicle.I created one table schema&sample data in sqlfiddle. Can anyone help me to find a solution for this.here is the link for table schema..http://www.sqlfiddle.com/#!2/4cf36
Here is a good query.
Parameters (vehicle_id=13) and (date='2012-11-08') are injected in the query, but they are parameters to be modified.
You can note that have I chosen an expression using creation_time<.. and creation_time>.. in instead of DATE(creation_time)='...', this is because the first expression can use indexes on "creation_time" while the second one cannot.
SELECT
SUM(fuel_content-prev_content) AS refuel_tot
, COUNT(*) AS refuel_nbr
FROM (
SELECT
p.trip_paramid
, fuel_content
, creation_time
, (
SELECT ps.fuel_content
FROM trip_parameters AS ps
WHERE (ps.vehicle_id=p.vehicle_id)
AND (ps.trip_paramid<p.trip_paramid)
ORDER BY trip_paramid DESC
LIMIT 1
) AS prev_content
FROM trip_parameters AS p
WHERE (p.vehicle_id=13)
AND (creation_time>='2012-11-08')
AND (creation_time<DATE_ADD('2012-11-08', INTERVAL 1 DAY))
ORDER BY p.trip_paramid
) AS log
WHERE (fuel_content-prev_content)>2
Test it:
select sum(t2.fuel_content-t1.fuel_content) TotalFuel,t1.vehicle_id,t1.trip_paramid as rowIdA,
t2.trip_paramid as rowIdB,
t1.creation_time as timeA,
t2.creation_time as timeB,
t2.fuel_content fuel2,
t1.fuel_content fuel1,
(t2.fuel_content-t1.fuel_content) diffFuel
from trip_parameters t1, trip_parameters t2
where t1.trip_paramid<t2.trip_paramid
and t1.vehicle_id=t2.vehicle_id
and t1.vehicle_id=13
and t2.fuel_content-t1.fuel_content>2
order by rowIdA,rowIdB
where (rowIdA,rowIdB) are all possibles tuples without repetition, diffFuel is the difference between fuel quantity and TotalFuel is the sum of all refuel quanty.
The query compare all fuel content diferences for same vehicle(in this example, for vehicle with id=13) and only sum fuel quantity when the diff fuel is >2.
Regards.
I'm struggling to select records from a table of locations (eg hotels) based on their availability stored in a seperate table. To avoid having lots of availability records for every possible day/location combo, the availability table only holds records for limited or no availability for a given date - so the absence of a matching record means there is FULL availabilty.
The tables are a bit like...
locations: id, name, maxRooms etc.
availability: locationID, date, roomsAvailable( an integer from zero meaning no availability to maxRooms)
...and what I need to do is to select all locations who - for a given date period - have some availability. That means they either have no matching availability records (ie fully available) or the sum of their matching availability.roomsAvailable records is greater than zero.
I'm getting a headache just trying to explain this :-( any ideas gratefully received...
SELECT locations.id FROM locations LEFT JOIN availability
ON( locations.id = availability.locationID AND
( availability.roomsAvailable > 0 OR ISNULL( availability.locationID ) ) )
You need to use a LEFT JOIN
I think you can query like this, get all locations whose sum of rooms available are greater than 0 for a specific date, then get all locations that are not listed in availability.
SELECT * FROM locations
WHERE id in
(
SELECT locationId
FROM availability
WHERE date >= '2011-05-01' and date <= '2011-07-01'
GROUP by locationId
HAVING SUM(roomsAvailable) > 0
)
OR id not in (SELECT DISTINCT locationId FROM availabilty)
Thanks for your suggestions everyone. After considering them I came up with this combined PHP/MySQL solution - which only works on MySQL 4 or later ...
SELECT * FROM locations
WHERE locations.id NOT IN (
SELECT distinct availability.locationID
FROM availability
WHERE availability.date >= '$periodStartYMD'
AND availability.date <= '$periodEndYMD'
GROUP BY locationID
HAVING sum(availability.availableRooms) = 0
AND count(availability.locationID) = $daysInPeriod)
the PHP vars should be self explanatory.
To summarise - If there are less availability records than days in the period that indicates full availability for the missing day records. If there IS a record, it indicates no or limited availability..
The sub-select statement gets all availability records matching a location which have a total availability of zero and ( the bit that was eluding me) a count of matching records totalling the number of days in the period.
So if there is an availability record for every day AND the total availability count is zero there must be no rooms available for the whole period.
Phew - hope that makes some sort of sens and thanks again...