I have restaurants table with structure, id, name, table_count, and orders table, with structure restaruant_id, start_date, end_date, status. I want to find those tables, that are available for some date range - as available - considered those , that either there are no orders , or number of confirmed reservations (status = 1) for the given date range is less than the number of table count for that restaurant. So, I use this query
SELECT r.id, r.name, r.table_count
FROM restaurants r
LEFT JOIN orders o
ON r.id = o.restaurant_id
WHERE o.id IS NULL
OR (r.table_count > (SELECT COUNT(*)
FROM orders o2
WHERE o2.restaurant_id = r.id AND o2.status = 1 AND
NOT(o.start_date >= '2013-09-10') AND NOT (o.end_date <= '2013-09-05')
)
)
So, Can I make the same query in other way which will be faster. I am thinking in this, way, because during time it can be thousands or more rows in orders table, and because it should compare the dates columns to be between some date range, would it be faster result if I add a column is_search(with mysql index) with value 1 or 0, which can be updated by cron job, periodically checking and making reservations from the past as 0, so during search only the orders of the present or of the future will be considered when comparing the date range(which I think much more expensive than compare tinyint column for 1 or 0). Or, adding that condition will add one more thing to check, and it will have the opposite effect ?
Thanks
It is not a good idea to index on 0 and 1.In your case with thousonds of records the index will have poor selecetivity and is posible to not be used at all.
You can speed up your query if you build comoposite index on start_data and end_data.
And rewrite query to using not "NOT" clause in dates columns. Because NOT is going to do the sort the tables instead of using the index.
You query
...NOT(o.start_date >= '2013-09-10') AND NOT (o.end_date <= '2013-09-05')..
could be defined
...(o.start_date <'2013-09-10') AND (o.end_date > '2013-09-05')...
Related
I have a query that uses a subquery to detect if an item in a joined table has a duplicate record, and if so the data is not pulled into the parent query:
select
(f.listing_datetime) as datetime,
round(avg(f.listing_price), 0) as price,
round(avg(f.listing_sqft), 0) as sqft,
round(avg(f.listing_p_per_sqft), 2) as p_per_ft,
f.listing_neighborhood, count(*) as points
from (
select
a.listing_datetime, a.listing_price, a.listing_sqft, a.listing_p_per_sqft,
a.listing_neighborhood, i.listing_tokens, count(i.listing_tokens) as c
from
agg_cl_data as a
left join incoming_cl_data_desc as i
on a.listing_url = i.listing_url
where a.listing_datetime between curdate() - interval 30 day and curdate()
group by i.listing_tokens
having c < 2
) as f
group by day(f.listing_datetime), f.listing_neighborhood
order by f.listing_datetime;
As you can see, by using a simple way to deal with dupes with the HAVING clause, I'm actually losing the original record that was stored because any aggregated record with great than 2 is thrown out. Is there a better way to do this so that I don't lose some of the data, WITHOUT creating a new table that would be queried against?
If you want to remove duplicate rows then use DISTINCT clause. If you want to find out duplicate based on partitioning on a particular column use the ROW_NUMBER window function.
On first glance, your subquery is invalid since you are grouping by one column and not using any other aggregate function in the other columns.
select distinct
a.listing_datetime, a.listing_price, a.listing_sqft, a.listing_p_per_sqft,
a.listing_neighborhood, i.listing_tokens
from
agg_cl_data as a
left join incoming_cl_data_desc as i
on a.listing_url = i.listing_url
where a.listing_datetime between curdate() - interval 30 day and curdate()
Try using 'distinct' instead if 'having' in subquery. You will get each url only once without loosing it, even if there were two entries for it.
So your code should be:
select DISTINCT a.listing_datetime, ...
and then no 'having' in the end.
I have an orders table that contains the orders_id, customers_email_address and date_purchased. I want to write a SQL query that will, for each line in the table, add a new field called 'repeat_order_count' that shows how many times this customer ordered before and including this order.
For example, if John ordered once before this order, the repeat_order_count would be 2 for this order, or in other words, this is the second time John has ordered. The next order row I encounter for John will have a 3, and so on. This will allow me to create a line graph that shows the number of orders placed by repeat customers over time. I can now go to a specific time in the past and figure out how many orders were placed by repeat customers during that time period:
SELECT
*
FROM orders
WHERE repeat_order_count > 1
WHERE date_purchased = January 2014 --(simplifying things here)
I'm also able to determine now WHEN a customer became a repeat customer.
I can't figure out the query to solve this. Or perhaps there may be an easier way to do this?
One approach to retrieving the specified result would be to use a correlated subquery in the SELECT list. This assumes that the customer identifier is customers_email_address, and that date_purchased is a DATETIME or TIMESTAMP (or other canonical format), and that there are no duplicated values for the same customer (that is, the customer doesn't have two or more orders with the same date_purchased value.)
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, ( SELECT COUNT(1)
FROM orders p
WHERE p.customers_email_address = s.customers_email_address
AND p.date_purchased < s.date_purchased
) AS previous_order_count
FROM orders s
ORDER
BY s.customers_email_address
, s.date_purchased
The correlated subquery will return 0 for the "first" order for a customer, and 1 for the "second" order. If you want to include the current order in the count, replace the < comparison operator with <= operator.
FOLLOWUP
For performance of that query, we need to be particulary concerned with the performance of the correlated subquery, since that is going to be executed for every row in the table. (A million rows in the table means a million executions of that query.) Having a suitable index available is going to be crucial.
For the query in my answer, I'd recommend trying an index like this:
ON orders (customers_email_address, date_purchased, orders_id)
With that index in place, we'd expect EXPLAIN to show the index being used by both the outer query, to satisfy the ORDER BY (No "Using filesort" in the Extra column), and as a covering index (no lookups to the pages in the underlying table, "Using index" shown in the Extra column.)
The answer I gave demonstrated just one approach. It's also possible to return an equivalent result using a join pattern, for example:
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, COUNT(p.orders_id)
FROM orders s
JOIN orders p
ON p.customers_email_address = s.customers_email_address
AND p.date_purchased <= s.date_purchased
GROUP
BY s.customers_email_address
, s.date_purchased
, s.orders_id
ORDER
BY s.customers_email_address
, s.date_purchased
, s.orders_id
(This query is based on some additional information provided in a comment, which wasn't available before: orders_id is UNIQUE in the orders table.)
If we are guaranteed that orders_id of a "previous" order is less than the orders_id of a previous order, then it would be possible to use that column in place of the date_purchased column. We'd want a suitable index available:
... ON orders (customers_email_address, orders_id, date_purchased)
NOTE: The order of the columns in the index is important. With that index, we could do:
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, COUNT(p.orders_id)
FROM orders s
JOIN orders p
ON p.customers_email_address = s.customers_email_address
AND p.orders_id <= s.orders_id
GROUP
BY s.customers_email_address
, s.orders_id
ORDER
BY s.customers_email_address
, s.orders_id
Again, we'd want to review the output from EXPLAIN to verify that the index is being used for both the join operation and the GROUP BY operation.
NOTE: With the inner join, we need to use a <= comparison, so we get at least one matching row back. We could either subtract 1 from that result, if we wanted a count of only "previous" orders (not counting the current order), or we could use an outer join operation with a < comparison, so we could get a row back with a count of 0.
when you are inserting into your orders table, for the column you have for your OrderCount you use a co-related sub-query.
eg:
select
col1,
col2,
(isnull((select count(*) from orders where custID = #currentCustomer),0) + 1),
col4
Note that you wouldn't be adding the field when the 2nd order is processed, the field would already exist and you would just be populating it.
i have a database which has some units which report in at regular intervals. I want a query where I can see units that have not logged after a certain date.
select distinct FSS_LIVE_50.dbo.AgencyVehicle.AgencyVehicleName from FSS_LIVE_50.dbo.AgencyVehicle
inner join FSS_LIVE_50.dbo.VehicleLocation
on FSS_LIVE_50.dbo.AgencyVehicle.AgencyVehicleKey = FSS_LIVE_50.dbo.VehicleLocation.AgencyVehicleKey
where FSS_LIVE_50.dbo.VehicleLocation.GPSLocationDate < '2013-01-01'
and FSS_LIVE_50.dbo.AgencyVehicle.TermDate is NULL
order by AgencyVehicleName
NOW IT SHOWS ME VEHICLES WHO ALSO HAVE LOGS AFTER "2013-01-01", BECAUSE THEY ALSO HAVE LOGS BEFORE AND AFTER THIS DATE
HOW CAN I EXCLUDE NAMES FROM BEING SHOWN WHICH ALSO HAVE DATE LOGS AFTER THAT >?
Change the distinct to group by. Then add a having clause. You are looking for the largest date being before the cutoff:
select FSS_LIVE_50.dbo.AgencyVehicle.AgencyVehicleName
from FSS_LIVE_50.dbo.AgencyVehicle inner join
FSS_LIVE_50.dbo.VehicleLocation
on FSS_LIVE_50.dbo.AgencyVehicle.AgencyVehicleKey = FSS_LIVE_50.dbo.VehicleLocation.AgencyVehicleKey
where FSS_LIVE_50.dbo.AgencyVehicle.TermDate is NULL
group by FSS_LIVE_50.dbo.AgencyVehicle.AgencyVehicleName
having MAX(FSS_LIVE_50.dbo.VehicleLocation.GPSLocationDate) < '2013-01-01'
order by AgencyVehicleName;
Judicious use of aliases would also make your query much more readable.
My attempt was to join customer and order table and to join the lineitem and order table. I have also indexed the c_mktsegment field. My resultant query is this. Is there anything that I can do do improve it?
select
o_shippriority,
l_orderkey,
o_orderdate,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
cust As c
join ord As o on c.c_custkey = o.o_custkey
join line As l on o.o_orderkey = l.l_orderkey
where
c_mktsegment = ':1'
and o_orderdate < date ':2'
and l_shipdate > date ':2'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate;
I don't see anything obviously wrong with this query. For good performance, you probably should have indexes on orders.o_custkey and lineitem.l_orderkey. The index on c_mktsegment will let the DB find customer records quickly, but from there you need to be able to find order and lineitem records.
You should do an Explain to see how the db is processing the query. This depends on many factors, including the number of records in each table and distribution of keys, so I can't say what the plan is just by looking at the query. But if you run Explain and see that it is doing a full-file read of a table, you should add an index to prevent that. That's pretty much rule #1 for query optimization.
I am trying to speedup database select for reporting with more than 3Mil data. Is it good to use dayofweek?
Query is something like this:
SELECT
COUNT(tblOrderDetail.Qty) AS `BoxCount`,
`tblBoxProducts`.`ProductId` AS `BoxProducts`,
`tblOrder`.`OrderDate`,
`tblFranchise`.`FranchiseId` AS `Franchise`
FROM `tblOrder`
INNER JOIN `tblOrderDetail` ON tblOrderDetail.OrderId=tblOrder.OrderId
INNER JOIN `tblFranchise` ON tblFranchise.FranchiseeId=tblOrderDetail.FranchiseeId
INNER JOIN `tblBoxProducts` ON tblOrderDetail.ProductId=tblBoxProducts.ProductId
WHERE (tblOrderDetail.Delivered = 1) AND
(tblOrder.OrderDate >= '2004-05-17') AND
(tblOrder.OrderDate < '2004-05-24')
GROUP BY `tblBoxProducts`.`ProductId`,`tblFranchise`.`FranchiseId`
ORDER BY `tblOrder`.`OrderDate` DESC
But what I really want is to show report for everyday in a week. Like On Sunday, Monday....
So Would it be a good idea to use dayofweek in query or render the result from the view?
No, using dayofweek as one of the columns you're selecting is not going to hurt your performance significantly, nor will it blow out your server memory. Your query shows that you're displaying seven distinct order_date days' worth of orders. Maybe there are plenty of orders, but not many days.
But you may be better off using DATE_FORMAT (see http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format) to display the day of the week as part of the order_date column. Then you don't have to muck around in your client turning (0..6) into (Monday..Sunday) or is it (Sunday..Saturday)? You get my point.
A good bet is to wrap your existing query in an extra one just for formatting. This doesn't cost much and lets you control your presentation without making your data-retrieval query more complex.
Note also you omitted order_date from your GROUP BY expression. I think this is going to yield unpredictable results in mySql. In Oracle, it yields an error message. Also, I don't know what you're doing with this result set, but don't you want it ordered by franchise and box products as well as date?
I presume your OrderDate columns contain only days -- that is, all the times in those column values are midnight. Your GROUP BY won't do what you hope for it to do if there are actual order timestamps in your OrderDate columns. Use the DATE() function to make sure of this, if you aren't sure already. Notice that the way you're doing the date range in your WHERE clause is already correct for dates with timestamps. http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date
So, here's a suggested revision to your query. I didn't fix the ordering, but I did fix the GROUP BY expression and used the DATE() function.
SELECT BoxCount, BoxProducts,
DATE_FORMAT(OrderDate, '%W') AS DayOfWeek,
OrderDate,
Franchise
FROM (
SELECT
COUNT(tblOrderDetail.Qty) AS `BoxCount`,
`tblBoxProducts`.`ProductId` AS `BoxProducts`,
DATE(`tblOrder`.`OrderDate`) AS OrderDate,
`tblFranchise`.`FranchiseId` AS `Franchise`
FROM `tblOrder`
INNER JOIN `tblOrderDetail` ON tblOrderDetail.OrderId=tblOrder.OrderId
INNER JOIN `tblFranchise` ON tblFranchise.FranchiseeId=tblOrderDetail.FranchiseeId
INNER JOIN `tblBoxProducts` ON tblOrderDetail.ProductId=tblBoxProducts.ProductId
WHERE (tblOrderDetail.Delivered = 1)
AND (tblOrder.OrderDate >= '2004-05-17')
AND (tblOrder.OrderDate < '2004-05-24')
GROUP BY `tblBoxProducts`.`ProductId`,`tblFranchise`.`FranchiseId`, DATE(`tblOrder`.`OrderDate`)
ORDER BY DATE(`tblOrder`.`OrderDate`) DESC
) Q
You have lots of inner join operations; this query may still take a while. Make sure tblOrder has some kind of index on OrderDate for best performance.