I have a database consisting of a Customer, Product, and Transaction table.
I'm trying to write an SQL statement to list all customers' names and SSN's for those customers who have made no transaction in the year 2000.
The TransactionDate column in the Transaction table is a Date/Time data type (e.g. 2000-12-18 00:00:00).
This is the SQL code I've written:
SELECT DISTINCT CustomerName, Customer.CustomerSSN
FROM Customer, Transaction
WHERE Customer.CustomerSSN=Transaction.CustomerSSN
AND YEAR(TransactionDate)<>2000;
The not equal to symbol (<>) seems to not be working for some reason. When I change it to an equal sign, it does return the correct result...
Any advice is appreciated.
I'd change the approach.
The following query doesn't need distinct or GROUP BY because none of the customer records are joined to multiple transaction records.
It also works for customers who have never made Any transactions.
Finally, it uses >= AND < rather than YEAR()=2000. This enable an index seek rather than a full scan (assuming that you have an approriate index on the transactions table).
SELECT
CustomerName,
CustomerSSN
FROM
Customer
WHERE
NOT EXISTS (
SELECT *
FROM Transaction
WHERE CustomerSSN = Customer.CustomerSSN
AND TransactionDate >= '20000101'
AND TransactionDate < '20010101'
)
SELECT DISTINCT
Customer.CustomerName,
Customer.CustomerSSN
FROM Customer
LEFT JOIN Transaction
ON Customer.CustomerSSN=Transaction.CustomerSSN
AND YEAR(TransactionDate) = 2000
WHERE Transaction.TransactionDate IS NULL
This query joins transactions onto customers, however joins specifically Transactions from the year 2000. Any customers which have no patching record from Transactions therefore had no transaction in that year. Therefore you are looking for Transaction.TransactionDate IS NULL
In your own query, you are simply finding any customers who had transactions in a year that was not 2000, however some may have had transactions within the year 2000 also.
SELECT CustomerName, CustomerSSN
FROM Customer
WHERE CustomerSSN NOT IN (
SELECT CustomerSSN
FROM Transaction
WHERE Year(TransactionDate)=2000);
I know its solved, but still wanted to post this as an additional answer here (may be helpful to others).
An Alternate way of fixing this is to use NULLIF operator which is a least modification to the original query and I presume it to be a better replacement if <> doesn't work.
SELECT DISTINCT CustomerName, Customer.CustomerSSN
FROM Customer, Transaction
WHERE Customer.CustomerSSN=Transaction.CustomerSSN
AND (NULLIF(YEAR(TransactionDate), 2000) IS NOT NULL)
Related
I have this query but apparently, the WITH statement has not been implemented in some database systems as yet. How can I rewrite this query to achieve the same result.
Basically what this query is supposed to do is to provide the branch names all of all the branches in a database whose deposit total is less than the average of all the branches put together.
WITH branch_total (branch_name, value) AS
SELECT branch_name, sum (balance) FROM account
GROUP BY branch_name
WITH branch_total_avg (value) AS SELECT avg(value)
FROM branch_total SELECT branch_name
FROM branch_total, branch_total_avg
WHERE branch_total.value < branch_total_avg.value;
Can this be written any other way without the WITH? Please help.
WITH syntax was introduced as a new feature of MySQL 8.0. You have noticed that it is not supported in earlier versions of MySQL. If you can't upgrade to MySQL 8.0, you'll have to rewrite the query using subqueries like the following:
SELECT branch_total.branch_name
FROM (
SELECT branch_name, SUM(balance) AS value FROM account
GROUP BY branch_name
) AS branch_total
CROSS JOIN (
SELECT AVG(value) AS value FROM (
SELECT SUM(balance) AS value FROM account GROUP BY branch_name
) AS sums
) AS branch_total_avg
WHERE branch_total.value < branch_total_avg.value;
In this case, the WITH syntax doesn't provide any advantage, so you might as well write it this way.
Another approach, which may be more efficient because it can probably avoid the use of temporary tables in the query, is to split it into two queries:
SELECT AVG(value) INTO #avg FROM (
SELECT SUM(balance) AS value FROM account GROUP BY branch_name
) AS sums;
SELECT branch_name, SUM(balance) AS value FROM account
GROUP BY branch_name
HAVING value < #avg;
This approach is certainly easier to read and debug, and there's some advantage to writing more straightforward code, to allow more developers to maintain it without having to post on Stack Overflow for help.
Another way to rewrite this query:
SELECT branch_name
FROM account
GROUP BY branch_name
HAVING SUM(balance) < (SELECT AVG(value)
FROM (SELECT branch_name, SUM(balance) AS value
FROM account
GROUP BY branch_name) t1)
As you can see from this code the account table has nearly the same aggregate query run against it twice, once at the outer level and again nested two levels deep.
The benefit of the WITH clause is that you can write that aggregate query once give it a name and use it as many times as needed. Additionally a smart DB engine will only run that subfactored query once but use the results as often as needed.
I have table with user transactions.I need to select users who made total transactions more than 100 000 in a single day.Currently what I'm doing is gather all user ids and execute
SELECT sum ( amt ) as amt from users where date = date("Y-m-d") AND user_id=id;
for each id and checking weather the amt > 100k or not.
Since it's a large table, it's taking lot of time to execute.Can some one suggest an optimised query ?
This will do:
SELECT sum ( amt ) as amt, user_id from users
where date = date("Y-m-d")
GROUP BY user_id
HAVING sum ( amt ) > 1; ' not sure what Lakh is
What about filtering the record 1st and then applying sum like below
select SUM(amt),user_id from (
SELECT amt,user_id from users where user_id=id date = date("Y-m-d")
)tmp
group by user_id having sum(amt)>100000
What datatype is amt? If it's anything but a basic integral type (e.g. int, long, number, etc.) you should consider converting it. Decimal types are faster than they used to be, but integral types are faster still.
Consider adding indexes on the date and user_id field, if you haven't already.
You can combine the aggregation and filtering in a single query...
SELECT SUM(Amt) as amt
FROM users
WHERE date=date(...)
AND user_id=id
GROUP BY user_id
HAVING amt > 1
The only optimization that can be done in your query is by applying primary key on user_id column to speed up filtering.
As far as other answers posted which say to apply GROUP BY on filtered records, it won't have any effect as WHERE CLAUSE is executed first in SQL logical query processing phases.
Check here
You could use MySql sub-queries to let MySql handle all the iterations. For example, you could structure your query like this:
select user_data.user_id, user_data.total_amt from
(
select sum(amt) as total_amt, user_id from users where date = date("Y-m-d") AND user_id=id
) as user_data
where user_data.total_amt > 100000;
I have a users table, and an appointments table. For any given day, I would like a query that selects
1) the user_id the appointment is scheduled with
2) the number of appointments for that user for the specified day.
It seems I can one or the other, but I'm unsure of how to do it with one query. For instance, I can do:
SELECT user_id FROM appt_tbl WHERE DATE(appt_date_time) = '2012-10-14'
group by user_id;
Which will give me the users that have an appointment that day, but how can I add to this query another column that will give me how many appointments each user has? Assuming I need some kind of subquery, but I'm unsure of how to structure that.
SQL uses the notion of "aggregate functions" to get you this information. You can use them with any aggregating query (i.e. it has "group by" in it).
SELECT user_id, count(*) as num_apts ...
Try adding COUNT(*) to your query:
SELECT user_id, COUNT(*) FROM appt_tbl WHERE DATE(appt_date_time) = '2012-10-14'
group by user_id;
I have two tables
Invoice(
Id,
Status,
VendorId,
CustomerId,
OrderDate,
InvoiceFor,
)
InvoiceItem(
Id,
Status,
InvoiceId,
ProductId,
PackageQty,
PackagePrice,
)
here invoice.id=invoiceItem.invoiceId (Foregin key)
and Id fields are primary key (big int)
these tables contains 100000(invoice) and 450000(invoiceItem) rows
Now I have to write a query which will return the ledger of invoices where invoice for = 55 or 66 and in a certain date range.
I also have to return a last taken date which will contain the previous taken date of product by that particular customer.
The output should be
OrderDate, InvoiceId, CustomerId, ProductId, LastTaken, PackageQty, PackagePrice
So I write the following query
SELECT a.*, (
SELECT MAX(ivv.orderdate)
FROM invoice AS ivv , invoiceItem AS iiv
WHERE ivv.id=iiv.invoiceid
AND iiv.ProductId=a.ProductId AND ivv.CustomerId=a.CustomerId AND ivv.orderDate<a.orderdate
) AS lastTaken FROM (
SELECT iv.Id, iv.OrderDate, iv.CustomerId, iv.InvoiceFor, ii.ProductId,
ii.PackageQty, ii.PackagePrice
FROM invoice AS iv, invoiceitem AS ii
WHERE iv.id=ii.InvoiceId
AND iv.InvoiceFor IN (55,66)
AND iv.Status=0 AND ii.Status=0
AND OrderDate BETWEEN '2011-01-01' AND '2011-12-31'
ORDER BY iv.orderdate, iv.Id ASC
) AS a
But I always got the Time out. How Will I solve the problem???
the Explain for this query is as follows:
Create index on OrderDate and InvoiceFor attributes. It should be much faster.
Two points about the query itself:
Learn to use proper JOIN syntax. Doing the joins in the WHERE clause is like writing questions in Shakespearean English.
The ORDER BY in the subquery should be outside at the highest level.
However, neither of these are killing performance. The problem is the subquery in the SELECT clause. i think the problem is that your subquery in the SELECT clause is not joining the two tables directly. Try including iiv.InvoiceId = ivv.InvoiceId in, preferably, and ON clause.
If that doesn't work, try an indexing strategy. The following indexes should improve the performance of that subquery:
An index on InvoiceItem(ProductId)
An index on Invoice (CustomerId, OrderDate)
This should allow MySQL to run the subquery from indexes, rather than full table scans, which should be a big performance improvement.
I am trying to speedup database select for reporting with more than 3Mil data. Is it good to use dayofweek?
Query is something like this:
SELECT
COUNT(tblOrderDetail.Qty) AS `BoxCount`,
`tblBoxProducts`.`ProductId` AS `BoxProducts`,
`tblOrder`.`OrderDate`,
`tblFranchise`.`FranchiseId` AS `Franchise`
FROM `tblOrder`
INNER JOIN `tblOrderDetail` ON tblOrderDetail.OrderId=tblOrder.OrderId
INNER JOIN `tblFranchise` ON tblFranchise.FranchiseeId=tblOrderDetail.FranchiseeId
INNER JOIN `tblBoxProducts` ON tblOrderDetail.ProductId=tblBoxProducts.ProductId
WHERE (tblOrderDetail.Delivered = 1) AND
(tblOrder.OrderDate >= '2004-05-17') AND
(tblOrder.OrderDate < '2004-05-24')
GROUP BY `tblBoxProducts`.`ProductId`,`tblFranchise`.`FranchiseId`
ORDER BY `tblOrder`.`OrderDate` DESC
But what I really want is to show report for everyday in a week. Like On Sunday, Monday....
So Would it be a good idea to use dayofweek in query or render the result from the view?
No, using dayofweek as one of the columns you're selecting is not going to hurt your performance significantly, nor will it blow out your server memory. Your query shows that you're displaying seven distinct order_date days' worth of orders. Maybe there are plenty of orders, but not many days.
But you may be better off using DATE_FORMAT (see http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format) to display the day of the week as part of the order_date column. Then you don't have to muck around in your client turning (0..6) into (Monday..Sunday) or is it (Sunday..Saturday)? You get my point.
A good bet is to wrap your existing query in an extra one just for formatting. This doesn't cost much and lets you control your presentation without making your data-retrieval query more complex.
Note also you omitted order_date from your GROUP BY expression. I think this is going to yield unpredictable results in mySql. In Oracle, it yields an error message. Also, I don't know what you're doing with this result set, but don't you want it ordered by franchise and box products as well as date?
I presume your OrderDate columns contain only days -- that is, all the times in those column values are midnight. Your GROUP BY won't do what you hope for it to do if there are actual order timestamps in your OrderDate columns. Use the DATE() function to make sure of this, if you aren't sure already. Notice that the way you're doing the date range in your WHERE clause is already correct for dates with timestamps. http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date
So, here's a suggested revision to your query. I didn't fix the ordering, but I did fix the GROUP BY expression and used the DATE() function.
SELECT BoxCount, BoxProducts,
DATE_FORMAT(OrderDate, '%W') AS DayOfWeek,
OrderDate,
Franchise
FROM (
SELECT
COUNT(tblOrderDetail.Qty) AS `BoxCount`,
`tblBoxProducts`.`ProductId` AS `BoxProducts`,
DATE(`tblOrder`.`OrderDate`) AS OrderDate,
`tblFranchise`.`FranchiseId` AS `Franchise`
FROM `tblOrder`
INNER JOIN `tblOrderDetail` ON tblOrderDetail.OrderId=tblOrder.OrderId
INNER JOIN `tblFranchise` ON tblFranchise.FranchiseeId=tblOrderDetail.FranchiseeId
INNER JOIN `tblBoxProducts` ON tblOrderDetail.ProductId=tblBoxProducts.ProductId
WHERE (tblOrderDetail.Delivered = 1)
AND (tblOrder.OrderDate >= '2004-05-17')
AND (tblOrder.OrderDate < '2004-05-24')
GROUP BY `tblBoxProducts`.`ProductId`,`tblFranchise`.`FranchiseId`, DATE(`tblOrder`.`OrderDate`)
ORDER BY DATE(`tblOrder`.`OrderDate`) DESC
) Q
You have lots of inner join operations; this query may still take a while. Make sure tblOrder has some kind of index on OrderDate for best performance.