How can I convert/fix this WITH statement in SQL? - mysql

I have this query but apparently, the WITH statement has not been implemented in some database systems as yet. How can I rewrite this query to achieve the same result.
Basically what this query is supposed to do is to provide the branch names all of all the branches in a database whose deposit total is less than the average of all the branches put together.
WITH branch_total (branch_name, value) AS
SELECT branch_name, sum (balance) FROM account
GROUP BY branch_name
WITH branch_total_avg (value) AS SELECT avg(value)
FROM branch_total SELECT branch_name
FROM branch_total, branch_total_avg
WHERE branch_total.value < branch_total_avg.value;
Can this be written any other way without the WITH? Please help.

WITH syntax was introduced as a new feature of MySQL 8.0. You have noticed that it is not supported in earlier versions of MySQL. If you can't upgrade to MySQL 8.0, you'll have to rewrite the query using subqueries like the following:
SELECT branch_total.branch_name
FROM (
SELECT branch_name, SUM(balance) AS value FROM account
GROUP BY branch_name
) AS branch_total
CROSS JOIN (
SELECT AVG(value) AS value FROM (
SELECT SUM(balance) AS value FROM account GROUP BY branch_name
) AS sums
) AS branch_total_avg
WHERE branch_total.value < branch_total_avg.value;
In this case, the WITH syntax doesn't provide any advantage, so you might as well write it this way.
Another approach, which may be more efficient because it can probably avoid the use of temporary tables in the query, is to split it into two queries:
SELECT AVG(value) INTO #avg FROM (
SELECT SUM(balance) AS value FROM account GROUP BY branch_name
) AS sums;
SELECT branch_name, SUM(balance) AS value FROM account
GROUP BY branch_name
HAVING value < #avg;
This approach is certainly easier to read and debug, and there's some advantage to writing more straightforward code, to allow more developers to maintain it without having to post on Stack Overflow for help.

Another way to rewrite this query:
SELECT branch_name
FROM account
GROUP BY branch_name
HAVING SUM(balance) < (SELECT AVG(value)
FROM (SELECT branch_name, SUM(balance) AS value
FROM account
GROUP BY branch_name) t1)
As you can see from this code the account table has nearly the same aggregate query run against it twice, once at the outer level and again nested two levels deep.
The benefit of the WITH clause is that you can write that aggregate query once give it a name and use it as many times as needed. Additionally a smart DB engine will only run that subfactored query once but use the results as often as needed.

Related

what is the main difference between mysql where vs having?

Give me a simple answer because I can't understand big explain.
I want to know the main work of having vs where.
Example=select custname,salary,(salary*0.04) as EPF
from customer where EPF >1200;
select custname,salary,(salary*0.04) as EPF
from customer having EPF >1200;
select custname
from customer where salary >= 50000;
select custname
from customer having salary >= 50000;
select custname,salary
from customer where salary >= 50000;
In SQL, having is reserved exclusively for aggregation queries. It is used to filter after aggregation.
MySQL extends the use of having so it can be used in non-aggregation queries. The purpose is so column aliases defined in the SELECT can be used in the having.
Hence, both these queries are valid in MySQL:
select custname
from customer
where salary >= 50000;
select custname
from customer
having salary >= 50000;
The latter is not valid in other databases -- because most (all?) other databases conform more closely to the standard in the definition of having.
I would strongly recommend using WHERE in this case, because it is the SQL standard. It is also possible that using HAVING in MySQL -- under some circumstances -- would not make use of indexes or partitions as efficiently as WHERE.
Gordon's answer is precisely correct.
I'd like only give an example (that could probably be found in any SQL manual) where using HAVING makes sense:
SELECT
manager.id
, COUNT(*) AS workers_count
FROM
user AS manager
INNER JOIN user AS worker ON (
worker.manager_id = manager.id
)
WHERE
manager.type = 'manager'
HAVING
workers_count > 3
The query will return managers who has more than 3 workers.
You cannot use workers_count in WHERE clause as this is not defined at WHERE processing stage.

not a group group function in mysql

I have a table (invoice) like
InovoiceID Invoice amount
I want to select the invoicenumber, the average invoiceamount, and the difference between actual amount and average invoiceamount for each row. However, when I try to do this,
select invoiceID,
avg(invoiceamount) as Average,
Average - invoiceamount
from invoice
This shows an error that sql command is not complete.
Why is this happening?
PS: I even tried this,
SELECT invoiceid,
(SELECT AVG(invoiceamount) FROM Invoice) AS avg_invamt,
(SELECT AVG(invoiceamount) FROM Invoice) - invoiceamount AS diff
FROM Invoice
But still it shows error.
I am using oracle database express edition.
What you wan't to do without is giving the group group error because you are using a group function without specifing a group by statement. So you can achieve what you want with a sub query or even with the with clause.
In my examples I change a bit the names of the columns.
with a as (
select avg(amount) average
from invoice
)
select id, a.average, a.average - amount as diff
from invoice, a;
OR
select id, a.average, a.average - amount as diff
from invoice,
(select avg(amount) average
from invoice) a;
See it here on sqlfiddle: http://sqlfiddle.com/#!4/0c33e/8
This seems like a query that would benefit windowing functions
However, MySQL doesn't support these kinds of functions.The query below should work.
SELECT invoiceId, avg_inv.amt AS "Average",
avg_inv.amt - invoice.invoiceamount AS "Difference"
FROM invoice,
(SELECT avg(invoiceamount) as "amt" FROM invoice) avg_inv
update: just noticed the question tag changed from mysql to oracle. now you can use windowing functions.
heres a description of their usage in oracle, or you can search the countless SO questions regarding windowing functions.

SQL Datetime Not Equal To Is Not Working

I have a database consisting of a Customer, Product, and Transaction table.
I'm trying to write an SQL statement to list all customers' names and SSN's for those customers who have made no transaction in the year 2000.
The TransactionDate column in the Transaction table is a Date/Time data type (e.g. 2000-12-18 00:00:00).
This is the SQL code I've written:
SELECT DISTINCT CustomerName, Customer.CustomerSSN
FROM Customer, Transaction
WHERE Customer.CustomerSSN=Transaction.CustomerSSN
AND YEAR(TransactionDate)<>2000;
The not equal to symbol (<>) seems to not be working for some reason. When I change it to an equal sign, it does return the correct result...
Any advice is appreciated.
I'd change the approach.
The following query doesn't need distinct or GROUP BY because none of the customer records are joined to multiple transaction records.
It also works for customers who have never made Any transactions.
Finally, it uses >= AND < rather than YEAR()=2000. This enable an index seek rather than a full scan (assuming that you have an approriate index on the transactions table).
SELECT
CustomerName,
CustomerSSN
FROM
Customer
WHERE
NOT EXISTS (
SELECT *
FROM Transaction
WHERE CustomerSSN = Customer.CustomerSSN
AND TransactionDate >= '20000101'
AND TransactionDate < '20010101'
)
SELECT DISTINCT
Customer.CustomerName,
Customer.CustomerSSN
FROM Customer
LEFT JOIN Transaction
ON Customer.CustomerSSN=Transaction.CustomerSSN
AND YEAR(TransactionDate) = 2000
WHERE Transaction.TransactionDate IS NULL
This query joins transactions onto customers, however joins specifically Transactions from the year 2000. Any customers which have no patching record from Transactions therefore had no transaction in that year. Therefore you are looking for Transaction.TransactionDate IS NULL
In your own query, you are simply finding any customers who had transactions in a year that was not 2000, however some may have had transactions within the year 2000 also.
SELECT CustomerName, CustomerSSN
FROM Customer
WHERE CustomerSSN NOT IN (
SELECT CustomerSSN
FROM Transaction
WHERE Year(TransactionDate)=2000);
I know its solved, but still wanted to post this as an additional answer here (may be helpful to others).
An Alternate way of fixing this is to use NULLIF operator which is a least modification to the original query and I presume it to be a better replacement if <> doesn't work.
SELECT DISTINCT CustomerName, Customer.CustomerSSN
FROM Customer, Transaction
WHERE Customer.CustomerSSN=Transaction.CustomerSSN
AND (NULLIF(YEAR(TransactionDate), 2000) IS NOT NULL)

MySQL Query in a View

I have the following query I use and it works great:
SELECT * FROM
(
SELECT * FROM `Transactions` ORDER BY DATE DESC
) AS tmpTable
GROUP BY Machine
ORDER BY Machine ASC
What's not great, is when I try to create a view from it. It says that subqueries cannot be used in a view, which is fine - I've searched here and on Google and most people say to break this down into multiple views. Ok.
I created a view that orders by date, and then a view that just uses that view to group by and order by machines - the results however, are not the same. It seems to have taken the date ordering and thrown it out the window.
Any and all help will be appreciated, thanks.
This ended up being the solution, after hours of trying, apparently you can use a subquery on a WHERE but not FROM?
CREATE VIEW something AS
SELECT * FROM Transactions AS t
WHERE Date =
(
SELECT MAX(Date)
FROM Transactions
WHERE Machine = t.Machine
)
You don't need a subquery here. You want to have the latest date in the group of machines, right?
So just do
SELECT
t.*, MAX(date)
FROM Transactions t
GROUP BY Machine
ORDER BY Machine ASC /*this line is obsolete by the way, since in MySQL a group by automatically does sort, when you don't specify another sort column or direction*/
A GROUP BY is used together with a aggregate function (in your case MAX()) anyway.
Alternatively you can also specify multiple columns in the ORDER BY clause.
SELECT
*
FROM
Transactions
GROUP BY Machine
ORDER BY Date DESC, Machine ASC
should give you also what you want to achieve. But using the MAX() function is definitely the better way to go here.
Actually I have never used a GROUP BY without an aggregate function.

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

If I have a table
CREATE TABLE users (
id int(10) unsigned NOT NULL auto_increment,
name varchar(255) NOT NULL,
profession varchar(255) NOT NULL,
employer varchar(255) NOT NULL,
PRIMARY KEY (id)
)
and I want to get all unique values of profession field, what would be faster (or recommended):
SELECT DISTINCT u.profession FROM users u
or
SELECT u.profession FROM users u GROUP BY u.profession
?
They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).
If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.
When in doubt, test!
If you have an index on profession, these two are synonyms.
If you don't, then use DISTINCT.
GROUP BY in MySQL sorts results. You can even do:
SELECT u.profession FROM users u GROUP BY u.profession DESC
and get your professions sorted in DESC order.
DISTINCT creates a temporary table and uses it for storing duplicates. GROUP BY does the same, but sortes the distinct results afterwards.
So
SELECT DISTINCT u.profession FROM users u
is faster, if you don't have an index on profession.
All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column.
Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...
BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.
So if you have something like:
// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins
// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id
It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.
So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!
Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!
well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).
tested example:
postgres=# select count(*) from (select distinct i from g) a;
count
10001
(1 row)
Time: 1563,109 ms
postgres=# select count(*) from (select i from g group by i) a;
count
10001
(1 row)
Time: 594,481 ms
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I
so be careful ... :)
Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null ..
SELECT DISTINCT u.profession FROM users u
is equal to
SELECT u.profession FROM users u GROUP BY u.profession order by null
It seems that the queries are not exactly the same. At least for MySQL.
Compare:
describe select distinct productname from northwind.products
describe select productname from northwind.products group by productname
The second query gives additionally "Using filesort" in Extra.
In MySQL, "Group By" uses an extra step: filesort. I realize DISTINCT is faster than GROUP BY, and that was a surprise.
After heavy testing we came to the conclusion that GROUP BY is faster
SELECT sql_no_cache
opnamegroep_intern
FROM telwerken
WHERE opnemergroep IN (7,8,9,10,11,12,13) group by opnamegroep_intern
635 totaal 0.0944 seconds
Weergave van records 0 - 29 ( 635 totaal, query duurde 0.0484 sec)
SELECT sql_no_cache
distinct (opnamegroep_intern)
FROM telwerken
WHERE opnemergroep IN (7,8,9,10,11,12,13)
635 totaal 0.2117 seconds ( almost 100% slower )
Weergave van records 0 - 29 ( 635 totaal, query duurde 0.3468 sec)
(more of a functional note)
There are cases when you have to use GROUP BY, for example if you wanted to get the number of employees per employer:
SELECT u.employer, COUNT(u.id) AS "total employees" FROM users u GROUP BY u.employer
In such a scenario DISTINCT u.employer doesn't work right. Perhaps there is a way, but I just do not know it. (If someone knows how to make such a query with DISTINCT please add a note!)
Here is a simple approach which will print the 2 different elapsed time for each query.
DECLARE #t1 DATETIME;
DECLARE #t2 DATETIME;
SET #t1 = GETDATE();
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SET #t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, #t1, #t2) AS varchar);
SET #t1 = GETDATE();
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET #t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, #t1, #t2) AS varchar);
OR try SET STATISTICS TIME (Transact-SQL)
SET STATISTICS TIME ON;
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET STATISTICS TIME OFF;
It simply displays the number of milliseconds required to parse, compile, and execute each statement as below:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
SELECT DISTINCT will always be the same, or faster, than a GROUP BY. On some systems (i.e. Oracle), it might be optimized to be the same as DISTINCT for most queries. On others (such as SQL Server), it can be considerably faster.
This is not a rule
For each query .... try separately distinct and then group by ... compare the time to complete each query and use the faster ....
In my project sometime I use group by and others distinct
If you don't have to do any group functions (sum, average etc in case you want to add numeric data to the table), use SELECT DISTINCT. I suspect it's faster, but i have nothing to show for it.
In any case, if you're worried about speed, create an index on the column.
If the problem allows it, try with EXISTS, since it's optimized to end as soon as a result is found (And don't buffer any response), so, if you are just trying to normalize data for a WHERE clause like this
SELECT FROM SOMETHING S WHERE S.ID IN ( SELECT DISTINCT DCR.SOMETHING_ID FROM DIFF_CARDINALITY_RELATIONSHIP DCR ) -- to keep same cardinality
A faster response would be:
SELECT FROM SOMETHING S WHERE EXISTS ( SELECT 1 FROM DIFF_CARDINALITY_RELATIONSHIP DCR WHERE DCR.SOMETHING_ID = S.ID )
This isn't always possible but when available you will see a faster response.
in mySQL i have found that GROUP BY will treat NULL as distinct, while DISTINCT does not.
Took the exact same DISTINCT query, removed the DISTINCT, and added the selected fields as the GROUP BY, and i got many more rows due to one of the fields being NULL.
So.. I tend to believe that there is more to the DISTINCT in mySQL.