Hi I want to understand how to structure query in subquery vs common table expression: See example below.
Write a query to count deduped records in the health.user_logs table
Approach 1: use sub query: select count statement is in the beginning
SELECT COUNT(*)
FROM (
SELECT DISTINCT *
FROM health.user_logs
) AS subquery;
Approach 2: use common table expression: select count statement is in the end?
WITH deduped_logs AS (
SELECT DISTINCT *
FROM health.user_logs
)
SELECT COUNT(*)
FROM deduped_logs;
When to decide if the select count statement should be in the beginning or in the end?
Most often this is about personal preferences, i.e. what you like better and consider more readable.
SELECT ...
FROM
(
SELECT ...
FROM some_table
WHERE ...
) AS subquery
JOIN another_table ON ...
and
WITH
(
SELECT ...
FROM some_table
WHERE ...
) AS subquery
SELECT ...
FROM subquery
JOIN another_table ON ...
are equivalent and one is as good as the other. One advantage with the WITH clause is that you can access the same subquery more than once:
WITH
(
SELECT ...
FROM some_table
WHERE ...
) AS subquery
SELECT ...
FROM subquery s1
JOIN subquery s2 ON s2.type = s1.type AND s2.id < s1.id
Another advantage is that you can build your query step by step without nesting subquery in subquery (at least not visibly), so the query may be considered more readable:
WITH all_jobs AS (...)
, technical_jobs AS (... FROM all_jobs ...)
, well_paid_technical_jobs AS (... FROM technical_jobs ...)
SELECT *
FROM well_paid_technical_jobs
WHERE ...
vs.
SELECT *
FROM
(
SELECT ...
FROM
(
SELECT ...
FROM
(
...
) all_jobs
WHERE ...
) technical_jobs
WHERE ...
) well_paid_technical_jobs
WHERE ...
Related
I have a complicated aggregate-functions query that produces a result-set, and which has to be amended with a single row that contains the totals and averages of that result-set.
My idea is to assign an alias to the result-set, and then use that alias in a second query, after a UNION ALL statement.
But, I can't successfully use the alias, in the subsequent SELECT statement, after the UNION ALL statement.
For the sake of simplicity, I won't post the original query here, just a simplified list of the variants I've tried:
SELECT * FROM fees AS Test1 WHERE Percentage = 15
UNION ALL
(SELECT * FROM fees AS Test2 WHERE Percentage > 15)
UNION ALL
(SELECT * FROM (SELECT * FROM fees AS Test3 WHERE Percentage < 10) AS Test4)
UNION ALL
SELECT * FROM Test3
The result is:
MySQL said: Documentation
#1146 - Table 'xxxxxx.Test3' doesn't exist
The result is the same if the last query references to the table Test1, Test2, or Test4.
So, how should I assign an alias to a result-set/derived table in earlier queries and use that same alias in latter queries, all within a UNION query?
Amendment:
My primary query is:
SELECT
COALESCE(referrers.name,order_items.ReferrerID),
SUM(order_items.quantity) as QtySold,
ROUND(SUM((order_items.quantity*order_items.price+order_items.shippingcosts)/((100+order_items.vat)/100)), 2) as TotalRevenueNetto,
ROUND(100*SUM(order_items.quantity*order_items.purchasepricenet)/SUM((order_items.quantity*order_items.price+order_items.shippingcosts)/((100+order_items.vat)/100)), 1) as PurchasePrice,
ROUND(100*SUM(order_items.quantity*COALESCE(order_items.calculatedfee,0)+order_items.quantity*COALESCE(order_items.calculatedcost,0))/SUM((order_items.quantity*order_items.price+order_items.shippingcosts)/((100+order_items.vat)/100)), 1) as Costs,
ROUND(100*SUM(order_items.calculatedprofit) / SUM( (order_items.quantity*order_items.price + order_items.shippingcosts)/((100+order_items.vat)/100) ) , 1) as Profit,
COALESCE(round(100*Returns.TotalReturns_Qty/SUM(order_items.quantity),2),0) as TotalReturns
FROM order_items LEFT JOIN (SELECT order_items.ReferrerID as ReferrerID, sum(order_items.quantity) as TotalReturns_Qty FROM order_items WHERE OrderType='returns' and OrderTimeStamp>='2017-12-1 00:00:00' GROUP BY order_items.ReferrerID) as Returns ON Returns.ReferrerID = order_items.ReferrerID LEFT JOIN `referrers` on `referrers`.`referrerId` = `order_items`.`ReferrerID`
WHERE ( ( order_items.BundleItemID in ('-1', '0') and order_items.OrderType in ('order', '') ) or ( order_items.BundleItemID is NULL and order_items.OrderType = 'returns' ) ) and order_items.OrderTimestamp >= '2017-12-1 00:00:00'
GROUP BY order_items.ReferrerID
ORDER BY referrers.name ASC
I want to make a grand-total of all the rows resulting from query above with:
SELECT 'All marketplaces', SUM(QtySold), SUM(TotalRevenueNetto), AVG(PurchasePrice), AVG(Costs), AVG(Profit), AVG(TotalReturns) FROM PrimaryQuery
I want to do this with a single query.
Your query is well-written. You may be able to get a total line by using a surrounding query with a dummy GROUP BY clause and WITH ROLLUP:
SELECT
COALESCE(Referrer, 'All marketplaces'),
SUM(QtySold) AS QtySold,
SUM(TotalRevenueNetto) AS TotalRevenueNetto,
AVG(PurchasePrice) AS PurchasePrice,
AVG(Costs) AS Costs,
AVG(Profit) AS Profit,
AVG(TotalReturns) AS TotalReturns
FROM
(
SELECT
COALESCE(referrers.name,order_items.ReferrerID) AS Referrer,
SUM(order_items.quantity) AS QtySold,
...
) PrimaryQuery
GROUP BY Referrer ASC WITH ROLLUP;
I'm not entirely sure what you are attempting to solve, but I guess something like the following:
Hypothetical 'main' query:
SELECT T1.ID
, Sum(total_grade)/COUNT(subjects) as AverageGrade
FROM A_Table T1
JOIN AnotherTable T2
ON T2.id = T1.id
GROUP BY T1.ID
You want sub resultsets, without having to keep querying the same data.
Edit: I mistakenly thought the linked documentation and method mentioned below was for the current version of mySQL. It is however a draft for a future version, and CTE's are not currently supported.
In the absence of CTE support, I would probably just insert the resultset into a temporary table. Something like:
CREATE TABLE TEMP_TABLE(ID INT, AverageGrade DECIMAL(15, 3))
INSERT INTO TEMP_TABLE
SELECT T1.ID
, Sum(total_grade)/COUNT(subjects) as AverageGrade
FROM A_Table T1
JOIN AnotherTable T2
ON T2.id = T1.id
GROUP BY T1.ID
SELECT ID, AverageGrade FROM TEMP_TABLE WHERE AverageGrade > 5
UNION ALL
SELECT COUNT(ID) AS TotalCount, SUM(AverageGrade) AS Total_AVGGrade FROM TEMP_TABLE
DROP TABLE TEMP_TABLE
(Disclaimer: I'm not too familiar with mySQL, there may be some syntax errors here. The general idea should be clear, though.)
That is, of course, if i had to do it like this, there are probably better ways to achieve the same. See Thorsten Kettner's comments on the matter.
(Previous answer assuming CTE is a posibility:)
A CTE approach looks like:
WITH CTE AS
(
SELECT T1.ID
, Sum(total_grade)/COUNT(subjects) as AverageGrade
FROM A_Table T1
JOIN AnotherTable T2
ON T2.id = T1.id
GROUP BY T1.ID
)
SELECT ID, AverageGrade FROM CTE WHERE AverageGrade > 5
UNION ALL
SELECT COUNT(ID) AS TotalCount, SUM(AverageGrade) AS Total_AVGGrade FROM CTE
You have the error because every query involved in UNION doens't know the alias of other.
DB Engine execute, in your case, 4 queries and then paste them with UNION operation.
Your real table is fees. Test3 is an alias used in the third query.
If you want to process the results of UNION operation, you must encapsulate your queries in a MAIN query.
It looks like you need something like below. Please try
SELECT * FROM fees AS Test2 WHERE Percentage >= 15
UNION ALL
SELECT * FROM fees AS Test3 WHERE Percentage < 10
You can't use a table alias based on a subquery (is not in the scope of the outer united select) you must repeat the code eg:
SELECT * FROM fees AS Test1 WHERE Percentage = 15
UNION ALL
SELECT * FROM fees AS Test2 WHERE Percentage > 15
UNION ALL
SELECT * FROM (
SELECT * FROM fees AS Test3 WHERE Percentage < 10
) AS Test4
UNION ALL
SELECT * FROM fees AS Test3 WHERE Percentage < 10
i have the following sql query
SELECT Store.*
FROM Store
WHERE EXISTS (
SELECT Contest.StoreID .....)
OR EXISTS (
SELECT Discount.StoreID .....)
my problem is that i want to include to the results some columns from the Contest and the Discount arrays. If I join them on the FROM it works but is there a way to get the values from the EXISTS ? something like this
SELECT Store.*, t1.something, t2.somethingElse
FROM Store
WHERE EXISTS (
SELECT Contest.StoreID .....) t1
OR EXISTS (
SELECT Discount.StoreID .....) t2
No, its not possible to select from the WHERE clause, think about it, this clause if for filtering.
There are two ways for selecting data from different tables together, a sub query or with a join.
Here is a JOIN example :
SELECT s.*, t1.something, t2.somethingElse
FROM Store s
LEFT OUTER JOIN Contest t1 ON(...)
LEFT OUTER JOIN Discount t2 ON(...)
WHERE t1.<column> is not null OR t2.<column> is not null
This will do the same as your query with the EXISTS() , and will probably have similar performance.
Can also be done we a correlated sub query :
SELECT * FROM (
SELECT s.*,
(SELECT t1.something FROM contest t1 WHERE t1.<col> = s.<col>) as col1,
(SELECT t1.something FROM contest t1 WHERE t1.<col> = s.<col>) as col2,
FROM Store s) t
WHERE t.col1 is not null or t.col2 is not null
Is it possible to combine a result of two cte to another cte. I wrote a query combining two cte. The result gave a three column data in which I want to group the third column and averaging the second column. The second column resulted from a case sum statement.
If you are asking whether you can re-use CTEs after they have been used in a query, the answer is no. You can't do this:
with A
as (
-- query
)
select A.*
from A;
-- this is a separate query
select id
, count(*)
from A
group by
id
You can, however, combine CTEs in all kinds of ways, as long as you do it in a single statement. You can do this, which uses the hypothetical CTE A in two CTEs and the final query:
with A
as (
-- some query
)
, ACustomers
as (
select *
from Customers
join A
on ....
)
, AVendors
as (
select *
from Vendors
join A
on ....
)
select A.StateId
, ACount = COUNT(*)
, CustomerCount = (select count(*) from ACustomers ac where ac.StateId = A.StateId )
, VendorCount = (select count(*) from AVendors av where av.StateId = A.StateId )
from A
group by
A.StateId
I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
The final query would look like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)
Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure
Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.
More tips
If you want to make queries faster,
don't do a SELECT * only select
the fields that you really need.
Make sure you have an index on relevant_field to speed up the equi-join.
Make sure to group by on the primary key.
If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.
General solution for 90% of your IN (select queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!
I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?
Subqueries vs joins
http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6
Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
after that create a table and insert result to it.
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;
sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem
This is similar to my case, where I have a table named tabel_buku_besar. What I need are
Looking for record that have account_code='101.100' in tabel_buku_besar which have companyarea='20000' and also have IDR as currency
I need to get all record from tabel_buku_besar which have account_code same as step 1 but have transaction_number in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.
It's slow because your sub-query is executed once for every comparison between relevant_field and your IN clause's sub-query. You can avoid that like so:
SELECT *
FROM some_table T1 INNER JOIN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) T2
USING(relevant_field)
This creates a derived table (in memory unless it's too large to fit) as T2, then INNER JOIN's it with T1. The JOIN happens one time, so the query is executed one time.
I find this particularly handy for optimising cases where a pivot is used to associate a bulk data table with a more specific data table and you want to produce counts of the bulk table based on a subset of the more specific one's related rows. If you can narrow down the bulk rows to <5% then the resulting sparse accesses will generally be faster than a full table scan.
ie you have a Users table (condition), an Orders table (pivot) and LineItems table (bulk) which references counts of Products. You want the sum of Products grouped by User in PostCode '90210'. In this case the JOIN will be orders of magnitude smaller than when using WHERE relevant_field IN( SELECT * FROM (...) T2 ), and therefore much faster, especially if that JOIN is spilling to disk!
I have a MySQL query where I have a nested SELECT that returns an array to the parent:
SELECT ...
FROM ...
WHERE ... IN (SELECT .... etc)
I would like to store the number of returned results (row count) from the nested SELECT, but doing something like IN (SELECT count(...), columnA) does not work, as the IN expects just one result.
Is there a way to store the returned result count for later use within the parent statement?
You're probably going to have to select the results of your nested statement into a temporary table. Then you can do an IN and a count on it later. I'm more familiar with MS-SQL, but I think you should be able to do it like this:
CREATE TEMPORARY TABLE tmp_table AS
SELECT something
FROM your_table;
SELECT ...
FROM ...
WHERE ... IN (SELECT * FROM tmp_table);
SELECT count(*) FROM tmp_table;
If that doesn't work, you may have to provide full details to the temporary table creation statement as you would with a normal "CREATE TABLE". See here in the MySQL manual, and here for a similar example.
CREATE TEMPORARY TABLE tmp_table
(
tableid INT,
somedata VARCHAR(50)
);
INSERT INTO tmp_table
SELECT ...
FROM ...
SELECT ...
FROM ...
WHERE ... IN (SELECT * FROM tmp_table);
SELECT count(*) FROM tmp_table;
Rich
You mentioned in your comment that your query look like this:
SELECT
tabA.colA,
tabA.colB
FROM tabA
WHERE tabA.colA IN ( SELECT tabA.colA FROM tabA WHERE tabA.colB = 1 )
I might be missing something, but you don't need a subquery for this. Why don't you do it in a regular where condition:
SELECT
tabA.colA,
tabA.colB,
FROM tabA
WHERE tabA.colB = 1
You can use IN predicate for multiple columns like this:
SELECT *
FROM table
WHERE (col1, col2) IN
(
SELECT col3, col4
FROM othertable
)
If you want to select COUNT(*) along with each value, use this:
SELECT colA, colB, cnt
FROM (
SELECT COUNT(*) AS cnt
FROM tabA
WHERE colB = 1
) q,
tabA
WHERE colB = 1