I try to write a subquery using SQL in Exasol database. The problem is similar to this thread (SQL Query - join on less than or equal date) and the code works well in mysql and postgres. However, when I move the code to Exasol, it says SQL Error 42000: correlation in on clause. I wonder if there's any alternative solution to this problem or how could i fix it in Exasol?
SELECT a.ID,
a.join_date,
a.country,
a.email,
b.start_date,
b.joined_from
FROM a
LEFT JOIN b
ON a.country = b.country
AND b.start_date = (
SELECT MAX(start_date)
FROM b b2
WHERE b2.country = a.country
AND b2.start_date <= a.join_date
);
Although correlated queries are not supported on Exasol, it is possible to solve the requirement using DENSE_RANK() SQL function as follows
with cte as (
select
a.ID, a.join_date, a.country, a.email, b.start_date, b.joined_from,
dense_rank() over (partition by b.country order by b.start_date desc) r1
from a
left join b
on a.country = b.country
)
select * from cte where r1 = 1
Related
Im have been query the database to collectively fetch latest record or each item using PARTITION and ROW_COUNT() which works on MariaDB version 10.4* but i want to query the same on a MySQL version 5.7* database but it doesn't work there. I would like to figure out the alternative that will work on the MySQL database. Kindly help me out.
The query is as follows.
SELECT A_id, B_id, Created_at
FROM
(
SELECT a.id as A_id, b.id as B_id, b.Created_at,
ROW_NUMBER() OVER (PARTITION BY a.id ORDER BY b.Created_at DESC) AS rn
FROM beta b
JOIN alpha a ON b.a_id = a.id
) q
WHERE rn = 1
You may use a join to subquery which finds the latest record for each id:
SELECT a.id AS A_id, b.id AS B_id, b.Created_at
FROM alpha a
INNER JOIN beta b
ON a.id = b.a_id
INNER JOIN
(
SELECT a.id AS max_id, MAX(b.Created_at) AS max_created_at
FROM alpha a
INNER JOIN beta b ON a.id = b.a_id
GROUP BY a.id
) t
ON t.max_id = a.id AND t.max_created_at = b.Created_at;
The idea here is that the additional join to the subquery above aliased as t will only retain the record, for each a.id, having the latest Created_at value from the B table. This has the same effect as your current approach using ROW_NUMBER, without actually needing to use analytic functions.
This is a clarification on this post:
SQL select only rows with max value on a column
In the accepted answer, the nested query is the one used for the max computation and the outer query joins to that. I tried to reverse the order but ran into a syntax error.
Query:
(SELECT id, MAX(rev) mrev
FROM docs
GROUP BY id) b
join (select id, rev, content from docs) d
on b.id = d.id and d.rev = b.rev
There error I run into is this:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near 'b join (select id, rev, content from docs) d on b.id = d.id and
d.rev = b.rev' at line 3
Does the order matter here?
Here is the link:
http://sqlfiddle.com/#!9/a6c585/64570
You can write that query like this.
SELECT d.*
FROM
(
SELECT id, MAX(rev) AS maxrev
FROM docs
GROUP BY id
) b
JOIN docs AS d
ON (b.id = d.id AND d.rev = b.maxrev)
Notice how it selects from a sub-query for the max rev. While the sub-query is simply joined to the table.
Another way to write it :
select d.*
from docs d
join (
select id, max(rev) maxrev
from docs
group by id
) b
on b.id = d.id and b.maxrev = d.rev
Or if you dare to use an EXISTS :
SELECT *
FROM docs AS d
WHERE EXISTS (
SELECT 1
FROM docs AS b
WHERE b.id = d.id
GROUP BY b.id
HAVING d.rev = MAX(b.rev)
);
I have the following simplified query:
Select
(select sum(f1) from tableA a where a.id = t.id) sum1,
(select sum(f2) from tableB b where b.id = t.id) sum2,
t.*
from Table t;
My wish is to have sum1 and sum2 re-used without calculating them again:
Select
(select sum(f1) from tableA a where a.id = t.id) sum1,
(select sum(f2) from tableB b where b.id = t.id) sum2,
sum1 + sum2 `sum3`,
t.*
from Table t;
Of course i can do the following query but this will unnecessary double the run time:
Select
(select sum(f1) from tableA a where a.id = t.id) sum1,
(select sum(f2) from tableB b where b.id = t.id) sum2,
(select sum(f1) from tableA a where a.id = t.id) +
(select sum(f2) from tableB b where b.id = t.id) `sum3`,
t.*
from Table t;
or even inserting the sum1 and sum2 results to a temporary table but can't imaging i'm overlooking something to have mysql do some efficient querying on summed fields.
Is there a better, more efficient way to re-use summed fields?
Try running this query,
select Resutl.*,Result.sum1+Result.sum2 as sum3 from(
SELECT (SELECT SUM(f1) FROM tableA a WHERE a.id = t.id) sum1,
(SELECT SUM(f2) FROM tableB b WHERE b.id = t.id) sum2,
t.*
FROM Table t)Result
)
Hope it will help.
My wish is to have sum1 and sum2 re-used without calculating them again
Whoa, steady on. The original query is not as efficient as it might be. Without knowing what the data distribution looks like its hard to advise what the most appropriate query is, but assuming that the tuples in A and B have a foreign key constraint (implicit or explicit) on table T but no implicit foreign key restraint on each other, that t.id is unique, and most of the rows in tableT have corresponding rows in tableA and tableB then....
SELECT t.*, sum_f1, sum_f2
FROM tableT t
LEFT JOIN (SELECT a.id, SUM(f1) AS sum_f1
FROM tableA a
GROUP BY a.id) AS a_agg
ON t.id=a_agg.id
LEFT JOIN (SELECT b.id, SUM(f2) AS sum_f2
FROM tableB b
GROUP BY b.id) as b_agg
ON t.id=b_agg.id
GROUP BY t.id
Will be much more efficient.
(this also assumes that ONLY_FULL_GROUP_BY is disabled - otherwise you'll need to replace 't.*' in the SELECT clause and 't.id' in the group by clause with each of the attributes you need from the table).
Note that in practice you're rarely going to be looking at allyour data in a single query. Since MySQL doesn't handle push predicates very well, simply adding a filter in the outer SELECT probably won't be the optimal solution.
Once you've structured your query as above, it's trivial to add the sums:
SELECT t.*, sum_f1, sum_f2,
IFNULL(sum_f1,0)+INFULL(sum_f2,0) AS sum3
FROM tableT t
LEFT JOIN (SELECT a.id, SUM(f1) AS sum_f1
FROM tableA a
GROUP BY a.id) AS a_agg
ON t.id=a_agg.id
LEFT JOIN (SELECT b.id, SUM(f2) AS sum_f2
FROM tableB b
GROUP BY b.id) as b_agg
ON t.id=b_agg.id
GROUP BY t.id
(but note that I've made a lot of assumptions about your data and schema)
I have the below 2 queries. Which is more effcient and how would I tell? I've looked at the Execution plan but don't know what I'm looking for, any help would be appreciated.
;WITH TEST AS
(SELECT DISTINCT *, DENSE_RANK () OVER (PARTITION BY T.id
ORDER BY T.ID DESC) AS seq_LatestUpdate
FROM test] t
)
SELECT *
FROM t
INNER JOIN TEST ON T.Id = TEST.Id AND TEST.seq_LatestUpdate = 1
WHERE 1=1
Or
SELECT *
FROM [MCS].[JXM1563].[test] t
INNER JOIN (SELECT DISTINCT T.Id, MAX(ID_CREATEDATE) AS MAXuPDATE
FROM test t
GROUP BY T.Id
) Z ON T.Id = Z.Id
WHERE 1=1
I have the following query:
SELECT
a.name, a.address, n.date, n.note
FROM a
LEFT JOIN n ON a.id = n.id
The a.id has a one to many relationship with n.id, so that many notes can be assocaited with one name.
How do I return just the latest note for each name instead of all the notes?
I'm using SQL Server 2008.
Thanks.
Here's one way using ROW_NUMBER()
SELECT t.name, t.address, t.date, t.note
FROM (
SELECT
a.name, a.address, n.date, n.note,
ROW_NUMBER() OVER (PARTITION BY a.name ORDER BY n.date DESC) rn
FROM a
LEFT JOIN n ON a.id = n.id
) t
WHERE t.rn = 1
alternative you can use a correlated subquery too get the max date, something like this
SELECT
a.name, a.address, n.date, n.note
FROM a
LEFT JOIN n ON a.id = n.id
WHERE n.date = (SELECT MAX(nn.date)
FROM n AS nn
WHERE a.id = nn.id)