To use JOIN or UNION - entity-framework-4.1

To use JOIN or UNION - entity-framework-4.1

I am supporting a legacy application which has a LINQ code which goes like this:
query1 = (from s in DBContext.Table_A JOIN DBCOntext.Table_B ON A.ID=B.ID
where a.SUPERVISOR = '9999' Select new ClassA {....}
query2 = (from s in DBContext.Table_A JOIN DBCOntext.Table_B ON A.ID=B/ID
where a.APPROVER = '1111' Select new ClassA {....}
query1.join(query2).ToList()
My question is can the above 2 queries be combined into 1 query using JOIN. Notice that the only difference between the 2 queries is the WHERE clause.
Will this result in any performance improvements?
A point to be noted that all the entities used here are Views.
Thanks
Parameswaran

The only way to know for sure is to test as it's hard to know what optimizations LINQ might use.
In my personal opinion, combining them would be more efficient (unless LINQ does that as an optimization under the hood) because you'd only have to go through the table once (instead of twice).
Your where clause could be rewritten as:
where a.SUPERVISOR='9999' or a.APPROVER='1111'

Related

Optimizing Queries in MySQL

Is Query 1 more optimized say for example for a larger database than Query 2 even by slight or am I just doubling the work with an additional WHERE clause?
Query 1:
SELECT sample_data
FROM table1 INNER JOIN table2 ON table1.key = table2.key
WHERE table1.key = table2.key;
Query 2:
SELECT sample_data
FROM table1 INNER JOIN table2 ON table1.key = table2.key;
Because I read this article saying that using filters in JOIN clauses improve the performance..:

Is Query 1 more optimized say for example for a larger database than Query 2?
No, it is not more optimized. Query 2 is the correct way to handle the JOIN. Query 1 does the same thing, but with extra verbiage for the MySQL server software to scrub out as it figures out how to satisfy your query.
The advice at the Adobe documentation about filtering both tables in a join does not relate to the join's ON-condition. Their example says to do this...
SELECT whatever, whatever
FROM table1
JOIN table2 ON table2.table1_id = table1.table1_id
WHERE table1.date >= '2021-01-01'
AND table2.date >= '2021-01-01' /* THIS LINE IS WHAT THEY SUGGEST */
Their suggestion, from 2015, has to do with filtering non-join attributes from both tables. It's a suggestion to use to optimize a query if it just isn't fast enough for you. And, in my experience, it's not a very good suggestion. Ignore it, at least for now. More recent MySQL versions have gotten more efficient.
Let me add to this. SQL is a so-called "declarative" language. You declare what you want and the MySQL server figures out how to get it for you. SQL software is getting really good at doing that; keep in mind that MySQL is now a quarter century old. In that time its programmers have been continuously making it smarter at figuring out how to get stuff. You probably can't outsmart it. But you may need to add indexes when your tables get really big. https://use-the-index-luke.com/
Other languages are "procedural": you, as a programmer, spell out a procedure for getting what you want. You don't need to do that for SQL.

I like to put it this way:
ON is where you specify how the tables are related.
WHERE is for filtering.
That makes it easy for a human reading the query to understand it.
In reality (for MySQL), JOIN (aka INNER JOIN) treats ON and WHERE identically. That is, there is no performance difference. Your Query 1 unnecessarily specifies the "relation" twice.
Also, MySQL's Optimizer is smart enough to realize when two columns have the same value. For example,
SELECT ...
FROM a
JOIN bb ON a.foo = bb.foo
WHERE a.foo = 123
If the Optimizer decides that starting with the filter bb.foo = 123 is more optimal, it will do so. Note: This is not the same as the example you showed; it joins on one thing (id) but filters on another (date). The two queries there are not equivalent!
LEFT JOIN, necessarily treats ON and WHERE differently. (But that is another topic.)

Is there an efficient alternative to "Not In" in MS Access Query?

It appears that doing a Not In query is expensive in MS Access, as these queries generally run very slowly. Is there an alternative method to conduct such a query as to avoid this overhead?

you can use Joins!!!
Using either the Left Join or Right Join should do what you want! (Include the WHERE Clause IS NULL
(So that's either the middle left or middle right diagrams.)
So it will be something like:
SELECT *
FROM Table a
LEFT JOIN Table b
ON a.Value = b.Value
WHERE b.AnotherOrSameValue IS NULL
Please Note
As HansUp has informed me, The Full Outer Join is not available for MS Access SQL (that's the lower right diagram)

Using in and not in translate to very long and or or statements by the pre parser.
So ABC IN(1,2,3,4) translates to (ABC = 1 OR ABC = 2 OR ABC = 3 OR ABC = 4)
Similarly for NOT IN, although probably using ANDs in the actual query processing.
This can result in poor performance of queries, if there is a large (-ish) number of values in the IN or NOT IN.
It might be best to redesign your query to use joins instead, as the database engines are highly optimized for fast Joins.
EDIT: Explained the pitfalls of using IN or NOT IN a bit better

MySQL Query Efficiency Suggestion

Although this question is specific to MySQL, I wouldn't mind knowing if this answer applies to SQL engines in general.
Also, since this isn't a syntax query, I'm using psuedo-SQL for brevity/clarity.
Let's say C[1]..C[M] are a set of criteria (separated by AND or OR) and Q[1]..Q[N] are another set (separated by OR). I want to use C[1]...C[M] to filter a table and from this filtered table, I want all the rows matching Q[1]...Q[N].
If I were to do:
SELECT ... FROM ... WHERE (C[1]...C[M]) AND (Q[1]...Q[N])
Would this be automatically optimized so that C[1]...C[M] is found only once and each Q[i] is run against this cache'ed result? If not, should I then split the query into two like so:
INSERT INTO TEMP ... SELECT ... FROM ... WHERE C[1]...C[N]
SELECT ... FROM TEMP WHERE Q[1]...Q[N]

This is the job of the internal query optimizer to calculate the best order for compiling the joins according to filters.
For instance in:
SELECT *
FROM
table1
INNER JOIN table2 ON table1.id = table2.id AND table2.column = Y
INNER JOIN table3 ON table3.id2 = table2.id2 AND table3.column = Z
WHERE
table1.column = X
Mysql (/oracle/sqlserver etc...) would try to compute beforehand each intermediary resultset to provide the best performances, and actually here the engine is doing a pretty good job.
However, everything relies on the statistics it actually has about the tables and the indexes you've setup in your architecture. These 2 points (apart from filling up the tables with datas...) are the only ones you can influence to help the optimizer to make good decisions by giving it the right and accurate information.
I hope it helped.
ps: have a look at this. This is about operators and order of precedence in queries compilation under oracle yet it is probably a good thing to know anyway:
http://ezinearticles.com/?Oracle-SQL---The-Importance-of-Order-of-Precedence&id=1597846

Performans of nested queries

I want to ask a question about database queries. In case of query such like where clause of the query is coming from the another query. For example
select ? from ? where ? = select ? from ?
This is the simple example so it is easy to write this. But for the more complex case, i want to know what is the best way in case of performance. Join? seperate queries? nested or another?
Thank you for answers.
Best Regards.

You should test it. These things depend a lot on the details of the query and of the indices it can use.
In my experience JOINs tend to be faster than nested queries in MySQL. In some cases MySQL isn't very smart and appears to run the subquery for every row produced by the outer query.
You can read more about these things in the official documentation:
Optimizing subqueries: http://dev.mysql.com/doc/refman/5.6/en/optimizing-subqueries.html
Rewriting subqueries as joins: http://dev.mysql.com/doc/refman/5.6/en/rewriting-subqueries.html

This is case dependent. In case you have a very less result in the inner query you should go for it. The flow works in the manner where in the inner query is executed first and the result set is being used in the outer query.
Meanwhile joins give you a Cartesian product which is again a heavy operation.

As Mitch and Joni stated, it depends. But generally a join will offer the best performance. You're trying to avoid running the nested query for each row of the outer query. A good query optimizer may do this for you anyway, by interpreting what you're trying to do and essentially "fixing" your mistake. But with the vast majority of queries, you should be writing it as a join in the first place. That way you're being explicit about what you're trying to do and you're fully understanding yourself what is being done, and what the most efficient way to do the work is.

I EXPECT the joins to be quicker, mainly because you have an equivalence and an explicit JOIN. Still use explain to see the differences in how the SQl engine will interpret them.
I would not expect these to be so different, where you can get real, large performance gains in using joins instead of subqueries is when you use correlated subqueries.

Since almost everyone is saying that joins will give the optimal performance I just logged in to say the exact opposite experience I had.
So some days back I was writing a query for 3-4 tables which had huge amount of data. I wrote a big sql query with joins and it was taking around 2-3 hours to execute it. Then I restructured it, created a nested select query, put as many where constraints as I can inside the nested one & made it as stricter as possible and then the performance improved by >90%, it now takes less than 4 mins to run.
This is just my experience and may be theoretically joins are better. I just felt to share my experience. Its better to try out different things, getting additional knowledge about the tables, it's indexes etc would help a lot.
Update:
And I just found out what I did is actually suggested in this optimization reference page of MySQL. http://dev.mysql.com/doc/refman/5.6/en/optimizing-subqueries.html
Pasting it here for quick reference:
Replace a join with a subquery. For example, try this:
SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN ( SELECT column1
FROM t2);
Instead of this:
SELECT DISTINCT t1.column1 FROM t1, t2 WHERE t1.column1 =
t2.column1;
Move clauses from outside to inside the subquery. For example, use
this query:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1
FROM t2); Instead of this query:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1
FROM t2); For another example, use this query:
SELECT (SELECT column1 + 5 FROM t1) FROM t2; Instead of this query:
SELECT (SELECT column1 FROM t1) + 5 FROM t2;

which query is better and efficient - mysql

I came across writing the query in differnt ways like shown below
Type-I
SELECT JS.JobseekerID
, JS.FirstName
, JS.LastName
, JS.Currency
, JS.AccountRegDate
, JS.LastUpdated
, JS.NoticePeriod
, JS.Availability
, C.CountryName
, S.SalaryAmount
, DD.DisciplineName
, DT.DegreeLevel
FROM Jobseekers JS
INNER
JOIN Countries C
ON JS.CountryID = C.CountryID
INNER
JOIN SalaryBracket S
ON JS.MinSalaryID = S.SalaryID
INNER
JOIN DegreeDisciplines DD
ON JS.DegreeDisciplineID = DD.DisciplineID
INNER
JOIN DegreeType DT
ON JS.DegreeTypeID = DT.DegreeTypeID
WHERE
JS.ShowCV = 'Yes'
Type-II
SELECT JS.JobseekerID
, JS.FirstName
, JS.LastName
, JS.Currency
, JS.AccountRegDate
, JS.LastUpdated
, JS.NoticePeriod
, JS.Availability
, C.CountryName
, S.SalaryAmount
, DD.DisciplineName
, DT.DegreeLevel
FROM Jobseekers JS, Countries C, SalaryBracket S, DegreeDisciplines DD
, DegreeType DT
WHERE
JS.CountryID = C.CountryID
AND JS.MinSalaryID = S.SalaryID
AND JS.DegreeDisciplineID = DD.DisciplineID
AND JS.DegreeTypeID = DT.DegreeTypeID
AND JS.ShowCV = 'Yes'
I am using Mysql database
Both works really well, But I am wondering
which is best practice to use all time for any situation?
Performance wise which is better one?(Say the database as a millions records)
Any advantages of one over the other?
Is there any tool where I can check which is better query?
Thanks in advance

1- It's a no brainer, use the Type I
2- The type II join are also called 'implicit join', whereas the type I are called 'explicit join'. With modern DBMS, you will not have any performance problem with normal query. But I think with some big complex multi join query, the DBMS could have issue with the implicit join. Using explicit join only could improve your explain plan, so faster result !
3- So performance could be an issue, but most important maybe, the readability is improve for further maintenance. Explicit join explain exactly what you want to join on what field, whereas implicit join doesn't show if you make a join or a filter. The Where clause is for filter, not for join !
And a big big point for explicit join : outer join are really annoying with implicit join. It is so hard to read when you want multiple join with outer join that explicit join are THE solution.
4- Execution plan are what you need (See the doc)
Some duplicates :
Explicit vs implicit SQL joins
SQL join: where clause vs. on clause
INNER JOIN ON vs WHERE clause

in the most code i've seen, those querys are done like your Type-II - but i think Type-I is better because of readability (and more logic - a join is a join, so you should write it as a join (althoug the second one is just another writing style for inner joins)).
in performance, there shouldn't be a difference (if there is one, i think the Type-I would be a bit faster).

Look at "Explain"-syntax
http://dev.mysql.com/doc/refman/5.1/en/explain.html

My suggestion.
Update all your tables with some amount of records. Access the MySQL console and run SQL both command one by one. You can see the time execution time in the console.

For the two queries you mentioned (each with only inner joins) any modern database's query optimizer should produce exactly the same query plan, and thus the same performance.
For MySQL, if you prefix the query with EXPLAIN, it will spit out information about the query plan (instead of running the query). If the information from both queries is the same, them the query plan is the same, and the performance will be identical. From the MySQL Reference Manual:
EXPLAIN returns a row of information
for each table used in the SELECT
statement. The tables are listed in
the output in the order that MySQL
would read them while processing the
query. MySQL resolves all joins using
a nested-loop join method. This means
that MySQL reads a row from the first
table, and then finds a matching row
in the second table, the third table,
and so on. When all tables are
processed, MySQL outputs the selected
columns and backtracks through the
table list until a table is found for
which there are more matching rows.
The next row is read from this table
and the process continues with the
next table.
When the EXTENDED keyword is used,
EXPLAIN produces extra information
that can be viewed by issuing a SHOW
WARNINGS statement following the
EXPLAIN statement. This information
displays how the optimizer qualifies
table and column names in the SELECT
statement, what the SELECT looks like
after the application of rewriting and
optimization rules, and possibly other
notes about the optimization process.
As to which syntax is better? That's up to you, but once you move beyond inner joins to outer joins, you'll need to use the newer syntax, since there's no standard for describing outer joins using the older implicit join syntax.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008