I am trying to query data from two tables into one tables using OUTER JOIN. The thing is that to uniquely identify the rows, three fields are needed. This brings me to query containing this expression:
FROM Data1 DB
RIGHT OUTER JOIN Data2 FT on (DB.field1 = FT.Value1
and DB.field2 = FT.field2
and DB.field3 = FT.field3)
However, the query runs for pretty much forever. To test the whole thing I used WHERE conditions and FULL OUTER JOIN and in the case of WHERE conditions it is done almost instantly whereas using the FULL OUTER JOIN I had the same trouble and usually ended up cancelling the whole thing after 5 minutes or so.
Can anyone see what I am doing wrong with my query? Thanks for any help!
Do you really need all the records back from the query? Some WHERE criteria could cut execution time down considerably.
Yes, and indexes. Check the plan and create recomended indexes.
Your best bet is to view the execution plan (and if you are comfortable with it, post a screenshot of it in your question). That'll tell you where the most expensive portion of the query is happening.
Related
I have a view and table value function, and make inner join between them. There are few millions records from each side that should be joined.
I have read a lot of how to optimize joins (the most common answer is to use indexes, but my view can not be indexed view) but still did not find a way of how join between view and function should be done in a correct way and optimized.
EDIT:
To show that the problem is the inner join indeed - I made "SELECT COUNT(*)" for each query:
View - 0 seconds
Function - 18 seconds
Function inner join View - 42 seconds
Let's take a look from the other point of view - do you really need this join on such huge amount of data?
I have the similar issue and this is my situation and solution :
There was no needs in all of millions records in my case - just a specific filtered data
I've created triggers for insert\update\delete from that huge table to some other - they aimed to track data I need
Now I can make fast joins on that specific (let's say - filtered) data
This approach has his own proc and cons; main inconvenience - you should rebuild\review your DB structure: that way maybe not suitable in all cases
Anyway - this is my solution of my particular issue, and it still works fine, I've increased performance at least in 10 times
I have a problem with a relatively simple query and the execution plan Access choose for it.
The query is of this form
SELECT somethings
FROM A INNER JOIN (B INNER JOIN (C INNER JOIN D ON ...) ON ...) ON ...
WHERE A.primaryKey= 1 AND D.d = 2;
C and D have relatively few rows. A and B have a few thousands rows.
The query, which returns 2 rows (not sure if this is pertinent) is really slow. It runs in 17 seconds. If I remove the AND D.d = 2 part of the where clause, the query now returns 4 rows and run instantly.
So my understanding is that the JET engine could run the query without the filter on D.d instantly, then execute the said filter instantly (only 4 rows to filter). Therefor it should not be too much longer to run the query with the D.d = 2 filter.
I tried to create a sub query without the filter and include this in another query that would just filter the result, but it's still slow. My guess is that the JET engine is smart enough to "flatten" the sub-queries so the result is the same.
Since I was unable to make the query run as I wished I used the JETSHOWPLAN thingy so that Access would output it's execution plan. Here is what I found:
For the fast query (the one without D.d = 2) the first step of the query plan is to apply the A.primaryKey = 1 filter on the A table. This result in a data set of 1 row out of more than 30000. Then the joins seems to be executed from A to D using index with a data set that never goes over 4 rows.
The slow query seems to be executed in the revers order. D and C are joined first then D.d = 2 is tested. After that, the joins from C to A are executed. By doing this this way the data that needs to be joined from D to C, from C to B and from B to A is much larger. When all the JOIN are executed and before A.primaryKey=1 is executed the data set will have 120K rows.
Is there a way I could force the right query plan on Access?
I hope I was clear. Let me know if I should post the query plans. I did not because they are quite large.
Thanks in advance,
mp
Do it in VBA code? The idea would be to take out the part that's slow and execute the fast-returning query, then append the slow part in sql.
db.execute "select * from qryFast inner join d on qryfast.dkey = d.d where d.d = 2
No, VBA code in a module is different from a sub-query. #HansUp has clarified for us that executing the code in one step, as I've shown above, won't improve the performance. You should be able to get the results in memory quickly, if you're familiar with writing code in modules, but then getting the output where you need it to go might slow you down more.
in other words, you should be able to get the results of qryFast into a recordset in memory quickly, and then apply a filter on qryFast.dkey = d, and also get a recordset quickly from 'select * from tableD where d=2' to look up the related info you want from tableD, but getting all that stuff out of memory and to a place where your front-end can access it might take longer than the 17 seconds they're waiting now.
In fact, it might kick it in the pants enough if you change qryFast to include a condition where dkey = 2 (or whatever the pk is on tableD)
another idea: have 3 queries, qryFast, qryD, and qryFastWithD joining the two. I'm just tossing out ideas, here.
or, as you say in your comments, try containing different parts of the query in sub-queries, but I would think the optimizer wouldn't be fooled by such a trick, if moving a piece of it into a sub-query didn't work.
By all means, whatever works, take it.
I finally got it to work by mixing things up until the query planner agreed with me. I isolated the "A.primaryKey= 1" in a sub-query to ensure it's executed before A is joined to B. It's something like this :
SELECT ...
FROM (SELECT ... FROM A WHERE a.primaryKey=1) AS qryA
INNER JOIN B ...
WHERE D.d = 2;
I used to write select clause in side select clause to avoid joins in from clause. But I am afraid that is it a good coading practice or it will degrade database performance. Below is the query which contains multiple tables but I have written it using nested select clause without any join statement. Please let me know if I am making any mistake or it is ok. At this moment, I am getting accurate result.
SELECT * ,
(select POrderNo from PurchaseOrderMST POM
where POM.POrderID=CET.POrderID)as POrderNo,
(select SiteName from SiteTRS ST where ST.SiteID=CET.SiteID)as SiteName,
(select ParticularName from ParticularMST PM where
PM.ParticularID=CET.ParticularID)as ParticulerName
FROM ClaimExpenseTRS CET
WHERE ClaimID=#ClaimID
I'd use joins for this because it is best practice to do so and will be better for the query optimizer.
But for the learning just try to execute the script with join and without and see what happens on the query plan and the execution time. Usually this answers your questions right away.
Your solution is just fine.
As long as you are only using 1 column for each "joined" table, and has no multiple matching rows, it is fine. In some cases, even better than joining.
(the db engine could anytime change the direction of a join, if you are not using tricks to force a given direction, which could cause performance suprises. It is called query optimiyation, but as far as you really know your database, you should be the one to decide how the query should run).
I think you should JOIN indeed.
Now your creating your own JOIN with where and select statements.
Performance wise, what is better?
If I have 3 or 4 join statements in my query or use embedded select statements to pull the same information from my database as part of one query?
I would say joins are better because:
They are easier to read.
You have more control over whether you want to do an inner, left/right outer join or full outer join
join statements cannot be so easily abused to create query abominations
with joins it is easier for the query optimizer to create a fast query (if the inner select is simple, it might work out the same, but with more complicated stuff joins will work better).
embedded select's can only simulate left/right outer join.
Sometimes you cannot do stuff using joins, in that case (and only then) you'll have to fall back on an inner select.
It rather depends on your database: sizes of tables particularly, but also the memory parameters and sometimes even how the tables are indexed.
On less than current versions of MySQL, there was a real possibility of a query with a sub-select being considerably slower than a query that would return the same results structured with a join. (In the MySQL 4.1 days, I have seen the difference to be greater than an order of magnitude.) As a result, I prefer to build queries with joins.
That said, there are some types of queries that are extremely difficult to build with a join and a sub-select is the only way to really do it.
Assuming the database engine does absolutely no optimization, I would say it depends on how consistent you need your data to be. If you're doing multiple SELECT statements on a busy database, where the data you are looking at may change rapidly, you may run into issues where your data does not match up, between queries.
Assuming your data contains no inter-dependencies, then multiple queries will work fine. However, if your data requires consistency, use a single query.
This viewpoint boils down to keeping your data transactionally safe. Consider the situation where you have to pull a total of all accounts receivable, which is kept in a separate table from the monetary transaction amounts. If someone were to add another transaction in between your two queries, the accounts receivable total would not match the sum of the transaction amounts.
Most databases will optimize both queries below into the same plan, so whether you do:
select A.a1, B.b1 from A left outer join B on A.id = B.a_id
or
select A.a1, (select B.b1 from B where B.a_id = A.id) as b1 from A
It ends up being the same. However, in most cases for non-trivial queries you'd better stick with joins whenever you can, especially since some types of joins (such as an inner join) are not possible to achieve using sub-selects.
As a more general case of this question because I think it may be of interest to more people...What's the best way to perform a fulltext search on two tables? Assume there are three tables, one for programs (with submitter_id) and one each for tags and descriptions with object_id: foreign keys referring to records in programs. We want the submitter_id of programs with certain text in their tags OR descriptions. We have to use MATCH AGAINST for reasons that I won't go into here. Don't get hung up on that aspect.
programs
id
submitter_id
tags_programs
object_id
text
descriptions_programs
object_id
text
The following works and executes in a 20ms or so:
SELECT p.submitter_id
FROM programs p
WHERE p.id IN
(SELECT t.object_id
FROM titles_programs t
WHERE MATCH (t.text) AGAINST ('china')
UNION ALL
SELECT d.object_id
FROM descriptions_programs d
WHERE MATCH (d.text) AGAINST ('china'))
but I tried to rewrite this as a JOIN as follows and it runs for a very long time. I have to kill it after 60 seconds.
SELECT p.id
FROM descriptions_programs d, tags_programs t, programs p
WHERE (d.object_id=p.id AND MATCH (d.text) AGAINST ('china'))
OR (t.object_id=p.id AND MATCH (t.text) AGAINST ('china'))
Just out of curiosity I replaced the OR with AND. That also runs in s few milliseconds, but it's not what I need. What's wrong with the above second query? I can live with the UNION and subselects, but I'd like to understand.
Join after the filters (e.g. join the results), don't try to join and then filter.
The reason is that you lose use of your fulltext index.
Clarification in response to the comment: I'm using the word join generically here, not as JOIN but as a synonym for merge or combine.
I'm essentially saying you should use the first (faster) query, or something like it. The reason it's faster is that each of the subqueries is sufficiently uncluttered that the db can use that table's full text index to do the select very quickly. Joining the two (presumably much smaller) result sets (with UNION) is also fast. This means the whole thing is fast.
The slow version winds up walking through lots of data testing it to see if it's what you want, rather than quickly winnowing the data down and only searching through rows you are likely to actually want.
Just in case you don't know: MySQL has a built in statement called EXPLAIN that can be used to see what's going on under the surface. There's a lot of articles about this, so I won't be going into any detail, but for each table it provides an estimate for the number of rows it will need to process. If you look at the "rows" column in the EXPLAIN result for the second query you'll probably see that the number of rows is quite large, and certainly a lot larger than from the first one.
The net is full of warnings about using subqueries in MySQL, but it turns out that many times the developer is smarter than the MySQL optimizer. Filtering results in some manner before joining can cause major performance boosts in many cases.
If you join both tables you end up having lots of records to inspect. Just as an example, if both tables have 100,000 records, fully joining them give you with 10,000,000,000 records (10 billion!).
If you change the OR by AND, then you allow the engine to filter out all records from table descriptions_programs which doesn't match 'china', and only then joining with titles_programs.
Anyway, that's not what you need, so I'd recommend sticking to the UNION way.
The union is the proper way to go. The join will pull in both full text indexes at once and can multiple the number of checks actually preformed.