How to improve performance on large left outer join? - sql-server-2008

These are my tables:
Source_Artikelen - columns: article - description (1.438.171 records)
Source_LevArt - columns: article - manufacturer part number (1.751.801 records)
... and this is the query I'm performing
SELECT a.Artikel,a.Omschrijving, l.Artikel_Leverancier
FROM Source_Artikelen AS a
LEFT OUTER JOIN Source_LevArt AS l
ON a.Artikel Like l.Artikel
This query was running tonight for more than 20 hours before I cancelled it manually.
So what am I trying to do?
I want to list down all articles from my table Source_Artikelen. Then I would like to see if there are manufacturer part numbers available in Source_LevArt.
not every article from Source_Artikelen is present in Source_LevArt
sometimes there are multiple manufacturer part numbers in Source_LevArt for one article
That's why I need to use a LEFT OUTER JOIN.
I've tried some things with indexes, but it's not really helping. Possibly I'm doing something wrong.
I can really use some help, as this is only the beginning of the query I'm writing.
I will have to add 2 other (large) tabes as left outer join later...
UPDATE 19/12/2016 16:24:
Hi piet.t
SELECT TOP(20) a.Artikel,a.Omschrijving, l.Artikel_Leverancier
FROM Source_Artikelen AS a
LEFT JOIN Source_LevArt AS l
ON a.Artikel LIKE l.Artikel
this takes 9 seconds
SELECT TOP(20) a.Artikel,a.Omschrijving, l.Artikel_Leverancier
FROM Source_Artikelen AS a
LEFT JOIN Source_LevArt AS l
ON a.Artikel = l.Artikel
this takes 1 second!
I really didn't know there was a difference as I'm not using wildcards.

This is covered by Paul White here :Dynamic Seeks and Hidden Implicit Conversions
using like even when there is exact match tends to do a dynamic seek..which means knowing the column to be seeked at execution time,not at compilation time..
below is how .,column is derived for the tables in below example of mine..
[Expr1005] = Scalar Operator(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0)),
[Expr1006] = Scalar Operator(LikeRangeStart(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0))),
[Expr1007] = Scalar Operator(LikeRangeEnd(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0))),
[Expr1008] = Scalar Operator(LikeRangeInfo(CONVERT_IMPLICIT(varchar(12),[Aegon_X].[Sales].[Orders].[custid] as [o].[custid],0)))
below is what Paul describes ,how those are derived
The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart, LikeRangeEnd, and LikeRangeInfo.
The first two functions describe the range as an open interval. The third function returns a set of flags encoded in an integer, that are used internally to define certain seek properties for the Storage Engine. The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart and LikeRangeEnd, and the application of the residual predicate ‘LIKE #Like’.
So in summary ,using like SQL uses dynamic seek to derive seek properties at compile time ..
Examples below showing different plans
using like :
I really didn't know there was a difference as I'm not using wildcards.
select top 10* from sales.orders o
join
sales.customers c
on c.custid like o.custid
plan:
Now when using exact match..
select top 10* from sales.orders o
join
sales.customers c
on c.custid =o.custid
You can see merge join plan

Use = instead of like.
These 2 indexes should give you the best performance for a Select.
CREATE INDEX idx ON Source_Artikelen(Artikel) INCLUDE(Omschrijving);
CREATE INDEX idx ON Source_LevArt(Artikel) INCLUDE(Artikel_Leverancier);
If you implement them and try your SELECT again, can you please upload a copy of your execution plan?

Related

SQL transform id and add where statement before join

I am pretty new to SQL. Here is an operation I am sure is simple for a lot of you. I am trying to join two tables across databases on the same server – dbB and dbA, and TableA (with IdA) and TableB (with IdB) respectively. But before doing that I want to transform column IdA into a number, where I would like to remove the “:XYZ” character from its values and add a where statement for another column in dbA too. Below I show my code for the join but I am not sure how to convert the values of the column. This allows me to match idAwith idB in the join. Thanks a ton in advance.
Select replace(idA, “:XYZ”, "")
from dbA.TableA guid
where event like “%2015”
left join dbB.TableB own
on guid.idA = own.idB
Few things
FROM, Joins, WHERE (unless you use subqueries) syntax order it's also the order of execution (notice select isn't listed as its near the end in order of operation but first syntactically!)
alias/fully qualify columns when multiple tables are involved so we know what field comes from what table.
order of operations has the SQL doing the from and JOINS 1st thus what you do in the select isn't available (not in scope yet) for the compiler, this is why you can't use select column aliases in the from, where or even group by as well.
I don't like Select * usually but as I don't know what columns you really need... I used it here.
As far as where before the join. most SQL compilers anymore use cost based optimization and figure out the best execution plan given your data tables and what not involved. So just put the limiting criteria in the where in this case since it's limiting the left table of the left join. If you needed to limit data on the right table of a left join, you'd put the limit on the join criteria; thus allowing it to filter as it joins.
probably need to cast IDA as integer (or to the same type as IDB) I used trim to eliminate spaces but if there are other non-display characters, you'd have issues with the left join matching)
.
SELECT guild.*, own.*
FROM dbA.TableA guid
LEFT JOIN dbB.TableB own
on cast(trim(replace(guid.idA, ':XYZ', '')) as int) = own.idB
WHERE guid.event like '%2015'
Or materialize the transformation first by using a subquery so IDA in its transformed state before the join (like algebra ()'s matter and get processed inside out)
SELECT *
FROM (SELECT cast(trim(replace(guid.idA, ':XYZ', '')) as int) as idA
FROM dbA.TableA guid
WHERE guid.event like '%2015') B
LEFT JOIN dbB.TableB own
on B.IDA = own.idB

Display default record when query dont have any rows

I am trying to display a default record in a simple query but my attempt doesn't work:
SELECT
COALESCE(suppliers.supplier_name, 'No records') AS supplier_name
FROM suppliers
LEFT JOIN suppliers_purchases USING(supplier_id)
LEFT JOIN suppliers_purchases_articles USING(supplierpurchase_id)
WHERE suppliers_purchases_articles.article_id = 150
ORDER BY suppliers_purchases.supplierpurchase_id DESC
LIMIT 1
As the query returns no rows the coalesce never kicks in - there's no value to act on, let alone NULL.
While technically it is possible to solve your problem in SQL, it would become an awfully large, ugly, unmaintainable piece of SQL. This is because you are trying to solve an issue in SQL that it was never meant to do - a display problem. SQL is meant to control absolute and strict data sets, not default to informational messages based on the lack of a result set. No records is not the name of any supplier in your database, so don't list it as one.
Long story short: don't solve presentational issues in your data layer. Your front end code should handle the lack of results and fall back to properly displaying No records instead, where it's localizable, controllable, and expected by the developer after you.
While I agree this is a presentation logic issue, I have come across times where I had to control it from the database as I couldn't alter the UI.
If that is the case, you have a couple different options. One of them is to introduce a new temporary table and use another outer join:
SELECT
COALESCE(suppliers.supplier_name, 'No records') AS supplier_name
FROM (SELECT 1 as FakeCol) t
LEFT JOIN suppliers ON suppliers_purchases_articles.article_id = 150
LEFT JOIN suppliers_purchases USING(supplier_id)
LEFT JOIN suppliers_purchases_articles USING(supplierpurchase_id)
ORDER BY suppliers_purchases.supplierpurchase_id DESC
LIMIT 1
Condensed Fiddle Demo
Note I've moved the where criteria to the join. This isn't completely necessary, I just prefer the way it reads as such. If you have to leave where criteria, you don't want to negate your outer join, so you'll need to add corresponding is null checks as well.

Conditionals in WHEREs or JOINs?

Lets say I have the following query:
SELECT occurs.*, events.*
FROM occurs
INNER JOIN events ON (events.event_id = occurs.event_id)
WHERE event.event_state = 'visible'
Another way to do the same query and get the same results would be:
SELECT occurs.*, events.*
FROM occurs
INNER JOIN events ON (events.event_id = occurs.event_id
AND event.event_state = 'visible')
My question. Is there a real difference? Is one way faster than the other? Why would I choose one way over the other?
For an INNER JOIN, there's no conceptual difference between putting a condition in ON and in WHERE. It's a common practice to use ON for conditions that connect a key in one table to a foreign key in another table, such as your event_id, so that other people maintaining your code can see how the tables relate.
If you suspect that your database engine is mis-optimizing a query plan, you can try it both ways. Make sure to time the query several times to isolate the effect of caching, and make sure to run ANALYZE TABLE occurs and ANALYZE TABLE events to provide more info to the optimizer about the distribution of keys. If you do find a difference, have the database engine EXPLAIN the query plans it generates. If there's a gross mis-optimization, you can create an Oracle account and file a feature request against MySQL to optimize a particular query better.
But for a LEFT JOIN, there's a big difference. A LEFT JOIN is often used to add details from a separate table if the details exist or return the rows without details if they do not. This query will return result rows with NULL values for b.* if no row of b matches both conditions:
SELECT a.*, b.*
FROM a
LEFT JOIN b
ON (condition_one
AND condition_two)
WHERE condition_three
Whereas this one will completely omit results that do not match condition_two:
SELECT a.*, b.*
FROM a
LEFT JOIN b ON some_condition
WHERE condition_two
AND condition_three
Code in this answer is dual licensed: CC BY-SA 3.0 or the MIT License as published by OSI.

Select taking too long. Need advice for a better performance

Ok, here we go. There's this messy SELECT crossing other tables and ordering to get the one desired row. Basically I do the "math" inside the ORDER BY.
1 base table.
7 JOINS poiting to local tables.
WHERE with 2 clauses and a NOT IN crossing another table.
You'll see in the code the ORDER BY is pretty damn big/ugly, it sums the result of 5 different calculations. I need that result to order by those calculations in order to get the worst row-case.
The problem is once I execute the Stored Procedure it takes up to 8 seconds to run. That's kind of non-acceptable. So, I'm starting to check Indexes.
So, I'm looking for advices on how to make this query run faster.
I'm indexing the WHERE clauses and the field LINEA, Should I index something else? Like the rows Im crossing for the JOINs? or should I approach the query differently?
Query:
SET #LINEA = (
SELECT TOP 1
BOA.LIN
FROM
BAND_BA BOA
LEFT JOIN
TEL PAR
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(PAR.Te,2,10)
LEFT JOIN
TELP CLP
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(CLP.Numtel,2,10)
LEFT JOIN
CA C
ON REPLACE(BOA.Lin,'-','') = C.An
LEFT JOIN
RE R
ON REPLACE(BOA.Lin,'-','') = R.Lin
LEFT JOIN
PRODUCTOS2 P2
ON BOA.PRODUCTO = P2.codigo
LEFT JOIN
EN
ON REPLACE(BOA.Lin,'-','') = EN.G
LEFT JOIN
TIP ID
ON TIPID = ID.ID
WHERE
BOA.EST = 'C' AND
ID.SE = 'boA' AND
BOA.LIN NOT IN (
SELECT
LIN
FROM
BAN
)
ORDER BY (EN.VALUE + ANT.VALUE + REIT.VAL + C.VALUE + TEL.VALUE
) DESC,
I'll be frank, this is some pretty terrible SQL. Without seeing all your table structures, advice here will be incomplete. That being said, please don't post all your table structures because you are already very close to "hire a consultant" territory with this.
All the REPLACE logic should be done away with. If you need to JOIN on these fields, then add comparable fields to the tables so you don't need to manipulate the data. Every single JOIN that uses a REPLACE or SUBSTRING is a table or index scan - those are non-SARGable and a definite anti-pattern.
The ORDER BY is probably the most convoluted ORDER BY I have ever seen. Some major issues there:
Subqueries should all be eliminated and materialized either in the outer query or as variables
String manipulation should be eliminated (see item 1 above)
The entire query is basically a code smell. If you need to write code like this to meet business requirements then you either have a terribly inappropriate design or some other much larger issue in the organization or data.
One thing that can kill performance is using a lot of LEFT JOINs. To improve performance of LEFT JOIN, you might want to make sure that the column(s) to which you join have an index - that can have a huge impact on performance.

MySQL -- joining then joining then joining again

MySQL setup: step by step.
programs -> linked to --> speakers (by program_id)
At this point, it's easy for me to query all the data:
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
Nice and easy.
The trick for me is this. My speakers table is also linked to a third table, "books." So in the "speakers" table, I have "book_id" and in the "books" table, the book_id is linked to a name.
I've tried this (including a WHERE you'll notice):
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
No results.
My questions:
What am I doing wrong?
What's the most efficient way to make this query?
Basically, I want to get back all the programs data and the books data, but instead of the book_id, I need it to come back as the book name (from the 3rd table).
Thanks in advance for your help.
UPDATE:
(rather than opening a brand new question)
The left join worked for me. However, I have a new problem. Multiple books can be assigned to a single speaker.
Using the left join, returns two rows!! What do I need to add to return only a single row, but separate the two books.
is there any chance that the books table doesn't have any matching columns for speakers.book_id?
Try using a left join which will still return the program/speaker combinations, even if there are no matches in books.
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
Btw, could you post the table schemas for all tables involved, and exactly what output (or reasonable representation) you'd expect to get?
Edit: Response to op author comment
you can use group by and group_concat to put all the books on one row.
e.g.
SELECT speakers.speaker_id,
speakers.speaker_name,
programs.program_id,
programs.program_name,
group_concat(books.book_name)
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
GROUP BY speakers.id
LIMIT 5
Note: since I don't know the exact column names, these may be off
That's typically efficient. There is some kind of assumption you are making that isn't true. Do your speakers have books assigned? If they don't that last JOIN should be a LEFT JOIN.
This kind of query is typically pretty efficient, since you almost certainly have primary keys as indexes. The main issue would be whether your indexes are covering (which is more likely to occur if you don't use SELECT *, but instead select only the columns you need).