Informatica: How to make a condition for two tables joining execution - data-analysis

Actually, there are total 4 tables invoked in this mapping: Market,Cost, A, B,
Read_sourceTB_B-----FIL1------->---------JNR4 \
| | |
| Read_sourceTB_Market--\ | |
| Read_sourceTB_Cost------JNR1--\ | |
| Read_sourceTB_A-----------------JNR2 JNR5--->EXP... -->TGT
| | | |
| | | |
| | | |
---------------------FIL2->---------JNR3 /
SQ_TABLEB --FIL1-> -- JNR1 \
| | |
| SQ_TABLEA --| JNR3-->EXP.... -->TGT
| | |
|--FIL2-> -- JNR2 /
**First **joinning condition
A LEFT JOIN B
ON A.MEMBERSHIPID = B.MEMBERSHIPID
Where B.System_Code='University'
IF <First joinning condition> failed, then execute
**Second **joinning condition
A LEFT JOIN B ON
A.address = B.address and A.phonenumber = B.phonenumber
Where B.System_Code='Policy'
Which transformation should I use? I don't know how to use Informatica, my version is Informatica Developer 10.5, please help me.Thanks!
I only know how to
A left join B on `condition` `System_Code='University'`
left join B on `condition` `System_Code='Policy'`
but I don't know how to make a decision for
if A join B System_Code='University'failed,
then A join B System_Code='Policy'

You need to join A with B (twice) based on two different condition and then join them back to one single pipeline for a decision/if-else condition.
Also please note, all your left joins are actually inner join because you are using B.xxx='something' condition in the where clause.
So, considering above problem -
After source qualified of B, add two filters FIL1(system_Code='University') and FIL2(System_Code='Policy') in parallel.
Then use JNR1 to join A and B(FIL1) using JOINER on A.MEMBERSHIPID = B_F1.MEMBERSHIPID. Use A as detail table and use 'inner join'.
Then join A and B(FIL2) using JOINER(JNR2) on A.address = B_F2.address and A.phonenumber = B_F2.phonenumber. Use A as detail table and use 'inner join'.
Then join above two pipelines into one single pipeline using another Joiner(JNR3). It should be normal join and join should be primary key from table A. Get all required columns.
(EXP)Then use an expression transformation. Use logic similar to below.
out_col1 = IIF( isnull(col_tableB_F1_jnr1),col_tableB_F2_jnr2, col_tableB_F1_jnr1)
Whole mapping should look like this -
SQ_TABLEB --FIL1-> -- JNR1 \
| | |
| SQ_TABLEA --| JNR3-->EXP.... -->TGT
| | |
|--FIL2-> -- JNR2 /
But i think your requirement may be like this -
A LEFT JOIN B
ON A.MEMBERSHIPID = B.MEMBERSHIPID AND B.System_Code='University'
if yes, then change the inner join to master outer join in the JNR1 and JNR2.

Related

LEFT JOIN to INNER JOIN multiple tables [duplicate]

This question already has answers here:
LEFT JOIN query not returning all rows in first table
(1 answer)
Left Outer Join doesn't return all rows from my left table?
(3 answers)
Closed 3 years ago.
After a day of digging around trying to get this query to work, I've had to resort to asking for help. This is my first venture into JOINs so please treat me gently ;)
I've got a query producing a timetable based on data across 6 tables.
Database relationship diagram
My query is:
SELECT
course.CourseName,
course.CourseID,
timetablepaeriods.PeriodName,
subject.SubjectName,
Subject.SubjectColour,
Room.RoomName
FROM
TimetablePeriods
LEFT JOIN Timetable ON
TimetablePeriods.PeriodID = Timetable.Period_ID
INNER JOIN Course ON
Timetable.Course_ID = Course.CourseID
INNER JOIN Subject ON
Course.Subject_ID = Subject.SubjectID
INNER JOIN CourseMembership ON
CourseMembership.Course_ID = Course.CourseID
INNER JOIN Room ON
Timetable.Room_ID = Room.RoomID
WHERE CourseMembership.Student_ID = 123
ORDER BY TimetablePeriods.SortOrder ASC
This is returning all of the results that match but not the rows where there is a value in TimetablePeriods but nothing else.
CourseName | CourseID | PeriodName | SubjectName | etc . . .
-----------|----------|------------|-------------|
y7Ma3 | 19 | MonP1 | Maths |
y7Hist4 | 16 | MonP2 | History |
y7Geog1 | 30 | MonP3 | Geography |
y7Eng3 | 28 | MonP5 | English |
I was expecting to get a row with blank values for MonP4. This exists in the database and if I run the same query against a student who has a blank against MonP5 it skips that instead.
As I said at the top this is my first attempt at using the JOIN statement if theres a better way of approaching this I'd love to hear it.
Thanks in advance for any help.
As explained by #Madhur Bhaiya the WHERE statement in my original query was changing everything to an INNER JOIN
My solution
SELECT
r.CourseName,
r.CourseID,
r.SubjectName,
r.SubjectColour,
r.RoomName,
TimetablePeriods.PeriodName
FROM
(SELECT
Course.CourseName,
Course.CoureID,
Subject.SubjectName,
Subject.Colour,
Room.RoomName,
Timetable.Period_ID
FROM
Course,
Timetable,
Subject,
CourseMembership,
Room
WHERE
Course.CourseID = Timetable.Course_ID AND
Course.Subject_ID = Subject.SubjectID AND
Timetable.Room_ID = Room.Room_ID AND
CourseMembership.Course_ID = Course.CourseID AND
CourseMembership.Student_ID = 123) r
RIGHT JOIN TimetablePeriods ON
TimetablePeriods.PeriodID = r.Period_ID
ORDER BY TimetablePeriods.SortOrder ASC

How to optimize query with many joins?

I have simple but long query which count the content of the result it takes about 14 seconds. the count itself on the main table takes less than a second but after multiple join the delay is too high as follow
Select Count(Distinct visits.id) As Count_id
From visits
Left Join clients_locations ON visits.client_location_id = clients_locations.id
Left Join clients ON clients_locations.client_id = clients.id
Left Join locations ON clients_locations.location_id = locations.id
Left Join users ON visits.user_id = users.id
Left Join potentialities ON clients_locations.potentiality = potentialities.id
Left Join classes ON clients_locations.class = classes.id
Left Join professions ON clients.profession_id = professions.id
Inner Join specialties ON clients.specialty_id = specialties.id
Left Join districts ON locations.district_id = districts.id
Left Join provinces ON districts.province_id = provinces.id
Left Join locations_types ON locations.location_type_id = locations_types.id
Left Join areas ON clients_locations.area_id = areas.id
Left Join calls ON calls.visit_id = visits.id
The output of explain is
+---+---+---+---+---+---+---+---+---+---+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+---+---+---+---+---+---+---+---+---+---+
| 1 | SIMPLE | specialties | index | PRIMARY | specialty_name | 52 | NULL | 53 | Using index |
| 1 | SIMPLE | clients | ref | PRIMARY,specialty | specialty | 4 | crm_db.specialties.id | 143 | |
| 1 | SIMPLE | clients_locations | ref | PRIMARY,client_id | client_id | 4 | crm_db.clients.id | 1 | |
| 1 | SIMPLE | locations | eq_ref | PRIMARY | PRIMARY | 4 | crm_db.clients_locations.location_id | 1 | |
| 1 | SIMPLE | districts | eq_ref | PRIMARY | PRIMARY | 4 | crm_db.locations.district_id | 1 | Using where |
| 1 | SIMPLE | visits | ref | unique_visit,client_location_id | unique_visit | 4 | crm_db.clients_locations.id | 4 | Using index |
| 1 | SIMPLE | calls | ref | call_unique,visit_id | call_unique | 4 | crm_db.visits.id | 1 | Using index |
+---+---+---+---+---+---+---+---+---+---+
Update 1
The above query used with dynamic where statement $sql = $sql . "Where ". $whereFilter but the i submitted it in simple form . So do not consider the answer just eleminate the joins :)
Update 2
Here is example of dynamic filtering
$temp = $this->province_id;
if ($temp != null) {
$whereFilter = $whereFilter . " and provinces.id In ($temp) ";
}
But in startup case which is our case no where statement
Left joins always return a row from the first table, but may return multiple rows if there are multiple matching rows. But because you are counting distinct visit rows, left joining to another table while counting distinct visits is the same as just counting the rows of visits. Thus the only joins that affect the result are inner joins, so you can remove all "completely" left joined tables without affecting the result.
What I mean by "completely" is that some left joined tables are effectively inner joined; the inner join to specialty requires the join to clients to succeed and thus also be an inner join, which in turn requires the join to clients_locations to succeed and thus also be an inner join.
Your query (as posted) can be reduced to:
Select Count(Distinct visits.id) As Count_id
From visits
Join clients_locations ON visits.client_location_id = clients_locations.id
Join clients ON clients_locations.client_id = clients.id
Join specialties ON clients.specialty_id = specialties.id
Removing all those unnecessary joins will however greatly improve the runtime of your query, not only because there are less joins to make but also because the resulting rowset size could be enormous when you consider that the size is the product of the matches in all the tables (not the sum.
For maximum performance, create a covering indexes on all id-and-fk columns:
create index visits_id_client_location_id on visits(id, client_location_id);
create index clients_locations_id_client_id on clients_locations(id, client_id);
create index clients_id_specialty_id on clients(id, specialty_id);
so index-only scans can be used where possible. I assume there are indexes on the PK columns.
You don't seem to have any (or much) intentional filtering. If you want to know the number of visits referred to in calls, I would propose:
select count(distinct c.visit_id)
from calls c;
in order to optimize the whole process you can dynamically construct the pre-where SQL according to the filters you are going to apply. Like:
// base select and left join
$preSQL = "Select Count(Distinct visits.id) As Count_id From visits ";
$preSQL .= "Left Join clients_locations ON visits.client_location_id = clients_locations.id ";
// filtering by province_id
$temp = $this->province_id;
if ($temp != null) {
$preSQL .= "Left Join locations ON clients_locations.location_id = locations.id ";
$preSQL .= "Left Join districts ON locations.district_id = districts.id ";
$preSQL .= "Left Join provinces ON districts.province_id = provinces.id ";
$whereFilter = "provinces.id In ($temp) ";
}
$sql = $preSQL . "Where ". $whereFilter;
// ...
If you are using multiple filters you can put all inner/left-join strings in an array and then after analysing the request, you can construct your $preSQL using the minimum of joins.
Use COUNT(CASE WHEN visit_id!="" THEN 1 END) as visit.
Hope this will help
Isn't it just:
SELECT COUNT(id)
FROM visits
because all the left outer joins also return a visits.id when theres no matching clients, ..., calls and id's ought to be unique?
Different hint: The one inner join also is only effective when a client exists. Generally when needing inner joins they must be put as high/near as possible to the source table, so in your example it would have been best in the line after "left join clients".
I didn't understand too much your idea, specially your INNER JOIN that will tranform some LEFT in INNER JOINs, it seems strange, but lets try a solution:
Usually the LEFT JOINs has a very bad performance, and I think you'll need them only if you'll use them in WHERE clause, then you can include them with INNER JOIN only if you'll use them.
For example:
$query = "Select Count(Distinct visits.id) As Count_id From visits ";
if($temp != null){
$query .= " INNER JOIN clients_locations ON visits.client_location_id = clients_locations.id ";
$query .= " INNER JOIN locations ON clients_locations.location_id = locations.id ";
$query .= " INNER JOIN locations ON clients_locations.location_id = locations.id ";
$query .= " INNER JOIN districts ON locations.district_id = districts.id "
$query .= " INNER JOIN provinces ON districts.province_id = provinces.id ";
$whereFilter .= " and provinces.id In ($temp) ";
}
I think it'll help your performance and it'll works as you need.

MySQL left join with default values

I have a couple of tables, one with source data which I'll call SourceData and another which defines overridden values for a given user if they exist called OverriddenSourceData.
The basic table format looks something this like:
SourceData
| source_id | payload |
--------------------------------
| 1 | 'some json' |
| 2 | 'some more json' |
--------------------------------
OverriddenSourceData
| id | source_id | user_id | overrides
| 1 | 2 | 4 | 'a change' |
------------------------------------------
For a given user, I'd like to return all the Source data rows with the overrides column included. If the user has overridden the source then the column is populated, else it is null.
I started by executing a left join and then including a condition for checking the user like so:
SELECT A.source_id, A.payload, B.overrides from SourceData A
LEFT JOIN OverriddenSourceData B
ON A.source_id = B.source_id
WHERE user_id = 4
but then source rows that weren't overridden wouldn't be included ( it was acting like an inner join) (e.g source id 1)
I then relaxed the query and used a strict left join on source_id.
SELECT A.source_id, A.payload, B.overrides from SourceData A
LEFT JOIN OverriddenSourceData B
ON A.source_id = B.source_id
# WHERE user_id = 4
This can return more data than I need though (e.g other users who have overridden the same source data) and then I have to filter programatically.
It seems like I should be able to craft a query that does this all the DB level and gives me what I need. Any help?
You should add your condition on LEFT JOIN clause, if you use WHERE, mysql will do it with INNER JOIN, so try this;)
SELECT A.source_id, A.payload, B.overrides from SourceData A
LEFT JOIN OverriddenSourceData B
ON A.source_id = B.source_id
AND B.user_id = 4

two inner join in one sql statement

What's wrong here ? i just want to display all the item in item_tb with 2 different group , vicma and branch but it returns nothing. It only works in one inner join but when i join the other one it display nothing.
|-------------|-------------------------|---------------|
|item_tb | vicma_tb | branch_tb |
| | vID - PK | id-PK |
|branchID-FK | | |
|vicma - FK | | |
|-------------|-------------------------|---------------|
$sql = "
SELECT item_tb.*
, branch_tb.*
, vicma_tb.*
from item_tb
JOIN branch_tb
on item_tb.branchID = branch_tb.id
JOIN vicma_tb
on item_tb.vicma = vicma_tb.vID ";
Seems like you need to do a LEFT JOIN instead of INNER JOIN. LEFT JOIN will return all values from your original table and NULL if there is no match. Try:
SELECT item_tb.*, branch_tb.* , vicma_tb.* from item_tb
LEFT JOIN branch_tb on item_tb.branchID = branch_tb.id
LEFT JOIN vicma_tb on item_tb.vicma = vicma_tb.vID

How would I execute this complex conditional multi-table MySQL join (queries provided)?

Ok I have a few tables tables. I am only showing relevant fields:
items:
----------------------------------------------------------------
name | owner_id | location_id | cab_id | description |
----------------------------------------------------------------
itm_A | 11 | 23 | 100 | Blah |
----------------------------------------------------------------
.
.
.
users:
-------------------------
id | name |
-------------------------
11 | John |
-------------------------
.
.
.
locations
-------------------------
id | name |
-------------------------
23 | Seattle |
-------------------------
.
.
.
cabs
id | location_id | name
-----------------------------------
100 | 23 | Cool |
-----------------------------------
101 | 24 | Cool |
-----------------------------------
102 | 24 |thecab |
-----------------------------------
I am trying to SELECT all items (and their owner info) that are from Seattle OR Denver, but if they are in Seattle they can only be in the cab NAMED Cool and if they are in Denver they can only be in the cab named 'thecab' (not Denver AND cool).
This query doesn't work but I hope it explains what I am trying to accomplish:
SELECT DISTINCT
`item`.`name`,
`item`.`owner_id`,
`item`.`description`,
`user`.`name`,
IF(`loc`.`name` = 'Seattle' AND `cab`.`name` = 'Cool',1,0) AS `cab_test_1`,
IF(`loc`.`name` = 'Denver' AND `cab`.`name` = 'thecab',1,0) AS `cab_test_2`,
FROM `items` AS `item`
LEFT JOIN `users` AS `user` ON `item`.`owner_id` = `user`.`id`
LEFT JOIN `locations` AS `loc` ON `item`.`location_id` = `loc`.`location_id`
LEFT JOIN `cabs` AS `cab` ON `item`.`cab_id` = `cabs`.`id`
WHERE (`loc`.`name` IN ("Seattle","Denver")) AND `cab_test_1` = 1 AND `cab_test_2` = 1
I'd rather get rid of the IFs is possible. It seems inefficent, looks clunky, and is not scalable if I have a lot of location\name pairs
Try this:
SELECT DISTINCT
item.name,
item.owner_id,
item.description,
user.name
FROM items AS item
LEFT JOIN users AS user ON item.owner_id = user.id
LEFT JOIN locations AS loc ON item.location_id = loc.id
LEFT JOIN cabs AS cab ON item.cab_id = cabs.id
WHERE ((loc.name = 'Seattle' AND cab.name = 'Cool')
OR (loc.name = 'Denver' AND cab.name = 'thecab'))
My first thought is to store the pairs of locations and cab names in a separate table. Well not quite a table, but a derived table generated by a subquery.
You still have the problem of pivoting the test results into separate columns. The code can be simplified by making use of mysql boolean expressions, which get rid of the need for a case or if.
So, the approach is to use the same joins you have (although left join is not needed because the comparison on cab.name turns them in to inner joins). Then add a table of the pairs you are looking for, along with the "test name" for the pair. The final step is an explicit group by and a check whether conditions are met for each test:
SELECT i.`name`, i.`owner_id`, i.`description`, u.`name`,
max(pairs.test_name = 'test_1') as cab_test_1,
max(pairs.test_name = 'test_2') as cab_test_2
FROM `items` i LEFT JOIN
`users` u
ON i.`owner_id` = u.`id` LEFT JOIN
`locations` l`
ON i.`location_id` = l.`location_id` left join
`cabs` c
ON i.`cab_id` = c.`id` join
(select 'test_1' as testname, 'Seattle' as loc, 'cool' as cabname union all
select 'test_2', 'Denver', 'thecab'
) pairs
on l.name = pairs.name and
l.cabname = c.name
group by i.`name`, i.`owner_id`, i.`description`, u.`name`;
To add in additional pairs, add them into the pairs table along, and add an appropriate line in the select for the test flag.