SSIS Lookup data update - ssis

I have created a SSIS Package that reads data from a CSV file and loads into table1 . the other data flow tasks does a look up on table 1 .Table1 has columns x , y, z, a ,b . Table 2 has columns a , b ,y,z Lookup is done based on columns y and z . Based on the column y and z , it is picking up a and b from table 1 and updating table 2 . The problem is the data gets updated but i get multiple rows of data thats is one without updation and one after updation .
I can provide more clear explanation if needed .

Fleshing out Nick's suggestion, I would get rid of your second data flow (the one from Table 2 to Table 2).
After the first Dataflow that populates table 1, then just do an EXECUTE SQL task that performs an UPDATE on Table 2, and joins to Table 1 to get the new data.
EDIT in response to comment:
You need to use a WHERE clause that will match rows uniquely. Apparently Model_Cd is not a UNIQUE column in JLRMODEL_DIMS. If you cannot make the WHERE clause unique because of the relationship between the two tables, then you need to select either an aggregate [Length (cm)] like MIN(), MAX() etc, or you need to use TOP 1, so that you only get one row from the subquery.

Related

"Filtering" huge MariaDB/Mysql table based on different table

Struggling with a large dataset in my mariaDB database. I have two tables, where table A contains 57 million rows and table B contains around 500. Table B is a subset of ids related to a column in table A. I want to delete all rows from A which do not have a corresponding ID in table B.
Example table A:
classification_id
Name
20
Mercedes
30
Kawasaki
80
Leitz
70
HP
Example table B:
classification_id
Type
20
car
30
bike
40
bus
50
boat
So in this example the last two rows from table A would be deleted (or a mirror table would be made containing only the first two rows, thats also fine).
I tried to do the second one using an inner join but this query took a few minutes before giving an out of memory exception.
Any suggestions on how to tackle this?
try this:
delete from "table A" where classification_id not in (select classification_id from "table B");
Since you say that the filter table contains a relatively small number of rows, your best bet would be creating a separate table that contains the same columns as the original table A and the rows that match your criteria, then replace the original table and drop it. Also, with this number of IDs you probably want to use WHERE IN () instead of joins - as long as the field you're using there is indexed, it will usually be way faster. Bringing it all together:
CREATE TABLE new_A AS
SELECT A.* FROM A
WHERE classification_id IN (SELECT classification_id FROM B);
RENAME TABLE A TO old_A, new_A to A;
DROP TABLE old_A;
Things to be aware of:
Backup your data! And test the queries thoroughly before running that DROP TABLE. You don't want to lose 57M rows of data because of a random answer at StackOverflow.
If A has any indexes or foreign keys, these won't be copied over - so you'll have to recreate them all manually. I'd recommend running SHOW CREATE TABLE A first and making note on its structure. Alternatively, you may consider creating the table new_A explicitly using the output of SHOW CREATE TABLE A as a template and then performing INSERT INTO new_A SELECT ... instead of CREATE TABLE new_A AS SELECT ... with the same query after this.

SSIS Lookup Transformation No Match Output Only Populates Null

I am trying to use the lookup transformation but can not seem to get the functionality out of it that I need. I have two tables that are the exact same structure
Temp Table (input): Smaller table but may have entries that do not exist in other table
Reference Lookup Table: Larger table that may not have identical entries to Temp Table.
I am trying to compare the entries of the Temp Table to the entries of the Reference Lookup Table. Anything that exists in the Temp Table, but not the Lookup should be output to a separate table (No match output).
It is a very simple Data Flow, but it does not seem to accomplish the lookup properly. It will find "No Match" rows, but the "no match" table is populated with null values for every column. I am trying to figure out why the data is losing its values?
How the Lookup is setup:
The data in temp table is what drives your data flow. 151 rows flowed out of it.
Your lookup is going to match based on whatever criteria you specify and you've identified that if there is no match, I want to push the no-match data into a table.
Since the lookup task cannot add columns to the no-match output path, this would imply your source (temp table) started NULL across the board.
Drop a data viewer/data tap onto the data flow between the lookup and the destination and then compare that data to your source. I suspect you're going to discover that the process that populated Temp table is at fault.
In the Lookup Transformation, in the columns tab you have identified that you want to use the value from the reference table to replace the value from the source.
Which works great until you get a no-match. In which case, the component is going to do the non-intuitive (even to me with 15+ years of working with it) action of update that column whether it matches or not.
Source query
SELECT 21 AS tipID, NULL AS tipYear
UNION ALL SELECT 22, 2020
UNION ALL SELECT 64263810, 2020
This adds three rows to my data flow, the first with no tipYear and the next two rows with a year of 2020. Stamp of 1 in the below image
Lookup query
SELECT
*
FROM
(
values (20, 1111), (21, 2021), (22, 2022)
)D(tipID, tipYear)
This reference data will supply a year for all the matches (21 and 22). In the matched path, we'll see 21 supplied with a value and 22 will have its year updated. Stamp 2 in the image
For id 64263810 however, no match will be found and we'll see the initial value of 2020 replaced with the matching row aka NULL. Stamp 3
Lessons learned. If you need to use the data from the reference table but have a no-match output path, do not replace column in the lookup transformation (unless your intention is to wipe out data)

Add columns to result

i use pentaho data integration.
I have three columns A,B,C i want to use C in "Input Table" Transformation for a select to another database.
so i have added "Select Value" before "Input Table" so my sql work fine, and that return me only one column : 'D'
But now i have two stream A,B,C and another D.
I havn't primary key in my second stream so how can i merge all columns,
My final result will be A,B,C,D
I have tried with "Merge join" but not working because i havn't primary key
ps: two stream retun me the same number of rows
Try to use "Joins rows" and do not select any fields, this will add the column D on the main flow.

Replace one column of a table with another column of another table in SQL

I have a table with several columns Table1(Col A, Col B)
Now I have one more table with one column. Table2 (Col C)
What I want to do is:
Replace Col B of table1 with Col C of tabl 2.
Is it possible in SQL? I am using phpmyadmin to execute queries
Why I need to do this?
- I was playing around with the database structure and changed the type of text to integer which messed up the entries in the column
- Good thing: I have a backup excel file so now i am planning to replace the effected column to by the orginal values in the backedup excel file.
No can do.
You seem to be making an incorrect assumption, namely that the order of rows in a table is significant. Else what's confusing some of the commenters would be clear to you: there's no information in table2 to relate it to table1.
Since you still have the data in Excel, drop table2 and re-create it with rows having the key to table1. Then write a view to join them. Easiest is probably to insert that join result into a third table, and then drop the first two and rename the third.

Index counter shared by multiple tables in mysql

I have two tables, each one has a primary ID column as key. I want the two tables to share one increasing key counter.
For example, when the two tables are empty, and counter = 1. When record A is about to be inserted to table 1, its ID will be 1 and the counter will be increased to 2. When record B is about to be inserted to table 2, its ID will be 2 and the counter will be increased to 3. When record C is about to be inserted to table 1 again, its ID will be 3 and so on.
I am using PHP as the outside language. Now I have two options:
Keep the counter in the database as a single-row-single-column table. But every time I add things to table A or B, I need to update this counter table.
I can keep the counter as a global variable in PHP. But then I need to initialize the counter from the maximum key of the two tables at the start of apache, which I have no idea how to do.
Any suggestion for this?
The background is, I want to display a mix of records from the two tables in either ASC or DESC order of the creation time of the records. Furthermore, the records will be displayed in page-style, say, 50 records per page. Records are only added to the database rather than being removed. Following my above implementation, I can just perform a "select ... where key between 1 and 50" from two tables and merge the select datasets together, sort the 50 records according to IDs and display them.
Is there any other idea of implementing this requirement?
Thank you very much
Well, you will gain next to nothing with this setup; if you just keep the datetime of the insert you can easily do
SELECT * FROM
(
SELECT columnA, columnB, inserttime
FROM table1
UNION ALL
SELECT columnA, columnB, inserttime
FROM table2
)
ORDER BY inserttime
LIMIT 1, 50
And it will perform decently.
Alternatively (if chasing last drop of preformance), if you are merging the results it can be an indicator to merge the tables (why have two tables anyway if you are merging the results).
Or do it as SQL subclass (then you can have one table maintain IDs and other common attributes, and the other two reference the common ID sequence as foreign key).
if you need creatin time wont it be easier to add a timestamp field to your db and sort them according to that field?
i believe using ids as a refrence of creation is bad practice.
If you really must do this, there is a way. Create a one-row, one-column table to hold the last-used row number, and set it to zero. On each of your two data tables, create an AFTER INSERT trigger to read that table, increment it, and set the newly-inserted row number to that value. I can't remember the exact syntax because I haven't created a trigger for years; see here http://dev.mysql.com/doc/refman/5.0/en/triggers.html