This is a classic does the record exist then update otherwise insert scenario. The dataflow process I am following is as follows:
OLE DB: return list of fields one of which is a calculated hashvalue on a number of the fields in the dataset
Lookup: See if the row has changed by comparing the hashvalue and another key field(allows duplicates) against the latest record in the target table for that key field.
If matches update the record, if no match then insert.
The problem I am having is that one of the records that I would match against is in the dataset I am importing and not in the target table. Therefore I always geta NO MATCH result. If I delete the row causing the error in the target table then re-run the import I do get a MATCH.
I have tried turning off the cache for the lookup with no success. Suffice to say I have searched for the answer with no luck. HELP
Related
I want to load data into Snowflake with Talend. I used tSnowflakeOutput with Upsert option because I want to insert data if not exists in Snowflake, or update rows if it exists. I used the primary key to identify the rows that already exist.
When I run my job, I have the following error:
Duplicate row detected during DML action
I am aware that the problem is due to a line that exists in Snowflake, I want to update the line but all I've got is this error.
do you have an idea why?
Please help :)
The Talend connector might be internally using the MERGE operation of Snowflake. As mentioned by #mike-walton, the error is reported because MERGE does not accept duplicates in the source data. Considering that its an insert or update if exists operation, if multiple source rows join to a target record, the system is not able to decide which source row to use for the operation.
From the docs
When a merge joins a row in the target table against multiple rows in the source, the following join conditions produce nondeterministic results (i.e. the system is unable to determine the source value to use to update or delete the target row)
A target row is selected to be updated with multiple values (e.g. WHEN MATCHED ... THEN UPDATE)
Solutions 1
One option as mentioned in the documentation can be to set the ERROR_ON_NONDETERMINISTIC_MERGE parameter. This will just pick an arbitrary source row to update from.
Solutions 2
Another option is to make it deterministic by using a MERGE query of the following form. This essentially does a de-duplication on the source table and lets you pick one of the duplicates as the preferred one for the update.
merge into taget_table t
using (
select *
from source_table
qualify
row_number() over (
partition by
the_join_key
order by
some_ordering_column asc
) = 1
) s
on s.the_join_key = t.the_join_key
when matched then update set
...
when not matched then insert
...
;
Doing this same thing in Talend may just require one to do a dedup operation upstream in the ETL mapping.
I am using AWS data pipeline to copy data from RedShift to MySql in RDS. The data is copied to MySQL. In the pipeline the insert query is specified as below:
insert into test_Employee(firstname,lastname,email,salary) values(?,?,?,?);
Is there any way for me to specify the column name for the source table in place of ? in the above query? I tries adding the column names for the source table but that does not seem to work. Currently the column names in both the source and destination table are same.
Thanks for your time. Let me know if any other information is required.
Specifying columns instead of ? wouldn't work because insert SQL queries know nothing about your source datasource. AWS Copy activity just passes the parameters to this query in the same order you selected them from the source dataset.
However column names of destination table (test_Employee) in insert query don't have to match the order of columns specified in the DDL, so you can change this query to match the order of columns in the source table.
E.g if your source dataset has following columns:
email,first_name,last_name,salary
Insert query:
insert into test_Employee(email,firstname,lastname,salary) values(?,?,?,?);
Note. As you can see the column names of the source and destination table don't have to match.
I'm having data flow from source tables to destination table. To simplify the question, I'll say there are two merge joined source tables and one destination table. Also, there are primary keys helping me identify each record
The package is running everyday, and if one record is deleted from source table, how could I know which one is deleted so that I could delete that in destination table?
(FYI~~ I've dong checking to see if a record exists in destination table and if so update else insert, but don't know how to find deleted data)
Another possible approach:
Assuming you receive all records from source, not just imports and updates:
Amend package to stamp records that have been inserted or updated using a unique id or run datetime
Following the package run, process the destination table where records weren't inserted or updated in the last package run. By a process of elimination, any records that weren't provided in the source file should be deleted.
Again, assuming that all records are sent, not just imports and updates. But then again, if you don't receive all records, it's going to be physically impossible to detect if a record has been deleted.
The problem with comparing source to destination is that you have to compare every source row to the destination in every load, and as the number of rows increases that takes up more and more time.
As a result, the best way to handle this is probably on the source side. Two common approaches are a 'soft delete' where you set a flag column to mark the row as deleted; or a trigger that records the PK of the deleted row in a log table (or moves the entire row to an archive log table). Your ETL process then looks at the flags or the log/archive table to determine which rows were deleted since the last load.
Another possibility is that the source platform offers some built-in feature you can use to track deleted rows, e.g. CDC in SQL Server. But if you have no control at all over the source database (if it even is a database) then there may be no alternative to comparing the full data set.
One possible approach:
Prior to running package, delete the destination table records (using a stored procedure)
Just import all records in to destination table
Pros:
Your destination table will always mirror the incoming data, no need to check for deletions
Cons:
You won't have any historical information (if that is required)
I had the same problem, as in how to mark my old/archive records as being "deleted" because they no longer exist in the original data source.
Basically, I built two tables, where one is the main table containing all the records that came in from the original data source, and a temporary table I kept to store the original data source every time I ran my scripts.
MAIN TABLE
ID, NAME, SURNAME, DATE_MODIFIED, ORDERS_COUNT, etc
plus a STATUS column (1 for Active, 0 for Deleted)
TEMP TABLE same as the original, but without STATUS column
ID, NAME, SURNAME, DATE_MODIFIED, ORDERS_COUNT, etc
The key was to update the MAIN TABLE with STATUS = 0 if the ID of the MAIN table was no longer in the Temp table. ie: The source records have been deleted.
I did it like this:
UPDATE m
SET m.Status = 0
FROM tblMAIN AS m
LEFT JOIN tblTEMP AS t
ON t.ID = m.ID
WHERE t.ID IS NULL
I have a filemaker script that inserts a new entry on several imported mySQL tables. I need to get the unique id of these entries as they are created (before or after, either way) so I can refer to them later in the script. This is easy to do in SQL with LAST_INSERT_ID(), but I can't seem to find how in filemaker.
What I have tried that didn't work:
GetNextSerialValue ( Get ( FileName )&".fp7" ; "jos_users::id" ) #returns nothing, does not work properly on ODBC tables
Get(RecordID) #Local value does not correspond to the mySQL unique ID
Thanks in advance! Any suggestions are appreciated.
How are you creating the records in your MySQL table? Are you using FileMaker's ESS and native "New Record" command to insert the new record or are you using the Execute SQL[] script step? Or something different?
The next serial value and recordID values are FMP-specific and don't apply here (unless your MySQL primary key is a serial value managed by FMP).
With "New Record" FileMaker sets your record context to the new record. So if your "jos_users" table occurrence is an alias to the MySQL table, calling "New Record" will place you in the context of that record. You can then set a variable to the contents in the "id" column.
Go To Layout ["jos_users" (jos_users)]
New Record/Request
Commit Records/Requests[No dialog]
Set Variable [$lastInsertID; jos_users::id]
This presumes that your MySQL record is given a proper ID at record insertion time and that the record can be committed passing any validation checks you've applied.
If you're creating the records using some other process, let us know and we can advise you on how to get the value.
Is empty lookup table the same as non-matching lookup table in lookup transform?
What would be the result if no row redirection is configured?
an empty result set or
package failure at the lookup transform
You could get #2: Package failure. It would not be able to find the row in the lookup table (since it's empty).
Edit: I should say that if you set the Error Configuration to Ignore Failure, you will get an empty rowset.