Pentaho Dimension lookup/update

Pentaho Dimension lookup/update - csv

I have seen Dimension Lookup/Update documentation here and a few other blogs. But I cannot seem to get a clear idea.
I have a table with the following structure:
Key Name Code Status IN Out Active
The key name code status active comes from a csv file .
I need to use Dimension lookup/update step for scd type2 and populate the IN/Out.
After setting up the connection details,
I have set the Keys to KEY and the Fields to all the other fields with the option Date of last insert (without stream field as source). I need to create a new row in the database if there is change in any of the other fields. That row would have the same key and the updated details with the out set to infinity and the in set to current system date.
Date Range start field is set to IN and Table Date range End is set to OUT column of database .
I don't understand the concept of this technical key as the key also comes from the csv file .
When I click on preview there is an error:
DB2 SQL error: SQLCODE: -407, SQLSTATE: 23502, SQLERRMC:
Please let me know for more details and any step or setting I must have missed.

The key points of Dimension Lookup / Update step (used in Update mode) when using for building SCD II. table are:
Keys - Key fields: Here you define an id column from your source data (I guess its Key from your CSV file). It is used to look up previously stored rows with same Key, so the step can compare incoming row to rows already stored in SCD II. and evaluate the row was changed or not.
Technical key field: Technical key is extra new column you need to add to your table (e.g. technical_key). Also new within your PDI record stream (named it same as it is in your table technical_key). Set it to auto increment. It will auto increment itself when inserting any new row into the table and its value is unique within the table (so can be used as table's primary key).
Stream Datefield: Usually here you put last_updated_date column from your source data which changes its value to actual date every time the row record is updated in source data. Or you can use actual time from when the transformation is executed (obtained from Get System Info.system date step).
Date range start field, Table date range and: Every row in SCD II. table needs to have validity period (a scope within row data is valid). This period is defined by two dates - start (Date range start field) and end date (Table date range end). These two fields you set to IN and Out (name of the table columns). The step will automatically determine their values (set up Min. year = 1900 and Max. year = 2199) using "Range start, end" and Stream Datefield values:
When the row (identified by Key) is new:
Key = 1; technical_key = 123; In = 1900-01-01; Out = 2199-12-31; Name = X
E.g. Next day. The same row updated (Stream Datefield's value = '2015-03-13'):
Key = 1; technical_key = 123; In = 1900-01-01; Out = 2015-03-13; Name = X
Key = 1; technical_key = 158; In = 2015-03-13; Out = 2199-12-31; Name = A
Fields - Update fields: Here you define all the data fields you want to store: Name Code Status Active. As a Type of dimension update set Insert (used for SCD II. attributes).
The Date of last insert (without stream field as source) cannot be used because it only simply writes out actual date-time to particular dimension field and you cannot define stream field along with this option.
The Punch through option can be used when you are not interested about the attribute's history (SCD I.). It overwrites all occurrences among the rows with the same Key. E.g. Punch through on Name (New value 'A'):
Key = 1; technical_key = 123; In = 1900-01-01; Out = 2015-03-13; Name = A
Key = 1; technical_key = 158; In = 2015-03-13; Out = 2199-12-31; Name = A

Related

MySQL 8.0.31: Foreign Key Value Overwritten with 'Null' when another Value is Assigned

I have two tables that are associated with one another through a foreign key, 'character_id' in my 'grid_location' table that references the primary key, 'character_id' in my 'character' table. I'm attempting to update the 'character_id' foreign key value for the entry with a "grid_number" primary key value of two in the 'grid_location' table with the "character.character_id" value corresponding to the character with first name of "Andrew" and last name of "Bernard" from the "character" table. When I run the following query with all default "grid_number.character_id" values assigned to "Null", the value of "grid_location.character_id" for the entry corresponding to grid location two is successfully updated as shown in the first image below:
UPDATE grid_location
SET grid_location.character_id = (
SELECT `character`.character_id
FROM `character`
WHERE `character`.first_name = 'Andrew' AND `character`.last_name = 'Bernard' AND grid_location.grid_number = 2
);
However, when I attempt to execute a similar query to update the 'character_id' foreign key value for the entry with a "grid_number" primary key value of three in the 'grid_location' table with the "character.character_id" value corresponding to the character with first name of "Angela" and last name of "Martin" from the "character" table, the value is successfully set for the grid_number value of three, but the previous value set for the grid number of two is overwritten to "Null" as if only one foreign key value can take on a "Not Null" value at a time. Must I set the "grid_location.character_id" foreign key entries to a default value other than NULL, or does the issue perhaps lie in the different number of entries in the "character" and "grid_location" tables, namely 42 and 9 respectively? The query used to execute this command, the "character" table, and the resulting "grid_location" table are shown below. As an aside, I am not able to write out the contents of the tables to a file because of unresolved permissions restrictions and so I had no choice but to insert screenshots into this issue.
UPDATE grid_location
SET grid_location.character_id = (
SELECT `character`.character_id
FROM `character`
WHERE `character`.first_name = 'Angela' AND `character`.last_name = 'Martin' AND grid_location.grid_number = 3
);

Since your query does not have a WHERE clause, it always updates all of the rows in the grid_location table.
For each row, it executes the subselect, and the subselect only returns a non-null value if the grid_number of the "current" row matches the number specified after grid_location.grid_number = . It returns null for all of the other rows.
I think you're looking for this instead:
UPDATE grid_location
SET grid_location.character_id = (
SELECT `character`.character_id
FROM `character`
WHERE `character`.first_name = 'Andrew' AND `character`.last_name = 'Bernard'
)
WHERE grid_location.grid_number = 2;

Copying records from mysql to pgsql with serial id field

I have a web-based database application that currently uses a mysql database. We're moving from mysql to pgsql so we're moving data from the mysql db to the new pgsql db. In the data there are header records in one table, detail records in another table. The header records have a serial field for a record id and the detail records have a numeric field that holds the header record id to tie the detail record to the header record. Since the application that uses this data relies on the database to generate record ids when records are created, that same structure exists in the target pgsql database. How can data be copied from the mysql database to the pgsql database and maintain the header/detail id relationship? It would seem the header records will get whatever serialized value is next for their id, and the detail records will get added still holding the old mysql header record id.

Postgres uses sequences for tracking auto increment primary keys. You can insert your primary and foreign keys as-is from MySQL. You will then need to update the sequences to have the max(id) based on your data.
see: https://www.postgresql.org/docs/current/functions-sequence.html
for example:
SELECT setval('your_table_name_seq', select max(id) from your_table_name);
Note the "_seq" suffix for the setval param.

class Employee(NamedTuple): # inherit from typing.NamedTuple
name: str
id: int = 3 # default value
employee = Employee('Guido')
assert employee.id == 3

Insert multiple records into a table based on AVG values from another table in mySQL

I have a mysql database with 2 tables. The first "spec" is a specification table, the second 'log' is table containing logged entries of previous measurements. Each part being logged is identified by a part number and test measurement. There may be many log entries for any given part number, but only 1 entry per part number in the 'spec' database giving the actual specification. What I need to do is obtain an average of the test measurement for every different part in the 'log' table, and insert this into the 'spec' table as a new specification. The log table will have already have been corrected to remove outliers.
I have been able to update existing records in the 'spec' table, but have been unable to insert records that do not already exist.
This works
update no_flow.spec s join
(select part, round(avg(cc),0) as avgcc
from no_flow.log l
group by part) l
on s.part = l.part and l.avgcc > 0
set s.cc = l.avgcc;
This does not work
INSERT INTO no_flow.spec set (part, cc) s join
SELECT part, avg(cc)
FROM no_flow.log l
WHERE id != 0
values (l.part, l.avgcc);
Suggestions?

If there is a unique index on part in spec you could use INSERT ... ON DUPLICATE KEY UPDATE Syntax
It would look something like this:
INSERT INTO noflow.spec (part, cc)
select
part as logPart,
round(avg(cc),0) as avgcc
from
no_flow.log
group
by logPart
ON DUPLICATE KEY UPDATE cc = VALUES(cc);
This inserts all the records from the inner SELECT into the spec table. When a given inserted record encounters a duplicate key error (i.e. there is already a record for the current part number) the ON DUPLICATE KEY clause updates the existing record by setting its cc column equal to the cc column on the record it was trying to insert.

Appending data to an indexed table

I'm attempting to append records to a table that has data indexed by an ID column. I'd like the appended data to continue that indexing and attach the next number to the first appended record and so on. However, that ID column is not an AutoNumber column--the first number needed to be 5001 so it's current Data Type is "Number". The data already in that table is entered via a form with this VBA to format the ID column:
If Nz(Me.ID, "") = "" Then
NewID = Int(DMax("ID", "tComplianceAll") + 1)
Else
NewID = Me.ID
End If
I currently have an append query to try to append the new data to the table with this SQL for that ID column : Int(DMax("ID","tComplianceAll")+1) AS Expr1
That, however, only works for the first record. The rest do not get appended due to key violations since it's trying to assign the same ID number for all appended records. Is there a way to change that SQL so that it properly indexes the newly appended data?

You surely can use a standard Autonumber ID value here!
Edit table tComplianceAll and change the ID column type to AutoNumber
Then, set the next value using this SQL:
ALTER TABLE tComplianceAll ALTER COLUMN ID AUTOINCREMENT(5001,1)
Actually, you would change 5001 to the latest value returned by (DMax("ID", "tComplianceAll") + 1)
That should do it, providing that tComplianceAll is not in any relationships with other foreign tables using the ID field.

Refactor Foreign Keys with Update after Primary Key change

We ran into a problem with out primary key. It was set to a meaningful value for ease of data entry since all data was originally added directly. However now the meaningful value is not always present in all entries. So now we are moving to an auto-generated, non-meaningful key. But I have to update the database to reflect this.
So my products table has the columns serial (the original key) and Id (the new PK). My parts table has the 2 columns FK_serial (the old FK) and FK_product (the new FK, currently set to 0 for all entries).
Is there a UPDATE statement that will walk through the parts table and set the FK_product to the value of Id in the products table where serial = FK_serial?

UPDATE parts
JOIN products
ON parts.FK_serial = products.serial
SET parts.FK_product = products.Id;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Pentaho Dimension lookup/update - csv

Related

MySQL 8.0.31: Foreign Key Value Overwritten with 'Null' when another Value is Assigned

Copying records from mysql to pgsql with serial id field

Insert multiple records into a table based on AVG values from another table in mySQL

Appending data to an indexed table

Refactor Foreign Keys with Update after Primary Key change

Categories

Resources