I'm using Pentaho Data Integration for my ETL process...
I have multiple excel files that I need to merge and upload in one database. However, I cannot Distribute the fields into its corresponding tables in the database. I can only send it to one table at a time. Is there any other way to do this? How can I have multiple target table?
P.S. I'm using MySQL Workbench for the database.
Thank you for your help!
You can connect multiple Table output steps to your last processing step and set it to copy all rows to both or all target steps. Connect Table outputs (or Insert/update, etc) like in the image, then right-click the step where the stream splits and select Copy Data to Next Steps. In the Table outputs you obviously only specify the columns that apply to that table.
Related
I would like to do the below operation using Pentaho spoon/PDI.
Copy data from the Mysql source database to the Mysql target database.
2)After that I would like to truncate/Delete Data from Source Database
Can Someone help with this?
Just run two transformations and one job:
First transformation: Table input step to read table data from source database and Table output step to write said data to table in target database
Second transformation: Truncate table in Source database. I don't remember if there's a specific step for this or if you could use a job step instead of a transformation running a SQL script to truncate the table. Just check what you have available in Pentaho, I don't have it opened at the moment to verify.
Job: After successfully running First transformation run the second transformation or the step.
Now, you can complicate matters from this depending on your situation:
Do you need to replicate a whole database instead of just one table? You'll need to add steps to work with Metadata Injection, so you run the first a second transformations for all the tables in the source database. You can read the dictionary tables in the source database to inject the information about table names and columns to the transformations so you don't create one pair of transformations for each table.
Do you need to create the tables in the target database before populating them? You'll need to add additional transformations to read the metadata of the table to create and build the script to create said table in the target database before populating it.
And so on
I want to update three tables from CSV File in a single SSIS package.
I am done with updating the single table by comparing CSV file and table, I have attach the screenshot its working fine. But when I trying to update more than 3 tables in single package getting the problem in updating records.
so please share the detailed steps to update multiple(more than two) tables
Please be clear about how you want to update more than three tables?
or on basis of conditions, you want to insert and update data?
I have a list of tables that need to copied from one database to another. The list of tables resides in the table in a DB. I am trying to write a Standard Macro to perform this. This is what I have so far:
The macro reads the in list of tables and then reads that using the Dynamic Input. But I am unsure how to write the tables to the second DB. Please advise.
You can do this with a batch macro where each batch will bring in a different table. The Dynamic Input tool will not work because the tables are of different schemas.
The workflow will have a Control Parameter (set to the table name) and updating an Input Data tool. This will then go to an Output Tool that has the "Take File/Table Name from field" option selected.
I'm new to PDI and I'm working with PDI Kettle, I have 40 .csv files with different number of columns ,I want to create tables out of those files in a single transformation, I have used a "CSV File Input" step to select a file and "Table Output" step to create table but for creating 40 tables out of those 40 files I again need to select these two steps,so is there any way to create all 40 tables in one go in a single transformation is it possible,pls help me with the same
Thanks in advance
To do this in Pentaho with the standard steps is a bit involved. To read the CSV and get the headings, and then read the data, you need to use ETL Metadata injection.
First read the header with columns name, and use ETL Metadata injection to read the data in another transformation.
To auto create the databases is not straight forward, as this is something the main developer of Pentaho discourages.
Here is the answer and an example of how to auto create a table: Perform an auto CREATE TABLE to store the output of a transformation.
So you would run a job, that passes the filename and tablename to a transformation. The transformation will use ETL Metadata injection to read the CSV into correct fields and meta.getSQLStatementsString(); to get the DDL of the database to store the data.
I have 2 separate sql databases, they both have the same field but are not attached and are completely separate files. One of the database files has a few hundred rows of data and I want to copy a few of those rows into the other database file. Some people have said to use sql statements to copy the data but the databases are not linked in any way so I am not sure as to how these statements would work. Is there no software where I can just select the correct rows and copy them over, or create a new database with the ones selected?
I hope this makes sense, thanks.
Regardless of the database platform you are using, there should be commands/tools that will allow you to perform bulk data imports and exports from/to a file (e.g. a CSV file). Try exporting the rows you wish to copy from the database on the first server into an intermediate file, copying that file to the second server, and then importing it into that database.