Creating a table from .csv file using PDI Kettle - mysql

I'm new to PDI and I'm working with PDI Kettle, I have 40 .csv files with different number of columns ,I want to create tables out of those files in a single transformation, I have used a "CSV File Input" step to select a file and "Table Output" step to create table but for creating 40 tables out of those 40 files I again need to select these two steps,so is there any way to create all 40 tables in one go in a single transformation is it possible,pls help me with the same
Thanks in advance

To do this in Pentaho with the standard steps is a bit involved. To read the CSV and get the headings, and then read the data, you need to use ETL Metadata injection.
First read the header with columns name, and use ETL Metadata injection to read the data in another transformation.
To auto create the databases is not straight forward, as this is something the main developer of Pentaho discourages.
Here is the answer and an example of how to auto create a table: Perform an auto CREATE TABLE to store the output of a transformation.
So you would run a job, that passes the filename and tablename to a transformation. The transformation will use ETL Metadata injection to read the CSV into correct fields and meta.getSQLStatementsString(); to get the DDL of the database to store the data.

Related

Delete records from MySQL source database table ,after copying to MySQL target database using Pentaho spoon

I would like to do the below operation using Pentaho spoon/PDI.
Copy data from the Mysql source database to the Mysql target database.
2)After that I would like to truncate/Delete Data from Source Database
Can Someone help with this?
Just run two transformations and one job:
First transformation: Table input step to read table data from source database and Table output step to write said data to table in target database
Second transformation: Truncate table in Source database. I don't remember if there's a specific step for this or if you could use a job step instead of a transformation running a SQL script to truncate the table. Just check what you have available in Pentaho, I don't have it opened at the moment to verify.
Job: After successfully running First transformation run the second transformation or the step.
Now, you can complicate matters from this depending on your situation:
Do you need to replicate a whole database instead of just one table? You'll need to add steps to work with Metadata Injection, so you run the first a second transformations for all the tables in the source database. You can read the dictionary tables in the source database to inject the information about table names and columns to the transformations so you don't create one pair of transformations for each table.
Do you need to create the tables in the target database before populating them? You'll need to add additional transformations to read the metadata of the table to create and build the script to create said table in the target database before populating it.
And so on

Excel to Multiple Tables in One Database Output - PDI

I'm using Pentaho Data Integration for my ETL process...
I have multiple excel files that I need to merge and upload in one database. However, I cannot Distribute the fields into its corresponding tables in the database. I can only send it to one table at a time. Is there any other way to do this? How can I have multiple target table?
P.S. I'm using MySQL Workbench for the database.
Thank you for your help!
You can connect multiple Table output steps to your last processing step and set it to copy all rows to both or all target steps. Connect Table outputs (or Insert/update, etc) like in the image, then right-click the step where the stream splits and select Copy Data to Next Steps. In the Table outputs you obviously only specify the columns that apply to that table.

Reading excel from DB (varbinary) in SSIS

First I'd like to say that I'm brand new to SSIS so bear with me if this is a very basic question. I've searched and cannot find an answer.
I need to read data from SQL Server that is stored in a varbinary column that contains an excel document. I then need to store this data into another table with the appropriate columns (pre-defined format).
My question is essentially... How do I read this varbinary data into something I can work with and then insert into another table?
You could use Export Column Transformation available within the Data Flow Task to read the varbinary data and then save it as a file on local disk where the SSIS package is running.
MSDN documentation about Export Column transformation.
Sample: The Export Column Transformation on BI Monkey
Using another data flow task, you can read the saved file and import the data into the table of your choice.

Updating a SQL table with CSV data?

I am trying to update one of my SQL tables with new columns in my source CSV file. The CSV records in this file are already in this SQL table, but this SQL table is lacking some of the new columns from this CSV file.
I already added the new columns to my SQL table structure via ALTER TABLE. But now I just need to import the data from this CSV file into the new columns. How can I do this? I am trying to use SSIS and SQL Server to accomplish this, but am pretty new to Excel.
This is probably too late to solve salvationishere's problem; though I'm posting this for future readers!
You could just generate the SQL INSERT/UPDATE/etc command by parsing the csv file (a simple python script will do).
You could alternatively use this online parser:
http://www.convertcsv.com/csv-to-sql.htm
(Hoping that it'd still be available when you click!)
to generate your SQL command. The interface is extremely straight forward and it does the entire job in an awesome way.
You have several options:
If you are loading the data into a non-production system where you can edit the target tables, you could load the data into a new table, rename the old table to obsolete, and rename the new table to the old table name.
You can load the data into a staging table and then write a SQL statement to update the target table from the staging table.
You can open the CSV file in Excel and write a formula to generate an update script, drag the formula down across all rows so that you get a separate update statement for each row, and then run the separate update statements in management studio.
You can truncate the target table and update your existing ssis package that imports the file to use the new columns if you have the full history in your CSV file.
There are more options, but any of the above would probably be more than adequate solutions.

Can I import tab-separated files into MySQL without creating database tables first?

As the title says: I've got a bunch of tab-separated text files containing data.
I know that if I use 'CREATE TABLE' statements to set up all the tables manually, I can then import them into the waiting tables, using 'load data' or 'mysqlimport'.
But is there any way in MySQL to create tables automatically based on the tab files? Seems like there ought to be. (I know that MySQL might have to guess the data type of each column, but you could specify that in the first row of the tab files.)
No, there isn't. You need to CREATE a TABLE first in any case.
Automatically creating tables and guessing field types is not part of the DBMS's job. That is a task best left to an external tool or application (That then creates the necessary CREATE statements).
If your willing to type the data types in the first row, why not type a proper CREATE TABLE statement.
Then you can export the excel data as a txt file and use
LOAD DATA INFILE 'path/file.txt' INTO TABLE your_table;