I have two excel source 1st is giving me date value and 2nd is giving me price value from excel sheet.
Now i need to insert these two values into one table please tell me how can i do this?
I have used merge join but it is giving me error input must be sorted that i can't as it excel file.
Well personally, I would put each Excel file into it's own staging table. Then I would use a SQL query that joins the two tables as the source for my insert to the production tables.
After you get the input from each source, you have to sort it prior to merging it.
You can sort the input from an Excel source, from any source, because the sort is performed with the data on memory. Its an element in the Toolbar.
Check this:
http://msdn.microsoft.com/en-us/library/ms137653.aspx
I'm pretty sure you can define a sort on an excel
Related
I am quite new to Pentaho Spoon and I would like to import records of an csv file to an database table. However, only unique records should be imported into the database table. That is why I need to compare EACH record with all records of the database table in order to determine if the record should be imported or not.
So far, I tried out the suggested CRUD-pattern which looks like this:
As you can see in the picture, I merge the excel input and the table input (ignore the cast-steps. I needed to cast a value because ther differed in the float format: database format was #.000000 and the csv format of float was #.0)
After the merge join, I compare the flag (which is given by the merge rows(diff) and if the compared records are new, I import them to the database table, if they are changed, I update the record and if they are deleted or identical, I simply do nothing. So far, so good.
But here is the problem: If I shuffle the records of the csv-input-file and run the transformation anew, all the records are imported anew and consequently, there are duplicated in my database table (which I wanted to avoid). To emphasize again: The right way to solve this is that each row of the csv-input-file is compared with ALL entries in the database table.
How can I realize this? Any suggestions? Thank you so much in advance!!
The Merge Rows (diff) expect the input to be sorted. Normally, you have been warned of this by a pop-up.
Put a Sort rows step on the output flow of the Excel Input, before it reaches the Merge Rows (diff).
You should do the same between the Table Input and the Merge Rows (diff). On course you may think you could do it in the sql statement of the Table Input.
However, there is a beginner trap here. You have 3 other steps Output Rows, Update and Delete which operates on the same table. And these steps may lock the table. As in Kettle all the steps are running concurrently, you do not know which steps will fire first, and the table may be locked and never be able to read even the first record. This is known in jargon as an auto-lock, and the way to solve it is to put a Sort Row step as a buffer.
You can use the 'Dimension lookup/update' control which provides the same functionality which you are trying to achieve.
Thanks,
Nilesh
I want to know if it's possible to load an Excel Validation list directly from an external query. I'm expecting to do this in VBA somehow. I have the query which works fine and I can load the resultant list into a table in a worksheet via VBA without a problem, but I wanted to know if I can use this result set directly into a validation list without having to first load the query results into a workbook and then refer to this table as the source for the validation list.
I'm working with MySQL and Excel 2016 and the combination works well for everything so far, but I'm stuck on this. Any ideas please?
Yes, you can and it is easy if you use Excel tables.
You need to always save the records returned from your SQL query into an Excel table. You can create it each time or you can just create one, rename it and always keep it there in a visible or hidden worksheet. It depends on you.
Let's say that you have created an Excel Table that keeps the options for your drop down list and it is called Table1. The trick is when you are defining the validation in the cell (or cells), in the Source section you write this:
=INDIRECT("Table1")
I have multiple archive tables storing similar kind of data in these tables but archived in the month wise format. Now, the requirement is to get all the archived data in to one table instead of multiple tables.
I am doing this activity with the help of Union all in SSIS, however it seems that it is taking random insert in the destination table.
Attach is the route taken for the transformation.
I want to prioritize the insert, please suggest!
You can add an extra column "Priority" to each of OLE DB sources with the corresponding priority for each source and then after union you can add Sort Component that sorts the data by Priority. But if you have a lot of data - that would be really inefficient because sort component will wait until all the source data is read.
I would suggest to write a proper source SQL statement that does the union/prioritization/sort for you and then insert into target.
Also if the sources are on different servers you can create Foreach loop container that will iterate through source tables and inset all of them into the target table. You can use this article for the reference.
Is there a way to write data to an excel spreadsheet after skipping x number of rows...excel is my destination and a sql query would be my source?
My scenario is one where i have a lot of header rows that i need to skip before data insertion. I would like to do this in an SSIS package. I am using SQL 2008 and Excel 2010.
Thanks
if you right click on the excel connection manager at the bottom of the page than click options , there is a setting called FirstRowHasColumnName set it to FALSE .let me know if it helps , didn't really understand if you just want to skip the first row that is the name of the columns from SQL query or more , there are other ways
Easiest way would be to modify your SQL query to exclude the header rows. If you can't do that then you need some logic to determine if the row is a header row (like checking if a certain field is a number):
If you can do that then you can do this:
read all columns in as text
Put in a derived column where you generate a new column IsHeader using your logic
Use Conditional Output to filter out the rows where your IsHeader is true
Use Data Conversion or Derived column to convert the columns to correct datatype
Output to Excel as usual
I am writing the SSIS package to import the data from *.csv files to the SQL 2008 DB. The problem is that one of the file contains the duplicate records in the csv file and I want to extract only the distinct values from that source. Please see the image below.
Unfortunately, the generated files are not under my control and it is owned by the third party and I could not change the way they generated.
I did use the LookUp Component. But it only checks the existing data against the incoming data. It does not check the duplicate records in the incoming data.
I believe the sort component gives an option to remove duplicate rows.
Depends on how serious you want to get about the duplicates. Do you need a record of what was duplicated or is it enough to just get rid of them? Sort component will get rid of dups on the sort field. However, the dups may have different data in the other fields and then you want a differnt strategy. Usually I load all to staging tables and clean up from there. I send the dupes removed to an exception table (we have to answer a lot of questions from our customers about why things don't match what they sent) and I often use a set of business rules (and use either an execute SQl or data flow tasks to enforce the rules) to determine which one to pick if there are duplicates in one area but not another (say two business addresses when we can only store 1). I also make sure the client is aware of how we determine which of the two to pick.
Use SORT tool for that from Toolbox, then click on it. You will get all available input columns.
Check the column and change sortType direction and then check "remove rows with duplicate sort value".
Bring in the data from the csv file the way it is, then dedup it after it's loaded.
It'll be easier to debug, too.
I used Aggregate Component and Group By both QualificationID and UnitID. If you want, you can also use Sort Component too. Perhaps, my information might help others.