i have an input csv file with columns position_Id, Asofdate, etc which has to be loaded into staging table. In my table, columns Position_Id, AsofDate are primary keys. We receive this input file for very 2 hours. For Exmaple, we recived File at 10 Am today, and that files loads into table.And after 2 more hours we recived another file whcih contains of same data as of the file which we recived 2 hours back and data loads into table.
Now my table contains the data of the file that we recived at 10 Am and 12 pm. At 12:10 pm we received modified input file with different data inside it. Now, my actual requirement is, before the latest input file (12:10 Pm) data is loaded int table, it has to see that only new and updated data has to be loaded into the table.
Have you ever heard of the term Upsert? Here are examples of how to upsert (insert new records and update existing records).
This blog post walks you through Upserting using a lookup in a dataflow.
This stackoverflow answer provides links to explaining and setting up a merge.
Related
Im quit new in ADF so here's the challenge from me.
I have a Pipeline that consist a LookUp activity and ForEach and inside this a Copy Activity
When i run this pipeline the first output of the Lookup activity looks like this
The output contains 11 different values. From my perspective i only see 11 records that will need to be copied to my Sink which is Azure SQL DB.
The input of the ForEach activity looks like this
During the running the Pipeline copy 11 times and in my sql database it has now 121 records. This amount is based on 11 rows multiple 11 iteration. This is not the output which i expected.
I only expect 11 rows in my sink table. How can i change this pipeline in order to achieve the expected outcome of only 11 rows?
Many thanks!
In order to copy data, Lookup activity and copy data source activity should not be given same configuration. If given so, duplicate rows will be copied.
I tried to repro the same in my environment.
If 3 records are there in source data, 3 times 3 records will be copied.
In order to avoid duplicates, we can use only copy activity to copy data from Source to sink.
Only 3 records are in target table.
I am working in a workflow on Knime, and I have an excel writer node as a final node of my workflow. I need to read this file and get and store the last value of one specific column (time). And with this data, I need to input in another time node, to update my API link to get a new request.
To summarize I need to extract specific information from the last line of my excel file in knime.
My question is: How can I read this file and get this value from my sheet? And then, how can update a time loop to refresh the data for inserting the current day in my API link?
UDDATE-> My question is how can I Filter always the last 90 days in my concatenate database. I have two columns in this file with dates. And I need to maintain just the last 90 days since the current day.
To read an Excel file, use the Excel Reader node.
The simplest way to get the last row of a table (assuming your date column has a value for every row of this table?) is probably to use a Rule-based Row Filter with the expression
$$ROWINDEX$$ = $$ROWCOUNT$$ => TRUE
Now you have a one-row table with the values from the last line of the Excel sheet. To help further we need to understand what you mean by update a time loop to refresh the date for inserting the current day in my API link. Can you update your question with a screenshot of your current KNIME code?
I've been working on some of my projects, and I am feeling stuck because my software takes 1 or 2 hours to load those 10,000 records (basically it goes through lengthy way and I want to skip it).
Luckily I know that the software uses .db file to save fetched records and I am sure that if I can put my custom list in table of .db file, it will show up in software without need to go through lengthy process. I used some .db viewer and editor and through those tools, i was able to see my desired table (in which there are records), and there was "Insert record" option too, but imagine that I want to put 10,000 records and inserting record one by one is way huge time consuming.
Therefore, I want to edit one table of that .db file and insert custom records (multiple) in one or few clicks rather than 10,000 clicks.
.db file is located in AppData > Roaming ... etc
Is there any way to achieve? For your information, I use Windows 7 operating system.
thanks for your interest. I was able to insert multiple records through Insert SQL query and now everything is working perfectly.
Thanks!
I am importing a FoxPro table into SQL Server 2008 using SSIS. The source data is a proprietary database that I have no control over. Let call the table I am importing Customers.
Sometimes, the structure for Customers looks like this:
ID (int)
NAME (char(30))
ADDRESS (char(30))
CITY (char(20))
STATE (char(2))
ZIP (char(10))
CCNUM (char(16))
Other times, it looks like this:
ID (int)
NAME (char(30))
ADDRESS (char(30))
CITY (char(20))
STATE (char(2))
ZIP (char(10))
CCPTR (char(100))
This proprietary database basically has 2 different versions of the database. The older version had a field called CCNUM (credit card #) that was a basic 16 character field. The newer version replaced that field with a field called CCPTR, which was a 100 character field that represented a card pointer (encrypted value for the actual credit card number).
The problem here is everytime I have to switch back and forth between 2 datasets that have these different table structures, SSIS blows up and I have to go in and manually refresh the metadata.
My question is, is there anyway I can have SSIS dynamically look for one of these fields at runtime, and based on which one is there, load the correct data into the correct table structure in SQL?
Forgive me if this has been asked before. I am still fairly new to SSIS and I tried searching for this answer but to no avail.
Thanks,
Mark
The short answer is no. SSIS expects that there are no significant changes to the meta data of its source and destination components. There are ways to programatically influence this with .NET, but that kind of misses the point.
A well-designed solution to this problem is to create 2 separate data flows that copy the data into a shared staging table. Use this staging table as source to transform your data and push it to its final data structure.
if you build your package based on the lenght (100) and tun it on the (16), you should get only a warning. Are you getting an error?
I am currently experiencing difficulties when trying to append data to existing tables.
I have about 100 CSV files that I would like to create a single table from; all the tables have different column structures but this isn't really an issue as the associated field names are in the first row of each file.
First, I create a new table from one of the files indicating that my field names are in the first row. I change the particular fields that have more than 256 characters to memo fields and import the data.
I then add to the table the fields that are missing.
Now, when I try to append more data, I again select that my field names are in the first row, but now I receive a truncation error for data that is destined for the memo fields.
Why is this error occurring? Is there a workaround for this?
edit
Here is an update regarding what I've attempted to solve the problem:
Importing and appending tables will not work unless they have the exact same structure. Moreover, you cannot create a Master table with all fields and properties set, then append all tables to the master. You still receive truncation errors.
I took CodeSlave's advice and attempted to upload the table, set the fields that I needed to be Memo fields, and then append the table. This worked, but again, the memo fields are not necessarily in the same order in every data file, and I have 1200 data files to import into 24 tables. Importing the data table by table is just NOT an option for this many tables.
I expect what you are experiencing is a mismatch between the source file (CSV) and the destination table (MS Access).
MS Access will make some guesses about what the field types are in you CSV file when you are doing the import. However, it's not perfect. Maybe it's seeing a string as a memo or float as a real. It's impossible for me to know without seeing the data.
What I would normally do, is:
Import the second CSV into it's own (temporary) table
Clean up the second table
Then use an SQL query to append those records from the second table to the first table.
Delete the second table
(repeat for each CSV file you are loading).
If I knew ahead of time that every CSV file was already identical in structure, I'd be inclined to instead concatenate them all together into one, and only have to do the import/clean-up once.
Had a very similar problem - trying to import a CSV file with large text fields (>255 chars) into an existing table. Declared the fields as memo but were still being truncated.
Solution: start an import to link a table and then click on the Advanced button. Create a link specification which defines the relevant fields as memo fields and then save the link specification. Then cancel the import. Do another import this time the one you want which appends to an existing table. Click on the Advanced button again and select the link specification you just created. Click on finish and the data should be imported correctly without truncation.
I was having this problem, but noticed it always happened to only the first row. So by inserting a blank row in the csv it would import perfectly, then you need to remove the blank row in the Access table.
Cheers,
Grae Hunter
Note: I'm using Office 2010