A way to update data in Oracle - mysql

I have a table that I need to update each day. The data comes in a text file every time. I wrote a program that extracts the data from the text file and and writes it in the table, but now I want to modify it to just update the existing data. The data is mostly the same, it might differ only a few things.
I was thinking about MERGE but I don't know very well how I could use this in my program. All the examples that I saw used a second table.
So it would be like creating a second table in which I extract the current data, after which I make the merge into the old table to update the records. I want to avoid creating a second table, so I was wondering if there is any way to do this?
Thanks!

Related

Pentaho Import uniqe records into database

I am quite new to Pentaho Spoon and I would like to import records of an csv file to an database table. However, only unique records should be imported into the database table. That is why I need to compare EACH record with all records of the database table in order to determine if the record should be imported or not.
So far, I tried out the suggested CRUD-pattern which looks like this:
As you can see in the picture, I merge the excel input and the table input (ignore the cast-steps. I needed to cast a value because ther differed in the float format: database format was #.000000 and the csv format of float was #.0)
After the merge join, I compare the flag (which is given by the merge rows(diff) and if the compared records are new, I import them to the database table, if they are changed, I update the record and if they are deleted or identical, I simply do nothing. So far, so good.
But here is the problem: If I shuffle the records of the csv-input-file and run the transformation anew, all the records are imported anew and consequently, there are duplicated in my database table (which I wanted to avoid). To emphasize again: The right way to solve this is that each row of the csv-input-file is compared with ALL entries in the database table.
How can I realize this? Any suggestions? Thank you so much in advance!!
The Merge Rows (diff) expect the input to be sorted. Normally, you have been warned of this by a pop-up.
Put a Sort rows step on the output flow of the Excel Input, before it reaches the Merge Rows (diff).
You should do the same between the Table Input and the Merge Rows (diff). On course you may think you could do it in the sql statement of the Table Input.
However, there is a beginner trap here. You have 3 other steps Output Rows, Update and Delete which operates on the same table. And these steps may lock the table. As in Kettle all the steps are running concurrently, you do not know which steps will fire first, and the table may be locked and never be able to read even the first record. This is known in jargon as an auto-lock, and the way to solve it is to put a Sort Row step as a buffer.
You can use the 'Dimension lookup/update' control which provides the same functionality which you are trying to achieve.
Thanks,
Nilesh

Prioritize Bulk Insert in a table using Union all in ssis

I have multiple archive tables storing similar kind of data in these tables but archived in the month wise format. Now, the requirement is to get all the archived data in to one table instead of multiple tables.
I am doing this activity with the help of Union all in SSIS, however it seems that it is taking random insert in the destination table.
Attach is the route taken for the transformation.
I want to prioritize the insert, please suggest!
You can add an extra column "Priority" to each of OLE DB sources with the corresponding priority for each source and then after union you can add Sort Component that sorts the data by Priority. But if you have a lot of data - that would be really inefficient because sort component will wait until all the source data is read.
I would suggest to write a proper source SQL statement that does the union/prioritization/sort for you and then insert into target.
Also if the sources are on different servers you can create Foreach loop container that will iterate through source tables and inset all of them into the target table. You can use this article for the reference.

How do I populate a field with static header row information on import?

As it stands, I am currently looking to import data from an Excel spreadsheet into a table on a monthly basis. The header row in the spreadsheet contains the date that the original query was run.
I have one master table in access consisiting of multiple files. I would like to set up an automated process to capture the date in the header upon import, and then record it in a field for every new record that was imported.
There are two caveats here:
Spreadsheet sizes will vary depending on where data exists.
I have no control over how the data is provided. Fields that contain no data for the month will not populate to the spreadsheet.
Less frequently fields will be added that do not exist.
So far I have been identifying these new additions manually and creating a new field for them at the end of the field list. I realize that this is very inefficient and I would like to automate it, if I can.
Does anyone have any insight? Any assistance would be greatly appreciated.
OK, here's the steps you'll want to take.
Create a link from Access to your Excel spreadsheet. Access will now see this as a table.
Create a make table query using the Excel table as the source and adding the date (derived from a sub-query) as an additional field.
Then run the query. This will automatically create all the fields.
If, however, you need to create new fields in an existing table, then you'll have to use VBA, read each header in the Excel table, compare it to the schema of the existing table, and execute an alter table query to add the field.
Good luck

How to save changes of an HTML table full of text fields to a db

I have an HTML table full of text fields. I have the ability to add new rows in the table (I am just doing this with a JavaScript function call and inserting them directly) on the fly. I need to be able to save the table values right into a (MYSQL through PHP) database. I have got the database all hooked up, but I have a question about saving the changes to my table.
How can I know which fields of the potentially dozens of fields are 'dirty'? I am thinking it would be easiest to just clear out the database when I need to save and insert all of the data from the HTML table into the database. Is that an acceptable approach?
In general it's better to clear out all relevant records and re-add them. It saves a lot of messing around BUT do be sure the table you're saving contains all the relevant data. If you have state data that wouldn't normally be editable (like creationDate you should be sure it's available to resave.
You should also wrap the whole process in a transaction so you can roll back any deletions if saving the new data fails for any reason.

Reading hierarchical flat file into SSIS

I have flat file that structured in a hierarchical format that looks something like this:
Area|AreaCode|AreaDescription
Region|RegionCode|RegionDescriptoin
Zone|ZoneCode|ZoneDescription
District|DistrictCode|DistrictDescription
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
RouteFooter
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
RouteFooter
DistrictFooter
District|DistrictCode|DistrictDescription
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
Record|Name|Address|Ect
RouteFooter
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
RouteFooter
DistrictFooter
ZoneFooter
RegionFooter
AreaFooter
I have to bring this into SSIS and consume information about the Record row and also about the header for the current record row. As well as information from several other sources and output a more simple flat file as a result.
I would like to read the flat file above into a structure that each row contains a record with the appropriate header information included.
My question is, what is the best way to do this if it is even possible?
First how do you tell what type of line you are on if you are on say line 3,987,986? How do you tell what is related to what? Is there apossiblity you could get this in a better format? Before spending lots of time (and don't kid yourself, this will take lots of time to set up and test properly) I would kick the file back to the provider and request it in a different format. You won't always get it, but you should at least try.
When I have done this in the past in DTS, the first characters of each line told me which structure the line referred to. I imported all into a staging table with two columns, one for the recordtype data and one for the rest. Then I parsed the rest into the staging tables for the type of record with the correct column structure for that type of record (and any fileds you might need to do the relationships) and then did clean up and then imported to prod tables. AS you also have differnt number of columns I would try that approach (only you may have to manually populate some columns instead of figuring out directly from the file), also give each record an identity filed in the staging tables. this will help you figure out the realtionships I think.