How to update SQL Server 2008 table periodically - sql-server-2008

I have a question which I'm sure has been asked before, but I don't know the terminology for this question. (Hence, I have tried searching for an answer to this question on this, but no luck.)
I am on SQL Server Management Studio 2008. I have a table that was created by an import of a flat file. At the beginning of every month, I want to update this table with a new version of the given flat file. The headers on the flat file / table will always stay the same, and no previous records will be lost. Data on previous records may change, and new records will be included.
What is the best way for doing this each month? Right now, my solution is to delete the current table and re-create it with an import of the new flat file. (Or, I could run a truncate, and then re-import.)

One of the faster methods would be to drop all indexes, truncate, re-import, and re-create all indexes. Note that with a flat file you could automate using SSIS, or you could use a BULK INSERT for a job schedule. For instance, if the file is in the same location every month and all the delimiters and details are the same, a procedure or TSQL script that BULK INSERTs the file would work when called by a job once a month on a schedule.
BULK INSERT MonthlyTable
FROM 'C:\MonthlyFileDrop\MonthlyFile.txt'
WITH (
FIELDTERMINATOR = ','
,ROWTERMINATOR = '0x0a'
,FIRSTROW=2
)
Another approach (one that I'm not partial to) would be to insert the data into a stage table, compare what data are not in the existing table from the staging table, populate those data, then re-index the existing table and drop the staging table.

Related

Check if a record from database exist in a csv file

today I come to you for inspiration or maybe ideas how to solve a task not killing my laptop with massive and repetitive code.
I have a CSV file with around 10k records. I also have a database with respective records in it. I have four fields inside both of these structures: destination, countryCode,prefix and cost
Every time I update a database with this .csv file I have to check if the record with given destination, countryCode and prefix exist and if so, I have to update the cost. That is pretty easy and it works fine.
But here comes the tricky part: there is a possibility that the destination may be deleted from one .csv file to another and I need to be aware of that and delete that unused record from the database. What is the most efficient way of handling that kind of situation?
I really wouldn't want to check every record from the database with every row in a .csv file: that sounds like a very bad idea.
I was thinking about some time_stamp or just a bool variable which will tell me if the record was modified during the last update of the DB BUT: there is also a chance that neither of params within the record change, thus: no need to touch that record and mark it as modified.
For that task, I use Python 3 and mysql.connector lib.
Any ideas and advice will be appreciated :)
If you're keeping a time stamp why do you care if it's updated even if nothing was changed in the record? If the reason is that you want to save the date of the latest update you can add another column saving a time stamp of the last time the record appeared in the csv and afterwords delete all the records that the value of this column in them is smaller than the date of the last csv.
If the .CSV is a replacement for the existing table:
CREATE TABLE new LIKE real;
load the .csv into `new` (Probably use LOAD DATA...)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
If you have good reason to keep the old table and patch it, then...
load the .csv into a table
add suitable indexes
do one SQL to do deletes (no loop needed). It is probably a multi-table DELETE.
do one sql to update the prices (no loop needed). It is probably a multi-table UPDATE.
You can probably do the entire task (either way) without touching Python.

SSIS package design, where 3rd party data is replacing existing data

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.
Here's the quick description of the business need:
We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.
Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.
Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.
Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.
What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.
What have you done in a similar situation?
If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data.
other approach could be
you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.
I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.
What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

Bulk CSV File Import in MySQL Removing duplicates while Importing dynamic colmns from CSV

I have to import CSV File for different clients in my system, some with [,] some [|] Etc… separated. Always very big files.
While importing I need to filter duplicate records, duplicate records should not insert in dB.
Problem is columns can be different for different clients. I have database for every client for CSV DATA which I import every day, Week or month depending on client need. I keep data for every import so we can generate reports & what data we receive in CSV file our system do processing after import.
Data structure example:
Client 1 database:
First_Name | Last_Name | Email | Phone | Ect…
95% data always same every new CSV file. Some records new comes & some records they delete from csv. so our system only process those records those newly imported .
Currently what happening we import data every time in new table. We keep table name with time stamp so we can keep track for import. it Is expensive process, it duplicate records and tables.
Im thinking solution and I need your suggestion on it.
Keeping one table every time import when I import CSV file data in table, I’ll alter existing table add new column, column name will be “current date” (byte or Boolean) add true false on import??
My other question is first time when I import CSV file …I need to write script:
While importing CSV data, if table already exists then my date logic will work else if table does not exist it should create table given or provided “client name” as table name. Challenge is columns I don’t know it should create columns from CSV file.
Table already exist some new records came in it should insert else update.
Is it do able in mysql??
although I have to do something for mssql also but right now I need solution for my sql.
Please help me... im not good in MySQL :(
You can certainly do an insert Or update statement when importing each record.
see here :
https://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
I propose you create a script to dynamically create you table if it doesn't
What is the language that you would use to insert your csv?

ssis want to update a sql table based on a flat file

Rather new to SSIS so not sure how to handle this.
I have a flat file which i managed to successfully read from. So right now my data flow consists of just a flat file source.
What i want to do is something like this:
Update SqlTable S
set s.columnA = f.columna
from FlatFile f
where s.columnID = f.columnID
Right now the only way i can see of doing this would be to insert the contents of the flat file into a sql table, then doing my update. This seems wasteful considering i don't need to save the data of the flat file. I just need to update an existing sql table based on the data in the flat file. So is there some way to run the query directly in the SSIS package instead of having to insert a bunch of data into a sql table that i will just wind up dropping?
thanks
Update SqlTable S set s.columnA = f.columna from FlatFile f where s.columnID = f.columnID
That statement above is a SQL statement. You cannot connect a sql table to a flat file. You need to work in SQL to do an update, since that is where the table lives
You have 2 choices:
Use an OLEDB Command component within the data flow. The downside is this calls the statement for each record, so if you have 1,000s of records it is very inefficient.
Push the records to a table using an OLE DB Destination and then you can call your update using an Execute SQL Task. You can then truncate the table if you like
A possible 3rd option is to roll your own OLE DB destination to do an update on record sets vs records.
While this might sound wasteful, to create a table in the database to store update records, it is done very often. You just drop the worktable or truncate when complete.
You could add an OLE DB Command component to the Data Flow that retrieves data from the flat file. The OLE DB Command would do a single row update for each record retrieved from the flat file. This might be okay if there are few rows in the flat file; but, you can imagine how bad performance will be if there are many rows in the flat file.
I think you'll find that sending the flat file rows to a database table and running a single UPDATE is going to be the best performer for lots of data.
I haven't tried this but have you tried sending to a recordset destination and then running the update using that?
The bulk load into a temporary table is the way to go and then do your updates from the temp table. As a previous poster says it is quite a common aproach to stuff data into a staging area prior to doing some more work with the data and then dropping or truncating the table

Updating a SQL table with CSV data?

I am trying to update one of my SQL tables with new columns in my source CSV file. The CSV records in this file are already in this SQL table, but this SQL table is lacking some of the new columns from this CSV file.
I already added the new columns to my SQL table structure via ALTER TABLE. But now I just need to import the data from this CSV file into the new columns. How can I do this? I am trying to use SSIS and SQL Server to accomplish this, but am pretty new to Excel.
This is probably too late to solve salvationishere's problem; though I'm posting this for future readers!
You could just generate the SQL INSERT/UPDATE/etc command by parsing the csv file (a simple python script will do).
You could alternatively use this online parser:
http://www.convertcsv.com/csv-to-sql.htm
(Hoping that it'd still be available when you click!)
to generate your SQL command. The interface is extremely straight forward and it does the entire job in an awesome way.
You have several options:
If you are loading the data into a non-production system where you can edit the target tables, you could load the data into a new table, rename the old table to obsolete, and rename the new table to the old table name.
You can load the data into a staging table and then write a SQL statement to update the target table from the staging table.
You can open the CSV file in Excel and write a formula to generate an update script, drag the formula down across all rows so that you get a separate update statement for each row, and then run the separate update statements in management studio.
You can truncate the target table and update your existing ssis package that imports the file to use the new columns if you have the full history in your CSV file.
There are more options, but any of the above would probably be more than adequate solutions.