Running very low on ideas here. I've got a case where I'm using SqlBulkCopy to pump data into a DB, and about halfway through I run into different exceptions (primary key violations, index violations, etc).
I've confirmed that the violations are in fact true and need to be corrected in the data. What's infuriating, though, is that if I were writing to the DB with a DataAdapter (which would be far slower), the bad rows in the DataSet would have HasErrors turned on so I could easily find them and take care of things. With SqlBulkCopy? Zilch. Nada. Good luck finding whoever caused your problem because all you'll get is an error name (like "primary key violation in yada yada yada, blah blah blah") and that's it.
Any suggestions? I can't believe there's no way to get these errors. With the standard BCP I think you can even pump these things to a log file. Can't we do something like this with SqlBulkCopy?
Thx,
Have a look at the source code for SqlBulkCopy - you'll get your answer there - there is no real error handling there.
One thing you could do, although lame that you would have to, is to run an erroring batch with the batch size set to 1.
When I'm doing data imports requiring validation, I usually will dump the data into a table that will take the data as is, then run a stored proc or some other sql that can validate my data in a set based manner, do the transformations, and put it into the final destination in a way that I can control.
Related
My server is using a MySQL DB, connecting to it via the C++ connector. I'm nearing production and I've been spending some time trying to break things as part of hardening the server.
One action item I had was to see what would happen if I execute a statement with a string that is longer than VARCHAR. For example, if I have a column defined as VARCHAR(4) and then set it to the string "hello".
This of course throws an exception with the error code 1406 (Data too long for column).
What I was wondering was if there was a good or standard way to defend against this? Obviously one thing is to check against the string length and truncate manually. I can do this, however there are many tables and several columns with VARCHAR. So my worry is updating server code if one of the columns using VARCHAR has its length increased (i.e. code maintainability)
Note that the server does do some validation up front. I'm just trying to defend against a subtle bug or corner case that lets something slip through.
A couple of other options on the table are to disable strict so it will give a warning and truncate or to convert VARCHAR to TEXT.
I was wondering a few things.
Is there a recommended method to handle this situation?
What are the disadvantages of disabling strict?
Is it worth (and is it possible) to query the DB at runtime the VARCHAR lengths? Note that I'm using the C++ connector. I suppose I could also write a tool that is run before compiling which would extract out VARCHAR lengths from the SQL code used to generate tables. But that then makes me wonder is I'm over engineering this.
I'm just sorting through the possible approaches now and thought I'd seek advice from those with more experience with MySQL.
As an experience database engineer I would recommend a combination of the follow two strategies:
1) If you that know that a there is a chance, however small, that data for your varchar(4) could go higher than 4 then make the varchar field larger than 4. For example, if you expect that the field can go as high as 8 then set the field to varchar(10). The beauty of using a varchar field instead of a char is that a varchar will only use whatever storage it needs.
2) If there is a real issue with data constantly being larger than the varchar field length then you should right your own exception handler to trap for the 1406 error. For the exception to work properly you will need to come up with some type of strategy on exactly how you want to handle the exception. For example, you could send an error to the user and ask them to fix the problem, you could accept the data but truncated it so it fits into the field, or you could send the error to a log file to get fixed at a later time.
I've been searching for a quick way to do this after my first few thoughts have failed me, but I haven't found anything.
My Issue
I'm importing raw client data into an Access database where the flat file they provide is parsed and converted into a standardized format for our organization. I do this for all of our clients, but this particular client's software gives us a file that puts "(NULL)" in every field that should be NULL. lol as a result, I have a ton of strings rather than a null field!
My goal is to do a data cleanse of the entire TABLE, rather than perform the cleanse at the FIELD level (as I do in my temporary solution below).
Data Cleanse
Temporary Solution:
I can't add those strings to our datawarehouse, so for now, I just have a query with an IIF statement check that replaces "(NULL)" with "" for each field (which took awhile to setup since the client file has roughly 96 fields). This works. However, we work with hundreds of clients, so I'd like to make a scale-able solution that doesn't require many changes if another client has a similar file; not to mention that if this client changes something in their file, I might have to redo my field specific statements.
Long-term Solution:
My first thought was an UPDATE query. I was hoping I could do something like:
UPDATE [ImportedRaw_T]
SET [ImportedRaw_T].* = ""
WHERE ((([ImportedRaw_T].* = "(NULL)"));
This would be easily scale-able, since for further clients I'd only need to change the table name and replace "(NULL)" with their particular default. Unfortunately, you can't use SELECT * with an update query.
Can anyone think of a work-around to the SELECT * issue for the update query, or have a better solution for cleansing an entire table, rather doing the cleanse at the field level?
SIDE NOTES
This conversion is 100% automated currently (Access is called via a watch folder batch), so anything requiring manual data manipulation / human intervention is out.
I've tried using a batch script to just cleanse the data in the .txt file before importing to Access - however, this caused an issue with the fixed-width format of the .txt, which has caused even larger issues with the automatic import of the file to Access. So I'd prefer to do this in Access if possible.
Any thoughts and suggestions are greatly appreciated. Thanks!
Unfortunately it's impossible to implement this in SQL using wildcards instead of column names, there is no such kind syntax.
I would suggest VBA solution, where you need to cycle thru all table fields and if field data type is string, generate and execute SQL UPDATE command for updating current field.
Also use Null instead of "", if you really need Nulls in the field instead of empty strings, they may work differently in calculations.
So I just noticed one of my table have been dropped/deleted, I am using MySQL anyway I can trace or see logs of this?
I have no idea where to start look.
If you have binary logging enabled, you could probably see the DROP TABLE command in the binlog. However, that won't help you get the table back, of course. You would need to restore your data from a backup.
I hope you are keeping regular backups.
The next question is, why has this happened? Is your site vulnerable to SQL injection? Building SQL queries/statements with 'literals' from an external source, is inherently vulnerable.
For example: "select * from CUSTOMER where name='".cust_name."'", and all queries/statements like it, are vulnerable to attack.
Calling from the web such that cust_name=test';drop table CUSTOMER; supplies the closing apostrophe, issues a semicolon to complete the statement and then issues malicious commands which will be executed.
You should use parameterized queries and bind parameters explicitly.
See: http://en.wikipedia.org/wiki/SQL_injection
Slowely changing dimension task is not working in data flow task.. but if I remove it and use the OLEDB destination it is inserting the data. I have no idea why it is not inserting data into the table. SCD and leaf task is also not converting to yellow or green color.
I have to do insert & update on the table based on two columns.
This is what I have tried:
When I have used only OLEDB destination - everything works fine
When I am doing look up and sending matched output to command task (where I am using the update command) and no match to OLEDB destination - nothing works.
When I am using SCD and using two columns as business key and other as changing attribute - nothing works.
with "nothing works" I mean, even these task are not converting in yellow or green.
Any idea what could be the reason for this ?
Even when you get it to work, you will realise that it's slow as hell and missing lots of configuration options.
Therefore I recommend this handy component initiated and recommended by the godfather of ETL & data warehousing, Ralph Kimball.
I personally will never return to the SSIS standard component. Try it, it's awesome.
I would like to implement a custom database initialization strategy so that I can:
generate the database if not exists
if model change create only new tables
if model change create only new fields without dropping the table and losing the data.
Thanks in advance
You need to implement IDatabaseInitializer interface.
Eg
public class MyInitializer : IDatabaseInitializer<MyDbContext>
{
public void InitializeDatabase(MyDbContext context)
{
//your logic here
}
}
And then set your initializer at your application startup
Database.SetInitializer<ProductCatalog>(new MyInitializer());
Here's an example
You will have to manually execute commands to alter the database.
context.ObjectContext.ExecuteStoreCommand("ALTER TABLE dbo.MyTable ADD NewColumn VARCHAR(20) NULL");
You can use a tool like SQL Compare to script changes.
There is a reason why this doesn't exist yet. It is very complex and moreover IDatabaseInitializer interface is not very prepared for such that (there is no way to make such initialization database agnostic). Your question is "too broad" to be answered to your satisfaction. With your reaction to #Eranga's correct answer you simply expect that somebody will tell you step by step how to do that but we will not - that would mean we will write the initializer for you.
What you need to do what you want?
You must have very good knowledge of SQL Server. You must know how does SQL server store information about database, tables, columns and relations = you must understand sys views and you must know how to query them to get data about current database structure.
You must have very good knowledge of EF. You must know how does EF store mapping information. You must be able to explore metadata get information about expected tables, columns and relations.
Once you have old database description and new database description you must be able to write a code which will correctly explore changes and create SQL DDL commands for changing your database. Even this look like the simplest part of the whole process this is actually the hardest one because there are many other internal rules in SQL server which cannot be violated by your commands. Sometimes you really need to drop table to make your changes and if you don't want to lose data you must first push them to temporary table and after recreating table you must push them back. Sometimes you are doing changes in constraints which can require temporarily turning constrains off, etc. There is good reason why tools which do this on SQL level (comparing two databases) are probably all commercial.
Even ADO.NET team doesn't implemented this and they will not implement it in the future. Instead they are working on something called migrations.
Edit:
That is true that ObjectContext can return you script for database creation - that is exactly what default initializers are using. But how it could help you? Are you going to parse that script to see what changed? Are you going to execute that script in another connection to use the same code as for current database to see its structure?
Yes you can create a new database, move data from the old database to a new one, delete the old one and rename a new one but that is the most stupid solution you can ever imagine and no database administrator will ever allow that. Even this solution still requires analysis of changes to create correct data transfer scripts.
Automatic upgrade is a wrong way. You should always prepare upgrade script manually with help of some tools, test it and after that execute it manually or as part of some installation script / package. You must also backup your database before you are going to do any changes.
The best way to achieve this is probably with migrations:
http://nuget.org/List/Packages/EntityFramework.SqlMigrations
Good blog posts here and here.