How can I speed up updating lots of rows - sql-server-2008

I have a table that has 1.400.000 entries. Its is a simple list of documents
Table - Document
ID int
DocumentPath nvarchar
DocumentValid
bit
I scan a directory and set any document found in the directory as valid.
public void SetReportsToValidated(List<int> validatedReports)
{
SqlConnection myCon = null;
try
{
myCon = new SqlConnection(_conn);
myCon.Open();
foreach (int id in validatedReports)
{
SqlDataAdapter myAdap = new SqlDataAdapter("update_DocumentValidated", myCon);
myAdap.SelectCommand.CommandType = CommandType.StoredProcedure;
SqlParameter pId = new SqlParameter("#Id", SqlDbType.Int);
pId.Value = id;
myAdap.SelectCommand.Parameters.Add(pId);
myAdap.SelectCommand.ExecuteNonQuery();
}
}
catch (SystemException ex)
{
_log.Error(ex);
throw;
}
finally
{
if (myCon != null)
{
myCon.Close();
}
}
}
The performance of Updates is ok, but I want more. It takes more than 1 hour to update 1000000 of the documents to valid. Is there any good way to speed up the updates? I am thinking of using some kind of batch (like table valued parameters).
Each update takes some 5-10ms when profiled on SQLServer.

Read the reports in and append them together in a DataTable (since they have the same dimensions) then use the SqlBulkCopy object for to upload the entire thing. Will probably work better for you. I don't think you will have memory issues given the small number of columns and rows.

At the moment you are calling the db for each record individually. You can use the SqlDataAdapter to do bulk updates by (in a very brief nutshell):
1) define one SqlDataAdapter
2) set the .UpdateCommand on the adapter to your update sproc
3) call the .Update method on the adapter, passing it a DataTable containing the ids of documents to be updated. This will batch up the updated rows from the DataTable in to the DB, calling the sproc for each record in a batched manner. You can control the Batch Size via the .BatchSize property.
4) So what you're doing is removing the manual, row by row looping which is inefficient for batched updates.
See examples:
http://support.microsoft.com/kb/308055
http://www.c-sharpcorner.com/UploadFile/61b832/4430/
Alternatively, you could:
1) Use SqlBulkCopy to bulk insert all the IDs into a new table in the database (highly efficient)
2) Once loaded in to that staging table, run a single SQL statement to update your main table from that staging table to validate the documents.
See examples:
http://www.adathedev.co.uk/2010/02/sqlbulkcopy-bulk-load-to-sql-server.html
http://www.adathedev.co.uk/2011/01/sqlbulkcopy-to-sql-server-in-parallel.html

Instead of creating the adapter and parameter every time in the loop just create them once and assign different value to the parameter:
SqlDataAdapter myAdap = new SqlDataAdapter("update_DocumentValidated", myCon);
myAdap.SelectCommand.CommandType = CommandType.StoredProcedure;
SqlParameter pId = new SqlParameter("#Id", SqlDbType.Int);
myAdap.SelectCommand.Parameters.Add(pId);
foreach (int id in validatedReports)
{
myAdap.SelectCommand.Parameters[0].Value = id;
myAdap.SelectCommand.ExecuteNonQuery();
}
This might not result in a very dramatic improvement but is better compared to the original code. Also, as you are manually executing the SqlCommand object you do not need the adapter at all. Just use the SqlCommand directly.

Related

How to execute multiple statement with variables in C# OdbcCommand object

I want to execute below MySql queries at a time through OdbcCommand object within C# as dynamic query, it always fails:
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
set #row=0;
select * from
(
select #row:=#row+1 as my____row_num,
cities.`cityid`,
cities.`cityname`,
cities.`countryid`,
cities.`countryname` , '1' as my____data_row_created , '1' as
my____data_row_updated from `cities` ) p
where my____row_num>=101 and my____row_num<=200;
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ ;
I'm using below method to execute above MySql queries:
ExcuteCommand(Sql)
{
DataTable dt = new DataTable();
OdbcCommand SQLCommand = new OdbcCommand(Sql);
OdbcConnection Con = new OdbcConnection(ConnectionString);
try
{
Con.Open();
SQLCommand.Connection = Con;
OdbcDataAdapter da = new OdbcDataAdapter(SQLCommand);
da.Fill(dt);
Con.Close();
Con.Dispose();
}
catch
{
try
{
Con.Close();
}
catch { }
throw;
}
return dt;
}
I found solution from here. While executing multiple dynamic MySql statements through ODBC in C# we have two options:
Execute separately every command
Use stored procedures
In my case I'm bound to use dynamic-quires because I'm having only read-access on database.
Solution:
Rather than Declaring variable and set it, I used another technique to use a session variable as a derived table and crossed join it with the main table. See the following query, in my scenario I changes to below MySql query code and removed both SET SESSION related code from the query, and it worked properly:
select * from
(
select #row:=#row+1 as my____row_num,
cities.`cityid`,
cities.`cityname`,
cities.`countryid`,
cities.`countryname` , '1' as my____data_row_created , '1' as
my____data_row_updated from `cities` ,(select #row:=0) as t ) p
where my____row_num>=101 and my____row_num<=200;
I'm not going to attempt to solve your MySQL problem, but your C# code can and should be written better, and since comments are not suited for codes, I thought I'd better write this as an answer.
So here is an improvement to your C# part:
DataTable FillDataTable(string sql)
{
var dataTable = new DataTable();
using(var con = new OdbcConnection(ConnectionString))
{
using(var command = new OdbcCommand(sql, con))
{
using(var dataAdapter = new OdbcDataAdapter(SQLCommand))
{
dataAdapter.Fill(dataTable);
}
}
}
return dataTable;
}
Points of interests:
I've renamed your method to a more descriptive name. ExecuteCommand doesn't say anything about what this method does. FillDataTable is self explanatory.
The using statement ensures the disposing of instances implementing the IDisposable interface - And almost all ADO.Net classes are implementing it.
The disposing of an OdbcConnection also close it, so you don't need to explicitly close it yourself.
There is no point of catching exceptions if you are not doing anything with them. The thumb rule is to throw early, catch late. (actually catch as soon as you can do something about it like write to log, show a message to the user, retry etc').
DataAdapters implicitly opens the Connection object, no need to explicitly open it.
Other two improvements you can do are:
Have this method also accepts parameters.
Have this method also accept the CommandType as a parameter (currently, using a stored procedure with this will not work since the default value of CommandType is Text
So, an even better version would be this:
DataTable FillDataTable(string sql, CommandType commandType, params OdbcParameter[] parameters)
{
var dataTable = new DataTable();
using(var con = new OdbcConnection(ConnectionString))
{
using(var command = new OdbcCommand(sql, con))
{
command.CommandType = commandType;
command.Parameters.AddRange(parameters);
using(var dataAdapter = new OdbcDataAdapter(SQLCommand))
{
dataAdapter.Fill(dataTable);
}
}
}
return dataTable;
}
If you want to improve that even further, You can have a look at my GitHub ADONETHelper project - There I have a single private method for Execute, and the methods for filling data tables, filling data sets, execute non query etc' all use this single method.
would you please try this instead
declare #row int
set #row=0;
select * from
(
select SUM(#row,1) as my____row_num,
cities.cityid as CityID,
cities.cityname as CityName,
cities.countryid as CountryID,
cities.countryname as CountryName ,
'1' as my____data_row_created , '1' as my____data_row_updated from cities) //i did not understand the meaning of this
where (my____row_num BETWEEN 100 AND 200 )
backEnd
ExcuteCommand(Sql)
{
<AddThis>ConnectionString= ConfigurationManager.ConnectionStrings["YourDataBaseLocation_OR_theConnectionCreatedViaProperties"].Connectionstring;</AddThis>
DataTable dt = new DataTable();
<deleteThis> OdbcCommand SQLCommand = new OdbcCommand(Sql);</deletethis>
//You Need to add the connection you have used it and Odbc
//Command.CommandType= CommandType.StoredProcedure();
OdbcConnection Con = new OdbcConnection(ConnectionString);
<AddThis>OdbcCommand SqlCommand = new OdbcCommand(Sql,Con);</AddThis>
try
{
Con.Open();
SQLCommand.Connection = Con;
OdbcDataAdapter da = new OdbcDataAdapter(SQLCommand);
da.Fill(dt);
<add this > SQLCommand.ExecuteNonQuery();</Add this>
Con.Close();
<delete> Con.Dispose();</delete>
}
catch
{
try
{
Con.Close();
}
catch (Exception e) { }
throw (e);
}
return dt;
}

SqlDependency and table update do not refresh DataContext

I'm having trouble with the implementation of SqlDependency in my project.
I'm using SqlDependency in a WCF Service. WCF Service then holds in memory cache all results from all tables in order to have a huge speed gain. Everything seems to be working fine, except when I'm doing a table row update. If I add or delete a row in my table, DataContext is refreshed and cache is invalidated without problems. But when it comes to a table row update, nothing happens, the cache is not invalidated and when I look in debug mode at the content of DataContext, no changes seems to be there.
Here's the code I'm using (note that I'm using the System.Runtime.Caching object) :
public static List<T> LinqCache<T>(this Table<T> query) where T : class
{
ObjectCache cache = MemoryCache.Default;
string tableName =
query.Context.Mapping.GetTable(typeof(T)).TableName;
List<T> result = cache[tableName] as List<T>;
if (result == null)
{
using (SqlConnection conn =
new SqlConnection(query.Context.Connection.ConnectionString))
{
conn.Open();
SqlCommand cmd = new SqlCommand(
query.Context.GetCommand(query).CommandText, conn);
cmd.Notification = null;
cmd.NotificationAutoEnlist = true;
SqlDependency dependency = new SqlDependency(cmd);
SqlChangeMonitor sqlMonitor =
new SqlChangeMonitor(dependency);
CacheItemPolicy policy = new CacheItemPolicy();
policy.ChangeMonitors.Add(sqlMonitor);
cmd.ExecuteNonQuery();
result = query.ToList();
cache.Set(tableName, result, policy);
}
}
return result;
}
I created an extension method so all I have to do is to query any table like that :
List<MyTable> list = context.MyTable.LinqCache();
My DataContext is opened at the Global.asax Application_OnStart and stored in cache, so I can use it whenever I want in my WCF Service. As well at this moment I'm opening the SqlDependency object with
SqlDependency.Start(
ConfigurationManager.ConnectionStrings[myConnectionString].ConnectionString);
So, is that a limitation of SqlDependency, or I'm doing something wrong/missing something in the process?
I think the problem is that although you do all the work in setting up the command object you then do:
cmd.ExecuteNonQuery();
result = query.ToList();
Which is going to use your SQL Command and throw away the results then LINQ to SQL will generate it's own internally via query.ToList(). Thankfully you can ask LINQ to SQL to execute your own command and translate the results for you so try replacing those two lines with:
results = db.Translate<T>(cmd.ExecuteReader());

How do I get the next identity ID before Submit in Linq-to-Sql?

I want to get the next identity ID and then log it somewhere. Only after this do I want to call SubmitChanges().
Wrap the DataContext in a database transaction and call SubmitChanges to write changes to the database within that transaction. This way you can get the auto generated ID while being able to keep the operation transactional:
using (var con = new SqlConnection(conStr))
{
con.Open();
using (var tran = con.BeginTransaction())
{
using (var db = new YourDataContext(con))
{
// Setting the transaction is needed in .NET 3.5.
// It's a bug in L2S and was fixed in .NET 4.0.
db.Transaction = tran;
var entity = new MyEntity();
db.MyEntities.InsertOnSubmit(entity);
db.SubmitChanges();
var id = entity.Id;
// Do something useful with this id
}
tran.Commit();
}
}
Wrap the whole thing in a transaction, do a SELECT IDENT_CURRENT('table_name') then submit your changes, the commit the transaction. If you lock the table that should prevent someone else from inserting a record after your SELECT IDENT_CURRENT and before you insert which should give you the correct identity value.
you need to add the lastModified date column in DB and get it,if not and increment the identity column
Other wise better do SubmitChanges() and get the nextID

Do MERGE using Linq to SQL

SQL Server 2008 Ent
ASP.NET MVC 2.0
Linq-to-SQL
I am building a gaming site, that tracks when a particular player (toon) had downed a particular monster (boss). Table looks something like:
int ToonId
int BossId
datetime LastKillTime
I use a 3d party service that gives me back latest information (toon,boss,time).
Now I want to update my database with that new information.
Brute force approach is to do line-by-line upsert. But It looks ugly (code-wise), and probably slow too.
I think better solution would be to insert new data (using temp table?) and then run MERGE statement.
Is it good idea? I know temp tables are "better-to-avoid". Should I create a permanent "temp" table just for this operation?
Or should I just read entire current set (100 rows at most), do merge and put it back from within application?
Any pointers/suggestions are always appreciated.
An ORM is the wrong tool for performing batch operations, and Linq-to-SQL is no exception. In this case I think you have picked the right solution: Store all entries in a temporary table quickly, then do the UPSERT using merge.
The fastest way to store the data to the temporary table is to use SqlBulkCopy to store all data to a table of your choice.
If you're using Linq-to-SQL, upserts aren't that ugly..
foreach (var line in linesFromService) {
var kill = db.Kills.FirstOrDefault(t=>t.ToonId==line.ToonId && t.BossId==line.BossId);
if (kill == null) {
kill = new Kills() { ToonId = line.ToonId, BossId = line.BossId };
db.Kills.InsertOnSubmit(kill);
}
kill.LastKillTime = line.LastKillTime;
}
db.SubmitChanges();
Not a work of art, but nicer than in SQL. Also, with only 100 rows, I wouldn't be too concerned about performance.
Looks like a straight-forward insert.
private ToonModel _db = new ToonModel();
Toon t = new Toon();
t.ToonId = 1;
t.BossId = 2;
t.LastKillTime = DateTime.Now();
_db.Toons.InsertOnSubmit(t);
_db.SubmitChanges();
To update without querying the records first, you can do the following. It will still hit the db once to check if record exists but will not pull the record:
var blob = new Blob { Id = "some id", Value = "some value" }; // Id is primary key (PK)
if (dbContext.Blobs.Contains(blob)) // if blob exists by PK then update
{
// This will update all columns that are not set in 'original' object. For
// this to work, Blob has to have UpdateCheck=Never for all properties except
// for primary keys. This will update the record without querying it first.
dbContext.Blobs.Attach(blob, original: new Blob { Id = blob.Id });
}
else // insert
{
dbContext.Blobs.InsertOnSubmit(blob);
}
dbContext.Blobs.SubmitChanges();
See here for an extension method for this.

speed up sql INSERTs

I have the following method to insert millions of rows of data into a table (I use SQL 2008) and it seems slow, is there any way to speed up INSERTs?
Here is the code snippet - I use MS enterprise library
public void InsertHistoricData(List<DataRow> dataRowList)
{
string sql = string.Format( #"INSERT INTO [MyTable] ([Date],[Open],[High],[Low],[Close],[Volumn])
VALUES( #DateVal, #OpenVal, #High, #Low, #CloseVal, #Volumn )");
DbCommand dbCommand = VictoriaDB.GetSqlStringCommand( sql );
DB.AddInParameter(dbCommand, "DateVal", DbType.Date);
DB.AddInParameter(dbCommand, "OpenVal", DbType.Currency);
DB.AddInParameter(dbCommand, "High", DbType.Currency );
DB.AddInParameter(dbCommand, "Low", DbType.Currency);
DB.AddInParameter(dbCommand, "CloseVal", DbType.Currency);
DB.AddInParameter(dbCommand, "Volumn", DbType.Int32);
foreach (NasdaqHistoricDataRow dataRow in dataRowList)
{
DB.SetParameterValue( dbCommand, "DateVal", dataRow.Date );
DB.SetParameterValue( dbCommand, "OpenVal", dataRow.Open );
DB.SetParameterValue( dbCommand, "High", dataRow.High );
DB.SetParameterValue( dbCommand, "Low", dataRow.Low );
DB.SetParameterValue( dbCommand, "CloseVal", dataRow.Close );
DB.SetParameterValue( dbCommand, "Volumn", dataRow.Volumn );
DB.ExecuteNonQuery( dbCommand );
}
}
Consider using bulk insert instead.
SqlBulkCopy lets you efficiently bulk
load a SQL Server table with data from
another source. The SqlBulkCopy class
can be used to write data only to SQL
Server tables. However, the data
source is not limited to SQL Server;
any data source can be used, as long
as the data can be loaded to a
DataTable instance or read with a
IDataReader instance. For this example
the file will contain roughly 1000
records, but this code can handle
large amounts of data.
This example first creates a DataTable and fills it with the data. This is kept in memory.
DataTable dt = new DataTable();
string line = null;
bool firstRow = true;
using (StreamReader sr = File.OpenText(#"c:\temp\table1.csv"))
{
while ((line = sr.ReadLine()) != null)
{
string[] data = line.Split(',');
if (data.Length > 0)
{
if (firstRow)
{
foreach (var item in data)
{
dt.Columns.Add(new DataColumn());
}
firstRow = false;
}
DataRow row = dt.NewRow();
row.ItemArray = data;
dt.Rows.Add(row);
}
}
}
Then we push the DataTable to the server in one go.
using (SqlConnection cn = new SqlConnection(ConfigurationManager.ConnectionStrings["ConsoleApplication3.Properties.Settings.daasConnectionString"].ConnectionString))
{
cn.Open();
using (SqlBulkCopy copy = new SqlBulkCopy(cn))
{
copy.ColumnMappings.Add(0, 0);
copy.ColumnMappings.Add(1, 1);
copy.ColumnMappings.Add(2, 2);
copy.ColumnMappings.Add(3, 3);
copy.ColumnMappings.Add(4, 4);
copy.DestinationTableName = "Censis";
copy.WriteToServer(dt);
}
}
One general tip on any relational database when doing a large number of inserts, or indeed any data change, is to drop the all your secondary indexes first then recreate them afterwards.
Why does this work? Well with secondary indexes the index data will be elsewhere on the disk than the data, so forcing at best an additional read/write update for each record written to the table per index. In fact it may be much worse than this as from time to time the database will decide it needs to carry out a more serious reorganisation operation on the index.
When you recreate the index at the end of the insert run the database will perform just one full table scan to read and process the data. Not only do you end up with a better organised index on disk, but the total amount of work required will be less.
When is this worthwhile doing? That depends upon your database, index structure and other factors (such as if you have your indexes on a separate disk to your data) but my rule of thumb is to consider it if I am processing more than 10% of the records in a table of a million records or more - and then check with test inserts to see if it is worthwhile.
Of course on any particular database there will be specialist bulk insert routines, and you should also look at those.
FYI - looping through a record set and doing a million+ inserts on a relational DB, is the worst case scenario when loading a table. Some languages now offer record-set objects. For fastest performance SMINK is right, use BULK INSERT. Millions of rows loaded in minutes, rather than hours. Orders of magnitude faster than any other method.
As an example, I worked on a eCommerce project, that required a product list refresh each night. 100,000 rows inserted into a high-end Oracle DB, took 10 hours. If I remember correctly the top speed to when doing row-by-row inserts is aprox 10 recs/sec. Painful slow and completely unnecessary. With bulk insert - 100K rows should take less than a minute.
Hope this helps.
Where the data come from? Could you run a bulk insert? If so, that is the best option you could take.