SQL Server 2008 Ent
ASP.NET MVC 2.0
Linq-to-SQL
I am building a gaming site, that tracks when a particular player (toon) had downed a particular monster (boss). Table looks something like:
int ToonId
int BossId
datetime LastKillTime
I use a 3d party service that gives me back latest information (toon,boss,time).
Now I want to update my database with that new information.
Brute force approach is to do line-by-line upsert. But It looks ugly (code-wise), and probably slow too.
I think better solution would be to insert new data (using temp table?) and then run MERGE statement.
Is it good idea? I know temp tables are "better-to-avoid". Should I create a permanent "temp" table just for this operation?
Or should I just read entire current set (100 rows at most), do merge and put it back from within application?
Any pointers/suggestions are always appreciated.
An ORM is the wrong tool for performing batch operations, and Linq-to-SQL is no exception. In this case I think you have picked the right solution: Store all entries in a temporary table quickly, then do the UPSERT using merge.
The fastest way to store the data to the temporary table is to use SqlBulkCopy to store all data to a table of your choice.
If you're using Linq-to-SQL, upserts aren't that ugly..
foreach (var line in linesFromService) {
var kill = db.Kills.FirstOrDefault(t=>t.ToonId==line.ToonId && t.BossId==line.BossId);
if (kill == null) {
kill = new Kills() { ToonId = line.ToonId, BossId = line.BossId };
db.Kills.InsertOnSubmit(kill);
}
kill.LastKillTime = line.LastKillTime;
}
db.SubmitChanges();
Not a work of art, but nicer than in SQL. Also, with only 100 rows, I wouldn't be too concerned about performance.
Looks like a straight-forward insert.
private ToonModel _db = new ToonModel();
Toon t = new Toon();
t.ToonId = 1;
t.BossId = 2;
t.LastKillTime = DateTime.Now();
_db.Toons.InsertOnSubmit(t);
_db.SubmitChanges();
To update without querying the records first, you can do the following. It will still hit the db once to check if record exists but will not pull the record:
var blob = new Blob { Id = "some id", Value = "some value" }; // Id is primary key (PK)
if (dbContext.Blobs.Contains(blob)) // if blob exists by PK then update
{
// This will update all columns that are not set in 'original' object. For
// this to work, Blob has to have UpdateCheck=Never for all properties except
// for primary keys. This will update the record without querying it first.
dbContext.Blobs.Attach(blob, original: new Blob { Id = blob.Id });
}
else // insert
{
dbContext.Blobs.InsertOnSubmit(blob);
}
dbContext.Blobs.SubmitChanges();
See here for an extension method for this.
Related
I do want to join two table in HTML5 indexed db.
I found lot samples to add, Update, Delete and listing record but can't found any samples to join multiple table.
Sample URL:
*http://users.telenet.be/kristofdegrave/*
Was wondering the same question when just generally looking at these technologies. Did not check your particular example, but here is an example from Mozilla on how to do a join if you need it:
https://hacks.mozilla.org/2010/06/comparing-indexeddb-and-webdatabase/
It is quite a lot of code compared to SQL, but I think the idea is that abstracting these using JavaScript later is quite easy.
There is no join, but you can open multiple object stores and join during cursor iterations.
Using my own ydn-db library, you can query SELECT * FROM Supplier, Part WHERE Supplier.CITY = Part.CITY by
var iter_supplier = new ydn.db.IndexValueIterator('Supplier', 'CITY');
var iter_part = new ydn.db.IndexValueIterator('Part', 'CITY');
var req = db.scan(function(keys, values) {
var SID = keys[0];
var PID = keys[1];
console.log(SID, PID);
if (!SID || !PID) {
return []; // done
}
var cmp = ydn.db.cmp(SID, PID); // compare keys
if (cmp == 0) {
console.log(values[0], values[1]);
return [true, true]; // advance both
} else if (cmp == 1) {
return [undefined, SID]; // jump PID cursor to match SID
} else {
return [PID, undefined]; // jump SID cursor to match PID
}
}, [iter_supplier, iter_part]);
See more detail on Join query article.
As far as I know, IndexedDB does not have an API for doing JOINs, yet. The solutions I adopted involved opening a cursor, looping through the results and doing the join manually. It's the dreadful RBAR approach, but I couldn't find a better solution.
Instead of trying to join, redesign how you store the data so that a join is not required. You do not need to follow the same normalization constraints as SQL when you are using a no-sql approach. No-sql supports storing data redundantly when it is appropriate.
IndexedDB is in fact an object store and normaly in an object store there is less need for joins because you can just save the nested structure.
In the case you show, data from a codetable gets joined with real data. The case to solve this without a join is just fetch your code table into memory (normally this data never changes) and make the join in code.
I have the following method to insert millions of rows of data into a table (I use SQL 2008) and it seems slow, is there any way to speed up INSERTs?
Here is the code snippet - I use MS enterprise library
public void InsertHistoricData(List<DataRow> dataRowList)
{
string sql = string.Format( #"INSERT INTO [MyTable] ([Date],[Open],[High],[Low],[Close],[Volumn])
VALUES( #DateVal, #OpenVal, #High, #Low, #CloseVal, #Volumn )");
DbCommand dbCommand = VictoriaDB.GetSqlStringCommand( sql );
DB.AddInParameter(dbCommand, "DateVal", DbType.Date);
DB.AddInParameter(dbCommand, "OpenVal", DbType.Currency);
DB.AddInParameter(dbCommand, "High", DbType.Currency );
DB.AddInParameter(dbCommand, "Low", DbType.Currency);
DB.AddInParameter(dbCommand, "CloseVal", DbType.Currency);
DB.AddInParameter(dbCommand, "Volumn", DbType.Int32);
foreach (NasdaqHistoricDataRow dataRow in dataRowList)
{
DB.SetParameterValue( dbCommand, "DateVal", dataRow.Date );
DB.SetParameterValue( dbCommand, "OpenVal", dataRow.Open );
DB.SetParameterValue( dbCommand, "High", dataRow.High );
DB.SetParameterValue( dbCommand, "Low", dataRow.Low );
DB.SetParameterValue( dbCommand, "CloseVal", dataRow.Close );
DB.SetParameterValue( dbCommand, "Volumn", dataRow.Volumn );
DB.ExecuteNonQuery( dbCommand );
}
}
Consider using bulk insert instead.
SqlBulkCopy lets you efficiently bulk
load a SQL Server table with data from
another source. The SqlBulkCopy class
can be used to write data only to SQL
Server tables. However, the data
source is not limited to SQL Server;
any data source can be used, as long
as the data can be loaded to a
DataTable instance or read with a
IDataReader instance. For this example
the file will contain roughly 1000
records, but this code can handle
large amounts of data.
This example first creates a DataTable and fills it with the data. This is kept in memory.
DataTable dt = new DataTable();
string line = null;
bool firstRow = true;
using (StreamReader sr = File.OpenText(#"c:\temp\table1.csv"))
{
while ((line = sr.ReadLine()) != null)
{
string[] data = line.Split(',');
if (data.Length > 0)
{
if (firstRow)
{
foreach (var item in data)
{
dt.Columns.Add(new DataColumn());
}
firstRow = false;
}
DataRow row = dt.NewRow();
row.ItemArray = data;
dt.Rows.Add(row);
}
}
}
Then we push the DataTable to the server in one go.
using (SqlConnection cn = new SqlConnection(ConfigurationManager.ConnectionStrings["ConsoleApplication3.Properties.Settings.daasConnectionString"].ConnectionString))
{
cn.Open();
using (SqlBulkCopy copy = new SqlBulkCopy(cn))
{
copy.ColumnMappings.Add(0, 0);
copy.ColumnMappings.Add(1, 1);
copy.ColumnMappings.Add(2, 2);
copy.ColumnMappings.Add(3, 3);
copy.ColumnMappings.Add(4, 4);
copy.DestinationTableName = "Censis";
copy.WriteToServer(dt);
}
}
One general tip on any relational database when doing a large number of inserts, or indeed any data change, is to drop the all your secondary indexes first then recreate them afterwards.
Why does this work? Well with secondary indexes the index data will be elsewhere on the disk than the data, so forcing at best an additional read/write update for each record written to the table per index. In fact it may be much worse than this as from time to time the database will decide it needs to carry out a more serious reorganisation operation on the index.
When you recreate the index at the end of the insert run the database will perform just one full table scan to read and process the data. Not only do you end up with a better organised index on disk, but the total amount of work required will be less.
When is this worthwhile doing? That depends upon your database, index structure and other factors (such as if you have your indexes on a separate disk to your data) but my rule of thumb is to consider it if I am processing more than 10% of the records in a table of a million records or more - and then check with test inserts to see if it is worthwhile.
Of course on any particular database there will be specialist bulk insert routines, and you should also look at those.
FYI - looping through a record set and doing a million+ inserts on a relational DB, is the worst case scenario when loading a table. Some languages now offer record-set objects. For fastest performance SMINK is right, use BULK INSERT. Millions of rows loaded in minutes, rather than hours. Orders of magnitude faster than any other method.
As an example, I worked on a eCommerce project, that required a product list refresh each night. 100,000 rows inserted into a high-end Oracle DB, took 10 hours. If I remember correctly the top speed to when doing row-by-row inserts is aprox 10 recs/sec. Painful slow and completely unnecessary. With bulk insert - 100K rows should take less than a minute.
Hope this helps.
Where the data come from? Could you run a bulk insert? If so, that is the best option you could take.
I am trying to get the records from the 'many' table of a one-to-many relationship and add them as a list to the relevant record from the 'one' table.
I am also trying to do this in a single database request.
Code derived from Linq to Sql - Populate JOIN result into a List almost achieves the intended result, but makes one database request per entry in the 'one' table which is unacceptable. That failing code is here:
var res = from variable in _dc.GetTable<VARIABLE>()
select new { x = variable, y = variable.VARIABLE_VALUEs };
However if I do a similar query but loop through all the results, then only a single database request is made. This code achieves all goals:
var res = from variable in _dc.GetTable<VARIABLE>()
select variable;
List<GDO.Variable> output = new List<GDO.Variable>();
foreach (var v2 in res)
{
List<GDO.VariableValue> values = new List<GDO.VariableValue>();
foreach (var vv in v2.VARIABLE_VALUEs)
{
values.Add(VariableValue.EntityToGDO(vv));
}
output.Add(EntityToGDO(v2));
output[output.Count - 1].VariableValues = values;
}
However the latter code is ugly as hell, and it really feels like something that should be do-able in a single linq query.
So, how can this be done in a single linq query that makes only a single database query?
In both cases the table is set to preload using the following code:
_dc = _db.CreateLinqDataContext();
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<VARIABLE>(v => v.VARIABLE_VALUEs);
_dc.LoadOptions = loadOptions;
I am using .NET 3.5, and the database back-end was generated using SqlMetal.
This link may help
http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
Look under join operators. You'll probably have to change from using extension syntax other syntax too. Like this,
var = from obj in dc.Table
from obj2 in dc.Table2
where condition
select
My workplace doesn't use identity columns or GUIDs for primary keys. Instead, we retrieve "next IDs" from a table as needed, and increment the value for each insert.
Unfortunatly for me, LINQ-TO-SQL appears to be optimized around using identity columns. So I need to query and update the "NextId" table whenever I perform an insert. For simplicity, I do this during the creation of the new object:
var db = new DataContext( "...connection string..." );
var car = Car
{
Id = GetNextId<Car>( db ),
TopSpeed = 88.0
};
db.InsertOnSubmit( car );
db.SubmitChanges();
The GetNextId method is something like this:
public int GetNextId<T>( DataContext db )
{
using ( var transaction = new TransactionScope ( TransactionScopeOption.RequiresNew ) )
{
var nextId = (from n in db.GetTable<NextId> ()
where n.TableName == typeof(T).Name
select n).Single ();
nextId.Value += 1;
db.SubmitChanges ();
transaction.Complete ();
return nextId.Value - 1;
}
}
Since all operations between creation of the data context and the call to SubmitChanges are part of one transaction, do I need to create a separate data context for retrieving next IDs? Each time I need an ID, I need to query and update a table inside a transaction to prevent multiple apps from grabbing the same value. The call to SubmitChanges() in the GetNextId() method would submit all previous operations, which I don't want to do.
Is a separate data context the only way, or is there something better I could try?
Follow up to this question. I have the following code:
string[] names = new[] { "Bob", "bob", "BoB" };
using (MyDataContext dataContext = new MyDataContext())
{
foreach (var name in names)
{
string s = name;
if (dataContext.Users.SingleOrDefault(u => u.Name.ToUpper() == s.ToUpper()) == null)
dataContext.Users.InsertOnSubmit(new User { Name = name });
}
dataContext.SubmitChanges();
}
...and it inserts all three names ("Bob", "bob" and "BoB"). If this was Linq-to-Objects, it wouldn't.
Can I make it look at the pending changes as well as what's already in the table?
I don't think that would be possible in general. Imagine you made a query like this:
dataContext.Users.InsertOnSubmit(new User { GroupId = 1 });
var groups = dataContext.Groups.Where(grp => grp.Users.Any());
The database knows nothing about the new user (yet) because the insert wasn't commited yet, so the generated SQL query might not return the Group with Id = 1. The only way the DataContext could take into account the not-yet-submitted insert in cases like this would be to get the whole Groups-Table (and possibly more tables, if they are affected by the query) and perform the query on the client, which is of course undesirable. I guess the L2S designers decided that it would be counterintuitive if some queries took not-yet-committed inserts into account while others wouldn't, so they chose to never take them into account.
Why don't you use something like
foreach (var name in names.Distinct(StringComparer.InvariantCultureIgnoreCase))
to filter out duplicate names before hitting the database?
Why dont you try something like this
foreach (var name in names)
{
string s = name;
if (dataContext.Users.SingleOrDefault(u => u.Name.ToUpper() == s.ToUpper()) == null)
{
dataContext.Users.InsertOnSubmit(new User { Name = name });
break;
}
}
I am sorry, I don't understand LINQ to SQL as much.
But, when I look at the code, it seems you are telling it to insert all the records at once (similar to a transaction) using SubmitChanges and you are trying to check the existence of it from the DB, when the records are not inserted at all.
EDIT: Try putting the SubmitChanges inside the loop and see that the code will run as per your expectation.
You can query the appropriate ChangeSet collection, such as
if(
dataContext.Users.
Union(dataContext.GetChangeSet().Inserts).
Except(dataContext.GetChangeSet().Deletes).
SingleOrDefault(u => u.Name.ToUpper() == s.ToUpper()) == null)
This will create a union of the values in the Users table and the pending Inserts, and will exclude pending deletes.
Of course, you might want to create a changeSet variable to prevent multiple calls to the GetChangeSet function, and you may need to appropriately cast the object in the collection to the appropriate type. In the Inserts and Deletes collections, you may want to filter it with something like
...GetChangeSet().Inserts.Where(o => o.GetType() == typeof(User)).OfType<User>()...