Sql Server 2008 Tuning with large transactions (700k+ rows/transaction) - sql-server-2008

So, I'm working on a database that I will be adding to my future projects as sort of a supporting db, but I'm having a bit of an issue with it, especially the logs.
The database basically needs to be updated once a month. The main table has to be purged and then refilled off of a CSV file. The problem is that Sql Server will generate a log for it which is MEGA big. I was successful in filling it up once, but wanted to test the whole process by purging it and then refilling it.
That's when I get an error that the log file is filled up. It jumps from 88MB (after shrinking via maintenance plan) to 248MB and then stops the process altogether and never completes.
I've capped it's growth at 256MB, incrementing by 16MB, which is why it failed, but in reality I don't need it to log anything at all. Is there a way to just completely bypass logging on any query being run against the database?
Thanks for any responses in advance!
EDIT: Per the suggestions of #mattmc3 I've implemented SqlBulkCopy for the whole procedure. It works AMAZING, except, my loop is somehow crashing on the very last remaining chunk that needs to be inserted. I'm not too sure where I'm going wrong, heck I don't even know if this is a proper loop, so I'd appreciate some help on it.
I do know that its an issue with the very last GetDataTable or SetSqlBulkCopy calls. I'm trying to insert 788189 rows, 788000 get in and the remaining 189 are crashing...
string[] Rows;
using (StreamReader Reader = new StreamReader("C:/?.csv")) {
Rows = Reader.ReadToEnd().TrimEnd().Split(new char[1] {
'\n'
}, StringSplitOptions.RemoveEmptyEntries);
};
int RowsInserted = 0;
using (SqlConnection Connection = new SqlConnection("")) {
Connection.Open();
DataTable Table = null;
while ((RowsInserted < Rows.Length) && ((Rows.Length - RowsInserted) >= 1000)) {
Table = GetDataTable(Rows.Skip(RowsInserted).Take(1000).ToArray());
SetSqlBulkCopy(Table, Connection);
RowsInserted += 1000;
};
Table = GetDataTable(Rows.Skip(RowsInserted).ToArray());
SetSqlBulkCopy(Table, Connection);
Connection.Close();
};
static DataTable GetDataTable(
string[] Rows) {
using (DataTable Table = new DataTable()) {
Table.Columns.Add(new DataColumn("A"));
Table.Columns.Add(new DataColumn("B"));
Table.Columns.Add(new DataColumn("C"));
Table.Columns.Add(new DataColumn("D"));
for (short a = 0, b = (short)Rows.Length; a < b; a++) {
string[] Columns = Rows[a].Split(new char[1] {
','
}, StringSplitOptions.RemoveEmptyEntries);
DataRow Row = Table.NewRow();
Row["A"] = Columns[0];
Row["B"] = Columns[1];
Row["C"] = Columns[2];
Row["D"] = Columns[3];
Table.Rows.Add(Row);
};
return (Table);
};
}
static void SetSqlBulkCopy(
DataTable Table,
SqlConnection Connection) {
using (SqlBulkCopy SqlBulkCopy = new SqlBulkCopy(Connection)) {
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("A", "A"));
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("B", "B"));
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("C", "C"));
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("D", "D"));
SqlBulkCopy.BatchSize = Table.Rows.Count;
SqlBulkCopy.DestinationTableName = "E";
SqlBulkCopy.WriteToServer(Table);
};
}
EDIT/FINAL CODE: So the app is now finished and works AMAZING, and quite speedy! #mattmc3, thanks for all the help! Here is the final code for anyone who may find it useful:
List<string> Rows = new List<string>();
using (StreamReader Reader = new StreamReader(#"?.csv")) {
string Line = string.Empty;
while (!String.IsNullOrWhiteSpace(Line = Reader.ReadLine())) {
Rows.Add(Line);
};
};
if (Rows.Count > 0) {
int RowsInserted = 0;
DataTable Table = new DataTable();
Table.Columns.Add(new DataColumn("Id"));
Table.Columns.Add(new DataColumn("A"));
while ((RowsInserted < Rows.Count) && ((Rows.Count - RowsInserted) >= 1000)) {
Table = GetDataTable(Rows.Skip(RowsInserted).Take(1000).ToList(), Table);
PerformSqlBulkCopy(Table);
RowsInserted += 1000;
Table.Clear();
};
Table = GetDataTable(Rows.Skip(RowsInserted).ToList(), Table);
PerformSqlBulkCopy(Table);
};
static DataTable GetDataTable(
List<string> Rows,
DataTable Table) {
for (short a = 0, b = (short)Rows.Count; a < b; a++) {
string[] Columns = Rows[a].Split(new char[1] {
','
}, StringSplitOptions.RemoveEmptyEntries);
DataRow Row = Table.NewRow();
Row["A"] = "";
Table.Rows.Add(Row);
};
return (Table);
}
static void PerformSqlBulkCopy(
DataTable Table) {
using (SqlBulkCopy SqlBulkCopy = new SqlBulkCopy(#"", SqlBulkCopyOptions.TableLock)) {
SqlBulkCopy.BatchSize = Table.Rows.Count;
SqlBulkCopy.DestinationTableName = "";
SqlBulkCopy.WriteToServer(Table);
};
}

If you are doing a Bulk Insert into the table in SQL Server, which is how you should be doing this (BCP, Bulk Insert, Insert Into...Select, or in .NET, the SqlBulkCopy class) you can use the "Bulk Logged" recovery model. I highly recommend reading the MSDN articles on recovery models: http://msdn.microsoft.com/en-us/library/ms189275.aspx

You can set the Recover model for each database separately. Maybe the simple recovery model will work for you. The simple model:
Automatically reclaims log space to keep space requirements small, essentially eliminating the need to manage the transaction log space.
Read up on it here.

There is no way to bypass using the transaction log in SQL Server.

Related

firebase update by batches does not work with large dataset

I want to populate a feed to almost one million of users upon a content posted by a user with high number of followers using GCP cloud functions.
In order to do this, I am designing to split the firebase update of the feed into numbers of small batches. That's because I think if I dont split the update, I might face the following issues:
i) keeping one million of users feed in memory will exceed the allocated maximum 2GB memory.
ii) update one million of entries at one go will not work (How long it takes to update one million entries?)
However, the batch update only works for me when the batch only inserting around 100 entries per update invocation. When I tried with 1000 per batch, only the 1st batch was inserted. I wonder if this is due to:
i) time-out ? however I dont see this error in the log.
ii) The array variable , userFeeds{} , keeping the batch is destroyed when the function is out of scope ?
Below is my code:
var admin = require('firebase-admin');
var spark = require('./spark');
var user = require('./user');
var Promise = require('promise');
var sparkRecord;
exports.newSpark = function (sparkID) {
var getSparkPromise = spark.getSpark(sparkID);
Promise.all([getSparkPromise]).then(function(result) {
var userSpark = result[0];
sparkRecord = userSpark;
sparkRecord.sparkID = sparkID;
// the batch update only works if the entries per batch is aroud 100 instead of 1000
populateFeedsToFollowers(sparkRecord.uidFrom, 100, null, myCallback);
});
};
var populateFeedsToFollowers = function(uid, fetchSize, startKey, callBack){
var fetchCount = 0;
//retrieving only follower list by batch
user.setFetchLimit(fetchSize);
user.setStartKey(startKey);
//I use this array variable to keep the entries by batch
var userFeeds = {};
user.getFollowersByBatch(uid).then(function(users){
if(users == null){
callBack(null, null, null);
return;
}
//looping thru the followers by batch size
Object.keys(users).forEach(function(userKey) {
fetchCount += 1;
if(fetchCount > fetchSize){
// updating users feed by batch
admin.database().ref().update(userFeeds);
callBack(null, userKey);
fetchCount = 0;
return;
}else{
userFeeds['/userFeed/' + userKey + '/' + sparkRecord.sparkID] = {
phase:sparkRecord.phase,
postTimeIntervalSince1970:sparkRecord.postTimeIntervalSince1970
}
}
});//Object.keys(users).forEach
if(fetchCount > 0){
admin.database().ref().update(userFeeds);
}
});//user.getFollowersByBatch
};
var myCallback = function(err, nextKey) {
if (err) throw err; // Check for the error and throw if it exists.
if(nextKey != null){ //if having remaining followers, keep populating
populateFeedsToFollowers(sparkRecord.uidFrom, 100, nextKey, myCallback);
}
};

DuplicateKeyException on the submitchanges method- Windows Phone

I am learning on how to insert data into the database using Linq and also to the ObservableCollection.
The data gets inserted into the ObservableCollection but not to the database. Could somebody explain whats going on with the code below. The system throws an unhandled exception on submitchanges method. Please advice.
public void populateDates(DateTime theWeek)
{
ObservableCollection<theSchedule> theDatesList = new ObservableCollection<theSchedule>();
for (int i = 0; i < 7; i++)
{
theSchedule theShift = new theSchedule
{
theDay = (theWeek.AddDays(i).ToString("dd/MM/yyyy")),
theTime = (theWeek.AddHours(i).ToString("HH:mm")) + " - " + (theWeek.AddHours(i+8).ToString("HH:mm"))
};
theDatesList.Add(theShift);
//MessageBox.Show(theWeek.AddDays(i).ToString("HH:mm"));
shiftsDb.theSchedules.InsertOnSubmit(theShift);
}
shiftsDb.SubmitChanges();
mylistbox.ItemsSource = theDatesList;
}

Bulk Insert into SQL Server 2008

for (int i = 0; i < myClass.Length; i++)
{
string upSql = "UPDATE CumulativeTable SET EngPosFT = #EngPosFT,EngFTAv=#EngFTAv WHERE RegNumber =#RegNumber AND Session=#Session AND Form=#Form AND Class=#Class";
SqlCommand cmdB = new SqlCommand(upSql, connection);
cmdB.CommandTimeout = 980000;
cmdB.Parameters.AddWithValue("#EngPosFT", Convert.ToInt32(Pos.GetValue(i)));
cmdB.Parameters.AddWithValue("#RegNumber", myClass.GetValue(i));
cmdB.Parameters.AddWithValue("#EngFTAv", Math.Round((engtot / arrayCount), 2));
cmdB.Parameters.AddWithValue("#Session", drpSess.SelectedValue);
cmdB.Parameters.AddWithValue("#Form", drpForm.SelectedValue);
cmdB.Parameters.AddWithValue("#Class", drpClass.SelectedValue);
int idd = Convert.ToInt32(cmdB.ExecuteScalar());
}
assuming myClass.Length is 60. This does 60 update statements. How can I limit it to 1 update statement. Please code example using the above code will be appreciated. Thanks
Tried using this
StringBuilder command = new StringBuilder();
SqlCommand cmdB = null;
for (int i = 0; i < myClass.Length; i++)
{
command.Append("UPDATE CumulativeTable SET" + " EngPosFT = " + Convert.ToInt32(Pos.GetValue(i)) + "," + " EngFTAv = " + Math.Round((engtot / arrayCount), 2) +
" WHERE RegNumber = " + myClass.GetValue(i) + " AND Session= " + drpSess.SelectedValue + " AND Form= " + drpForm.SelectedValue + " AND Class= " + drpClass.SelectedValue + ";");
//or command.AppendFormat("UPDATE CumulativeTable SET EngPosFT = {0},EngFTAv={1} WHERE RegNumber ={2} AND Session={3} AND Form={4} AND Class={5};", Convert.ToInt32(Pos.GetValue(i)), Math.Round((engtot / arrayCount), 2), myClass.GetValue(i), drpSess.SelectedValue, drpForm.SelectedValue, drpClass.SelectedValue);
}//max length is 128 error is encountered
Look at the BULK INSERT T-SQL command. But since I don't have a lot of personal experience with that command, I do see some immediate opportunity to improve this code using the same sql by creating the command and parameters outside of the loop, and only making the necessary changes inside the loop:
string upSql = "UPDATE CumulativeTable SET EngPosFT = #EngPosFT,EngFTAv=#EngFTAv WHERE RegNumber =#RegNumber AND Session=#Session AND Form=#Form AND Class=#Class";
SqlCommand cmdB = new SqlCommand(upSql, connection);
cmdB.CommandTimeout = 980000;
//I had to guess at the sql types you used here.
//Adjust this to match your actual column data types
cmdB.Parameters.Add("#EngPosFT", SqlDbType.Int);
cmdB.Parameters.Add("#RegNumber", SqlDbType.Int);
//It's really better to use explicit types here, too.
//I'll just update the first parameter as an example of how it looks:
cmdB.Parameters.Add("#EngFTAv", SqlDbType.Decimal).Value = Math.Round((engtot / arrayCount), 2));
cmdB.Parameters.AddWithValue("#Session", drpSess.SelectedValue);
cmdB.Parameters.AddWithValue("#Form", drpForm.SelectedValue);
cmdB.Parameters.AddWithValue("#Class", SqlDbTypedrpClass.SelectedValue);
for (int i = 0; i < myClass.Length; i++)
{
cmdB.Parameters[0].Value = Convert.ToInt32(Pos.GetValue(i)));
cmdB.Parameters[1].Value = myClass.GetValue(i));
int idd = Convert.ToInt32(cmdB.ExecuteScalar());
}
It would be better in this case to create a stored procedure that accepts a Table Valued Parameter. On the .NET side of things, you create a DataTable object containing a row for each set of values you want to use.
On the SQL Server side of things, you can treat the parameter as another table in a query. So inside the stored proc, you'd have:
UPDATE a
SET
EngPosFT = b.EngPosFT,
EngFTAv=b.EngFTAv
FROM
CumulativeTable a
inner join
#MyParm b
on
a.RegNumber =b.RegNumber AND
a.Session=b.Session AND
a.Form=b.Form AND
a.Class=b.Class
Where #MyParm is your table valued parameter.
This will then be processed as a single round-trip to the server.
In such scenarios it is always best to write a Stored Procedure and call that stored proc in the for loop, passing the necessary arguments at each call.
using System;
using System.Data;
using System.Data.SqlClient;
namespace DataTableExample
{
class Program
{
static void Main(string[] args)
{
DataTable prodSalesData = new DataTable("ProductSalesData");
// Create Column 1: SaleDate
DataColumn dateColumn = new DataColumn();
dateColumn.DataType = Type.GetType("System.DateTime");
dateColumn.ColumnName = "SaleDate";
// Create Column 2: ProductName
DataColumn productNameColumn = new DataColumn();
productNameColumn.ColumnName = "ProductName";
// Create Column 3: TotalSales
DataColumn totalSalesColumn = new DataColumn();
totalSalesColumn.DataType = Type.GetType("System.Int32");
totalSalesColumn.ColumnName = "TotalSales";
// Add the columns to the ProductSalesData DataTable
prodSalesData.Columns.Add(dateColumn);
prodSalesData.Columns.Add(productNameColumn);
prodSalesData.Columns.Add(totalSalesColumn);
// Let's populate the datatable with our stats.
// You can add as many rows as you want here!
// Create a new row
DataRow dailyProductSalesRow = prodSalesData.NewRow();
dailyProductSalesRow["SaleDate"] = DateTime.Now.Date;
dailyProductSalesRow["ProductName"] = "Nike";
dailyProductSalesRow["TotalSales"] = 10;
// Add the row to the ProductSalesData DataTable
prodSalesData.Rows.Add(dailyProductSalesRow);
// Copy the DataTable to SQL Server using SqlBulkCopy
using (SqlConnection dbConnection = new SqlConnection("Data Source=ProductHost;Initial Catalog=dbProduct;Integrated Security=SSPI;Connection Timeout=60;Min Pool Size=2;Max Pool Size=20;"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = prodSalesData.TableName;
foreach (var column in prodSalesData.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(prodSalesData);
}
}
}
}
}

What's the risk of multiple SubmitChanges?

Is there a risk with the code below that if two people submit at the same time the wrong s.saleID will be retrieved?
protected void submitSale(int paymentTypeID)
{
tadbDataContext tadb = new tadbDataContext();
ta_sale s = new ta_sale();
decimal total = decimal.Parse(lblTotal.Value);
s.paymentTypeID = paymentTypeID;
s.time = DateTime.Now;
s.totalSale = total;
tadb.ta_sales.InsertOnSubmit(s);
tadb.SubmitChanges();
char[] drinksSeparator = new char[] {'|'};
char[] rowSeparator = new char[] { ':' };
string drinkString = lblSummaryQty.Value.Substring(0, lblSummaryQty.Value.Length - 1);
string[] arrDrinks = drinkString.Split(drinksSeparator);
foreach (string row in arrDrinks)
{
string[] arrDrink = row.Split(rowSeparator);
int rowID = Convert.ToInt16(arrDrink[0]);
int rowQty = Convert.ToInt16(arrDrink[1]);
ta_saleDetail sd = new ta_saleDetail();
sd.drinkID = rowID;
sd.quantity = rowQty;
sd.saleID = s.saleID;
tadb.ta_saleDetails.InsertOnSubmit(sd);
}
tadb.SubmitChanges();
}
}
If so, what should I do to make sure it is atomic? (I think it should be OK, but want to double check!)
To be shure that it's ok just write a test that call submitSale 2 or even more times and make it to submit changes almost at the same time. If test will fail, use lock statement, but be carefull, it can cause deadlocks. After you change submitSale just test it again with high load (a lot of simultaneous calls). And so on until you pass the test.

Linq-2-Sql code: Does this scale?

I'm just starting to use linq to sql. I'm hoping that someone can verify that linq-2-sql has deferred execution until the foreach loop is executed. Over all, can someone tell me if this code scales. It's a simple get method with a few search parameters. Thanks!
Code:
public static IList<Content> GetContent(int contentTypeID, int feedID, DateTime? date, string text)
{
List<Content> contentList = new List<Content>();
using (DataContext db = new DataContext())
{
var contentTypes = db.ytv_ContentTypes.Where(p => contentTypeID == -1 || p.ContentTypeID == contentTypeID);
var feeds = db.ytv_Feeds.Where(p => p.FeedID == -1 || p.FeedID == feedID);
var targetFeeds = from f in feeds
join c in contentTypes on f.ContentTypeID equals c.ContentTypeID
select new { FeedID = f.FeedID, ContentType = f.ContentTypeID };
var content = from t in targetFeeds
join c in db.ytv_Contents on t.FeedID equals c.FeedID
select new { Content = c, ContentTypeID = t.ContentType };
if (String.IsNullOrEmpty(text))
{
content = content.Where(p => p.Content.Name.Contains(text) || p.Content.Description.Contains(text));
}
if (date != null)
{
DateTime dateTemp = Convert.ToDateTime(date);
content = content.Where(p => p.Content.StartDate <= dateTemp && p.Content.EndDate >= dateTemp);
}
//Execution has been defered to this point, correct?
foreach (var c in content)
{
Content item = new Content()
{
ContentID = c.Content.ContentID,
Name = c.Content.Name,
Description = c.Content.Description,
StartDate = c.Content.StartDate,
EndDate = c.Content.EndDate,
ContentTypeID = c.ContentTypeID,
FeedID = c.Content.FeedID,
PreviewHtml = c.Content.PreviewHTML,
SerializedCustomXMLProperties = c.Content.CustomProperties
};
contentList.Add(item);
}
}
//TODO
return contentList;
}
Depends on what you mean with 'scales'. DB side this code has the potential of causing trouble if you are dealing with large tables; SQL Server's optimizer is really poor at handling the "or" operator in where clause predicates and tend to fall back to table scans if there are multiple of them. I'd go for a couple of .Union calls instead to avoid the possibility that SQL falls back to table scans just because of the ||'s.
If you can share more details about the underlying tables and the data in them, it will be easier to give a more detailed answer...