Converting datatable column to double C# - csv

I read some columns from a csv file and then display it in a DataGridView. The column "Value" contains some 3-digit integer values. I want to have this integer values shown in the datagridview as doubles with one decimals place. The conversion doesn't seem to work. Also when I load a large csv file (around 30k rows) it is loaded immediately but with conversion it takes too much time.
using (StreamReader str = new StreamReader(openFileDialog1.FileName)) {
CsvReader csvReadFile = new CsvReader(str);
dt = new DataTable();
dt.Columns.Add("Value", typeof(double));
dt.Columns.Add("Time Stamp", typeof(DateTime));
while (csvReadFile.Read()) {
var row = dt.NewRow();
foreach (DataColumn column in dt.Columns) {
row[column.ColumnName] = csvReadFile.GetField(column.DataType, column.ColumnName);
}
dt.Rows.Add(row);
foreach (DataRow row1 in dt.Rows)
{
row1["Value"] = (Convert.ToDouble(row1["Value"])/10);
}
}
}
dataGridView1.DataSource = dt;

Sounds like you have two questions:
How to format the value to a single decimal place of scale.
Why does the convert section of code take so long?
Here are possibilities
See this answer which specifies a possibility of using
String.Format("{0:0.##}", (Decimal) myTable.Rows[rowIndex].Columns[columnIndex]);
You are iterating over every row of the datatable every time you read a line. That means when you read line 10 of the CSV, you will iterate over rows 1-9 of the DataTable again! And so on for each line you read! Refactor to pull that loop out of the ReadLine... something like this:
using (StreamReader str = new StreamReader(openFileDialog1.FileName)) {
CsvReader csvReadFile = new CsvReader(str);
dt = new DataTable();
dt.Columns.Add("Value", typeof(double));
dt.Columns.Add("Time Stamp", typeof(DateTime));
while (csvReadFile.Read()) {
var row = dt.NewRow();
foreach (DataColumn column in dt.Columns) {
row[column.ColumnName] = csvReadFile.GetField(column.DataType, column.ColumnName);
}
dt.Rows.Add(row);
}
foreach (DataRow row1 in dt.Rows)
{
row1["Value"] = (Convert.ToDouble(row1["Value"])/10);
}
}
dataGridView1.DataSource = dt;

Related

Getting error - "Stream was not readable" while reading CSVs and merging them into one

When I run the following code, it fails on the second loop with the error - "Stream was not readable". I don't understand why the stream in the second loop is coming to be closed when it is being created every time in the loop.
string resultCsvContents = "col1,col2,col3\n1,2,3";
using (var streamWriter = new StreamWriter("MergedCsvFile.csv", true))
{
int fileCount = 0;
foreach (var outputFilePath in outputFiles)
{
using (var fileStream = mockService.GetFileStream(outputFilePath))
{
using (var streamReader = new StreamReader(fileStream))
{
if (fileCount != 0)
{
var header = streamReader.ReadLine();
}
streamWriter.Write(streamReader.ReadToEnd());
fileCount++;
}
}
}
}
The output of this should be a csv with the following data :
col1, col2, col3 1,2,3 1,2,3 1,2,3
Here the mock service is mocked to return
string resultCsvContents = "col1,col2,col3\n1,2,3"; mockService.Setup(x => x.GetFileStream(It.IsAny<string>())) .ReturnsAsync(new MemoryStream(Encoding.ASCII.GetBytes(this.resultCsvContents1)));
Is there an issue with the way I am mocking this function ?

Read CSV first line using CsvBeanReader

I have a CSV file which has 3 columns and file does not have any header column but it has fixed pattern (like for first column, It will have url, Second and third column will have checksum). To process individual column values, I am using CSVBeanReader.
CSVBeanReader reads values from 2nd line with below code:
ICsvBeanReader beanReader = new CsvBeanReader(new FileReader(path),
CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
String[] header = beanReader.getHeader(true);
header = new String[] { "docURL", "shaCheckSum", null };
CellProcessor[] processors = new CellProcessor[3];
processors = getChecksumProcessors();
ValueObj docRecord;
while ((docRecord = beanReader.read(ValueObj.class, header, processors)) != null) {
docRecordList.add(docRecord);
}
private static CellProcessor[] getChecksumProcessors() {
return new CellProcessor[] { new NotNull(), new NotNull(), null };
}
How should I read first line of csv file using CSVBeanReader which contains data?
CSV file contains data from first line like below:
ftp://folder_struc/filename.pdf;checksum1;checksum2
Please let me know.
I guess you should omit the String[] header = beanReader.getHeader(true); line. Try just var header = new String[] {...}

Sampling a datetimestamp and voltage from 1st line only of multiple.csv files

I wish to take selected data from a collection of csv files, i have written code but confused on its behaviour, it reads them all, what am i doing wrong please.
string[] array1 = Directory.GetFiles(WorkingDirectory, "00 DEV1 2????????????????????.csv"); //excludes "repaired" files from array, and "Averaged" logs, if found, note: does not exclude duplicate files if they exist (yet)
Console.WriteLine(" Number of Files found with the filter applied = {0,6}", (array1.Length));
int i = 1;
foreach (string name in array1)
{
// sampling engine loop here, take first line only, first column DateTimeStamp and second is Voltage
Console.Write("\r Number of File currently being processed = {0,6}", i);
i++;
var reader = new StreamReader(File.OpenRead(name)); // Static for testing only, to be replaced by file filter code
reader.ReadLine();
reader.ReadLine(); // skip headers, read and do nothing
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
using (StreamWriter outfile = new StreamWriter(#"C:\\SampledFileResults.txt",true))
{
string content = "";
{
content = content + values[0] + ",";
content = content + values[9] + ",";
}
outfile.WriteLine(content);
Console.WriteLine(content);
}
}
} Console.WriteLine("SAMPLING COMPLETED");
Console.ReadLine();
Console.WriteLine("Test ended on {0}", (DateTime.Now));
Console.ReadLine();
}
}
You are using a while loop to read through all lines of the file. If you only want a single line, you can remove this loop.
Just delete the line:
while (!reader.EndOfStream)
{
And the accompanying close bracket
}

import CSV without header row file to datatable

I have a CSV file uploaded by UploadFile control which has no header row. When I try to read it to datatable it gives the error because the first row has same values in different columns. How to insert the header row to this file or read data from CSV to datatable with predefined columns?
Data in csv:
IO23968 2012 11 AB WI 100162804410W500 0 516.78 0 0 0 N 0
IO24190 2012 11 AB WI 100140604510W500 302 516.78 15617.9 0 15617 N 0
IO24107 2012 11 AB WI 100033104410W500 337 516.78 17456.3 0 17456 N 0
Control:
HtmlInputFile fileOilFile = fileOilSubmission as HtmlInputFile;
if (fileOilFile != null)
strOilFileName = fileOilFile.Value;
DataTable:
DataTable csvData = new DataTable();
csvData.Columns.Add("RoyaltyEntityID", typeof(string));
csvData.Columns.Add("ProductionYear", typeof(int));
csvData.Columns.Add("ProductionMonth", typeof(int));
csvData.Columns.Add("ProductionEntityID", typeof(string));
csvData.Columns.Add("ProductionVolume", typeof(double));
csvData.Columns.Add("SalePrice", typeof(double));
csvData.Columns.Add("GrossRoyaltyAmount", typeof(double));
csvData.Columns.Add("TruckingRate", typeof(double));
csvData.Columns.Add("TotalNetRoyalty", typeof(double));
csvData.Columns.Add("ConfidentialWell", typeof(bool));
csvData.Columns.Add("HoursProductionAmount", typeof(double));
using (StreamReader sr = File.OpenText(strOilFileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{ //we're just testing read speeds
foreach (var line in strOilFileName)
{
csvData.Rows.Add(line.split(',')[0]);
csvData.Rows.Add(line.split(',')[1]);
csvData.Rows.Add(line.split(',')[2]);
csvData.Rows.Add(line.split(',')[3]);
}
}
}
Can you just add the header manually in code then add the rows as you parse the file:
ex:
DataTable table = new DataTable();
table.Columns.Add("Dosage", typeof(int));
table.Columns.Add("Drug", typeof(string));
table.Columns.Add("Patient", typeof(string));
table.Columns.Add("Date", typeof(DateTime));
foreach (var line in csv)
{
table.Rows.Add(line.split(',')[0]);
table.Rows.Add(line.split(',')[1]);
table.Rows.Add(line.split(',')[2]);
table.Rows.Add(line.split(',')[3]);
}
return table;
The code below shows how you might read a CSV into a DataTable.
This code assumes your CSV can be found at strOilFileName and the DataTable's schema is what you show in your question. I'm also assuming that your CSV is actually comma-delimited (doesn't look that way from the sample data in your question).
DataTable csvData = new DataTable();
// ... add columns as you show.
using (StreamReader sr = File.OpenText(strOilFileName)) {
string line = string.Empty;
while ((line = sr.ReadLine()) != null) {
string[] fields = line.Split(',');
if (fields.Length == 13) {
// Create a new empty row based on your DataTable's schema.
var row = csvData.NewRow();
//Start populating the new row with data from the CSV line.
row[0] = fields[0];
// You can't be sure that the data in your CSV can be converted to your DataTable's column's data type so
// always use the TryParse methods when you can.
int prodYear = 0;
if (int.TryParse(fields[1], out prodYear)) {
row[1] = prodYear;
} else {
// Do what when the field's value does not contain a value that can be converted to an int?
// Here I'm setting the field to 2000 but you'll want to throw an Exception, set a different default, etc.
row[1] = 2000;
}
//
// Repeat the above steps for filling the rest of the columns in your DataRow.
//
// Add your new row to your DataTable.
csvData.Rows.Add(row);
} else {
// Do something because Split returned in unexpected number of fields.
}
}
}
The code to read the CSV is fairly simplistic. You might want to look into other CSV parsers that can handle a lot of the parsing details for you. There are a bunch out there.
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
csvData.Columns.Add("RoyaltyEntityID", typeof(string));
csvData.Columns.Add("ProductionYear", typeof(int));
csvData.Columns.Add("ProductionMonth", typeof(int));
csvData.Columns.Add("ProductionEntityID", typeof(string));
csvData.Columns.Add("ProductionVolume", typeof(double));
csvData.Columns.Add("SalePrice", typeof(double));
csvData.Columns.Add("GrossRoyaltyAmount", typeof(double));
csvData.Columns.Add("TruckingRate", typeof(double));
csvData.Columns.Add("TotalNetRoyalty", typeof(double));
csvData.Columns.Add("ConfidentialWell", typeof(string));
csvData.Columns.Add("HoursProductionAmount", typeof(double));
using (StreamReader sr = new StreamReader(csv_file_path))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
string[] strRow = line.Split(',');
DataRow dr = csvData.NewRow();
dr["RoyaltyEntityID"] = strRow[0];
dr["ProductionYear"] = strRow[1];
dr["ProductionMonth"] = strRow[2];
dr["ProductionEntityID"] = strRow[3];
dr["ProductionVolume"] = strRow[4];
dr["SalePrice"] = strRow[5];
dr["GrossRoyaltyAmount"] = strRow[6];
dr["TruckingRate"] = strRow[7];
dr["TotalNetRoyalty"] = strRow[8];
dr["ConfidentialWell"] = strRow[9];
if (strRow[9] == "Y")
{
dr["HoursProductionAmount"] = strRow[10];
}
else
{
dr["HoursProductionAmount"] = "0";
}
csvData.Rows.Add(dr);
}
}
return csvData;
}enter code here

Sql Server 2008 Tuning with large transactions (700k+ rows/transaction)

So, I'm working on a database that I will be adding to my future projects as sort of a supporting db, but I'm having a bit of an issue with it, especially the logs.
The database basically needs to be updated once a month. The main table has to be purged and then refilled off of a CSV file. The problem is that Sql Server will generate a log for it which is MEGA big. I was successful in filling it up once, but wanted to test the whole process by purging it and then refilling it.
That's when I get an error that the log file is filled up. It jumps from 88MB (after shrinking via maintenance plan) to 248MB and then stops the process altogether and never completes.
I've capped it's growth at 256MB, incrementing by 16MB, which is why it failed, but in reality I don't need it to log anything at all. Is there a way to just completely bypass logging on any query being run against the database?
Thanks for any responses in advance!
EDIT: Per the suggestions of #mattmc3 I've implemented SqlBulkCopy for the whole procedure. It works AMAZING, except, my loop is somehow crashing on the very last remaining chunk that needs to be inserted. I'm not too sure where I'm going wrong, heck I don't even know if this is a proper loop, so I'd appreciate some help on it.
I do know that its an issue with the very last GetDataTable or SetSqlBulkCopy calls. I'm trying to insert 788189 rows, 788000 get in and the remaining 189 are crashing...
string[] Rows;
using (StreamReader Reader = new StreamReader("C:/?.csv")) {
Rows = Reader.ReadToEnd().TrimEnd().Split(new char[1] {
'\n'
}, StringSplitOptions.RemoveEmptyEntries);
};
int RowsInserted = 0;
using (SqlConnection Connection = new SqlConnection("")) {
Connection.Open();
DataTable Table = null;
while ((RowsInserted < Rows.Length) && ((Rows.Length - RowsInserted) >= 1000)) {
Table = GetDataTable(Rows.Skip(RowsInserted).Take(1000).ToArray());
SetSqlBulkCopy(Table, Connection);
RowsInserted += 1000;
};
Table = GetDataTable(Rows.Skip(RowsInserted).ToArray());
SetSqlBulkCopy(Table, Connection);
Connection.Close();
};
static DataTable GetDataTable(
string[] Rows) {
using (DataTable Table = new DataTable()) {
Table.Columns.Add(new DataColumn("A"));
Table.Columns.Add(new DataColumn("B"));
Table.Columns.Add(new DataColumn("C"));
Table.Columns.Add(new DataColumn("D"));
for (short a = 0, b = (short)Rows.Length; a < b; a++) {
string[] Columns = Rows[a].Split(new char[1] {
','
}, StringSplitOptions.RemoveEmptyEntries);
DataRow Row = Table.NewRow();
Row["A"] = Columns[0];
Row["B"] = Columns[1];
Row["C"] = Columns[2];
Row["D"] = Columns[3];
Table.Rows.Add(Row);
};
return (Table);
};
}
static void SetSqlBulkCopy(
DataTable Table,
SqlConnection Connection) {
using (SqlBulkCopy SqlBulkCopy = new SqlBulkCopy(Connection)) {
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("A", "A"));
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("B", "B"));
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("C", "C"));
SqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("D", "D"));
SqlBulkCopy.BatchSize = Table.Rows.Count;
SqlBulkCopy.DestinationTableName = "E";
SqlBulkCopy.WriteToServer(Table);
};
}
EDIT/FINAL CODE: So the app is now finished and works AMAZING, and quite speedy! #mattmc3, thanks for all the help! Here is the final code for anyone who may find it useful:
List<string> Rows = new List<string>();
using (StreamReader Reader = new StreamReader(#"?.csv")) {
string Line = string.Empty;
while (!String.IsNullOrWhiteSpace(Line = Reader.ReadLine())) {
Rows.Add(Line);
};
};
if (Rows.Count > 0) {
int RowsInserted = 0;
DataTable Table = new DataTable();
Table.Columns.Add(new DataColumn("Id"));
Table.Columns.Add(new DataColumn("A"));
while ((RowsInserted < Rows.Count) && ((Rows.Count - RowsInserted) >= 1000)) {
Table = GetDataTable(Rows.Skip(RowsInserted).Take(1000).ToList(), Table);
PerformSqlBulkCopy(Table);
RowsInserted += 1000;
Table.Clear();
};
Table = GetDataTable(Rows.Skip(RowsInserted).ToList(), Table);
PerformSqlBulkCopy(Table);
};
static DataTable GetDataTable(
List<string> Rows,
DataTable Table) {
for (short a = 0, b = (short)Rows.Count; a < b; a++) {
string[] Columns = Rows[a].Split(new char[1] {
','
}, StringSplitOptions.RemoveEmptyEntries);
DataRow Row = Table.NewRow();
Row["A"] = "";
Table.Rows.Add(Row);
};
return (Table);
}
static void PerformSqlBulkCopy(
DataTable Table) {
using (SqlBulkCopy SqlBulkCopy = new SqlBulkCopy(#"", SqlBulkCopyOptions.TableLock)) {
SqlBulkCopy.BatchSize = Table.Rows.Count;
SqlBulkCopy.DestinationTableName = "";
SqlBulkCopy.WriteToServer(Table);
};
}
If you are doing a Bulk Insert into the table in SQL Server, which is how you should be doing this (BCP, Bulk Insert, Insert Into...Select, or in .NET, the SqlBulkCopy class) you can use the "Bulk Logged" recovery model. I highly recommend reading the MSDN articles on recovery models: http://msdn.microsoft.com/en-us/library/ms189275.aspx
You can set the Recover model for each database separately. Maybe the simple recovery model will work for you. The simple model:
Automatically reclaims log space to keep space requirements small, essentially eliminating the need to manage the transaction log space.
Read up on it here.
There is no way to bypass using the transaction log in SQL Server.