I have an Excel file that loosely resembles the following format:
I'll explain the next step of the SSIS element first as the column names are not "important" as I am un-pivoting the data in a data flow to start getting it usable:
The issue is, the file will be updated - years and quarters will be removed (historical), new ones added to replace the old ones. That means, as we all know, the metadata on a data flow is broken.
The cell range and position etc. will always remain the same.
Is there a way it can be handled in a data flow with the column names (2016q1) being fluid?
Thanks
You're going to like this as it also does the pivot:
Using C# Script component source:
Add namespace:
Using System.Data.OleDb;
Add your 4 output columns and select data types:
Add code to new row section.
public override void CreateNewOutputRows()
{
/*
Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
*/
string fileName = #"C:\test.xlsx";
string SheetName = "Sheet1";
string cstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\"";
OleDbConnection xlConn = new OleDbConnection(cstr);
xlConn.Open();
OleDbCommand xlCmd = xlConn.CreateCommand();
xlCmd.CommandText = "Select * from [" + SheetName + "$]";
xlCmd.CommandType = CommandType.Text;
OleDbDataReader rdr = xlCmd.ExecuteReader();
//int rowCt = 0; //Counter
while (rdr.Read())
{
for (int i = 2; i < rdr.FieldCount; i++) //loop from 3 column to last
{
Output0Buffer.AddRow();
Output0Buffer.ColA = rdr[0].ToString();
Output0Buffer.ColB = rdr[1].ToString();
Output0Buffer.FactName = rdr.GetName(i);
Output0Buffer.FactValue = rdr.GetDouble(i);
}
//rowCt++; //increment counter
}
xlConn.Close();
}
If the columns remain in order, then you can skip header rows and select 1st row does not contain headers.
Related
I currently have a flat file with around 1million rows.
I need to add a text string to the end of each row in the file.
I've been trying to adapt the following code but not having any success :-
public void Main()
{
// TODO: Add your code here
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
var subitems = str.Split('\n');
foreach (var subitem in subitems)
{
// write the data back to the file
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I can't seem to get the code to recognise the carriage return "\n" & am not sure howto write the row back to the file to replace the existing rather than add a new row. Or is the above code sending me down a rabbit hole & there is an easier method ??
Many thanks for any pointers &/or assistance.
Read all lines is likely getting rid of the \n in each record. So your replace won't work.
Simply append your string and use #billinKC's solution otherwise.
BONUS:
I think DateTime.Now.ToString("yyyyMMdd"); is what you are trying to append to each line
Thanks #billinKC & #KeithL
KeithL you were correct in that the \n was stripped off. So I used a slightly amended version of #billinKC's code to get what I wanted :-
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item + "~20221214\n";
System.IO.File.AppendAllText(fixedFile, str);
}
As an aside KeithL - thanks for the DateTime code however the text that I am appending is obtained from a header row in the source file which is being read into a variable in an earlier step.
I read your code as
For each line in the file, replace the existing newline character with ~20221214 newline
At that point, the value of str is what you need, just write that! Instead, you split based on the new line which gets you an array of values which could be fine but why do the extra operations?
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
System.IO.File.AppendAllText(fixedFile, str);
}
Something like this ought to be what you're looking for.
I'm Having a problem regarding to the autocomplete textbox. First I already made the autocomplete textbox work with mysql database as custom source but the default textfilter of autocomplete is "start with" not "contains". I want to change the textfilter to "contains", so that when I search any part of the string, the whole name which contains the searched word will appear in the autocomplete suggestions.
Can anyone help me fix my code?
This is the code i've done so far:
txtSearch.AutoCompleteMode = AutoCompleteMode.SuggestAppend
txtSearch.AutoCompleteSource = AutoCompleteSource.CustomSource
Dim DataCollection As New AutoCompleteStringCollection()
Dim query As String
sqlcon = New MySqlConnection
sqlcon.ConnectionString =
"server=localhost;userid=root;password=root;database=svfmemberlistdb"
Try
sqlcon.Open()
query = " SELECT Name FROM svfmemberlistdb.svfmemberlist "
sqlcmd = New MySqlCommand(query, sqlcon)
sqladr.SelectCommand = sqlcmd
sqladr.Fill(ds)
sqladr.Dispose()
sqlcon.Close()
For Each row As DataRow In ds.Tables(0).Rows
If row.ToString.Contains(txtSearch.Text) Then
DataCollection.Add(row(0).ToString())
End If
Next
Catch ex As Exception
End Try
txtSearch.AutoCompleteCustomSource = DataCollection
I quote here Mitja Bonca's answer on MSDN.
In this case, autocompletemode will just not do. Its code is not meant
for something like it.
You will have to do your own code, to do the filtering on each letter
press.
So I would suggest not to use autocompletemode, and get all the data
(names) into dataTable. When user presses some button ("1" for
example), you start with your filtering, by creating new Datatable
(leave the main one untached - so you can return back to all data when
clearing comboBox by backspace), with Copy() method - to create a full
copy of original one, and use Select method to do the filteing.
This should look something like by using % simbol on both sides of a
string - to filter inbetween - this is what you want!
DataTable AllNames = new DataTable();
//fill it up and leave it untouched!
//to filter comboBox with names that contains pressed characters do in
private void comboBox1_KeyPress(object sender, KeyPressEventArgs e)
{
string name = string.Format("{0}{1}", comboBox1.Text, e.KeyChar.ToString()); //join previous text and new pressed char
DataRow[] rows = table.Select(string.Format("FieldName LIKE '%{0}%'", name));
DataTable filteredTable = AllNames.Clone();
foreach(DataRow r in rows)
filteredTable.ImportRow(r);
comboBox1.DataSource = null;
comboBox1.DataSource = filteredTable.DefaultView;
comboBox1.DisplayMember = "FieldName";
}
Reference
EDIT: This is of course a c# answer not VB.NET but it might be helpful to get the concept.
I have a SSIS package which is trying to read data from a text file. The issue I am facing is that the text file doesn't have very straight forward data as in it has special characters which are creating trouble
For Example, right after the header row, there's a row full of hyphens, something like -----------------------------------------------------------------------------------------
This SSIS is reading as the first value of the first column beacause of which it fails. How do I get rid of this, without actually removing the row from the file itself?
Also, in later part of the file as well, there are some unwanted rows which I would like to ignore, the format of the file is something like this :
Header
Data
Random Rows
Same header row as above
Data
and so on.....
I would like to know if there's a way to handle this with script task or any other way before or while the 'Flat File source' task gets executed, without actually making changes in the original file.
I don't know of anyway to filter these rows on input using the Flat File Source component, but you can definitely do some filtering if you read the file in with a Script Component.
If you add a reference to Microsoft.VisualBasic, you can use the below function to read your CSV into a datatable:
public static DataTable ReadInDataFromCSV(string fileName, string delimiter)
{
DataTable dtOutput = new DataTable();
//How many lines to read in. 0 for unlimited
int numberOfLines = 0;
using (TextFieldParser parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiter);
//Are column names in first row?
bool columnNamesInFirstRow = true;
int rowCounter = 0;
string[] currentRow;
while (!parser.EndOfData && rowCounter <= numberOfLines)
{
try
{
currentRow = parser.ReadFields();
/*****************************
Add some kind of logic here to skip over rows you don't
want to read in
*****************************/
if (columnNamesInFirstRow == true)
{
foreach (string column in currentRow)
{
dtOutput.Columns.Add(column);
}
columnNamesInFirstRow = false;
}
else
{
DataRow dr;
dr = dtOutput.NewRow();
dr.ItemArray = currentRow;
dtOutput.Rows.Add(dr);
columnNamesInFirstRow = false;
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
rowCounter += (numberOfLines == 0) ? 0 : 1;
}
}
return dtOutput;
}
By default, the above code will read a flat file into a DataTable by calling something like:
DataTable myInputData = ReadInDataFromCSV(#"Path to file",",")
If you modify the commend I added inside the try/catch, you can filter out the rows you aren't interested in. For example, to skip the rows with hypens, you can add a simple check like:
if (currentRow.IndexOf("-----") > 0)
{
continue;
}
else
{
//If/else statement from the original code that adds the data to a DataRow and then adds it to the DataTable
}
Then you can simply add more similar checks to include/not include certain rows in your file. Good luck!
I have a small program to read CSV files to build datatable out of it. One requirement is to ignore commas (commas in names, etc) if the commas are between quotation marks. Example.
Name, Age, Location
"Henderson, David", 32, London
John Smith, 19, Belfast
The program should ignore the comma after Henderson and read Henderson, David as one field. My current code can't do this job adding extra column at the end. So How can I achieve it? The solution should not replace the comma between the quotation marks. Thanks.
My current code.
Public Function BuildDataTable() As DataTable
Dim myTable As DataTable = New DataTable("MyTable")
Dim i As Integer
Dim myRow As DataRow
Dim fieldValues As String()
Dim myReader As StreamReader = New StreamReader(_fileFullPath, Encoding.GetEncoding("iso-8859-1"))
Try
fieldValues = myReader.ReadLine().Split(_seperator)
'Create data columns accordingly
If _hasheader = False Then
For i = 0 To fieldValues.Length() - 1
myTable.Columns.Add(New DataColumn("Column(" & i & ")"))
Next
Else
'if the file has header, take the first row as header for datatable
For i = 0 To fieldValues.Length() - 1
myTable.Columns.Add(New DataColumn(fieldValues(i).Replace(" ", "")))
Next
End If
myRow = myTable.NewRow
If _hasheader = False Then
For i = 0 To fieldValues.Length() - 1
myRow.Item(i) = fieldValues(i).ToString
Next
myTable.Rows.Add(myRow)
End If
While myReader.Peek() <> -1
fieldValues = myReader.ReadLine().Split(_seperator)
myRow = myTable.NewRow
For i = 0 To fieldValues.Length() - 1
myRow.Item(i) = fieldValues(i).Trim.ToString
Next
If Not csv2xml.AreAllColumnsEmpty(myRow) = True Then
myTable.Rows.Add(myRow)
End If
End While
Catch ex As Exception
End Try
End Function
You're looking to use the double quote character as a text qualifier in your CSV. Text qualifers allow you to use your field delimiter character(s) in a field value if the field is enclosed in the text qualifier character.
You can progam this yourself but that would be a mistake. There are plenty of free and capable CSV parsers that can do this for you. Since you're using Visual Basic you can take a look at the TextFieldParser class.
You'll still need to write code that will write a CSV's contents into a DataTable.
I found the following that seems to work:
http://www.vbcode.com/asp/showsn.asp?theID=13645
Another option is the GenericParser over at codeproject.com. Don't let the fact that the code in the article is written in C# bother you; you can still reference the DLL (GenericParsing.dll) in your project and use it in VB.
The nice thing about this parser is it includes a method you can use to return a DataTable for you from a CSV. Here's an example which works with your sample data:
Using parser As New GenericParsing.GenericParserAdapter(CSV_FILE_FULLNAME)
parser.ColumnDelimiter = ","
parser.TextQualifier = """"
parser.FirstRowHasHeader = True
Dim dt As DataTable = parser.GetDataTable()
End Using
I'm not familiar with Visual Basic but I think you should not use a Split() function to split the line.
fieldValues = myReader.ReadLine().Split(_seperator) ' DO NOT do this
Instead, write your own split function, which reads each characters one by one. Then have a flag to record whether you are between the double quotation marks.
UPDATE
I'm sorry I know too little about VB or C# to write a runnable code sniplet.
Please read this pseudocode (in fact it is JavaScript)...hope it is useful.
function split_with_quote(string, delimiter, quotation) {
if (delimiter == null) delimiter = ',';
if (quotation == null) quotation = '"';
var in_quotation = false;
var result = [];
var part = '';
for (var i = 0; i < string.length; i++) {
var ch = string[i];
if (ch == quotation) in_quotation = !in_quotation;
if (ch == delimiter && !in_quotation) {
result.push(part);
part = '';
} else {
if (ch != quotation) part += ch;
}
}
return result;
}
a = 'abc,def,"ghi,jkl",123';
split_with_quote(a); // ["abc", "def", "ghi,jkl"]
I have a DataGridView control (DataGridView6) that displays a list of managers. I want to generate a new DataGridView everytime I add a new manager to the list and put it in a specific place on my form.
EDIT:
say if i have a main datagridview, and i want to add another datagridview of the same size directly below it, how would i achieve this using the event handler method described in your answer below? im not sure if this is the most efficient way of displaying new members in the program though...
How do can I do this as simply as possible?
Use the DataGridView's "RowsAdded" event. Every time you add a new row (ie manager) to DataGridView6, have the event handler create a new DataGridView and place it where you want it.
It's hard to give a more detailed answer without the specifics of your implementation, but something like that should work.
EDIT - So something like this?
DataGridView dgv = new DataGridView();
dgv.Location = new Point(DataGridView6.Location.X,DataGridView6.Location.Y + <somevalue>);
If you need to keep adding them below this, you could just make a variable NextY that you increment each time you add a new one. You can store them all in a LinkedList or something similar so you can access them easily in order.
I'm not very good at VB, so I've written it in C# first:
DataGridView DataGridView6;
DataGridView DataGridView7;
DataGridViewRow CreateRow(object data) {
DataGridViewRow row = null;
int index = DataGridView6.Rows.Add();
row = DataGridView6.Rows[index];
// row.Cells[0] = something;
// basically, add your date
return row;
}
void DisplayManagerRow(DataGridViewRow row) {
DataGridView7.DataSource = null;
int columns = (DataGridView6.Columns != null) ? DataGridView6.Columns.Count : 0;
if ((row != null) && (0 < columns)) {
DataGridView7.Columns.Clear();
List<DataGridViewColumn> cols = new List<DataGridViewColumn>(columns);
for (int i = 0; i < columns; i++) {
DataGridViewColumn dgvCol = (DataGridViewColumn)DataGridView6.Columns[i].Clone();
DataGridView7.Columns.Add(dgvCol);
}
DataGridView7.Rows.Add(row);
}
}
Now, to try this in VB:
private DataGridView6 As DataGridView
private DataGridView7 As DataGridView
Private Function CreateRow(ByVal data As Object) As DataGridViewRow
Dim index As Int16 = DataGridView6.Rows.Add()
Dim row As DataGridViewRow = DataGridView6.Rows(index)
' row.Cells(0) = something
' basically, add your date
Return row
End Function
Private Sub DisplayManagerRow(ByVal row As DataGridViewRow)
DataGridView7.DataSource = Nothing
Dim columns As Int32 = 0
If Not (DataGridView6.Columns = Nothing) Then
columns = DataGridView6.Columns.Count
End If
If ((row Is Not Nothing) And (0 < columns)) Then
DataGridView7.Columns.Clear()
Dim cols As List<DataGridViewColumn> = new List<DataGridViewColumn>(columns)
For (Dim i As Int32 = 0; i < columns; i++)
Dim dgvCol As DataGridViewColumn = CType(DataGridView6.Columns(i).Clone(), DataGridViewColumn)
DataGridView7.Columns.Add(dgvCol)
Next For
DataGridView7.Rows.Add(row)
End If
End Sub
I can't even remember how to write a For loop in VB! Pathetic!
Does that get the point across, though?
Is this what you are trying to do?