SSIS Script howto append text to end of each row in flat file? - ssis

I currently have a flat file with around 1million rows.
I need to add a text string to the end of each row in the file.
I've been trying to adapt the following code but not having any success :-
public void Main()
{
// TODO: Add your code here
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
var subitems = str.Split('\n');
foreach (var subitem in subitems)
{
// write the data back to the file
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I can't seem to get the code to recognise the carriage return "\n" & am not sure howto write the row back to the file to replace the existing rather than add a new row. Or is the above code sending me down a rabbit hole & there is an easier method ??
Many thanks for any pointers &/or assistance.

Read all lines is likely getting rid of the \n in each record. So your replace won't work.
Simply append your string and use #billinKC's solution otherwise.
BONUS:
I think DateTime.Now.ToString("yyyyMMdd"); is what you are trying to append to each line

Thanks #billinKC & #KeithL
KeithL you were correct in that the \n was stripped off. So I used a slightly amended version of #billinKC's code to get what I wanted :-
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item + "~20221214\n";
System.IO.File.AppendAllText(fixedFile, str);
}
As an aside KeithL - thanks for the DateTime code however the text that I am appending is obtained from a header row in the source file which is being read into a variable in an earlier step.

I read your code as
For each line in the file, replace the existing newline character with ~20221214 newline
At that point, the value of str is what you need, just write that! Instead, you split based on the new line which gets you an array of values which could be fine but why do the extra operations?
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
System.IO.File.AppendAllText(fixedFile, str);
}
Something like this ought to be what you're looking for.

Related

skipping unwated rows while reading a flat file in SSIS

I have a SSIS package which is trying to read data from a text file. The issue I am facing is that the text file doesn't have very straight forward data as in it has special characters which are creating trouble
For Example, right after the header row, there's a row full of hyphens, something like -----------------------------------------------------------------------------------------
This SSIS is reading as the first value of the first column beacause of which it fails. How do I get rid of this, without actually removing the row from the file itself?
Also, in later part of the file as well, there are some unwanted rows which I would like to ignore, the format of the file is something like this :
Header
Data
Random Rows
Same header row as above
Data
and so on.....
I would like to know if there's a way to handle this with script task or any other way before or while the 'Flat File source' task gets executed, without actually making changes in the original file.
I don't know of anyway to filter these rows on input using the Flat File Source component, but you can definitely do some filtering if you read the file in with a Script Component.
If you add a reference to Microsoft.VisualBasic, you can use the below function to read your CSV into a datatable:
public static DataTable ReadInDataFromCSV(string fileName, string delimiter)
{
DataTable dtOutput = new DataTable();
//How many lines to read in. 0 for unlimited
int numberOfLines = 0;
using (TextFieldParser parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiter);
//Are column names in first row?
bool columnNamesInFirstRow = true;
int rowCounter = 0;
string[] currentRow;
while (!parser.EndOfData && rowCounter <= numberOfLines)
{
try
{
currentRow = parser.ReadFields();
/*****************************
Add some kind of logic here to skip over rows you don't
want to read in
*****************************/
if (columnNamesInFirstRow == true)
{
foreach (string column in currentRow)
{
dtOutput.Columns.Add(column);
}
columnNamesInFirstRow = false;
}
else
{
DataRow dr;
dr = dtOutput.NewRow();
dr.ItemArray = currentRow;
dtOutput.Rows.Add(dr);
columnNamesInFirstRow = false;
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
rowCounter += (numberOfLines == 0) ? 0 : 1;
}
}
return dtOutput;
}
By default, the above code will read a flat file into a DataTable by calling something like:
DataTable myInputData = ReadInDataFromCSV(#"Path to file",",")
If you modify the commend I added inside the try/catch, you can filter out the rows you aren't interested in. For example, to skip the rows with hypens, you can add a simple check like:
if (currentRow.IndexOf("-----") > 0)
{
continue;
}
else
{
//If/else statement from the original code that adds the data to a DataRow and then adds it to the DataTable
}
Then you can simply add more similar checks to include/not include certain rows in your file. Good luck!

Increasing every number in a CSV by one?

This might be obvious to some people, but I have CSV data that I'm storing as a String in which every number is off by -1. I'd like to write a function (in ActionScript 3) in which I go in and increase every value by +1. How can I do this?
My CSV String looks like this:
public static const CSV_DATA:String = "14,15,16,8,9,8,9,8,9,8,9,264,265,266,267,268,269,8,9,260,261,262,263,8,9,1,2,3\n" +
"32,33,34,26,27,26,27,26,27,26,27,282,283,284,285,286,287,26,27,278,279,280,281,26,27,19,20,21\n" +
... etc
Thank you in advance.
In your case, just use String.split(). Primary split string by '\n', secondary split by ','. In result array every string with number process with parseInt function. After that, you can increase your numbers by one.
But if you need to read real CSV files, you can write class for this or use open-source
Here you go:
var CSV_DATA:String = "14,15,16,8,9,8,9,8,9,8,9,264,265,266,267,268,269,8,9,260,261,262,263,8,9,1,2,3,32,33,34,26,27,26,27,26,27,26,27,282,283,284,285,286,287,26,27,278,279,280,281,26,27,19,20,21";
var UPDATED_CSV_DATA:String = addOne(CSV_DATA);
function addOne(CSV:String):String
{
var CSVArr:Array = CSV.split(",");
for(var i=0;i<CSVArr.length;i++)
{
CSVArr[i] = Number(CSVArr[i]) + 1 ;
}
return CSVArr.toString();
}

Creating a line graph with highcharts and data in an external csv

I've read through the Highcharts how-to, checked the demo galleries, searched google, read the X amount of exact similar threads here on stackoverflow yet I cannot get it to work.
I'm logging data in a csv file in the form of date,value.
Here's what the date looks like
1355417598678,22.25
1355417620144,22.25
1355417625616,22.312
1355417630851,22.375
1355417633906,22.437
1355417637134,22.437
1355417641239,22.5
1355417641775,22.562
1355417662373,22.125
1355417704368,21.625
And this is how far I've managed to get the code:
http://jsfiddle.net/whz7P/
This renders a chart, but with no series or data at all. I think I'm fudging things up while formatting the data so it can be interpreted in highcharts.
Anyone able to give a helping hand?
So, you have the following data structure, right ?
1355417598678,22.25
1355417620144,22.25
1355417625616,22.312
1355417630851,22.375
1355417633906,22.437
1355417637134,22.437
1355417641239,22.5
1355417641775,22.562
1355417662373,22.125
1355417704368,21.625
Then you split it into an array of lines, so each array item is a line.
Then for each line you do the following.
var items = line.split(';'); // wrong, use ','
But there ins't ; into the line, you should split using ,.
The result will be a multidimencional array which each item is an array with the following structure. It will be stored in a var named data.
"1355417598678","22.25" // date in utc, value
This is the expected data for each serie, so you can pass it directly to your serie.
var serie = {
data: data,
name: 'serie1' // chose a name
}
The result will be a working chart.
So everything can be resumed to the following.
var lines = data.split('\n');
lines = lines.map(function(line) {
var data = line.split(',');
data[1] = parseFloat(data[1]);
return data;
});
var series = {
data: lines,
name: 'serie1'
};
options.series.push(series);
Looking at your line.split part:
$.get('data.csv', function(data) {
// Split the lines
var lines = data.split('\n');
$.each(lines, function(lineNo, line) {
var items = line.split(';');
It looks like you are trying to split on a semi-colon (;) instead of a comma (,) which is what is in your sample CSV data.
You need to put
$(document).ready(function() {
in the 1st line, and
});
in the last line of the javascript to make this work.
Could you upload your csv file? Is it identical to what you wrote in your original post? I ran into the same problem, and it turns out there are errors in the data file.

Losing leading 0s when string converts to array

I have a textInput control that sends .txt value to an array collection. The array collection is a collection of US zip codes so I use a regular expression to ensure I only get digits from the textInput.
private function addSingle(stringLoader:ArrayCollection):ArrayCollection {
arrayString += (txtSingle.text) + '';
var re:RegExp = /\D/;
var newArray:Array = arrayString.split(re);
The US zip codes start at 00501. Following the debugger, after the zip is submitted, the variable 'arrayString' is 00501. But once 'newArray' is assigned a vaule, it removes the first two 0s and leaves me with 501. Is this my regular expression doing something I'm not expecting? Could it be the array changing the value? I wrote a regexp test in javascript.
<script type="text/javascript">
var str="00501";
var patt1=/\D/;
document.write(str.match(patt1));
</script>
and i get null, which leads me to believe the regexp Im using is fine. In the help docs on the split method, I dont see any reference to leading 0s being a problem.
**I have removed the regular expression from my code completely and the same problem is still happening. Which means it is not the regular expression where the problem is coming from.
Running this simplified case:
var arrayString:String = '00501';
var re:RegExp = /\D/;
var newArray:Array = arrayString.split(re);
trace(newArray);
Yields '00501' as expected. There's nothing in the code you've posted that would strip leading zeros. You may want to dig around a bit more.
This smells suspiciously like Number coercion: Number('00501') yields 501. Read through the docs for implicit conversions and check if any pop up in your code.
What about this ?
/^\d+$/
You can also specify exactly 5 numbers like this :
/^\d{5}$/
I recommend just getting the zip codes instead of splitting on non-digits (especially if 'arrayString' might have multiple zip codes):
var newArray:Array = [];
var pattern:RegExp = /(\d+)/g;
var zipObject:Object;
while ((zipObject = pattern.exec(arrayString)) != null)
{
newArray.push(zipObject[1]);
}
for (var i:int = 0; i < newArray.length; i++)
{
trace("zip code " + i + " is: " + newArray[i]);
}

iTextSharp HTML to PDF preserving spaces

I am using the FreeTextBox.dll to get user input, and storing that information in HTML format in the database. A samle of the user's input is the below:
                                                                     133 Peachtree St NE                                                                     Atlanta,  GA 30303                                                                     404-652-7777                                                                      Cindy Cooley                                                                     www.somecompany.com                                                                     Product Stewardship Mgr                                                                     9/9/2011Deidre's Company123 Test StAtlanta, GA 30303Test test.  
I want the HTMLWorker to perserve the white spaces the users enters, but it strips it out. Is there a way to perserve the user's white space? Below is an example of how I am creating my PDF document.
Public Shared Sub CreatePreviewPDF(ByVal vsHTML As String, ByVal vsFileName As String)
Dim output As New MemoryStream()
Dim oDocument As New Document(PageSize.LETTER)
Dim writer As PdfWriter = PdfWriter.GetInstance(oDocument, output)
Dim oFont As New Font(Font.FontFamily.TIMES_ROMAN, 8, Font.NORMAL, BaseColor.BLACK)
Using output
Using writer
Using oDocument
oDocument.Open()
Using sr As New StringReader(vsHTML)
Using worker As New html.simpleparser.HTMLWorker(oDocument)
worker.StartDocument()
worker.SetInsidePRE(True)
worker.Parse(sr)
worker.EndDocument()
worker.Close()
oDocument.Close()
End Using
End Using
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment;filename={0}.pdf", vsFileName))
HttpContext.Current.Response.BinaryWrite(output.ToArray())
HttpContext.Current.Response.End()
End Using
End Using
output.Close()
End Using
End Sub
There's a glitch in iText and iTextSharp but you can fix it pretty easily if you don't mind downloading the source and recompiling it. You need to make a change to two files. Any changes I've made are commented inline in the code. Line numbers are based on the 5.1.2.0 code rev 240
The first is in iTextSharp.text.html.HtmlUtilities.cs. Look for the function EliminateWhiteSpace at line 249 and change it to:
public static String EliminateWhiteSpace(String content) {
// multiple spaces are reduced to one,
// newlines are treated as spaces,
// tabs, carriage returns are ignored.
StringBuilder buf = new StringBuilder();
int len = content.Length;
char character;
bool newline = false;
bool space = false;//Detect whether we have written at least one space already
for (int i = 0; i < len; i++) {
switch (character = content[i]) {
case ' ':
if (!newline && !space) {//If we are not at a new line AND ALSO did not just append a space
buf.Append(character);
space = true; //flag that we just wrote a space
}
break;
case '\n':
if (i > 0) {
newline = true;
buf.Append(' ');
}
break;
case '\r':
break;
case '\t':
break;
default:
newline = false;
space = false; //reset flag
buf.Append(character);
break;
}
}
return buf.ToString();
}
The second change is in iTextSharp.text.xml.simpleparser.SimpleXMLParser.cs. In the function Go at line 185 change line 248 to:
if (html /*&& nowhite*/) {//removed the nowhite check from here because that should be handled by the HTML parser later, not the XML parser
Thanks for the help everyone. I was able to find a small work around by doing the following:
vsHTML.Replace(" ", " ").Replace(Chr(9), " ").Replace(Chr(160), " ").Replace(vbCrLf, "<br />")
The actual code does not display properly but, the first replace is replacing white spaces with , Chr(9) with 5 , and Chr(160) with .
I would recommend using wkhtmltopdf instead of iText. wkhtmltopdf will output the html exactly as rendered by webkit (Google Chrome, Safari) instead of iText's conversion. It is just a binary that you can call. That being said, I might check the html to ensure that there are paragraphs and/or line breaks in the user input. They might be stripped out before the conversion.