Does anyone have any examples of using EzAPI with a flat file as the data source? All the examples in the documentation start with OleDB connections.
Specifically I can't work out how to define input and output columns.
Say, for instance, that I have a CSV file with columns for firstname, surname and age. I want to read this into SSIS, sort by age and write out to another text file.
According to this post How to use EzAPI FlatFile Source in SSIS? I need to define columns manually, but I can't get the suggested code to work.
If I do:
if (!pkg.Source.OutputColumnExists("col0"))
{
pkg.Source.InsertOutputColumn("col0");
}
bool newColumnExists = pkg.Source.OutputColumnExists("col0");
newColumnExists is still false.
i think this link will help you : http://blogs.msdn.com/b/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx
you will get to know how to create one.
if you want to add columns in flat file use this code:
var flatFileCm = new EzFlatFileCM(this);
flatFileCm.ConnectionString = file;
foreach (var column in columns)
{
// Add a new Column to the Flat File Connection Manager
var flatFileColumn = flatFileCm.Columns.Add();
flatFileColumn.DataType = DataType.DT_WSTR;
flatFileColumn.ColumnWidth = 255;
flatFileColumn.ColumnDelimiter = columns.GetUpperBound(0) == Array.IndexOf(columns, column) ? "\r\n" : "\t";
flatFileColumn.ColumnType = "Delimited";
// Use the Import File Field name to name the Column
var columnName = flatFileColumn as IDTSName100;
if (columnName != null) columnName.Name = column;
}
flatFileCm.ColumnNamesInFirstDataRow = true;
Related
I have to filter the data, therefore I need to create new CSV file based on the filters.
I am having a trouble doing it, cause the new file does not change after I run the code
Below is my code. Where I have two csv file. Stage_3_try.csv file is the one I am trying to add new data. I used enumerate to get the index value of the specific value I searched in previous csv file.
# Projec
import csv
from csv import writer
A = np.array([ 316143.8829, 6188926.04])
B = np.array([ 314288.7418, 6190277.519])
for i in range(0,len(east_3)):
P = []
P.append(east_3[i])
P.append( north_3[i])
P = np.asarray(P)
projected = point_on_line(P) #a code to do the projection
x_values = [A[0], B[0]]
y_values = [A[1], B[1]]
plt.plot(x_values, y_values, 'b-')
if projected[0]>315745.75 and projected[1]>6188289:
with open('Stage_3_try.csv', 'a') as f_out:
writer = csv.writer(f_out)
for num, row in enumerate(stage_3['UTM North NAD83']):
if row == P[1]:
writer.writerow(stage_3.loc[[num][0]])
print(type(stage_3.loc[[num][0]]))
plt.plot(projected[0], projected[1], 'rx')
f_out.close()
else:
pass
PS: I updated the code, since the previous one worked, but when I added it to the loop, it stopped working
I currently have a flat file with around 1million rows.
I need to add a text string to the end of each row in the file.
I've been trying to adapt the following code but not having any success :-
public void Main()
{
// TODO: Add your code here
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
var subitems = str.Split('\n');
foreach (var subitem in subitems)
{
// write the data back to the file
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I can't seem to get the code to recognise the carriage return "\n" & am not sure howto write the row back to the file to replace the existing rather than add a new row. Or is the above code sending me down a rabbit hole & there is an easier method ??
Many thanks for any pointers &/or assistance.
Read all lines is likely getting rid of the \n in each record. So your replace won't work.
Simply append your string and use #billinKC's solution otherwise.
BONUS:
I think DateTime.Now.ToString("yyyyMMdd"); is what you are trying to append to each line
Thanks #billinKC & #KeithL
KeithL you were correct in that the \n was stripped off. So I used a slightly amended version of #billinKC's code to get what I wanted :-
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item + "~20221214\n";
System.IO.File.AppendAllText(fixedFile, str);
}
As an aside KeithL - thanks for the DateTime code however the text that I am appending is obtained from a header row in the source file which is being read into a variable in an earlier step.
I read your code as
For each line in the file, replace the existing newline character with ~20221214 newline
At that point, the value of str is what you need, just write that! Instead, you split based on the new line which gets you an array of values which could be fine but why do the extra operations?
string origFile = #"E:\SSISSource\Source\Sourcetxt";
string fixedFile = #"E:\SSISSource\Source\Source.fixed.txt";
// Make a blank file
System.IO.File.WriteAllText(fixedFile, "");
var lines = System.IO.File.ReadAllLines(#"E:\SSISSource\Source\Source.txt");
foreach (string item in lines)
{
var str = item.Replace("\n", "~20221214\n");
System.IO.File.AppendAllText(fixedFile, str);
}
Something like this ought to be what you're looking for.
I have multiple folders (six or so) with multiple .CSV files in them. The CSV files are all of the same format:
Heading1,Heading2,Heading3
1,Monday,2.45
2,Monday,3.765...
Each .CSV has the same heading names [same data source for different months]. What is the best way to import these CSVs into SQL Server 2008? The server does not have xpShell configured [for security reasons which I cannot modify], so any method which uses that (which I originally tried), will not work.
EDIT
The CSV files are a maximum of 2mb in size and do not contain any commas (other than those required for delimiters).
Any ideas?
F.e. you got CSV file names sample.csv on D:\ drive, with this inside:
Heading1,Heading2,Heading3
1,Monday,2.45
2,Monday,3.765
Then you can use this query:
DECLARE #str nvarchar(max),
#x xml,
#head xml,
#sql nvarchar(max),
#params nvarchar(max) = '#x xml'
SELECT #str = BulkColumn
FROM OPENROWSET (BULK N'D:\sample.csv', SINGLE_CLOB) AS a
SELECT #head = CAST('<row><s>'+REPLACE(SUBSTRING(#str,1,CHARINDEX(CHAR(13)+CHAR(10),#str)-1),',','</s><s>')+'</s></row>' as xml)
SELECT #x = CAST('<row><s>'+REPLACE(REPLACE(SUBSTRING(#str,CHARINDEX(CHAR(10),#str)+1,LEN(#str)),CHAR(13)+CHAR(10),'</s></row><row><s>'),',','</s><s>')+'</s></row>' as xml)
SELECT #sql = N'
SELECT t.c.value(''s[1]'',''int'') '+QUOTENAME(t.c.value('s[1]','nvarchar(max)'))+',
t.c.value(''s[2]'',''nvarchar(max)'') '+QUOTENAME(t.c.value('s[2]','nvarchar(max)'))+',
t.c.value(''s[3]'',''decimal(15,7)'') '+QUOTENAME(t.c.value('s[3]','nvarchar(max)'))+'
FROM #x.nodes(''/row'') as t(c)'
FROM #head.nodes('/row') as t(c)
To get output like:
Heading1 Heading2 Heading3
1 Monday 2.4500000
2 Monday 3.7650000
At first we take data as SINGLE_CLOB with the help of OPEROWSET.
Then we put all in #str variable. The part from beginning to first \r\n we put in #head, the other part in #x with conversion to XML. Structure:
<row>
<s>Heading1</s>
<s>Heading2</s>
<s>Heading3</s>
</row>
<row>
<s>1</s>
<s>Monday</s>
<s>2.45</s>
</row>
<row>
<s>2</s>
<s>Monday</s>
<s>3.765</s>
</row>
After that we build dynamic query like:
SELECT t.c.value('s[1]','int') [Heading1],
t.c.value('s[2]','nvarchar(max)') [Heading2],
t.c.value('s[3]','decimal(15,7)') [Heading3]
FROM #x.nodes('/row') as t(c)
And execute it. Variable #x is passing as parameter.
Hope this helps you.
I ended up solving my problem using a non-SQL answer. Thank you everyone who helped contribute. I apologise for going with a completely off-field answer using PHP. Here is what I created to solve this problem:
<?php
//////////////////////////////////////////////////////////////////////////////////////////////////
// //
// Date: 21/10/2016. //
// Description: Insert CSV rows into pre-created SQL table with same column structure. //
// Notes: - PHP script needs server to execute. //
// - Can run line by line ('INSERT') or bulk ('BULK INSERT'). //
// - 'Bulk Insert' needs bulk insert user permissions. //
// //
// Currently only works under the following file structure: //
// | ROOT FOLDER //
// | FOLDER 1 //
// | CSV 1 //
// | CSV 2... //
// | FOLDER 2 //
// | CSV 1 //
// | CSV 2... //
// | FOLDER 3... //
// | CSV 1 //
// | CSV 2... //
// //
//////////////////////////////////////////////////////////////////////////////////////////////////
//Error log - must have folder pre-created to work
ini_set("error_log", "phplog/bulkinsertCSV.php.log");
//Set the name of the root directory here (Where the folder's of CSVs are)
$rootPath = '\\\networkserver\folder\rootfolderwithCSVs';
//Get an array with the folder names located at the root directory location
// The '0' is alphabetical ascending, '1' is descending.
$rootArray = scandir($rootPath, 0);
//Set Database Connection Details
$myServer = "SERVER";
$myUser = "USER";
$myPass = "PASSWORD";
$myDB = "DATABASE";
//Create connection to the database
$connection = odbc_connect("Driver={SQL Server};Server=$myServer;Database=$myDB;", $myUser, $myPass) or die("Couldn't connect to SQL Server on $myServer");
//Extend Database Connection timeout
set_time_limit(10000);
//Set to true for bulk insert, set to false for line by line insert
// [If set to TRUE] - MUST HAVE BULK INSERT PERMISSIONS TO WORK
$bulkinsert = true;
//For loop that goes through the folders and finds CSV files
loopThroughAllCSVs($rootArray, $rootPath);
//Once procedure finishes, close the connection
odbc_close($connection);
function loopThroughAllCSVs($folderArray, $root){
$fileFormat = '.csv';
for($x = 2; $x < sizeof($folderArray); $x++){
$eachFileinFolder = scandir($root."\\".$folderArray[$x]);
for($y = 0; $y < sizeof($eachFileinFolder); $y++){
$fullCSV_path = $root."\\".$folderArray[$x]."\\".$eachFileinFolder[$y];
if(substr_compare($fullCSV_path, $fileFormat, strlen($fullCSV_path)-strlen($fileFormat), strlen($fileFormat)) === 0){
parseCSV($fullCSV_path);
}
}
}
}
function parseCSV($path){
print_r($path);
print("<br>");
if($GLOBALS['bulkinsert'] === false){
$csv = array_map('str_getcsv', file($path));
array_shift($csv); //Remove Headers
foreach ($csv as $line){
writeLinetoDB($line);
}
}
else{
bulkInserttoDB($path);
}
}
function writeLinetoDB($line){
$tablename = "[DATABASE].[dbo].[TABLE]";
$insert = "INSERT INTO ".$tablename." (Column1,Column2,Column3,Column4,Column5,Column6,Column7)
VALUES ('".$line[0]."','".$line[1]."','".$line[2]."','".$line[3]."','".$line[4]."','".$line[5]."','".$line[6]."')";
$result = odbc_prepare($GLOBALS['connection'], $insert);
odbc_execute($result)or die(odbc_error($connection));
}
function bulkInserttoDB($csvPath){
$tablename = "[DATABASE].[dbo].[TABLE]";
$insert = "BULK
INSERT ".$tablename."
FROM '".$csvPath."'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\\n')";
print_r($insert);
print_r("<br>");
$result = odbc_prepare($GLOBALS['connection'], $insert);
odbc_execute($result)or die(odbc_error($connection));
}
?>
I ended up using the script above to write to the database line by line... This was going to take hours. I modified to the script to use BULK INSERT which unfortunately we didn't have 'permissions' to use. Once I 'obtained' permissions, the BULK INSERT method worked a charm.
I have a SSIS package which is trying to read data from a text file. The issue I am facing is that the text file doesn't have very straight forward data as in it has special characters which are creating trouble
For Example, right after the header row, there's a row full of hyphens, something like -----------------------------------------------------------------------------------------
This SSIS is reading as the first value of the first column beacause of which it fails. How do I get rid of this, without actually removing the row from the file itself?
Also, in later part of the file as well, there are some unwanted rows which I would like to ignore, the format of the file is something like this :
Header
Data
Random Rows
Same header row as above
Data
and so on.....
I would like to know if there's a way to handle this with script task or any other way before or while the 'Flat File source' task gets executed, without actually making changes in the original file.
I don't know of anyway to filter these rows on input using the Flat File Source component, but you can definitely do some filtering if you read the file in with a Script Component.
If you add a reference to Microsoft.VisualBasic, you can use the below function to read your CSV into a datatable:
public static DataTable ReadInDataFromCSV(string fileName, string delimiter)
{
DataTable dtOutput = new DataTable();
//How many lines to read in. 0 for unlimited
int numberOfLines = 0;
using (TextFieldParser parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(delimiter);
//Are column names in first row?
bool columnNamesInFirstRow = true;
int rowCounter = 0;
string[] currentRow;
while (!parser.EndOfData && rowCounter <= numberOfLines)
{
try
{
currentRow = parser.ReadFields();
/*****************************
Add some kind of logic here to skip over rows you don't
want to read in
*****************************/
if (columnNamesInFirstRow == true)
{
foreach (string column in currentRow)
{
dtOutput.Columns.Add(column);
}
columnNamesInFirstRow = false;
}
else
{
DataRow dr;
dr = dtOutput.NewRow();
dr.ItemArray = currentRow;
dtOutput.Rows.Add(dr);
columnNamesInFirstRow = false;
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
rowCounter += (numberOfLines == 0) ? 0 : 1;
}
}
return dtOutput;
}
By default, the above code will read a flat file into a DataTable by calling something like:
DataTable myInputData = ReadInDataFromCSV(#"Path to file",",")
If you modify the commend I added inside the try/catch, you can filter out the rows you aren't interested in. For example, to skip the rows with hypens, you can add a simple check like:
if (currentRow.IndexOf("-----") > 0)
{
continue;
}
else
{
//If/else statement from the original code that adds the data to a DataRow and then adds it to the DataTable
}
Then you can simply add more similar checks to include/not include certain rows in your file. Good luck!
I am transferring some data from one table to another using SSIS with EzAPI. How can I get the number of rows that were transferred?
My setup is as follows
EzPackage package = new EzPackage();
EzOleDbConnectionManager srcConn;
EzOleDbSource src;
EzOleDbConnectionManager destConn;
EzOleDbDestination dest;
EzDataFlow dataFlow;
destConn = new EzOleDbConnectionManager(package); //set connection string
srcConn = new EzOleDbConnectionManager(package);
dataFlow = new EzDataFlow(package);
src = Activator.CreateInstance(typeof(EzOleDbSource), new object[] { dataFlow }) as EzOleDbSource;
src.Connection = srcConn;
src.SqlCommand = odbcImport.Query;
dest = Activator.CreateInstance(typeof(EzOleDbDestination), new object[] { dataFlow }) as EzOleDbDestination;
dest.Connection = destConn;
dest.AttachTo(src, 0, 0);
dest.AccessMode = AccessMode.AM_OPENROWSET_FASTLOAD;
DTSExecResult result = package.Execute();
Where in this can I add something to get the number of rows? For all versions of SQL server 2008r2 and up
The quick answer is that the Row Count Transformation isn't included out of the box. I had a brief post about that: Row Count with EzAPI
I downloaded the source project from CodePlex and then edited EzComponents.cs (in EzAPI\src) and added the following code
[CompID("{150E6007-7C6A-4CC3-8FF3-FC73783A972E}")]
public class EzRowCountTransform : EzComponent
{
public EzRowCountTransform(EzDataFlow dataFlow) : base(dataFlow) { }
public EzRowCountTransform(EzDataFlow parent, IDTSComponentMetaData100 meta) : base(parent, meta) { }
public string VariableName
{
get { return (string)Meta.CustomPropertyCollection["VariableName"].Value; }
set { Comp.SetComponentProperty("VariableName", value); }
}
}
The component id above is only for 2008.
For 2012, it's going to be E26997D8C-70DA-42B2-8208-A19CE3A9FE41 I don't have a 2012 installation at the moment to confirm I didn't transpose a value there but drop a Row Count component onto a data flow, right click and look at the properties. The component/class id is what that value needs to be. Similar story if you're dealing with 2005.
So, once you have the ability to use EzRowCountTransform, you can simply patch it into your existing script.
// Create an instance of our transform
EzRowCountTransform newRC = null;
// Create a variable to use it
Variable newRCVariable = null;
newRCVariable = package.Variables.Add("RowCountNew", false, "User", 0);
// ...
src.SqlCommand = odbcImport.Query;
// New code here too
newRC = new EzRowCountTransform(dataFlow);
newRC.AttachTo(src);
newRC.Name = "RC New Rows";
newRC.VariableName = newRCVariable.QualifiedName;
// Continue old code
I have a presentation on various approaches I've used over time and what I like/don't like about them. Type more, click less: a programmatic approach to building SSIS. It contains sample code for creating the EzRowCountTransform and usage.