Efficiently reading csv file in MQL4? - csv

I placed "AAPL.csv" into the MetaTrader terminal folder's Files subfolder (MQL4/Files) to be accessible by the EA. The structure of this csv is as follows:
Date,Open,High,Low,Close,Adj Close,Volume
1980-12-12,0.1283479928970337,0.1289059966802597,0.1283479928970337,0.1283479928970337,0.10092189908027649,469033600
1980-12-15,0.12221000343561172,0.12221000343561172,0.12165199965238571,0.12165199965238571,0.09565676748752594,175884800
I want to read this, as well as many other similar csv files. All files have different lengths. My question is, what is the best practice when reading variable-length files? For now, I managed to read the content of my file by creating a 2-dimensional array:
string s[7][1000000];
although this is poor programming (what if the file only has 500 rows?) and it can still fail if I encounter a csv that is longer (what if file has 1000001 rows?). I tried using a dynamic array:
string s[7][];
but it returns '[' - invalid index value error. Yet another idea I had to use the FileSize() function and allocate just-the-necessary amount of memory to the 2-dimensional array. However,
int handle=FileOpen(FileName,FILE_CSV|FILE_READ,",");
if(handle>0)
{
int size = FileSize(handle);
...
yielded a size that equals the product of column numbers and row numbers. I was hoping to obtain a row_count and col_count and use them to define s:
string s[col_count][row_count];
My full working code:
extern string FileName = "AAPL.csv";
int init()
{
int row=0,col=0;
string s[7][1000000];
ResetLastError();
int handle=FileOpen(FileName,FILE_CSV|FILE_READ,",");
if(handle>0)
{
while(True)
{
string temp = FileReadString(handle);
if(FileIsEnding(handle)) break; //FileIsEnding = End of File
s[col][row]=temp;
if(FileIsLineEnding(handle)) //FileIsLineEnding = End of Line
{
col = 0; //reset col = 0 for the next row
row++; //next row
}
else
{
col++; //next col of the same row
}
}
FileClose(handle);
}
else
{
Comment("File "+FileName+" not found, the last error is ", GetLastError());
}
return(0);
}
int start()
{
return(0);
}

You should use the first dimension of the array for your rows and the second dimension for your columns. You would therefore have an array statement of string s[1][7];
You can then resize your array as you loop through reading your csv file as follows:
int handle=FileOpen(FileName,FILE_CSV|FILE_READ,",");
if(handle==INVALID_HANDLE) Print("Error opening file ",GetLastError());
else
{
int row=0;
while(!FileIsEnding(handle))
{
if(row>0) ArrayResize(s,row+1);
//carry out array reading here
//eg s[row][0]=FileReadString(handle);
//eg s[row][1]=FileReadString(handle);
//etc
row++;
}
FileClose(handle);
}
You can not resize the 2nd dimension of a multi dimensional array.

Related

How to start inserting row after some specified number to MySql database using Pentaho?

Basically what i want to do is that,
I have CSV file containing 10,000 rows that i want to insert into the database . When i start my transformation i want to start inserting in database after 4500 rows .
So i want to skill number of rows that i specified .
How can i achieve that ?
Any help would be great.
Image Description : I simply create a transformation that read data from csv and write to database . I do not know which step will help me to achieve this .
Note : I have attached my simple transformation
I haven't found a step that count the rows processed, but you can use the "User Defined Java Class" step to count the row number and delete the first 4500 with a code like this:
// This will be the counter.
Long rowCount;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
if (first) {
rowCount = 0l;
first=false;
}
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// Increment of the counter.
rowCount++;
// Check ouf the counter. Doesn't output the current row if it's less than 4501.
if (rowCount>4500l) {
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());
// Adds the row count to a stream field.
get(Fields.Out, "Count").setValue(outputRow, rowCount);
putRow(data.outputRowMeta, outputRow);
}
return true;
}
I used following kettle file , that solved my problem .
Thanks to #WorkingHard..and #jxc

SSIS: How can I convert an NTEXT input to a string to perform a Split function in a script component?

I receive a Unicode text flat-file in which one column is a single fixed-length value, and the other contains a list values delimited by a vertical pipe '|'. The length of the second column and the number of delimited values it contains will vary greatly. In some cases the column will be up to 50000 characters wide, and could contain a thousand or more delimited values.
Input file Example:
[ObjectGUID]; [member]
{BD3481AF8-2CDG-42E2-BA93-73952AFB41F3}; CN=rGlynn SrechrshiresonIII,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3}; CN=reeghler Johnson,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp|CN=rCoefler Cellins,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp|CN=rDasije M. Delmogeroo,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp|CN=rCurry T. Carrollton,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp|CN=yMica Macintosh,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
My idea is to perform a Split operation on this column and create a new row for each value. I am attempting to use a script component to perform the split.
The width of the delimited column can easily exceed the 4000 character limit of DT-WSTR, so I chose NTEXT as the datatype. This presents problem because the .Split method I am familar with requires a string. I am attempting to convert the NTEXT to a string in the script component.
Here is my code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var stringMember = Row.member.ToString();
var groupMembers = stringMember.Split('|');
foreach (var groupMember in groupMembers)
{
this.Output0Buffer.AddRow();
this.Output0Buffer.objectGUID = Row.objectGUID;
this.Output0Buffer.member = groupMember;
}
}
The output I am trying to get would be this:
[ObjectGUID] [member]
{BD3481AF8-2CDG-42E2-BA93-73952AFB41F3} CN=rGlynn SrechrshiresonIII,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3} CN=reeghler Johnson,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3} CN=rCoefler Cellins,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3} CN=rDasije M. Delmogeroo,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3} CN=rCurry T. Carrollton,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3} CN=yMica Macintosh,OU=Users,OU=PRV,OU=LOL,DC=ent,DC=keke,DC=cqb,DC=corp
But what I am in fact getting is this:
[ObjectGUID] [member]
{BD3481AF8-2CDG-42E2-BA93-73952AFB41F3} Microsoft.SqlServer.Dts.Pipeline.BlobColumn
{AC365A4F8-2CDG-42E2-BA33-73933AFB41F3} Microsoft.SqlServer.Dts.Pipeline.BlobColumn
What might I be doing wrong?
The following code worked:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var blobLength = Convert.ToInt32(Row.member.Length);
var blobData = Row.member.GetBlobData(0, blobLength);
var stringData = System.Text.Encoding.Unicode.GetString(Row.member.GetBlobData(0, Convert.ToInt32(Row.member.Length)));
var groupMembers = stringData.Split('|');
foreach (var groupMember in groupMembers)
{
this.Output0Buffer.AddRow();
this.Output0Buffer.CN = Row.CN;
this.Output0Buffer.ObjectGUID = Row.ObjectGUID;
this.Output0Buffer.member = groupMember;
}
}
I was trying to perform an implicit conversion as I would in PowerShell, but was actually just passing some object metadata to the string output. This method properly splits my members and builds a complete row.

how to check end of file in a csv file before processing it in ssis

I have created an SSIS package which processes .CSV files using a ForEachLoop container.
All the csv files contains "END OF FILE" in the last row.
Only those CSV files will be processed if it contains "END OF FILE" in the last row.
How can it be done. Please help.
Thanks in advance.
Create a variable check
Name DataType Value
check int 0
Let's say you have a package design like the one below
Script task is to check the file which has End of File at the last row
In the Script task add the variable check in ReadWriteVariable section and the output variable from ForEach container (suppose the variable name is LoopFiles) in ReadOnlyVariables
In the script task add the following code to read the file .There are several ways you can read the files here and here
public void Main()
{
int counter = 0;
string loop= Dts.Variables["User::LoopFiles"].Value.ToString();
string line;
using (StreamReader files = new StreamReader(file))
{
while((line = files.ReadLine()) != null)
{
if (line.ToLower() == "End Of File".ToLower())
{
Dts.Variables["User::check"].Value = 1;
}
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Double Click the green arrow connection script task and Data Flow Task .A precedence dialog box will open and enter the expression as below
There are a number of ways that this could be done. One way would be:
Create the following variables:
EOF_Found Boolean
Row_Count Integer
Bring the data into a dataflow using the Flat File Source
Use a row count component to add the number of rows to Row_Count, to identify the last row later
Use a script component to loop through the rows, adding 1 to a counter for each row
When your counter equals the value in Row_Count (i.e. you are looking at the last row) check the value in the column that you expect "END OF FILE" to appear (depends on how you set up the flat file connection manager). if it equals "END OF FILE", change the value of EOF_Found to True
After the script component, add a derived column referencing the value in EOF_Found
Use a conditional split, checking the value of the derived column and only process if True
This solution avoids reading the entire file line by line. I have merged Praveen's code here for sake of completeness.
public void Main()
{
string line = ReadLastLine(#"c:\temp\EOF.cs");
if (line.ToUpper() == "END OF FILE")
{
Dts.Variables["User::check"].Value = 1;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
public static string ReadLastLine(string path)
{
StreamReader stream = new StreamReader(path);
string str = stream.ReadToEnd();
int i = str.LastIndexOf('\n');
string lastLine = str.Substring(i + 1);
return lastLine;
}

Reading the rowset of mySQL database returned by row = mysql_fetch_row(result) which is a string array to extract individual fields from it

When i try to use the code, I find that row[i] is the i'th row in the form of a string. But I need to use my data which was in the form of a int. Also, the row[i] contains the entire row in the form of a string. How do I extract the data from it. I tried to parse the data and convert it into integer, but the entire data is just one string and does have any spaces! so i am having a difficult time doing the parsing as I have no way to know where the previous field ended and the next began in the dataset.
So to sum it up;
when i do:
mysql_real_connect(conn, "localhost", "root", "abcd", "Hybr", 0, NULL, 0);
mysql_query(conn, "SELECT * FROM Data");
result = mysql_store_result(conn);
num_fields = mysql_num_fields(result);
while ((row = mysql_fetch_row(result)))
{
for(i = 0; i < num_fields; i++)
{
row[i] //this is a string containing all the
//fields. I want the individual values !
}
}
My Data 34, 45, host gets converted into a string "34 45 host". And the wierd thing is that when i print row[i], atleast it prints the spaces but when i copy it into a char*, the space somehow disappears! so it becomes impossible to parse it.
I think there is a different way of reading records ,maybe I have overlooked some part of the API, but i cant seem to find which ...
EDIT
I realised I havent overlooked the API; its just that the row[i] is an array of strings. I still need help extracting the individual values from it.
You can use atoi on the field if you know it will be int. For example, units is int and so "SELECT units FROM data" could be read into an int array or whatever.
mysql_query conn, "SELECT units FROM data"
res = mysql_store_result(conn)
num_fields = mysql_num_fields(res)
while (row = mysql_fetch_row(res))
for i = 0; i < num_fields; i++
printf "%03i ",atoi(row[i])
puts ""

LINQ variable to list of string without using column names?

In an C# ASP.Net MVC project, I'm trying to make a List<string> from a LINQ variable.
Now this might be a pretty basic thing, but I just cannot get that to work without using the actual column names for the data in that variable. The thing is that in the interests of trying to make the program as dynamic as possible, I'm leaving it up to a stored procedure to get the data out. There can be any amount of any which way named columns depending on where the data is fetched from. All I care about is taking all of their values into a List<string>, so that I can compare user-input values with them in program.
Pointing to the columns by their names in the code means I'd have to make dozens of overloaded methods that all just basically do the same thing. Below is false non-functioning code. But it should open up the idea of what I mean.
// call for stored procedure
var courses = db.spFetchCourseInformation().ToList();
// if the data fails a check on a single row, it will not pass the check
bool passed = true;
foreach (var i in courses)
{
// each row should be cast into a list of string, which can then be validated
// on a row-by-row basis
List courseRow = new List();
courseRow = courses[i]; // yes, obviously this is wrong syntax
int matches = 0;
foreach (string k in courseRow)
{
if (validator.checkMatch(courseRow[k].ToString()))
{
matches++;
}
}
if (matches == 0)
{
passed = false;
break;
}
}
Now below is an example of how I currently have to do it because I need to use the names for the columns
for (int i = 0; i < courses.Count; i++)
{
int matches = 0;
if (validator.checkMatch(courses[i].Name))
matches++;
if (validator.checkMatch(courses[i].RandomOtherColumn))
matches++;
if (validator.checkMatch(courses[i].RandomThirdColumn))
matches++;
if (validator.checkMatch(courses[i].RandomFourthColumn))
matches++;
/* etc...
* etc...
* you get the point
* and one of these for each and every possible variation from the stored procedure, NOT good practice
* */
Thanks for help!
I'm not 100% sure what problem you are trying to solve (matching user data to a particular record in the DB?), but I'm pretty sure you're going about this in slightly the wrong fashion by putting the data in a List. I
t should be possible to get your user input in an IDictionary with the key being used for the column name, and the object as the input data field.
Then when you get the data from the SP, you can get the data back in a DataReader (a la http://msmvps.com/blogs/deborahk/archive/2009/07/09/dal-access-a-datareader-using-a-stored-procedure.aspx).
DataReaders are indexed on column name, so if you run through the keys in the input data IDictionary, you can check the DataReader to see if it has matching data.
using (SqlDataReader reader = Dac.ExecuteDataReader("CustomerRetrieveAll", null))
{
while (reader.Read())
{
foreach(var key in userInputDictionary.AllKeys)
{
var data = reader[key];
if (data != userInputDictionary[key]) continue;
}
}
}
Still not sure about the problem you are solving but, I hope this helps!
A little creative reflection should do the trick.
var courses = db.spFetchCourseInformation()
var values = courses.SelectMany(c => c.GetType().GetProperties() // gets the properties for your object
.Select(property => property.GetValue(c, null))); // gets the value of each property
List<string> stringValues = new List<string>(
values.Select(v => v == null ? string.Empty : v.ToString()) // some of those values will likely be null
.Distinct()); // remove duplicates