How to avoid empty rows after actual data by using PHPExcel - mysql

I am using PHPExcel for importing data to the mysql database.
My code is,
require APPPATH . 'phpexcel/PHPExcel/IOFactory.php';
$objPHPExcel = PHPExcel_IOFactory::load($_FILES['ifile']['tmp_name']);
$data = $objPHPExcel->getActiveSheet(0)->toArray(null, true, true, true);
In my excel sheet having 14 rows,but $objPHPExcel->setActiveSheetIndex(0)->getHighestRow() returns 1047856 rows. Due to this the processing time too high. So $data returns error and server gets slow. How to avoid this?

No, your Excel sheet has something in those rows: whether data or styling or print settings or whatever, they exist in the excel file itself.... however, there is a getHighestDataRow() method that looks at the actual content of cells rather than simply their existence in a file. It will still return cells that contain a NULL or an empty string, but is probably better for your use.
If getHighestDataRow() resolved your problem with the row count, then you should probably also consider using rangeToArray() rather than toArray()

Related

Jitterbit: target CSV-file created with only header although "do not create emtpy files" is checked

In Jitterbit Dataloader 10.37 I want to create CSV-files from Salesforce data but only if the query returns data.
I checked "do not create empty files" on the target type local file but it is still creating a csv just with the header but with no data. I do not want files created with no data in it. It is not an option to not have the header at all in the files - I will need it when there is data from the query.
Any suggestions? What am I missing?
I've seen this happen in situations where the write operation is after a couple of other operations. In that instance a header is written in the first operation, then another header is written in a second operation. The first row is read as the header, the second row (another header) is read as data, and written out.
I always add in a condition where I check if one of the fields equals its name. Something like this, to just skip those rows.
<trans>
if(Id=="Id",
false;,
true;
);
</trans>
The best way to do this is to send your output to a variable array. Then check the variable to see if data is present. So set your target to a global variable. Then add a script after that target and do your validation. To test your script use DEBUGBREAK(); to test and look at your variable content. That way you can see what is going into it.
Then make your condition statement.
if( Length($varailbe)>1,RunOperation("operation:myexport"),"novalue"):

Pulling One Element From A CSV File

I'm trying to write a function that will return the most recent 'closing' value in a csv file containing the data of a cryptocurrency. The csv file contains 6 columns and about 900 rows and I'm looking to only pull one element of the table.
However, I seem to faced a fair bit of difficulty in pulling this off for some reason. The function below returns values from the column I want, however it seems to be pulling values from the very bottom of the document (whereas I want the most recent values).
Also, just a side note to explain what I was attempting to do with the 'count'. Since I'm expecting the value I want to be located on the second row, I wanted my for loop to only iterate through two lines of the file. However, as the result of the function went on to reveal to me, as it currently stands with the counter I'm returning two values from the function.
I understand there must be a much less convoluted way of getting the information I need so am open to any solution to the problem. Though, that being said, I'd be really interested to see where I went wrong here as I'm fairly new to Python.
Thanks a lot!
def csv_to_close(csv_file):
with open(f"{csv_file}.csv", 'r') as csvfile:
csv_file = csv.reader(csvfile)
running = True
count = 0
while running == True:
if count < 2:
for column in csv_file:
close = column[4]
count += 1
else:
running = False
print(close)

Prevent Duplicate headers in flat file destination - SSIS

I need some help.
I am importing some data in .csv file from an oledb source. I don't want the headers to appear twice in the destination. If i Uncheck the "Column names in first data row" property , the headers don't get populated in the first execution as well.
Output as of now.
Col1,Col2
A,B
Col1,Col2
C,D
How can I make the package run in such a way that if the file is empty , the headers get inserted. Then if the execution happens again, headers are not included,just the data.
there was a similar thread, but wasn't able to apply the solution as how to use expressions to get the number of rows of destination itself. It was long back , so I created a new.
Your help is deeply appreciated.
-Akshay
Perhaps I'm missing something but this works for me. I am not having the read only trouble with ColumnNamesInFirstDataRow
I created a package level variable named AddHeader, type Boolean and set it to True. I added a Flat File Connection Manager, named FFCM and configured it to use a CSV output of 2 columns HeadCount (int), AddHeader (boolean). In the properties for the Connection Manager, I added an Expression for the property 'ColumnNamesInFirstDataRow' and assigned it a value of #[User::AddHeader]
I added a script task to test the size of the file. It has read/write access to the Variable AddHeader. I then used this script to determine whether the file was empty. If your definition of "empty" is that it has a header row, then I'd adjust the logic in the if check to match that length.
public void Main()
{
string path = Dts.Connections["FFCM"].ConnectionString;
System.IO.FileInfo stats = null;
try
{
stats = new System.IO.FileInfo(path);
// checking length isn't bulletproof based on how the disk is configured
// but should be good enough
// http://stackoverflow.com/questions/3750590/get-size-of-file-on-disk
if (stats != null && stats.Length != 0)
{
this.Dts.Variables["AddHeader"].Value = false;
}
}
catch
{
// no harm, no foul
}
Dts.TaskResult = (int)ScriptResults.Success;
}
I looped through twice to ensure I'd generate the append scenario
I deleted my file and ran the package and only had a header once.
The property that controls whether the column names will be included in the output file or not is ColumnNamesInFirstDataRow. This is a readonly property.
One way to achieve what you are trying to do it would be to have two data flow tasks on the control flow surface preceded by a script task. these two data flow tasks will be identical except that they will be referring to two different flat file connection managers. Again, the only difference between these two would be the different values for the ColumnsInTheFirstDataRow; one true, another false.
Use this Script task to decide whether this is the first run or subsequent runs. Persist this information and check it within the script. Either you can have a separate table for this information, or use some log table to infer it.
Following solution is worked for me.You can also try the following.
Create three variables.
IsHeaderRequired
RowCount
TargetFilePath
Get the source row counts using Execute SQL task and save it in
RowCount variable.
Have script task. Add readonly variables TargetFilePath and
RowCount. Add read and write variable IsHeaderRequired.
Edit the script and add the following line of code.
string targetFilePath = Dts.Variables["TargetFilePath"].Value.ToString();
int rowCount = (int)Dts.Variables["RowCount"].Value;
System.IO.FileInfo targetFileInfo = new System.IO.FileInfo(targetFilePath);
if (rowCount > 0)
{
if (targetFileInfo.Length == 0)
{
Dts.Variables["IsHeaderRequired"].Value = true;
}
else
{
Dts.Variables["IsHeaderRequired"].Value = false;
}
}
Dts.TaskResult = (int)ScriptResults.Success;
Connect your script component to your database
Click connection manager of flat file[i.e your target file] and go
to properties. In the expression, mention the following as shown in
the screenshot.
Map the connectionString to variable "TargetFilePath".
Map the ColumnNamesInFirstDataRow to "IsHeaderRequired".
Expression for Flat file connection Manager.
Final package[screenshot]:
Hope this helps
A solution ....
First, add an SSIS integer variable in the scope of the Foreach Loop or higher - I'll call this RowCount - and make its default value negative (this is important!). Next, add a Row Count to your Data Flow, and assign the result to the RowCount SSIS variable we just made. Third, select your Connection Manager (don't double-click) and open the Properties window (F4). Find the Expressions property, select it, and hit the ellipsis (...) button. Select the ColumnNamesInFirstDataRow property, and use an expression like this:
[#User::RowCount] < 0
Now, when your package starts, RowCount has the static value of -1 or another negative number. When the data flow starts for the first time in your loop, the ColumnNamesInFirstDataRow property will have a value of TRUE. When the first data flow completes, the row count (even if it's zero) is written to the RowCount variable. On the second interation of the loop, the Connection Manager is then reconfigured to NOT write column names...

MySQL - get data from custom field with read-only access to db

I have a text field with data, something like:
[{"id":10001,"timeStarted":1355729600733,"projectId":10002,"issueId":"29732,","userName":"tester","assignee":"test","status":"STARTED","shared":True,"name":"Session 4","projectName":"IDS","assigneeDisplayName":"First1 Last1"},
{"id":10002,"timeStarted":1358354188010,"projectId":10002,"issueId":"","userName":"tester","assignee":"test","status":"CREATED","shared":True,"name":"asdf98798","projectName":"IDS","assigneeDisplayName":"First Last"}]
but with much more rows, it may be 30-40, and may be 2 more different statuses (total 4).
Is it possible to extract some data from here having read-only access to DB and only using MySQL query?
For example to count number of items with status "Stated" and with status "created".
Additional conditions may apply, e.g. where id is in definite interval.
Assuming you're using PHP, first you're better off with correcting those unrecognized booleans. You have True where it should have been true (alternatively TRUE for PHP) for it to evaluate the data right.
$jsStr = preg_replace_callback(
'~(?<=[,{[])(".+?"\s*:\s*)(true|false)(?=\s*[,}\]])~i',
create_function('$m','return $m[1].strtolower($m[2]);'),
$jsStr);
Then to be able to process it you want to use the json_decode() function.
$parsed = json_decode($jsStr);
// see the result if you like:
// print_r($parsed);
Ultimately if you want to extract some specific information on the client side (using Javascript) you can use the Array filter() function or a loop if you're not using jQuery. Otherwise you can use the jQuery filter() function with necessary conditions.
If you want to do this in PHP, after the string is parsed into JSON you can use the solutions that apply to Javascript.

SSIS - Is there a Data Flow Source component that will handle CSV files where the column order may change?

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.
It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.
For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!
Eg. Using a trivial example, the original file has the following content :
OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00
The output from the flat file source is :
OurReference Client Amount
235 ClientA 20000.00
236 ClientB 30000.00
Subsequently, the file delivered changes to :
OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00
When the existing unchanged package is run against this file, the output from the flat file source is :
OurReference Client Amount
235 A244 ClientA,20000.00
236 B222 ClientB,30000.00
Ideally, we would like to use a data source that will cope with this problem - ie which produces output based on the column names, instead of the column order.
Any suggestions would be welcomed!
Not that I know of.
A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.
If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.
As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.
A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.
Best approach would be to run a check before the SSIS package imports the CSV data. This may have to be an external script/application, because I don't think you can manipulate data in the MS Business Intelligence Studio.
Here is a rough approach. I will write down the limitations at the end.
Create a flat file source. Put the entire row in one column.
Do not check Column names in first data row.
Create a Script Component
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sRow = Row.Column0;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] columns = sRow.Split(',');
foreach (string column in columns)
{
sManipulated = string.Format("{0}{1}", sManipulated, column.PadRight(15, ' '));
}
/* Note: For sake of demonstration I am padding to 15 chars.*/
Row.Column0 = sManipulated;
}
Create a flat file destination
Map Column0 to Column0
Limitation: I have arbitrarily padded each field to 15 characters. Points to consider:
1. Do we need to have each field of same size?
2. If yes, what is that size?
A generic way to handle that would be to create a table to store the file name, fields, and field sizes.
Use the file name to dynamically create the source and destination connection manager.
Use the field name and corresponding field size to decide the padding. Not sure, if you need this much flexibility. If you have any question, please respond.