phpExcel TextValueBinder Don't understand why I need it - csv

I have the below code that I had to implement for .csv files so leading zeros would be preserved in files that I read. I did NOT have to do this for .xlsx files.
Why do .csv files need to have a TextValueBinder, but .xlsx files do NOT? CSV files are just plain text so I am wondering why phpExcel removes the leading zeros without a TextValueBinder.
Here is the code:
function file_to_obj_php_excel($inputFileName)
{
$CI =& get_instance();
if ($CI->config->item('spreadsheet_format') == 'XLSX')
{
$objReader = new PHPExcel_Reader_Excel2007();
}
else
{
$objReader = new PHPExcel_Reader_CSV();
PHPExcel_Cell::setValueBinder(new TextValueBinder());
}
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
return $objPHPExcel;
}
class TextValueBinder implements PHPExcel_Cell_IValueBinder
{
public function bindValue(PHPExcel_Cell $cell, $value = null)
{
$cell->setValueExplicit($value, PHPExcel_Cell_DataType::TYPE_STRING);
return true;
}
}

Excel xls and xlsx files have specific datatyping and formatting defined alongside the data itself: csv files do not, they're purely raw data. So PHPExcel uses a value binder to specify datatyping and additional formatting and styling based on the values loaded from a csv file, in exactly the same way that the import wizard does when you load a csv file into MS Excel.
For example, this allows the Reader to recognise numeric values, and store the data as a typed number; or to recognise date or time strings, and convert to an MS Excel timestamp with a number format mask... exactly as MS Excel itself does when you load a csv file.
And it also allows you to define custom binders to change this behaviour to suit your own requirements

Related

Dart CSV Writing

Hey so I haven't really messed around with it too much, but I was wondering if there was actually a way (before I go down a neverending rabbit hole) to read and write to CSV files in Dart/Flutter? I need to write to the files, not necessarily read them, and I'm willing to go to quite extreme lengths to do so. Any library works, built-in functions are even better. Any and all help is appreciated!
Use package csv from https://pub.dartlang.org/packages/csv
If you have a List<List<dynamic>> items that needs to convert into csv,
String csv = const ListToCsvConverter().convert(yourListOfLists);
If you want to write the csv to a file,
/// Write to a file
final directory = await getApplicationDocumentsDirectory();
final pathOfTheFileToWrite = directory.path + "/myCsvFile.csv";
File file = await File(pathOfTheFileToWrite);
file.writeAsString(csv);
Also, if you want to read a csv file directly into list<list<dynamic>>
final input = new File('a/csv/file.txt').openRead();
final fields = await input.transform(UTF8.decoder).transform(csvCodec.decoder).toList();
According to new packages and guidelines(2019) use following code
package csvfrom https://pub.dartlang.org/packages/csv.
If you have a List<List<dynamic>> items that needs to convert into csv,
String csv = const ListToCsvConverter().convert(yourListOfLists);
Then you can write the string to a file using file operations.
file.writeAsString('$csv');
Also, if you want to read a csv file directly into list<list<dynamic>>
final input = new File('a/csv/file.txt').openRead();
final fields = await input.transform(utf8.decoder).transform(new CsvToListConverter()).toList();

How can I continuously read a CSV file in Flink and remove the header

I am working with Flink streaming API and I want to continuously read CSV files from a folder, ignore the header and convert each row in the CSV file into a Java class (POJO). After all this processing, I should obtain a stream of Java objects(POJOs).
So far, I do the following to partially achieve the behavior(code below):
read the CSV files as regular text files, continuously
get a stream of strings from the CSV files
convert the stream of strings to a stream of Java objects
String path = "/home/cosmin/Projects/flink_projects/flink-java-project/data/";
TextInputFormat format = new TextInputFormat(
new org.apache.flink.core.fs.Path(path));
DataStream<String> inputStream = streamEnv.readFile(format, path, FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
DataStream<MyEvent> parsedStream = inputStream
.map((line) -> {
String[] cells = line.split(",");
MyEvent event = new MyEvent(cells[1], cells[2], cells[3]);
return event;
});
However, with this I don't manage to remove the header line in each CSV file.
I have read that I can build a custom connector for reading CSV files by using createInput() or addSource () methods on the StreamExecutionEnvironment class.
Can you help with some guidance on how to achieve this, as I haven't found any examples beyond the Javadoc?
You could chain a filter function before your map function to filter out header lines
inputStream.filter(new FilterFunction<String>() {
public boolean filter(String line) {
if (line.contains("some header identifier")) return false;
else return true;
}
}).map(...) <Your map function as before>

Import Flat File containing multiline fields in SSIS

I would like to import a flat file *.csv in SSIS. But one field is a multiline text. I do not have special record delimiter (and there is no way to get one), which is therefore the carriage return \r\n or CRLF.
The problem is : when SSIS meets a CRLF in a multiline field, he passes to the next line instead of continuing as the multiline field.
Here is the header and some first lines :
"name", "firstname", "description", "age"
"John", "Smith", "blablablablablabla", 25
"Fred", "Gordon", "blablabla
blablablabla", 33
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
This example above contains 1 header and 3 records. SSIS understands it as 1 header and 6 records and then get errors, of course.
I don't know how can i handle that problem.
Hope you should help me.
According to your example, the Description field values can contain multiple carriage returns that is causing the creation of new lines.
The following record appearing on multiple lines...
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
should appear like that below for SSIS to see the expected number of columns.
"Bill", "Buffalo", "bllllllllllllaaaaaaa blaaaaaaa blaalalalaaaaaaaaaa", 44
There are a couple of approaches to resolving the formatting issue.
If possible, the easiest approach is to follow up with the person who created the file and have them do it correctly. For example, assuming they're using SQL Server, then they can apply the following in their TSQL statement for the description field to replace the carriage returns with a blank. (Oracle also has a similar function.)
REPLACE(Description, CHAR(13),' ')
If you need to replace a line feed, then use CHAR(10).
Otherwise, I understand that contacting the source of the file is not always possible. In this case, you can modify the text file programmatically before feeding it into SSIS. The following link discusses how to apply Excel to do this where you can then save to a new csv file and then import that through SSIS.
http://www.mrexcel.com/forum/excel-questions/304939-importing-text-data-carriage-returns-into-excel.html
If you are looking at setting up the SSIS package in a job, then you can write a script task in the early part of your control flow that will do the same thing and bypass Excel. The VB code provided in the link can be easily adapted to a script task.
Hope this helps.
Given that the source of the text files cannot be contacted and that the number of columns in each csv will vary, the best option for performing an import is to proceed on a variation of option 2 of Answer #1. This will require some customization and the application of a script task in the control flow.
On the server where the SSIS package will be running, create a bucket folder where a temporary text file will be saved. Each time a CSV file is processed, a temporary file called "destFile.csv" will be created from it and this is what you will import. Each time a different csv file is processed by the script task, it will save to this temporary file and location.
Create two variables in the SSIS package. One for the source file and the second for the destination file.
Create a script task and define the two variables being sent to it.
Add the following C# to the script task and remember to replace at the top the assignments for source File and destination File. They should be set equal to the new user variables just created.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.IO;
using System.Data;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string sourceFile = #"C:\test\tempfile.csv";
string line;
int count = 0;
int commaCount = 0;
int HeaderCommaCount = 0;
string templine;
string destinationFile = #"C:\test\destFile.csv";
List lines = new List();
// Delete temporary destination file if it already exists
if (File.Exists(destinationFile))
{
File.Delete(destinationFile);
}
// Create temporary destination file
File.Create(destinationFile).Dispose();
if (File.Exists(sourceFile))
{
StreamReader file = null;
try
{
file = new StreamReader(sourceFile);
while ((line = file.ReadLine()) != null)
{
// If Header line, get the number of commas. This is the base by which all following rows will be compared.
if (count == 0)
{
HeaderCommaCount = line.Split(',').Length - 1;
lines.Add(line); //save to a string array
count++;
}
else // This is any row following header row
{
commaCount = line.Split(',').Length - 1;
if (commaCount == HeaderCommaCount) //Row following header contains the correct number of columns
{
lines.Add(line); //save to a string array
count++;
}
else
{
templine = line;
// If comma count is less than that of Header row, continue reading rows until it does and then write.
while (commaCount != HeaderCommaCount)
{
line = file.ReadLine();
templine = templine + " " + line;
commaCount = templine.Split(',').Length - 1;
line = templine;
if (commaCount == HeaderCommaCount)
{
lines.Add(line); //save to a string array
}
}
}
}
}
}
finally
{
if (file != null)
file.Close();
}
}
File.WriteAllLines(destinationFile, lines); //send contents of string array to destination file.
//Console.ReadLine();
}
}
}
I wrote this quickly as a console application so that it would be easier to convert over to a C# script task. The file tested successfully where I applied your initial file example. It will iterate through the source text file and concatenate the lines together that have been split apart and then save to a destination file. The destination file is recreated and populated each time it is run. You can test this out first as a console application in Visual Studio and also apply a console.writeline(line) command just above or below where you see the lines.Add(line) in the code.
After this, all you need to do is import from the temporary destination file to your database.
Hope this helps.

Apache camel + csv + header

I have csv file as follows:
A;B;C
1;test;22
2;test2;33
where first line is a kind of header, and others are data. I have an issue to import all data rows with respect to header and report how many rows are correct and how many are not.
My first idea is to split source file to multiple files in the form of:
file1:
A;B;C
1;test;22
file2:
A;B;C
2;test2;33
How can I do this in camel, and how can I collect data necessary to print a summary report?
Take a look at Bean IO, and the Camel BeanIO component.
Looks like a good fit for your scenario.
You could probably build upon the example code on the first page of bean IO
BeanIO
http://beanio.org/
Camel BeanIO component
http://camel.apache.org/beanio.html
You should not need to split your incoming file if the only thing you need to do is collect and count successful and unsuccessful records.
If the CSV is not too big and fits in memory, I would read and convert the CSV file to a list of Java objects. The latest Camel CSV component can convert a CSV file into a List<Map>, before Camel 2.13 it produced List<List>. After having read converted CSV file into List of something you can write your own processor to iterate over the List and check its content.
You can unmarshall the file as a CSV file, remove the first line (header) and then do your validations as desired. Follow an example of camel route implementation
from("file:mydir/filename?noop=true")
.unmarshal()
.csv()
.process(validateFile())
.to("log:my.package?multiline=true")
Then you need to define the validateFile() method using the camel Processor
class like this:
public Processor validateFile() {
return new Processor() {
#override
public void process(Exchange exchange) throws Exception {
List<List<String>> data = (List<List<String>>) exchange.getIn().getBody();
String headerLine = data.remove(0);
System.out.println("header: "+headerLine);
System.out.println("total lines: "+data.size());
// iterate over each line
for( List<String> line : data) {
System.out.println("Total columns: "+line.size());
System.out.println(line.get(0)); // first column
}
}
};
}
In this method you can validate each file line/columns as you wish and then print it out or even write this report in other output file
Use as reference the File and CSV component page from Apache camel docs;
http://camel.apache.org/file.html
http://camel.apache.org/csv.html

how to insert excel data in a database with java

i want to insert data from an excel file into a local database in a UNIX server with java without any manipulation of data.
1- someone told me that i've to convert the excel file extension into .csv to conform with unix. i created a CSV file for each sheet (i've 12) with a macro. the problem is it changed the date format from DD-MM-YYYY to MM-DD-YYYY. how to avoid this?
2- i used LOAD DATA command to insert data from the CSV files to my database. there's a date colonne that is optionnaly specified in the excel file. so in CSV it become ,, so the load data doesn't work (an argument is needed). how can i fix this?
thanks for your help
It should be quite easy to read out the values from Excel with Apache POI. Then you save yourself the extra step of converting to another format and possible problems when your data contains comma and you convert to CSV.
Save the EXCEL file as CSV (comma separated values) format. It will make it easy to read and parse with fairly simple use of StringTokenizer.
Use MySQL (or SQLite depending on your needs) and JDBC to load data into the database.
Here is a CSVEnumeration class I developed:
package com.aepryus.util;
import java.util.*;
public class CSVEnumeration implements Enumeration {
private List<String> tokens = new Vector<String>();
private int index=0;
public CSVEnumeration (String line) {
for (int i=0;i<line.length();i++) {
StringBuffer sb = new StringBuffer();
if (line.charAt(i) != '"') {
while (i < line.length() && line.charAt(i) != ',') {
sb.append(line.charAt(i));
i++;
}
tokens.add(sb.toString());
} else {
i++;
while(line.charAt(i) != '"') {
sb.append(line.charAt(i));
i++;
}
i++;
tokens.add(sb.toString());
}
}
}
// Enumeration =================================================================
public boolean hasMoreElements () {
return index < tokens.size();
}
public Object nextElement () {
return tokens.get(index++);
}
}
If you break the lines of the CSV file up using split and then feed them one by one into the CSVEnumeration class, you can then step through the fields. Or here is some code I have lying around that uses StringTokenizer to parse the lines. csv is a string that contains the entire contents of the file.
StringTokenizer lines = new StringTokenizer(csv,"\n\r");
lines.nextToken();
while (lines.hasMoreElements()) {
String line = lines.nextToken();
Enumeration e = new CSVEnumeration(line);
for (int i=0;e.hasMoreElements();i++) {
String token = (String)e.nextElement();
switch (i) {
case 0:/* do stuff */;break;
}
}
}
I suggest MySQL for its performance and obviously open source.
Here comes two situations:
If you want just to store the excel cell values into the database. You can convert the excel to CSV format, so that you can simply LOAD DATA command in MySQL command.
If you have to do some manipulation before the values to get into the tables, I suggest Apache POI. I've used, that works so fine, whatever you're format of Excel you just have to use the correct implementation.
We are using SQLite in our java application. It's serveless, really simple to use and very efficient.