Pentaho Kettle conversion from String to Integer/Number error - csv

I am new to Pentaho Kettle and I am trying to build a simple data transformation (filter, data conversion, etc). But I keep getting errors when reading my CSV data file (whether using CSV File Input or Text File Input).
The error is:
... couldn't convert String to number : non-numeric character found at
position 1 for value [ ]
What does this mean exactly and how do I handle it?
Thank you in advance

I have solved it. The idea is similar to what #nsousa suggested, but I didn't use the Trim option because I tried it and it didn't work on my case.
What I did is specify that if the value is a single space, it is set to null. In the Fields tab of the Text File Input, set the Null if column to space .

That value looks like an empty space. Set the Format of the Integer field to # and set trim to both.

Related

csv data with comma values throws error while processing the file through the BizTalk flatfile Disassembler

I'm going to a pick a csv file in BizTalk and after some process I wanted to update it with two or more different systems.
In order to getting the csv file, I'm using the default Flatfile Disassembler for breaking it and constructing it as XML with the help of genereted schema. I can do that successfully with some consistent data however if I use a data with comma in it (other than delimiters), BizTalk fails!
Any other way to do this without using a custom pipeline component?
Expecting a simple configuration within the flatfile disassembler component!
So, here's the deal. BizTalk is not failing. Well, it is, but that is the expected and correct behavior.
What you have in an invalid CSV file. The CSV specification disallows the comma in field data unless a wrap character is used. Either way, both are reserved characters.
To accept the comma in field data, you must choose a wrap character and set that in the Wrap Character property in the Flat File Schema.
This is valid:
1/1/01,"Smith, John", $5000
This is not:
1/1/01,Smith, John, $5000
Since your schema definition has ',' as delimiter, flat file disassembler will consider the data with comma as two fields and will fail due to mismatch in columns.
You have few options:
Either add a new field to schema if you know , in data will only be present in a particular field.
Or change the delimiter in flat file from , to |(pipe) or some other character so that data does not conflict with delimiter.
Or as you mentioned manipulate the flat file in a custom pipeline component, which should be last resort if above two are not feasible.

Neo4j CSV Load: How to avoid Null and escape characters

I am trying to load large volume of data into graph using CSV Load script (xyx.cpl) and Neo4jShell.
Mostly it is doing well. Sometimes I am receiving following errors
Cannot merge node using null property value ...
Error related to escape characters
So, seeking assistance to understand the best way to handle this issues in import script.
Thanks in advance
Cannot merge node using null property value
You can use a WITH statement to filter out rows that have a null value for the property you are using in the MERGE. For example:
LOAD CSV WITH HEADERS FROM "file:///file.csv" AS row
WITH row WHERE row.name IS NOT NULL
MERGE (p:Person {name: row.name})
SET p.age = row.age
...
Error related to escape characters
Can you be a bit more specific about the error you are getting / show a Cypher and data example?
Without seeing your specific error / code here is some info that might help:
the character for string quotation within your CSV file is a double quote "
the escape character is \
more info and some examples here and here

Lose data in random fields when importing from file into table using phpmyadmin

I have an access DB. I exported tables to xlsx. Then I saved as .ods using openOffice
because I found out that phpmyadmin-mysql no longer supports excel files. I have my mySQL database formated exactly as it should to accept the data. I import and everything seems fine except one little detail.
In some fields, the value is NULL instead of the value it should have according to the .ods file. Some rows show the same value for that field correctly, some show NULL.
Also, the "faulty" rows have some fields that show the value 0 for fields that where empty in the imported file (instead of NULL). Default value for those fields in mySQL is NULL. Each row has many fields like that and all of the same data type (tinyint). Some appear correctly NULL and some have the value 0....
I can't see a pattern on all these.
Any help is appreciated.
Check to see that imported strings have ("") quotes and NULL do not and that all are separated appropriately, usually a "," comma with the record/row delimited by ";" semicolon. Best way to check what the MySQL is looking for is to export some existing data to the same format and check it against what you are trying to import. One little missed quote and the deal is off. Be consistent in the use of either double " quotes or single ' quotes. also the ` character is not used as I think. If you are "squishing" your data through an application that applies "smart quotes" like MS word does or "Open Office??' this too can cause issues. Add the word NULL either inside or without quotes in your csv import where values appropriate.

How to deal with a string with comma in it from a csv, when we have to read the data by using loadrunner?

When I used Loadrunner, it can read data from a csv file. As we know , csv file is separated by a comma.
The question is, if the parameter in csv has comma itself, the string will be separated to several segments. That is not I want to get.
How can we get the original data with comma in it?
When data has a comma, use an escape character to store the data in the parameter.
For example, if the name is 'Smith, John', it can be stored as Smith\, John in the Loadrunner data file.
When you save a file in Excel that has commas in the actual cell data, the whole cell will be inside two " characters. Also it seems that cells with a space in them are inside " chars.
Example
ColA,ColB,"ColC with a , inside",ColD,ColE
More info on CSV file format: http://www.parse-o-matic.com/parse/pskb/CSV-File-Format.htm
The answer to the question is that perhaps the easiest way to do deal with , separators is to change the separator to a ; character. This is also a valid separator in CSV files.
Then the example would be:
ColA;ColB;"ColC with a , inside";ColD;ColE
Maybe the right way is to use C functions to read data from the file (for example fopen/fread)? When you have read it you be able to use "strchr" to find first quotes char and second quotes char. All in that interval would be a value, and it doesn't matter if comma is inside.
For the documentation about fopen, fread,strchr, you could refer to the HP or C function references.
Hope this will help you.
Assuming you are reading from a data file for the parameters, just use a custom seperator. Comma is the default, but you can define it to be whatever you want. Whenever I have a comma in the variable data I tend to use a pipe symbol, '|' as a way to distinguish the columns of data in the data file.
Examine your parameter dialog carefully and you will see where to make the change.

MYSQL Exporting Directly to CSV, Excel Displaying Formated Values as ####### and nulls as \N

I'm using a SELECT INTO OUTFILE to extract rows from a database into CSV.
I've got a couple issues I am trying to deal with.
The first is that I am formatting currency values in the format "$135,300.00"
When I open the csv in Excel, it shows all the currency fields as "########" until they are click on.
Also, null values are inserted as "\N", I would like them to simply be empty.
Thanks for any help you can provide.
The #### thing is an Excel feature. It shows any values as such when the cell is too small to show the full number. Just increase the column width until the cells are big enough.
As for the \n thing, the reference says
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N.
So you're probably using a FIELDS ESCAPED BY in the query. It'd help to see the full query you're using.