WEKA does not recognise different attributes from a .csv file - csv

I have a problem when opening a dataset in WEKA. While in its .csv format all the variables and respective values are clearly distinguished, in WEKA I only have one attribute which looks like this:
PINCP; AGEP; LANX; RACWHT; RACBLK; RACASN; SEX; etc.
and the values associated look similar, being separated by a semicolon as well.
Do you have any suggestions on how to make this work?
Thanks in advance!

CSV stands for comma-separated values. Your dataset, on the other hand, uses semi-colons as separator between cells.
When opening your dataset in the Weka Explorer, check the Invoke options dialog box in the file chooser dialog. Then change the fieldSeparator option from a comma to a semi-colon.
In case you are using the command-line for loading your dataset, use the -F option of the CSVLoader class.

Related

cannot load simple csv file into tableau public 9.3

I am trying to load the following simple csv file into tableau public 9.3:
customers,item1,item2,item3,item4
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
However, it doesn't read the file as separate columns, despite the field separator being Comma. Instead it treats the whole line as one column. Any help would be greatly appreciated :
If you change your locale settings to English US you will be able to load the file. You should also be able to work around this by creating a schema.ini file.
Go to Data > Manage fields > [Field] Options
You can also control imported CSV behavior post import both by splitting individual columns (which will remain split on update as well), or by the image below at the CSV level.
That doesn`t work for me. So I reopen the .csv file in Excel and save it again in .csv format with ',' as the delimeter.
After that my file looks like .csv with ';' delimeter and works with Tableau.

SQL Server Export Unicode & Import via SSIS

(SQL Server 2008)
So here's my task ..
I need to export query results to file, and then import that file using SSIS to another DB.
Specific to the task, the data contains every awkward unicode character you can think of, so delimiting with commas, pipes etc is out of the question.
Here are the options SSMS gives me for export format:
Column Aligned
Comma/Tab/Space delimited
Custom delimiter
And here are the options SSIS gives me for a flat file data source:
Delimited (custom)
Fixed Width
Ragged Right
So given that a delimiter character is out of the question ... I cannot see another method that both SSMS & SSIS agree on.
Such as fixed width ?
Seems strange that the 2 closely related MS products have such different options.
Or have I missed something here ?
Any advice appreciated !!
It seems you need to try out different combination of options while creating delimited flat file(for your exported query result).
Try setting Code page to UTF-8 with and without Unicode. Also use Text qualifier as " or any of your choice which you thought might work. Also try using different option for column delimiter.
Once you are able to create delimited file then you have to apply same setting on file while importing to another DB.

Importing CSV file in Talend - how to set options to match Excel

I have a CSV file that I can open in Excel 2012 and it comes in perfectly. When I try to setup the metadata for this CSV file in Talend the fields (columns) are not splitting the same was as Excel splits them. I suspect I am not properly setting the metadata.
The specific issue is that I have a column with string data in it which may contain commas within the string. For example suppose I have a CSV file with three columns: ID, Name and Age which looks like this:
ID,Name,Age
1,Ralph,34
2,Sue,14
3,"Smith, John", 42
When Excel reads this CSV file it looks at the second element of the third row ("Smith, John") as a single token and places it into a cell by itself.
In Talend it trys to break this same token into two since there is a comma within the token. Apparently Excel ignores all delimeters within a quoted string while Talend by default does not.
My question is how to I get Talend to behave the same as Excel?
if you use tfileinputdelimited component to read this csv file, you can use delimeter as "," and under csv options properties of this component you should enable Text Enclosure """ option or even if you use metadata there would be an option to define string/text enclosure - here you should mention """ to resolve your problem

SSIS package for export data into csv file to FTP

I'm creating SSIS package for to get .csv file to my local server and transfer it to FTP
When I get my csv into FTP and open into excel, My data getting shift over to other columns. Is there internally any kind set up do I need to change?
Also I tried different text qualifier still did not work.
It sounds like there may be hidden characters in your data set. If you are using comma's you may want to consider using a lesser used character for the delimiter such as a pipe "|". For instance an address may naturally have comma's. If a pipe shows up in an address field it's probably a type-o, and is far less likely. Things that shift data cells are often things like tab characters and CRLF. You can also open your data set in a text editor like notepad ++ and choose the "Show all Characters" option under "View->Show Symbols" menu option to see what the exact character is. If it's rampant in your data set you can use the replace function within the Derived Column Task to scrub the data as it comes out of the data source.

How to deal with a string with comma in it from a csv, when we have to read the data by using loadrunner?

When I used Loadrunner, it can read data from a csv file. As we know , csv file is separated by a comma.
The question is, if the parameter in csv has comma itself, the string will be separated to several segments. That is not I want to get.
How can we get the original data with comma in it?
When data has a comma, use an escape character to store the data in the parameter.
For example, if the name is 'Smith, John', it can be stored as Smith\, John in the Loadrunner data file.
When you save a file in Excel that has commas in the actual cell data, the whole cell will be inside two " characters. Also it seems that cells with a space in them are inside " chars.
Example
ColA,ColB,"ColC with a , inside",ColD,ColE
More info on CSV file format: http://www.parse-o-matic.com/parse/pskb/CSV-File-Format.htm
The answer to the question is that perhaps the easiest way to do deal with , separators is to change the separator to a ; character. This is also a valid separator in CSV files.
Then the example would be:
ColA;ColB;"ColC with a , inside";ColD;ColE
Maybe the right way is to use C functions to read data from the file (for example fopen/fread)? When you have read it you be able to use "strchr" to find first quotes char and second quotes char. All in that interval would be a value, and it doesn't matter if comma is inside.
For the documentation about fopen, fread,strchr, you could refer to the HP or C function references.
Hope this will help you.
Assuming you are reading from a data file for the parameters, just use a custom seperator. Comma is the default, but you can define it to be whatever you want. Whenever I have a comma in the variable data I tend to use a pipe symbol, '|' as a way to distinguish the columns of data in the data file.
Examine your parameter dialog carefully and you will see where to make the change.