weka not setting the atribute values when reading unlabeled data from mysql - mysql

I am trying to have weka read unlabeled data in from mysql. I set the Class attribute to have '?' for all the values but I cant set the types of classes to chose from, such as yes and no. I even tried to use the arff to mysql, used the arff that i had tested on and it loaded everything with null for the class values and has not types. Has anyone done this and I have just missed something in the wiki and docs?
EX
data to the arff to mysql method-> #attribute Class {yes,no}
bla,bla,bla,?
data that gets put into mysql -> #atribute Class{}
bla,bla,bla,null
is there something wrong with the method, if not, how do i using the weka library add the yes and no back

Related

Importing CSV database with null character as empty strings

I have hundreds of csv files and each one have lots of null characters in it. It is like that because the some of the cells must be empty. But when I try to import this into MySQL workbench using import wizard i keep getting the same error: "Unhandled exception: line contains NULL byte".
What I would like to do was to:
a) be able to import these database without the error from above
b) converting all null cells as empty strings.
Since there are hundreds of csv files like this one, each around 300mb, replacing the characters before importing doesn't seems to be a quick viable option.
Is there a way to force MySQL Workbench to accept the files with the null character in it?
I have googled many answers, none of which seems to be applicable to this case.
Many thanks
Since MySQL Workbench version 8.0.16 (released on 04/25/2019) there has been an additional option for uploading .csv file -- "null and NULL word as SQL keyword".
When selecting this option as NO, the NULL expression without quotes in .csv file (,NULL, rather than ,"NULL",) will auto-fill empty if your field default value is empty.
I hope this answer could solve other people's similar problem :)

csv data with comma values throws error while processing the file through the BizTalk flatfile Disassembler

I'm going to a pick a csv file in BizTalk and after some process I wanted to update it with two or more different systems.
In order to getting the csv file, I'm using the default Flatfile Disassembler for breaking it and constructing it as XML with the help of genereted schema. I can do that successfully with some consistent data however if I use a data with comma in it (other than delimiters), BizTalk fails!
Any other way to do this without using a custom pipeline component?
Expecting a simple configuration within the flatfile disassembler component!
So, here's the deal. BizTalk is not failing. Well, it is, but that is the expected and correct behavior.
What you have in an invalid CSV file. The CSV specification disallows the comma in field data unless a wrap character is used. Either way, both are reserved characters.
To accept the comma in field data, you must choose a wrap character and set that in the Wrap Character property in the Flat File Schema.
This is valid:
1/1/01,"Smith, John", $5000
This is not:
1/1/01,Smith, John, $5000
Since your schema definition has ',' as delimiter, flat file disassembler will consider the data with comma as two fields and will fail due to mismatch in columns.
You have few options:
Either add a new field to schema if you know , in data will only be present in a particular field.
Or change the delimiter in flat file from , to |(pipe) or some other character so that data does not conflict with delimiter.
Or as you mentioned manipulate the flat file in a custom pipeline component, which should be last resort if above two are not feasible.

Import csv in Rapidminer is not loading data properly

Importing csv in Rapidminer is not loading data properly in the attributes/ columns and returns errors.
I have set the parameter values correctly in the 'Data Import Wizard'.
Column Separation is set to comma and when I check the "Use Quotes" parameter I see that there are too many "?" appear in the columns even though there is data in the actual csv file.
And when I do not check the “Use Quotes” option then I notice that the content of the columns are distributed across different columns, i.e., data does not appear in the correct column. It also gives error for the date column.
How to resolve this? Any suggestions please? I saw a lot of Rapidminer videos and read about it but did not help.
I am trying to import twitter conversations data which I exported from a 3rd party SaaS tool which extracts Twitter data for us.
Could someone help me soon please? Thanks, Geeta
It's virtually impossible to debug this without seeing the data.
The use quotes option requires that each field is surrounded by double quotes. Do not use this if your data does not contain these because the input process will import everything into the first field.
When you use comma as the delimiter, the observed behaviour is likely to be because there are additional commas contained in the data. This seems likely if the data is based on Twitter. This confuses the import because it is just looking for commas.
Generally, if you can get the input data changed, try to get it produced using a delimiter that cannot appear in the raw text data. Good examples would be | or tab. If you can get quotes around the fields, this will help because it allows delimiter characters to appear in the field.
Dates formats can be handled using the data format parameter but my advice is to import the date field as a polynominal and then convert it later to date using the Nominal to Date operator. This gives more control especially when the input data is not clean.

Determining data type in RapidMiner

How it is possible to set attributes when we import data from a CSV file? I have set "y" as label but I can not run the process because Input Example set has no attributes.
Is there anyone who can help me to solve it?
You can change the roles and types of the attributes in data set meta data information in the read csv parameters.
But as mentioned before i would recommend running the import wizard!
I think another way of solving this problem is using Set Role operator on a pre-loaded dataset from CSV.

Replace missing value with cell above in either Perl or MySQL?

I'm importing a csv file of contacts and where one parent has many children it leaves the duplicated values blank. I need to make sure that they are populated when they reach the database however.
Is there a way that I can implement the following when I'm importing a .csv file into Perl and then exporting into MySQL?
if (value is null)
value = value above.
Thanks!
Why don't you place the individual values you read from the CSV file into an array (e.g. #FIELD_DATA). Then when you encounter an empty field while iterating over a row (e.g. for column 4) you can write
unless (length($CSV_FIELD[4])) {
$CSV_FIELD[4] = $FIELD_DATA[4]
}
Not with an import statement afaik. You could, however, make use of triggers (http://dev.mysql.com/doc/refman/5.0/en/triggers.html). Keep in mind though, that this will seriously impact the performance of the import statement.
Also: if they are duplicate values you should have a critical look at your database model or your setup overall.