Determining data type in RapidMiner - rapidminer

How it is possible to set attributes when we import data from a CSV file? I have set "y" as label but I can not run the process because Input Example set has no attributes.
Is there anyone who can help me to solve it?

You can change the roles and types of the attributes in data set meta data information in the read csv parameters.
But as mentioned before i would recommend running the import wizard!

I think another way of solving this problem is using Set Role operator on a pre-loaded dataset from CSV.

Related

TPT12109: Export Operator does not support JSON column.Is there a way to export results involving json column other than bteq export?

I have a table having 2 million records. I am trying to dump the contents of the table in json format. This issue is that the TPT export does not allow JSON columns and BTEQ export would take a lot of time to do this export. Is there any way to handle this export in a more optimized way.
Your help is really appreciated.
If the JSON values are not too large, you could potentially CAST them in your SELECT as VARCHAR(64000) CHARACTER SET LATIN, or VARCHAR(32000) CHARACTER SET UNICODE if you have non-LATIN characters, and export them in-line.
Otherwise each JSON object has to be transferred DEFERRED BY NAME where each object is stored in a separate file and the corresponding filename stored in the output row. In that case you would need to use BTEQ, or TPT SQL Selector operator - or write your own application.
You can do one thing. Load the json formatted rows in another teradata table.
Keep that table column as varchar and then do a tptexport of that column/table.
It should work.
INSERT INTO test (col1,col2...,jsn_obj)
SELECT col1,col2,..
JSON_Compose(<. columns you want to inlcude in your json file)
FROM <schemaname>.<tablename>
;

how to read multiple csv files with different schema in pyspark?

I have different csv files kept in sub folders in a given folder and some of them have one format and some of them have another format in the column names.
april_df = spark.read.option("header", True).option("inferSchema", True).csv('/mnt/range/2018_04_28_00_11_11/')
Above command only refers to one format and ignores other format. Is there any quick way in the parameter like mergeschema for parquet?
format of some files is like:
id ,f_facing ,l_facing ,r_facing ,remark
other is
id, f_f, l_f ,r_f ,remark
but there could be chances in the future that some columns are missing etc so need a robust way to handle this.
It is not. Either the column should be filled with null in the pipeline or you will have to specify the schema before you import the file. But if you have an understanding of what columns might be missing in the future, you could possibly create a scenario where based on length of the df.columns, you specify the schema, although it seems tedious.

Import csv in Rapidminer is not loading data properly

Importing csv in Rapidminer is not loading data properly in the attributes/ columns and returns errors.
I have set the parameter values correctly in the 'Data Import Wizard'.
Column Separation is set to comma and when I check the "Use Quotes" parameter I see that there are too many "?" appear in the columns even though there is data in the actual csv file.
And when I do not check the “Use Quotes” option then I notice that the content of the columns are distributed across different columns, i.e., data does not appear in the correct column. It also gives error for the date column.
How to resolve this? Any suggestions please? I saw a lot of Rapidminer videos and read about it but did not help.
I am trying to import twitter conversations data which I exported from a 3rd party SaaS tool which extracts Twitter data for us.
Could someone help me soon please? Thanks, Geeta
It's virtually impossible to debug this without seeing the data.
The use quotes option requires that each field is surrounded by double quotes. Do not use this if your data does not contain these because the input process will import everything into the first field.
When you use comma as the delimiter, the observed behaviour is likely to be because there are additional commas contained in the data. This seems likely if the data is based on Twitter. This confuses the import because it is just looking for commas.
Generally, if you can get the input data changed, try to get it produced using a delimiter that cannot appear in the raw text data. Good examples would be | or tab. If you can get quotes around the fields, this will help because it allows delimiter characters to appear in the field.
Dates formats can be handled using the data format parameter but my advice is to import the date field as a polynominal and then convert it later to date using the Nominal to Date operator. This gives more control especially when the input data is not clean.

weka not setting the atribute values when reading unlabeled data from mysql

I am trying to have weka read unlabeled data in from mysql. I set the Class attribute to have '?' for all the values but I cant set the types of classes to chose from, such as yes and no. I even tried to use the arff to mysql, used the arff that i had tested on and it loaded everything with null for the class values and has not types. Has anyone done this and I have just missed something in the wiki and docs?
EX
data to the arff to mysql method-> #attribute Class {yes,no}
bla,bla,bla,?
data that gets put into mysql -> #atribute Class{}
bla,bla,bla,null
is there something wrong with the method, if not, how do i using the weka library add the yes and no back

Replace missing value with cell above in either Perl or MySQL?

I'm importing a csv file of contacts and where one parent has many children it leaves the duplicated values blank. I need to make sure that they are populated when they reach the database however.
Is there a way that I can implement the following when I'm importing a .csv file into Perl and then exporting into MySQL?
if (value is null)
value = value above.
Thanks!
Why don't you place the individual values you read from the CSV file into an array (e.g. #FIELD_DATA). Then when you encounter an empty field while iterating over a row (e.g. for column 4) you can write
unless (length($CSV_FIELD[4])) {
$CSV_FIELD[4] = $FIELD_DATA[4]
}
Not with an import statement afaik. You could, however, make use of triggers (http://dev.mysql.com/doc/refman/5.0/en/triggers.html). Keep in mind though, that this will seriously impact the performance of the import statement.
Also: if they are duplicate values you should have a critical look at your database model or your setup overall.