Salutations. I have trouble with preparing data for FP-Growth algorithm. I have Integer attribute like age and real value attributes like pulse and temperature. What should I do with these values? Filter them out or turn into Binomials?
Related
I want to use rapidminer or gate to extract abstract about entities(Characters) (or just main characteristics) and the relations between entities in a story. Do you have an idea or sample i can modify to that aim?
I tried to use extensions with rapid miner like Aylien and Rosette, but the extract entities operator asks about an attribute parameter, and I couldn't discover what its value about? where to get it? and how to continue with finding the relations between entities?
when using the Extract Entities operator fro mthe Aylien extension for RapidMiner, the input attribute parameter should be the attribute (column in the example set) that contains the text examples you want to analyze.
For more inspiration take a look at the text mining section of the RapidMiner Community.
I have category information in the training input data, I am wondering what's the best way to normalize it.
The category information is like "city", "gender" and etc.
I'd like to use Keras to handle the process.
Scikitlearn has a preprocessing library with functions to normalize or scale your data.
This video gives an example for how to preprocess data that will be used for training a model with Keras. The preprocessing here is done with the library mentioned above.
As shown in the video, with the use of Scikitlearn's MinMaxScaler class, you can specify a range that you want your data to be transformed into, and then fit your data to that range using the MinMaxScaler.fit_transform() function.
I am trying to complete San Francisco Crime Classification kaggle excercise with Rapidminer but I don't find any help on how to create a csv in kaggle submission format from rapidminer.
Once you have the example set containing the results you need to export using the Write CSV operator. This will create columns in the CSV file with names matching the names of the example set attributes.
You need to submit the output of your trained model (on the test dataset). The submission matrix will have dimensions like 87k * 40. 1 column for the Id and 39 columns should be the total number of applicable predictions (crime types) and for each example, the row will be filled with the probabilities of each crime type (prediction probability). Columns should be labeled as follows:
Id WARRANTS ... LARCENY
i.e the first column for Id and the rest for Crime types
Use Write CSV operator in Rapidminer to convert thedata
I am a new learner in WEKA. I use Car Evaluation dataset. First, I copied all attributes, instances and values correctly in Excel and save as csv file. I opened that csv file in WEKA. I can see all count of classes, attributes etc. However, I cannot see for doors and persons attributes. I am getting "Attribute is neither numeric nor nominal."
These attributes get values such as "2","3" and "more". They take both numeric and nominal values. In WEKA their types are string. How can I change attribute types or which method should I apply to see their visualization and counts?
WEKA can read a csv file, but the csv gives no information about the type of the attributes. That is why WEKA encourages you to use arff file format. arff format is the same as csv except that it has a header that describes the variables (and allows comments and other documentation). The header will contain things like
#attribute mpg numeric
#attribute cyl numeric
#attribute doors {2,3,more}
to indicate that mpg and cyl will have numeric values while doors will be a factor that can take on any of the three values "2","3", or "more". You will need to be sure that you specify all of the possible values for factors like doors. You can simply add the header in a text editor if you know what the header should look like. You can get more details on the arff format at This WEKA site or This University of Waikato site.
Perhaps you should decide for making the attribute all numeric, or all nominal (also known as categorical, or all strings).
Benefits of an all numeric attribute: algorithms can determine a mathematical relationship between that attribute and any other attribute, including the target (or desired output), e.g., correlation, dependence/independence, covariance. Furthermore, if you use tree-based algorithms, nodes can define decision rules such as doors>3 or persons<2.
The benefit of having an all nominal attributes includes: algorithms can finish faster because of the limited number of things that can be done with categorical values. Cons: most algorithms do not directly support nominal attributes. Tree-based algorithms are limited in the type of decisions nodes they can produce, e.g., doors is '3' or persons is not 'more'.
Caveat: if the attribute you are dealing with is the target or desired output, having it all numeric will make weka interpret it as a regression problem, while having that attribute as nominal will automatically be interpreted as a classification problem.
If you are interested in making your attribute all numeric, you could probably replace all occurrences more with, say, a -1 using excel.
If later down the road you need to go from all numeric to a nominal attribute, you could simply use a filter do to that. Or if you are using the java API you could check Walter's solution:
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NumericToNominal;
public class Main {
public static void main(String[] args) throws Exception {
//load training instances
Instances originalTrain= //...load data with numeric attributes
NumericToNominal convert= new NumericToNominal();
String[] options= new String[2];
options[0]="-R";
options[1]="1-2"; //range of variables to make numeric
convert.setOptions(options);
convert.setInputFormat(originalTrain);
Instances newData=Filter.useFilter(originalTrain, convert);
System.out.println("Before");
for(int i=0; i<2; i=i+1) {
System.out.println("Nominal? "+originalTrain.attribute(i).isNominal());
}
System.out.println("After");
for(int i=0; i<2; i=i+1) {
System.out.println("Nominal? "+newData.attribute(i).isNominal());
}
}
}
USING SCHEME:
I am working on a question in my assignment that asks the follows:
A toy data structure specifies the name of a toy, a description, the acquisition price, and the recommended sales price. Create
constructors and accessors for the toy structure, ensuring that you do
not use Racket’s builtin structs [1 marks]. Using only your accessors
to get at the data in your structure, define a function that sorts a
list of toy structures by the difference between the two prices [2
marks].
Is the question asking to create my own toy data structure with constructors that has the name of the toy,description, and acquisition price and accessors to get the variables of toy like name or description?
if so, how do i do this?
is there a keyword like struct in scheme?
Is the question asking to create my own toy data structure with constructors that has the name of the toy,description, and acquisition price and accessors to get the variables of toy like name or description? if so, how do i do this?
Yes, that's exactly what the assignment says. Hint: a simple list will do, store each field in a position and the accessors will return the element at the position. For instance, if we store the name in the first position, then the accessor will return the car of the list, and so on.
Is there a keyword like struct in scheme?
Yes, oddly enough in Racket is called struct (it's a macro, not a keyword). Also in R7RS (or in SRFI-9) we have record types. But the assignment specifically states that you must not use this!