Export Distribution Model in RapidMiner - rapidminer

I have an example set in rapid miner.It has 2 columns.
for Example
colA colB
a 1
a 2
b 3
b 2
=====
I have used naive Bayes. It gives probability for each of colB for colA in distribution table.
for example, P(2) = .5
I need that distribution table output.
write model, excel csv, write does not help.
What should I do ?
Thanks in advance.

The simplest solution would just mark the table with you mouse (Strg+A works as well) and use copy and paste.
Unfortunately this only works manually, if you have to export the data very often, the next best step would be to write your own operator for it (which is actually quite simple and requires only basic Java skills):
http://docs.rapidminer.com/developers/

Yes you can. If you install the Reporting extension from the marketplace (it's free) then you can export the distribution table, plot view or text view.
Here's a sample process.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="68" name="Generate Report" width="90" x="45" y="34">
<parameter key="report_name" value="myReport"/>
</operator>
<operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Golf" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Golf-Testset" width="90" x="179" y="210">
<parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
</operator>
<operator activated="true" class="naive_bayes" compatibility="7.0.000" expanded="true" height="82" name="Naive Bayes" width="90" x="246" y="34"/>
<operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="68" name="Report" width="90" x="380" y="34">
<parameter key="report_name" value="myReport"/>
<parameter key="report_item_header" value="Distribution Table"/>
<parameter key="specified" value="true"/>
<parameter key="reportable_type" value="Distribution Model"/>
<parameter key="renderer_name" value="Distribution Table"/>
<list key="parameters">
<parameter key="min_row" value="1"/>
<parameter key="max_row" value="2147483647"/>
<parameter key="min_column" value="1"/>
<parameter key="max_column" value="2147483647"/>
<parameter key="sort_column" value="2147483647"/>
<parameter key="sort_decreasing" value="false"/>
</list>
</operator>
<operator activated="true" class="apply_model" compatibility="7.0.000" expanded="true" height="82" name="Apply Model" width="90" x="514" y="120">
<list key="application_parameters"/>
</operator>
<connect from_op="Golf" from_port="output" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Golf-Testset" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Naive Bayes" from_port="model" to_op="Report" to_port="reportable in"/>
<connect from_op="Report" from_port="reportable out" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="90"/>
<portSpacing port="sink_result 2" spacing="18"/>
</process>
</operator>
</process>

Related

Rapidminer count total occurrence and sort by date

I have rapidminer example set like this,
ID Issue Exp
100 9/8/2020 11/8/2020
100 8/5/2019 9/5/2019
101 6/3/2020 10/1/2020
102 8/15/2020 12/12/2020
I want to add a new column which will count the occurrence of the ID by adding the numbers and sort by the earliest date so we know at what date how many count I had.
Output like this,
ID Issue Exp Count
100 8/5/2019 9/5/2019 1
100 9/8/2020 11/8/2020 2
101 6/3/2020 10/1/2020 1
102 8/15/2020 12/12/2020 1
But when I aggregate by ID and do a count it will just count the total instead and show them for the same ID. So, for ID 100 it shows me 2 both the times because it is just adding the numbers both the times.
For example, for ID 100 in 2019 we had only 1 issue date hence count is 1, when we find ID 100 again at 2020 the count will be 2. So, the sort by date is also important because it will help us find the ID occurrence in correct order.
Any help is appreciated.
Thanks.
One approach is to use the Loop Values operator to loop through all the possible values of the ID operator, use this value to filter the example set (which has already been sorted), generate a new incrementing id from this filtered set and finally append all the filtered examples back together.
Here's the process and corresponding XML to do this.
<?xml version="1.0" encoding="UTF-8"?><process version="9.9.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.9.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.9.000" expanded="true" height="68" name="Retrieve occById" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/data/occById"/>
</operator>
<operator activated="true" class="blending:sort" compatibility="9.9.000" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
<list key="sort_by">
<parameter key="ID" value="ascending"/>
<parameter key="Issue" value="ascending"/>
</list>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="9.9.000" expanded="true" height="82" name="Loop Values" width="90" x="313" y="34">
<parameter key="attribute" value="ID"/>
<parameter key="iteration_macro" value="loop_value"/>
<parameter key="reuse_results" value="false"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="9.9.000" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
<parameter key="parameter_string" value="ID=%{loop_value}"/>
<parameter key="parameter_expression" value=""/>
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="ID.eq.%{loop_value}"/>
</list>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="generate_id" compatibility="9.9.000" expanded="true" height="82" name="Generate ID" width="90" x="313" y="34">
<parameter key="create_nominal_ids" value="false"/>
<parameter key="offset" value="0"/>
</operator>
<connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="9.9.000" expanded="true" height="82" name="Append" width="90" x="447" y="34">
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="merge_type" value="all"/>
</operator>
<connect from_op="Retrieve occById" from_port="output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
The input data was handcrafted and stored with name occById in the local repository - it looks like this.
The result is below.

How to store association rules from RapidMiner into MySQL table

I need to export fp-growth association rules from RapidMiner to a MySQL database.
The table contains those columns: premises, conclusion, support and confidence.
Which operator should I use?
you can use the "Association Rules to ExampleSet" Operator from the Converters extension, available at the RapidMiner marketplace. The relevant attributes from the resulting example set can be easily stored in a database.
See the sample process below for an example.
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Iris" origin="GENERATED_TUTORIAL" width="90" x="45" y="120">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="discretize_by_frequency" compatibility="7.1.001" expanded="true" height="103" name="Discretize by Frequency" origin="GENERATED_TUTORIAL" width="90" x="179" y="120">
<parameter key="number_of_bins" value="5"/>
<parameter key="range_name_type" value="short"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="7.1.001" expanded="true" height="103" name="Nominal to Binominal" origin="GENERATED_TUTORIAL" width="90" x="313" y="120">
<parameter key="transform_binominal" value="true"/>
<parameter key="use_underscore_in_name" value="true"/>
</operator>
<operator activated="true" class="concurrency:fp_growth" compatibility="9.0.002" expanded="true" height="82" name="FPGrowth" origin="GENERATED_TUTORIAL" width="90" x="447" y="120">
<parameter key="min_support" value="0.1"/>
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_number_of_itemsets" value="1"/>
<enumeration key="must_contain_list"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="9.0.002" expanded="true" height="82" name="Create Association Rules" origin="GENERATED_TUTORIAL" width="90" x="581" y="120"/>
<operator activated="true" class="converters:rules_2_example_set" compatibility="0.4.000" expanded="true" height="82" name="Association Rules to ExampleSet" width="90" x="782" y="136"/>
<connect from_op="Iris" from_port="output" to_op="Discretize by Frequency" to_port="example set input"/>
<connect from_op="Discretize by Frequency" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="FPGrowth" to_port="example set"/>
<connect from_op="FPGrowth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_op="Association Rules to ExampleSet" to_port="rules input"/>
<connect from_op="Association Rules to ExampleSet" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="90"/>
<portSpacing port="sink_result 2" spacing="18"/>
</process>
</operator>
</process>

Calculate Percentage Values

My Rapidminer process results are published as follows
Row No. Count
1 9.0
2 11.0
3 32.0
If I want to calculate:
(9/32)*100 and
(11/32)*100
from this result set, how would I do it?
the solution is not quite straight forward, as RapidMiner normally treats Examples (rows) as independent of each other.
What you can do is to extract the value needed as a macro and use it in the Generate Attributes Operator.
See the attached sample process for a solution to your particular problem. Just copy and paste the XML below to your process window in RapidMiner.
Also feel free to ask further, or re-post, questions in the RapidMiner community forum.
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="85">
<list key="attribute_values">
<parameter key="Count" value="9"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.001" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="112" y="187">
<list key="attribute_values">
<parameter key="Count" value="11"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.001" expanded="true" height="68" name="Generate Data by User Specification (3)" width="90" x="112" y="340">
<list key="attribute_values">
<parameter key="Count" value="32"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="124" name="Append" width="90" x="380" y="187"/>
<operator activated="true" class="extract_macro" compatibility="7.6.001" expanded="true" height="68" name="Extract Macro" width="90" x="581" y="187">
<parameter key="macro" value="divisor"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="Count"/>
<parameter key="example_index" value="3"/>
<list key="additional_macros"/>
<description align="center" color="green" colored="true" width="126">Extracting the third value as a macro. It can be the called using the %{macro_name} syntax</description>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="782" y="187">
<list key="function_descriptions">
<parameter key="Percentage" value="5"/>
</list>
<description align="center" color="green" colored="true" width="126">Creating a new Attribute (column) with the desired calculation<br><br>Check the final paragraph of the help text for the &quot;Generate Attribute&quot; Operator for a description of how to work with macros</description>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
<connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
<connect from_op="Append" from_port="merged set" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="581" resized="true" width="444" x="56" y="18">Generating sample data to fit the original problem</description>
</process>
</operator>
</process>

Creating a pareto chart in RapidMiner

I am not able to plot a simple pareto chart.
My data looks like:
and when I try to create a pareto chart, I get a blank space, I also cannot select a value for "Count Value":
What am I missing here?
My sample data is stored in that xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="447" y="75">
<list key="attribute_values">
<parameter key="category" value=""black""/>
<parameter key="Incidents" value="10"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="447" y="390">
<list key="attribute_values">
<parameter key="category" value=""blue""/>
<parameter key="Incidents" value="2"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="447" y="210">
<list key="attribute_values">
<parameter key="category" value=""green""/>
<parameter key="Incidents" value="7"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="447" y="165">
<list key="attribute_values">
<parameter key="category" value=""white""/>
<parameter key="Incidents" value="8"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (5)" width="90" x="447" y="300">
<list key="attribute_values">
<parameter key="category" value=""red""/>
<parameter key="Incidents" value="2"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (6)" width="90" x="447" y="480">
<list key="attribute_values">
<parameter key="category" value=""Yellow""/>
<parameter key="Incidents" value="1"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (7)" width="90" x="447" y="705">
<list key="attribute_values">
<parameter key="category" value=""Gray""/>
<parameter key="Incidents" value="1"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (8)" width="90" x="447" y="840">
<list key="attribute_values">
<parameter key="category" value=""Navy""/>
<parameter key="Incidents" value="1"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (9)" width="90" x="447" y="570">
<list key="attribute_values">
<parameter key="category" value=""Purple""/>
<parameter key="Incidents" value="1"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="5.3.015" expanded="true" height="220" name="Append" width="90" x="715" y="120"/>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 5"/>
<connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 4"/>
<connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 2"/>
<connect from_op="Generate Data by User Specification (5)" from_port="output" to_op="Append" to_port="example set 3"/>
<connect from_op="Generate Data by User Specification (6)" from_port="output" to_op="Append" to_port="example set 6"/>
<connect from_op="Generate Data by User Specification (7)" from_port="output" to_op="Append" to_port="example set 9"/>
<connect from_op="Generate Data by User Specification (8)" from_port="output" to_op="Append" to_port="example set 7"/>
<connect from_op="Generate Data by User Specification (9)" from_port="output" to_op="Append" to_port="example set 8"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
So I found a workaround (thanks to Andrew), which is only working for this example set.
I had to "de-aggregate" it and add a new polynominal attribute with the same value for every example.
Then I could create a Pareto chart, group-by 'category' and set the count-column to the new attribute.
Lead to this chart:
When I do this with my dataset I get this chart:
I guess without being able to configure the pareto chart, it is really bad for a lot of different values in the group-by category.

RapidMiner: filtering out n-grams with n>3

I have about 2 millions of messages (Data Tables).
I'd like to filter out messages containing frequent X-gram, while X>3.
(frequency in % of all messages)
For example:
Message 1 = "1 2 3 4 5"
Message 2 = "1 2 3 4 6"
Message 3 = "1 2 3"
M1 and M2 both have 4-gram 1_2_3_4, so I wand to exclude them, so the result has to leave only M3.
You can use the text processing extension to find n-grams, count how many are longer than three and add that number to the example set to allow subsequent filtering. You can also retain the original data.
Here's an example that you could copy (note that you have to install the text mining extension from the RapidMiner marketplace)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.5.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="parallelize_main_process" value="false"/>
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="6.5.000" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="75">
<list key="attribute_values">
<parameter key="message" value=""1 2 3 4""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="6.5.000" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="45" y="165">
<list key="attribute_values">
<parameter key="message" value=""1 2 3 4 5""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="6.5.000" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="45" y="255">
<list key="attribute_values">
<parameter key="message" value=""1 2 3""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="6.5.000" expanded="true" height="112" name="Append" width="90" x="246" y="75">
<parameter key="datamanagement" value="double_array"/>
<parameter key="merge_type" value="all"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="6.5.000" expanded="true" height="76" name="Nominal to Text" width="90" x="380" y="75">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="6.5.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="514" y="75">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<parameter key="parallelize_vector_creation" value="false"/>
<process expanded="true">
<operator activated="true" class="multiply" compatibility="6.5.000" expanded="true" height="94" name="Multiply" width="90" x="44" y="30"/>
<operator activated="true" class="text:tokenize" compatibility="6.5.000" expanded="true" height="60" name="Tokenize" width="90" x="179" y="30">
<parameter key="mode" value="regular expression"/>
<parameter key="characters" value=".:"/>
<parameter key="expression" value="\s"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<operator activated="true" class="text:generate_n_grams_terms" compatibility="6.5.000" expanded="true" height="60" name="Generate n-Grams (Terms)" width="90" x="179" y="120">
<parameter key="max_length" value="4"/>
</operator>
<operator activated="true" class="text:filter_tokens_by_content" compatibility="6.5.000" expanded="true" height="60" name="Filter Tokens (by Content)" width="90" x="179" y="210">
<parameter key="condition" value="contains match"/>
<parameter key="regular_expression" value="(_.){3,}"/>
<parameter key="case_sensitive" value="false"/>
<parameter key="invert condition" value="false"/>
</operator>
<operator activated="true" class="text:extract_token_number" compatibility="6.5.000" expanded="true" height="60" name="Extract Token Number" width="90" x="179" y="300">
<parameter key="metadata_key" value="numberOfNGramsGT3"/>
<parameter key="condition" value="all"/>
<parameter key="case_sensitive" value="false"/>
<parameter key="invert_condition" value="false"/>
</operator>
<connect from_port="document" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Tokenize" to_port="document"/>
<connect from_op="Multiply" from_port="output 2" to_port="document 1"/>
<connect from_op="Tokenize" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
<connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
<connect from_op="Filter Tokens (by Content)" from_port="document" to_op="Extract Token Number" to_port="document"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="filter_examples" compatibility="6.5.000" expanded="true" height="94" name="Filter Examples" width="90" x="782" y="75">
<parameter key="parameter_expression" value=""/>
<parameter key="condition_class" value="custom_filters"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="numberOfNGramsGT3.eq.0"/>
</list>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
<connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
<connect from_op="Append" from_port="merged set" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>