Creating new attribute - rapidminer

In rapidminer i have a data set with an attribute called address which contain property address, what i need to do is create a new attribute which will only have the last 3 words contain in each property address. ie 231 new road County Dublin Ireland what i want to have is County Dublin Ireland in the new attribute. Could anybody help me with this process as i am very new to rapidminer. I have tried to do it with the generate attribute process useing the function expression options but no success.

There might be an easier way to achieve that, but you can use the Execute Script operator and some regular expressions. This example script will replace the values of attribute "att1" with only the last three words:
import java.util.regex.*
exampleSet = operator.getInput(ExampleSet.class)
Pattern p = Pattern.compile("^.*?(\\S+\\s\\S+\\s\\S+)\$")
for(Example example : exampleSet){
value = example["att1"]
print(value)
Matcher m = p.matcher(value)
if(m.matches()){
example["att1"] = m.group(1)
}
}
return exampleSet
Edit:
There really is much easier way: Use the Generate extract operator with regular expression: (\S+\s\S+\s\S+)$ You may need to adapt the regular expression to your data.

Related

Change format of devise in apps script

I've a Sheets with many values in euro with 3 values after the decimal point (for exemple (2,154 €). I would like to convert this document in PDF to join it in mail.
When I convert it in temporary Sheet, this value change and I have 2.154 instead of. I would like to change the format of this cell.
So I decided to apply a setFormatNumber (.setNumberFormat('#,###.000 [$€]')) at this value but I don't get the result what I want. I obtain 2.154 € but I would like to have "," an not "." to separe entire to decimal values. I try to modify setFormatNumber by (.setNumberFormat('#,###,000 [$€]')) but my result is 2.154000 €.
I don't want to apply toString method and use replace method after because I think it's possible to have what I want by using this method.
Anyone can help me with that please ? I don't join my code because it's so long and, except the setNumberFormat, it's not interesting for you but if you need it, I can edit my post. Sorry for my english, I don't speak and write it very well.
Dots and commas have other meanings in the context of this “mask”-like parameter.
The numberFormat parameter of the setNumberFormat() is documented here.
According to the documentation, dots indicate where the decimal separator will be in the mask and commas indicate where the thousand separator will be.
The symbol of the decimal separator is however controlled according to the Spreadsheet locale settings.
You can change those settings via UI going to File > Settings > General > Locale or via Apps Scripts using the method SpreadsheetApp.getActive().setSpreadsheetLocale('XXXXX')

Jora name with Dots

I am using the Chrome plugin JsonDiscovery that use JORA to Query in JSON and make JSON query. But know when I consult the MS DevOps API that return the fields with dots in the name I coldnt make the query because JORA understand the dot like to get the next field in the hierarchic.
{
Microsoft.VSTS.Common.ValueArea: "Business",
Microsoft.VSTS.Scheduling.Effort: 40,
Microsoft.VSTS.Scheduling.StartDate: "2021-01-11T03:00:00Z",
}
Dows Someone know how I can make the query with those dots in the name ?
When a property has forbidden chars for an identifier, you should use the same approach as in JavaScript, i.e. $['Microsoft.VSTS.Common.ValueArea'].
For Jora foo['bar'] is the same as foo.bar. However, in first case you can use any chars for a property name, but the second one is faster to type and easier to read.
After many times debugging the Jora, I found the pick method.
..pick("Microsoft.VSTS.Common.ValueArea")

Google Apps Script findtext searchpattern format?

I am trying to pass in simple regex strings like
findText("/a/");
or
findText(/a/);
but it does not find anything. If I pass in only the text that works like this
findText("a");
How to pass regex strings in there?
It's not super clear in the documentation of findText, but the documentation for replaceText is more clear:
The search pattern is passed as a string, not a JavaScript regular expression object.
The example shown in the documentation of replaceText shows that your 3rd example is the correct one (where the search for a is shown as just the string, "a".
body.replaceText("^.*Apps ?Script.*$", "Apps Script");
Obviously, String.search() will work here as well, but if you're looking to manipulate the attributes of the text, rather than just the string contents, using the built-in javascript function might leave you hanging.
Use the String.search() method.
function test(){
var testString = "1212a1212";
var results = testString.search(/a/);
Logger.log(results); //results = 4;
}
It is possible to use regex expressions with the findText(searchPattern) function however the expression needs to be in the RE2 syntax.
For example if you wanted to do a case insensitive search for the word 'antevasin' you could specify
let searchResult = DocumentApp.getActiveDocument().getBody().editAsText().findText( '(?i)antevasin' );
where (?i) turns on case-insensitive matching and would find 'Antevasin' in the document.
This page and this one have some examples and more detail.

Formatting String Value as Monetary Value in Google Apps Script

I have created a Google Form that has a field where a numeric value is entered by the user (with numeric validation on the form) and is then submitted. When the number (e.g., 34.00) gets submitted, it appears as 34 in the Google spreadsheet, which is annoying but understandable. I already have a script that runs when the form is submitted to generate a nicely-formatted version of the information that was submitted on the form, but I'm having trouble formatting that value as a monetary value (i.e., 34 --> $34.00) using the Utilities.formatString function. Can anyone help? Thanks in advance.
The values property of a form submission event as documented in Event Objects is an array of Strings, like this:
['2015/05/04 15:00', 'amin#example.com', 'Bob', '27', 'Bill', '28', 'Susan', '25']
As a result, a script that wishes to use any of these values as anything but a String will need to do explicit type conversion, or coerce the string to number using the unary operator (+).
var numericSalary = +e.values[9];
Alternatively, you could take advantage of the built-in type determination of Sheets, by reading the submitted value from the range property also included in the event. Just as it does when you type in values at the keyboard, Sheets does its best to interpret the form values - in this case, the value in column J will have been interpreted as a Number, so you could get it like this:
var numericSalary = e.range.getValues()[9];
That will be slower than using the values array, and it will still provide an unformatted value.
Formatting
Utilities.formatString uses "sprintf-like" formatting values. If you search the interwebs, you'll find lots of references for sprint variables, some of which are helpful. Here's a format that will turn a floating-point number into a dollar-formatted string:
'$%.2f'
$ - nothing magic, just a dollar sign
% - magic begins here, the start of a format
.2 - defines a number with two decimal places, but unspecified digits before the radix
f - expect a floating point number
So this is your simplest line of code that will do the conversion you're looking for:
var currentSalary = Utilities.formatString( '$%.2f', +e.values[9] );
The correct format to get your string into a proper numeric format is as follows:
var myString = myString.toLocaleString('en-US', { style: 'currency', currency: 'USD' });
If you only care about the $ sign and don't need commas the code below (also shown in one of the answers above will suffice.
var myString = Utilities.formatString( '$%.2f', myString );
In my experience toLocaleString performs sometimes performs strangely in Apps Script as opposed to JavaScript
I know this was years ago but might help someone else.
You should be able to add the commas in between with this
.toLocaleString(); after you format the string decimal then add the '$' sign in the beginning by just concating them together.
Ex:
myString = Utilities.formatString( '%.2f', myNumber );
myString = '$' + myString.toLocaleString();

Parsing and formatting search results

Search:
Scripting+Language Web+Pages Applications
Results:
...scripting language originally...producing dynamic web pages. It has...graphical applications....purpose scripting language that is...d creating web pages as output...
Suppose I want a value that represents the amount of characters to allow as padding on either side of the matched terms, and another value that represents how many matches will be shown in the result (ie, I want to see only the first 5 matches, nothing more).
How exactly would you go about doing this?
This is pretty language-agnostic, but I will be implementing the solution in a PHP environment, so please restrict answers to options that do not require a specific language or framework.
Here's my thought process: create an array from the search words. Determine which search word has the lowest index regarding where it's found in the article-body. Gather that portion of the body into another variable, and then remove that section from the article-body. Return to step 1. You might even add a counter to each word, skipping it when the counter reaches 3 or so.
Important:
The solution must match all search terms in a non-linear fashion. Meaning, term one should be found after term two if it exists after term two. Likewise, it should be found after term 3 as well. Term 3 should be found before term 1 and 2, if it happens to exist before them.
The solution should allow me to declare "Only allow up to three matches for each term, then terminate the summary."
Extra Credit:
Get the padding-variable to optionally pad words, rather than chars.
My thought process:
Create a results array that supports non-unique name/value pairs (PHP supports this in its standard array object)
Loop through each search term and find its character starting position in the search text
Add an item to the results array that stores this character position it has just found with the actual search term as the key
When you've found all the search terms, sort the array ascending by value (the character position of the search term)
Now, the search results will be in order that they were found in the search text
Loop through the results array and use the specified word padding to get words on each side of the search term while also keeping track of the word count in a separate name/value pair
Pseudocode, or my best attempt at it:
function string GetSearchExcerpt(searchText, searchTerms, wordPadding = 0, searchLimit = 3)
{
results = new array()
startIndex = 0
foreach (searchTerm in searchTerms)
{
charIndex = searchText.FindByIndex(searchTerms, startIndex) // finds 1st position of searchTerm starting at startIndex
results.Add(searchTerm, charIndex)
startIndex = charIndex + 1
}
results = results.SortByValue()
lastSearchTerm = ""
searchTermCount = new array()
outputText = ""
foreach (searchTerm => charIndex in results)
{
searchTermCount[searchTerm]++
if (searchTermCount[searchTerm] <= searchLimit)
{
// WordPadding is a simple function that moves left or right a given number of words starting at a specified character index and returns those words
outputText += "..." + WordPadding(-wordPadding, charIndex) + "<strong>" + searchTerm + "</strong>" + WordPadding(wordPadding, charIndex)
}
}
return outputText
}
Personally I would convert the search terms into Regular Expressions and then use a Regex Find-Replace to wrap the matches in strong tags for the formatting.
Most likely the RegEx route would be you best bet. So in your example, you would end up getting three separate RegEx values.
Since you want a non-language dependent solution I will not put the actual expressions here as the exact syntax varies by language.