NA value in RAPIDMINER? - rapidminer

what is equivalent's of R ('NA'|'NULL'|'None') value in RAPIDMINER? Or what is some other value for denoting that this value should not be taken into consideration while making calculations, or that it is missing?

Missing values show up as '?' in the data view of example sets and they are counted in the meta-data view. Imported data can obviously have them and you can generate your own using the Declare Missing Values operator or within the Generate Attributes operator using a code fragment containing an invalid mathematical operation like 0/0. You can test for the presence of a missing value by using the missing() function within the Generate Attributes operator.

It is probabbly the question mark: ?

Related

SSIS REPLACE Function - Derived Column

I have a variable with a value of '1617'. It is a DT_WSTR datatype currently. Sometimes I need a string, others an integer.
I am using a derived column to replace the ' values so that I can cast this value as an integer.
My replace function is not working.
REPLACE([User::schoolYear],"'","")
What am I doing wrong?
The problem with your supplied expression, is that you are not referencing the variable schoolYear. Sometimes, you can address a variable as #schoolYear but the consistent, explicit syntax I would encourage is #[User::schoolYear] That way, you can identify the namespace in case someone like me has used a custom namespace.
Your Derived Column expression then becomes
REPLACE(#[User::schoolYear],"'","")
You are having space before and after the single quote, which is causing the replace to fail.
Modify the expression as given below. I have tested it. It is working fine.
REPLACE([User::schoolYear],"'","")

Reading negative numbers in a column

I'm using SSIS to separate good data from unusable date. In order to do that I used derived columns, script task and conditional split where I assigned certain conditions. One of the conditions I need to apply is that none of the numbers in one column cannot be negative. I'm guessing that the best way to solve this would be using conditional split, but I cannot get it to work. I'm new to SSIS, so any help would be appreciated.
You'd have an Expression like
[MyCaseSensitiveColumnName] < 0
and then name the output path something like BadData_NegativeValue
From the comments
that is what I did before, but I'm getting an error saying that The data types "DT_WSTR" and "DT_I4" are incompatible for binary operator ">"
That error message indicates that you are attempting to compare a unicode string (DT_WSTR) and an integer (DT_I4) and that the expression language does not allow it.
To resolve this type incompatibility, you would need to first convert the value of MyCaseSensitiveColumnName from DT_WSTR to an integer.
I'd likely add a Derived Column Component to my data flow and create a new column called MyCaseSensitiveColumnNameAsInteger with an expression like
(DT_I4) [MyCaseSensitiveColumnName]
Now, that may be perilous depending on the quality of your source data. I don't know why you are pulling numeric data in as a string. If there could be non whole numbers in the data set, then we will need to check before making the cast. If there are NULLs in that dataset, those too may cause issues.
That would result in our conditional split check becoming
[MyCaseSensitiveColumnNameAsInteger] < 0

Search json value in whereIn in laravel?

$exams=[{"id":1}];
$toReturn['exams'] =exams_list::whereIn('examClasses',$exams)->get()->toArray();
How to do this?
You can do as below
$exam_id = 1;
$toReturn['exams'] =exams_list::where('examClasses->"$.id"',$exam_id)->get()->toArray();
//As per your question
//$exams=[{"id":1}]; =======> Wrong way
$exams=["id" => 1];
$toReturn['exams'] =exams_list::where('examClasses->"$.id"',$exam["id"])->get()->toArray();
See example given below, Refer https://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html
column->path
In MySQL 5.7.9 and later, the -> operator serves as an alias for the JSON_EXTRACT() function when used with two arguments, a column identifier on the left and a JSON path on the right that is evaluated against the JSON document (the column value). You can use such expressions in place of column identifiers wherever they occur in SQL statements.

I want to extract the parameters of a url in mysql

I have in my database a column with the parameters value of an url. I want with an sql query to put those parameters in different columns. I give an example:
I have now a column named parameters with for example this value: pOrgNum=j11000&pLanguage=nl&source=homepage
now I want three columns: pOrgnum | pLanguage | source with the values of my parameters.
The problem is that I don't know the order of my parameters or the length of it, so I can't use for example substring(parameters,9,6) to extract the parameter pOrgnum. can someone help me please?
There's a MySQL UDF that you can use to do exactly this, which also handles decoding the params and handles most character encodings, etc.
https://github.com/StirlingMarketingGroup/mysql-get-url-param
Examples
select`get_url_param`('https://www.youtube.com/watch?v=KDszSrddGBc','v');
-- "KDszSrddGBc"
select`get_url_param`('watch?v=KDszSrddGBc','v');
-- "KDszSrddGBc"
select`get_url_param`('watch?v=KDszSrddGBc','x');
-- null
select`get_url_param`('https://www.google.com/search?q=cgo+uint32+to+pointer&rlz=1C1CHBF_enUS767US767&oq=cgo+uint32+to+pointer&aqs=chrome..69i57.12106j0j7&sourceid=chrome&ie=UTF-8','q');
-- "cgo uint32 to pointer"
select`get_url_param`('/search?q=Na%C3%AFvet%C3%A9&oq=Na%C3%AFvet%C3%A9','q');
-- "Naïveté"
Disclaimer, I am the author.
I achieved this by taking the right of the string after the search parameter, then the left of the resulting string before the first &.
This handles
if the parameter was the last in the url (so no "&" follows it)
if the parameter does not exist (returns blank)
varying lengths of the search string (provided you replace "utm_medium" everywhere)
This finds the value of "utm_medium" in a parameter named url:
IF(locate("utm_medium", url)=0, '', LEFT(RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium")),IF(locate("&",RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium")))=0,length(RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium")+1)),locate("&",RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium"))))-1)) utm_medium
To use, find and replace url with your field name, and utm_medium with your url parameter.
May be inefficient, but gets the job done, and couldn't find an easy answer elsewhere
Its code work in mysql:
SELECT substring_index(URL_FIELD,'\',-1) FROM DemoTable;

Why does SSIS TOKEN function fail to count adjacent column delimiters?

I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.