Can a resample result object be converted to BMR result object in MLR? - mlr

I want to convert resample result object to BMR result object and combine it with previous BMR result object? This is possible in MLR3 (as_benchmark_result() and $cobmine()) but not sure if it is also possible in MLR

Unfortunately not. This is one of the limitations of the old mlr which is now better implemented in mlr3.
(If you are missing a feature from mlr in mlr3, please open an issue in the mlr3 repo on GitHub 🙂️)

Related

Read json with glue crawler return UNKNOWN classification

I have a json file which of following format
{"result": [{"key1":"value1", "key2":"value2", "key3":"value3"}]}
When I use the crawler the table created has classification UNKOWNN. I have done some research and if you make a custom classifier with JsonPath $[*] you should be able to get the whole array. Unfortunately this does not work, for me at least. I created a new crawler after creating the classifier as it would not work if the old crawler were updated with the classifier.
Has anyone run into this issue and can be of help?
Your JSONPath is assuming that the root is a collection, eg.
[{"result ..},{}]
Since your root is not a collection, try a JSONPath like this:
$.result
That assumes that the whole object is the value you want, you may also want to do:
$.result[*]
That will get each entry in the result collection as a separate object.
I found a workaround..
In my python script I select the "result" array. In other words I do not have the "result" key now. I can then use the classifier with following JsonPath $[*]. This workaround worked fine for me.
Have a nice one!

Write PySpark Dataframe to GCS with Overwrite [duplicate]

I am trying to overwrite a Spark dataframe using the following option in PySpark but I am not successful
spark_df.write.format('com.databricks.spark.csv').option("header", "true",mode='overwrite').save(self.output_file_path)
the mode=overwrite command is not successful
Try:
spark_df.write.format('com.databricks.spark.csv') \
.mode('overwrite').option("header", "true").save(self.output_file_path)
Spark 1.4 and above has a built in csv function for the dataframewriter
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter
e.g.
spark_df.write.csv(path=self.output_file_path, header="true", mode="overwrite", sep="\t")
Which is syntactic sugar for
spark_df.write.format("csv").mode("overwrite").options(header="true",sep="\t").save(path=self.output_file_path)
I think what is confusing is finding where exactly the options are available for each format in the docs.
These write related methods belong to the DataFrameWriter class:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter
The csv method has these options available, also available when using format("csv"):
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.csv
The way you need to supply parameters also depends on if the method takes a single (key, value) tuple or keyword args. It's fairly standard to the way python works generally though, using (*args, **kwargs), it just differs from the Scala syntax.
For example
The option(key, value) method takes one option as a tuple like option(header,"true") and the .options(**options) method takes a bunch of keyword assignments e.g. .options(header="true",sep="\t")
EDIT 2021
The docs have had a huge facelift which may be good from the perspective of new users discovering functionality from a requirement perspective, but does need some adjusting to.
DataframeReader and DataframeWriter are now part of the Input/Output in the API docs: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html#input-and-output
The DataframeWriter.csv callable is now here https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameWriter.csv.html#pyspark.sql.DataFrameWriter.csv

JsPath: navigate backward

let me explain the problem that I'm facing:
I have two JSON objects, let's call them js1 and js2, I need to update js1 using "parts" of js2 object, and to do that I need to identify where this parts that needs to be updated are in js1.
To do that, I'm using a function that, for a certain input, return the full JsPath from root to the input value, and I get back a JsPath like this:
/priceLists(1)/sections(0)/items(0)(0)/itemIdentifier
what I need to do is navigate backward one step, to obtain a JsPath like
/priceLists(1)/sections(0)/items(0)(0)
I'm probably very dumb (and with not so much experience with Scala in general) but I can't find any way to do that.
The only way I found to get rid of that last part of the path is to transform JsPath into a list of PathNode, but then I don't know how to transform back that list of PathNodes into a JsPath.
I'm using Play 2.6 and Scala 2.11.8.

Using lapply or for loop on JSON parsed text to calculate mean

I have a json file that has a multi-layered list (already parsed text). Buried within the list, there is a layer that includes several calculations that I need to average. I have code to do this for each line individually, but that is not very time efficient.
mean(json_usage$usage_history[[1]]$used[[1]]$lift)
This returns an average for the numbers in the lift layer of the list for the 1st row. As mentioned, this isn't time efficient when you have a dataset with multiple rows. Unfortunately, I haven't had much success in using either a loop or lapply to do this on the entire dataset.
This is what happens when I try the for loop:
for(i in json_usage$usage_history[[i]]$used[[1]]$lift){
json_usage$mean_lift <- mean(json_usage$usage_history[[i]]$used[[1]]$lift)
}
Error in json_usage$affinity_usage_history[[i]] :
subscript out of bounds
This is what happens why I try lapply:
mean_lift <- lapply(lift_list, mean(lift_list$used$lift))
Error in match.fun(FUN) :
'mean(lift_list$used$lift)' is not a function, character or symbol
In addition: Warning message:
In mean.default(lift_list$used$lift) :
argument is not numeric or logical: returning NA
I am new to R, so I know I am likely doing it wrong, but I haven't found any examples of what I'm trying to do. I'm running out of ideas and growing increasingly frustrated. Please help!
Thank you!
The jsonlite package has a very useful function called flatten that you can use to convert the nested lists that commonly appear when parsing JSON data to a more usable dataframe. That should make it simpler to do the calculations you need.
Documentation is here: https://cran.r-project.org/web/packages/jsonlite/jsonlite.pdf
For an answer to a vaguely similar question I asked (though my issue was with NA data within JSON results), see here: Converting nested list with missing values to data frame in R

Library to convert JSON string to Erlang record

I've a large JSON string, I want to convert this string into Erlang record.
I found jiffy library but it doesn't completely convert to record.
For example:
jiffy:decode(<<"{\"foo\":\"bar\"}">>).
gives
{[{<<"foo">>,<<"bar">>}]}
but I want the following output:
{ok,{obj,[{"foo",<<"bar">>}]},[]}
Is there any library that can be used for the desired output?
Or is there any library that can be used in combination of jiffy for further modifying the output of it.
Consider the fact the JSON string is large, and I want the output is minimum time.
Take a look at ejson, from the documentation:
JSON library for Erlang on top of jsx. It gives a declarative interface for jsx by which we need to specify conversion rules and ejson will convert tuples according to the rules.
I made this library to make easy not just the encoding but rather the decoding of JSONs to Erlang records...
In order for ejson to take effect the source files need to be compiled with parse_transform ejson_trans. All record which has -json attribute can be converted to JSON later.