Is there a method to serialize a machine learning model in within tidymodels (similar to pickling a model in Python)? - pickle

I am aware that the in Python you can serialize a ML model using the pickle module; however, is there a method to do something similar in the tidymodel space? My goal would be to be able to save a trained model to be deployed later.

In R, you can use saveRDS & readRDS to save/load any R object, just like Python's pickle. Those functions are not specific to Tidymodels, they are basic R functions that can be used to serialize any object.
Usage
saveRDS(any_r_object, "filename.rds")
object_name <- readRDS("filename.rds")
There is also the save() & load() functions, they serve the same function are are mostly similar to saveRDS() & readRDS(). There are many online discussions/blogs comparing the two approaches.

Following up after a good while:
Some model fits in R, e.g. those fitted with the "xgboost" or "keras" engine in tidymodels, require native serialization methods to be saved and reloaded in a new R session properly. saveRDS and readRDS will do the trick most of the time, though fall short for model objects from packages that require native serialization.
The folks from the tidymodels team put together a new package, bundle, to provide a consistent interface for native serialization of model objects. The bundle() verb prepares a model object for serialization, and then you can safely saveRDS() + readRDS() and pass between R sessions as you wish, and then unbundle() in the new session:
mod_bundle <- bundle(mod)
saveRDS(mod_bundle, file = "path/to/file.rds")
# in a new R session:
mod_bundle <- readRDS("path/to/file.rds")
mod_new <- unbundle(mod_bundle)

Related

Accessing regmap RegFields

I am trying to find a clean way to access the regmap that is used with *RegisterNode for creating documentation and testing files. The TLRegisterNode has methods for generating the json through some Annotations. These are done in the regmap method by adding them to the ElaborationArtefacts object. Other protocols don't seem to have these annotations.
Is there anyway to iterate over the "regmap" Register Fields post elaboration or during?
I cannot just access the regmap as it's not really a val/var since it's a method. I can't quite figure out where this information is being stored. I don't really believe it's actually "storing" any information as much as it is simply creating the hardware to attach the specified logic to the RegisterNode based logic.
The JSON output is actually fine for me as I could just write a post processing script to convert JSON to my required formats, but I'm wondering if I can access this information OR if I could add a custom function call at the end. I cannot extend the case class *RegisterNode, but I'm not sure if it's possible to add custom functions to run at the end of the regmap method.
Here is something I threw together quickly:
//in *RegisterRouter.scala
def customregmap(customFunc: (RegField.Map*) => Unit, mapping: RegField.Map*) = {
regmap(mapping:_*)
customFunc(mapping:_*)
}
def regmap(mapping: RegField.Map*) = {
//normal stuff
}
A user could then create a custom function to run and pass it to the regmap or to the RegisterRouter
def myFunc(mapping: RegField.Map*): Unit = {
println("I'm doing my custom function for regmap!")
}
// ...
node.customregmap(myFunc,
0x0 -> coreControlRegFields,
0x4 -> fdControlRegFields,
0x8 -> fdControl2RegFields,
)
This is just a quick example I have. I believe what would be better, if something like this was possible, would be to have a Seq of functions that could be added to the RegisterNode that are ran at the end of the regmap method, similar to how TLRegisterNode currently works. So a user could add an arbitrary number and you still use the regmap call.
Background (not directly part of question):
I have a unified register script that I have built over the years in which I describe the registers for a particular IP. It works very similar to the RegField/node.regmap, except it obviously doesn't know about diplomacy and the like. It will generate the Verilog, but also a variety of files for DV (basic `defines for simple verilog simulations and more complex uvm_reg_block defines also with the ability to describe multiple of the IPs for a subsystem all the way up to an SoC level). It will also print out C Header files for SW and Sphinx reStructuredText for documentation.
Diplomacy actually solves one of the main issues I've been dealing with so I'm obviously trying to push most of my newer designs to Chisel/Diplo.
I ended up solving this by creating my own RegisterNode which is the same as the rocketchip RegisterNodes except that I use a different Elaboration Artifact to grab the info and store it for later.

Creating a serving graph separately from training in tensorflow for Google CloudML deployment?

I am trying to deploy a tf.keras image classification model to Google CloudML Engine. Do I have to include code to create serving graph separately from training to get it to serve my models in a web app? I already have my model in SavedModel format (saved_model.pb & variable files), so I'm not sure if I need to do this extra step to get it to work.
e.g. this is code directly from GCP Tensorflow Deploying models documentation
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in INPUT_COLUMNS:
inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
You are probably training your model with actual image files, while it is best to send images as encoded byte-string to a model hosted on CloudML. Therefore you'll need to specify a ServingInputReceiver function when exporting the model, as you mention. Some boilerplate code to do this for a Keras model:
# Convert keras model to TF estimator
tf_files_path = './tf'
estimator =\
tf.keras.estimator.model_to_estimator(keras_model=model,
model_dir=tf_files_path)
# Your serving input function will accept a string
# And decode it into an image
def serving_input_receiver_fn():
def prepare_image(image_str_tensor):
image = tf.image.decode_png(image_str_tensor,
channels=3)
return image # apply additional processing if necessary
# Ensure model is batchable
# https://stackoverflow.com/questions/52303403/
input_ph = tf.placeholder(tf.string, shape=[None])
images_tensor = tf.map_fn(
prepare_image, input_ph, back_prop=False, dtype=tf.float32)
return tf.estimator.export.ServingInputReceiver(
{model.input_names[0]: images_tensor},
{'image_bytes': input_ph})
# Export the estimator - deploy it to CloudML afterwards
export_path = './export'
estimator.export_savedmodel(
export_path,
serving_input_receiver_fn=serving_input_receiver_fn)
You can refer to this very helpful answer for a more complete reference and other options for exporting your model.
Edit: If this approach throws a ValueError: Couldn't find trained model at ./tf. error, you can try it the workaround solution that I documented in this answer.

Rivets.js adapter publish vs setting value

Rivets.js proposes to use the adapter.read and adapter.publish functions to get and set properties of a model while defining binders. I have not found an actual benefit of using read/publish when compared to the standard get/set methodology.
Excerpt from documentation:
adapter.read(model, keypath)
adapter.publish(model, keypath, value)
The source code for read and publish from v0.6.10
read: function(obj, keypath) {
return obj[keypath];
},
publish: function(obj, keypath, value) {
return obj[keypath] = value;
}
I wonder if anyone knows about the benefits that read and publish may offer?
I finally figured this out. The answer is as simple as abstracting the get and set functionalities from the binder. This has no real benefit if using rivets as is with the one and only dot (.) binder which it ships with. But this approach comes in very handy when one defines custom adapters.
A good example, like in my case, is when using the rivets-backbone adapter. The model passed to the binder could be a plain old java object or a backbone model. Reading and writing of properties on the object vary based on its type. By using the publish and read functions, this logic gets abstracted from the binders implementation.

Using toJSONPretty();

I am trying to work with JSON objects with DynamoDB and am having difficulty.
I'm trying to follow the tutorial:
http://aws.amazon.com/blogs/aws/dynamodb-update-json-and-more/
I wanted to use toJSONPretty(); on my object but the method is not recognized. I don't think I have the right gradle dependencies. I"m currently using:
compile 'com.amazonaws:aws-android-sdk-core:2.2.0'
compile 'com.amazonaws:aws-android-sdk-ddb:2.2.0'
compile 'com.amazonaws:aws-android-sdk-ddb-mapper:2.2.0'
Previously, my dynamo client was set up using the imports:
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient;
But looking at code from Dynamo/JSON tutorials, I see the import:
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
This seems to be required if you want to use the type DynamoDB as it:
DynamoDB dynamo = new DynamoDB(new AmazonDynamoDBClient(...));
I don't understand the difference between these libraries or how they relate to each other. Help!
The DynamoDB class is a higher-level abstraction of the AmazonDynamoDB API. It you create it with an instance of the AmazonDynamoDB inteface or a Regions object. AmazonDynamoDBClient implements the AmazonDynamoDB interface. Even if you pass a Regions object, an instance of a class that implements AmazonDynamoDB is created behind the scenes. Then, you get a Table so that you can perform data-plane operations like GetItem and PutItem. The Item class is one that has a toJSONPretty method. To sum up, the DynamoDB class uses an implementation of the AmazonDynamoDB interface behind the scenes to provide you with Items that you can call toJSONPretty() on.

'import' a cujojs/wire context into another

I'm looking for a way to realize the following use-case:
I have many modules and each one of them has a wire spec that
exposes its components
To assemble an application, I select the modules and use their wire-spec
The wire-spec of the application is the merge of wire-specs of used
modules: (3.1) I start by 'requiring' the wire-spec of each module
as objects. (3.2) Then, I merge the objects. (3.3) And, finally, I
return the result as the object defining the wire-spec of the
application.
Here is a sample of an application context-spec:
define(["jquery", "module1-wire-spec", "module2-wire-spec"], function(jquery, module1WireSpec, module2WireSpec) {
return jquery.extend(true, module1WireSpec, module2WireSpec);
});
I have read several times wire documentation hoping to find a 'native' way to do the above but I failed so far to find one.
A 'native' way would be a factory like the 'wire' factory but instead of creating a child-context for each module, I'm looking to see the components of each module as direct components of the application context.
Spring, for instance, allows importing a context definition into another one and the result is as if the content of the imported context has been inlined with the importing context.
A new feature has been added to cujojs/wire to allow import of contexts.
As of version 0.10.8, the keyword imports accepts:
a string for a single context import,
or an array for a list of contexts import.
Check here for more details.