How to savely clone a pytorch module? Is creating a new one faster? #Pytorch - deep-learning

I need to repeatly create some modules, which are completely same.
In the following code, I fill two same lists with N same modules.
MIM_N_cell = []
MIM_S_cell = []
for i in range(self.num_layers - 1):
new_MIM_N_cell = MIM_NS_cell(input_dim=self.hidden_dim,
hidden_dim=self.hidden_dim,
kernel_size=self.kernel_size,
model_cfg=model_cfg)
new_MIM_S_cell = MIM_NS_cell(input_dim=self.hidden_dim,
hidden_dim=self.hidden_dim,
kernel_size=self.kernel_size,
model_cfg=model_cfg)
MIM_N_cell.append(new_MIM_N_cell)
MIM_S_cell.append(new_MIM_S_cell)
self.MIM_N_cell = nn.ModuleList(MIM_N_cell)
self.MIM_S_cell = nn.ModuleList(MIM_S_cell)
Should I use .clone() instead of creating a new module each time?
I guess cloning may be faster but I am not aware of the side affects of it. If it worth to clone, how to use it safely (i.e. don't change the training/testing behaviour of the network)

Related

How to add to / amend / consolidate JRuby Profiler data?

Say I have inside my JRuby program the following loop:
loop do
x=foo()
break if x
bar()
end
and I want to collect profiling information just for the invocations of bar. How to do this? I got so far:
pd = []
loop do
x=foo()
break if x
pd << JRuby::Profiler.profile { bar() }
end
This leaves me with an array pd of profile data objects, one for each invocation of bar. Is there a way to create a "summary" data object, by combining all the pd elements? Or even better, have a single object, where profile would just add to the existing profiling information?
I googled for a documentation of the JRuby::Profiler API, but couldn't find anything except a few simple examples, none of them covering my case.
UPDATE : Here is another attempt I tried, which does not work either.
Since the profile method initially clears the profile data inside the Profiler, I tried to separate the profiling steps from the data initializing steps, like this:
JRuby::Profiler.clear
loop do
x=foo()
break if x
JRuby::Profiler.send(:current_thread_context).start_profiling
bar()
JRuby::Profiler.send(:current_thread_context).stop_profiling
end
profile_data = JRuby::Profiler.send(:profile_data)
This seems to work at first, but after investigation, I found that profile_data then contains the profiling information from the last (most recent) execution of bar, not of all executions collected together.
I figured out a solution, though I have the feeling that I'm using a ton of undocumented features to get it working. I also must add that I am using (1.7.27), so later JRuby versions might or might not need a different approach.
The problem with profiling is that start_profiling (corresponding to the Java method startProfiling in the class Java::OrgJrubyRuntime::ThreadContext) not only turns on the profiling flag, but also allocates a fresh ProfileData object. What we want to do, is to reuse the old object. stop_profiling OTOH only toggles the profiling switch and is uncritical.
Unfortunately, ThreadContext does not provide a method to manipulate the isProfiling toggle, so as a first step, we have to add one:
class Java::OrgJrubyRuntime::ThreadContext
field_writer :isProfiling
end
With this, we can set/reset the internal isProfiling switch. Now my loop becomes:
context = JRuby::Profiler.send(:current_thread_context)
JRuby::Profiler.clear
profile_data_is_allocated = nil
loop do
x=foo()
break if x
# The first time, we allocate the profile data
profile_data_is_allocated ||= context.start_profiling
context.isProfiling = true
bar()
context.isProfiling = false
end
profile_data = JRuby::Profiler.send(:profile_data)
In this solution, I tried to keep as close as possible to the capabilities of the JRuby::Profiler class, but we see, that the only public method still used is the clear method. Basically, I have reimplemented profiling in terms of the ThreadContext class; so if someone comes up with a better way to solve it, I will highly appreciate it.

Splitting a feature collection by system index in Google Earth Engine?

I am trying to export a large feature collection from GEE. I realize that the Python API allows for this more easily than the Java does, but given a time constraint on my research, I'd like to see if I can extract the feature collection in pieces and then append the separate CSV files once exported.
I tried to use a filtering function to perform the task, one that I've seen used before with image collections. Here is a mini example of what I am trying to do
Given a feature collection of 10 spatial points called "points" I tried to create a new feature collection that includes only the first five points:
var points_chunk1 = points.filter(ee.Filter.rangeContains('system:index', 0, 5));
When I execute this function, I receive the following error: "An internal server error has occurred"
I am not sure why this code is not executing as expected. If you know more than I do about this issue, please advise on alternative approaches to splitting my sample, or on where the error in my code lurks.
Many thanks!
system:index is actually ID given by GEE for the feature and it's not supposed to be used like index in an array. I think JS should be enough to export a large featurecollection but there is a way to do what you want to do without relying on system:index as that might not be consistent.
First, it would be a good idea to know the number of features you are dealing with. This is because generally when you use size().getInfo() for large feature collections, the UI can freeze and sometimes the tab becomes unresponsive. Here I have defined chunks and collectionSize. It should be defined in client side as we want to do Export within the loop which is not possible in server size loops. Within the loop, you can simply creating a subset of feature starting from different points by converting the features to list and changing the subset back to feature collection.
var chunk = 1000;
var collectionSize = 10000
for (var i = 0; i<collectionSize;i=i+chunk){
var subset = ee.FeatureCollection(fc.toList(chunk, i));
Export.table.toAsset(subset, "description", "/asset/id")
}

How do I package all of my binaries in bazel?

There is a plethora of BUILD files scattered throughout the hierarchy of my mono repo.
Some of these files contain cc_binary rules.
I know they are all built into bazel-bin, but I'd like to get easy access to them all.
How can I package them all up, and put them all into ~/.bin/ for example?
I see the packaging rules, but its not clear to me how to write a rule that captures every single program and packages them together.
It may not be the most elegant solution (plus I hope I got the question), but this is how we do it by packaging/"tarring" each binary in its own bazel package / BUILD file:
cc_binary(
name = "hello"
...
)
load("#bazel_tools//tools/build_defs/pkg:pkg.bzl", "pkg_tar")
pkg_tar(
name = "hello_pkg",
srcs = [":hello"],
mode = "0755",
package_dir = "/usr/bin",
)
And then we'd collect all those into a one overall tarball/package in project root:
pkg_tar(
name = "mypkg",
extension = "tar.gz",
deps = [
"//hello:hello_pkg",
...
],
)
Sometimes we'd actually have multiple such rules for hello to collect for instance executables under bin and libraries in lib with intermediary hello_bin and hello_lib targets. Which would in the same fashion as mypkg above be first aggregated into hello_pkg and that in turn would be used in mypkg.

How a linked list is practically possible in AS3.0

I stumbled across this page : Create Linked List in AS3
Since AS3.0 is a scripting language. I wonder how come linked list is possible in AS3.0 ? Isn't pointer ( a way to access memory location ) mandatory to create a linked list. That ultimately makes array of data faster in performance ?
In AS3 you have object references, you don't have pointers exactly, but you can achieve a linked list using the references in a very similar way. The advantage of a Linked List (in general) is in insertion and deletion within the list (you don't have to shift all elements as in using an array). You still get this benefit using object references.
Note: Objects in AS3 are passed by reference, primitives are passed by value.
Effectively all scripting languages do work with pointers.
They only decided to call them differently (most times they call them "references") and to hide the complexity (or even possibilities) of managing the memory allocation and releasing.
Having said that the simplest way to create a linked list in ActionScript (or JavaScript) would be
var node1 = {value: 1};
var node2 = {value: "foo"};
var node3 = {value: "bar"};
//of course this code should be localices within a separate class
//with some nice API
((node1.next = node2).next = node3).next = null;
//and then use like that e.g.
var n = node1;
while (n) {
trace(n.value);
n = n.next;
}

How can I force Linq to SQL NOT to use the cache?

When I make the same query twice, the second time it does not return new rows form the database (I guess it just uses the cache).
This is a Windows Form application, where I create the dataContext when the application starts.
How can I force Linq to SQL not to use the cache?
Here is a sample function where I have the problem:
public IEnumerable<Orders> NewOrders()
{
return from order in dataContext.Orders
where order.Status == 1
select order;
}
The simplest way would be to use a new DataContext - given that most of what the context gives you is caching and identity management, it really sounds like you just want a new context. Why did you want to create just the one and then hold onto it?
By the way, for simple queries like yours it's more readable (IMO) to use "normal" C# with extension methods rather than query expressions:
public IEnumerable<Orders> NewOrders()
{
return dataContext.Orders.Where(order => order.Status == 1);
}
EDIT: If you never want it to track changes, then set ObjectTrackingEnabled to false before you do anything. However, this will severely limit it's usefulness. You can't just flip the switch back and forward (having made queries between). Changing your design to avoid the singleton context would be much better, IMO.
It can matter HOW you add an object to the DataContext as to whether or not it will be included in future queries.
Will NOT add the new InventoryTransaction to future in memory queries
In this example I'm adding an object with an ID and then adding it to the context.
var transaction = new InventoryTransaction()
{
AdjustmentDate = currentTime,
QtyAdjustment = 5,
InventoryProductId = inventoryProductId
};
dbContext.InventoryTransactions.Add(transaction);
dbContext.SubmitChanges();
Linq-to-SQL isn't clever enough to see this as needing to be added to the previously cached list of in memory items in InventoryTransactions.
WILL add the new InventoryTransaction to future in memory queries
var transaction = new InventoryTransaction()
{
AdjustmentDate = currentTime,
QtyAdjustment = 5
};
inventoryProduct.InventoryTransactions.Add(transaction);
dbContext.SubmitChanges();
Wherever possible use the collections in Linq-to-SQL when creating relationships and not the IDs.
In addition as Jon says, try to minimize the scope of a DataContext as much as possible.