DolphinDB random sample functions - function

Can I sample an array a = [1, 2, 3, 4] based on the specified probabilities p = [0.1, 0.1, 0.3, 0.5]?
For example, in python I can use np.random.choice(a=[1, 2, 3, 4], size=100, p=[0.1, 0.1, 0.3, 0.5])

For me I will form a new random choice data list by percentages/probabilities, for example do random choice on [1, 2, 3, 3, 3, 4, 4, 4, 4, 4] will be equivalent with your data and probabilities.
You can use take (https://www.dolphindb.com/help/FunctionsandCommands/FunctionReferences/t/take.html) function for helping the data forming:
v = take(1, 1) <- take(2, 1) <- take(3, 3) <- take(4, 5)
rand(v, 100)

Related

How can I concatenate pytorch tensors or lists in a distributed multi-node setup?

I am trying to implement something like this for 2 nodes (each node with 2 GPUs):
#### Parallel process initiated with torch.distributed.init_process_group()
### All GPUs work in parallel, and generate lists like :
[20, 0, 1, 17] for GPU0 of node A
[1, 2, 3, 4] for GPU1 of node A
[5, 6, 7, 8] for GPU0 of node B
[0, 2, 4, 6] for GPU1 of node B
I tried
torch.distributed.reduce()
to get a sum of these 4:
[26, 10, 15, 35]
But what I want is a concatenated version like this
[[20, 0, 1, 17], [1, 2, 3, 4] , [5, 6, 7, 8] , [0, 2, 4, 6]]
Or
[20, 0, 1, 17, 1, 2, 3, 4, 5, 6, 7, 8, 0, 2, 4, 6]
is also OK with me.
Is it possible to achieve this from torch.distributed?
You can use dist.all_gather to do this:
import torch
import torch.distributed as dist
q = torch.tensor([20, 0, 1, 17]) # generated on each gpu (with different values) as you mentioned
all_q = [torch.zeros_like(q) for _ in range(world_size)] # world_size is the total number of gpu processes you are running. 4 in your case.
all_q = dist.all_gather(all_q, q)
all_q would then have the following:
[torch.tensor([20, 0, 1, 17]), torch.tensor([1, 2, 3, 4]), torch.tensor([5, 6, 7, 8]), torch.tensor([0, 2, 4, 6])]
You can then use torch.cat to collapse all elements into one array if you like.
You can use dist.all_gather_multigpu if you list of lists of tensors.

How to prevent the initial pytorch variable from changing using a function?

I want to apply a function to the variable x and saved as y. But why the x is also changed? How to prevent it?
import torch
def minus_min(raw):
for col_i in range(len(raw[0])):
new=raw
new[:,col_i] = (raw[:,col_i] - raw[:,col_i].min())
return new
x=torch.tensor([[0,1,2,3,4],
[2,3,4,0,8],
[0,1,2,3,4]])
y=minus_min(x)
print(y)
print(x)
output:
tensor([[0, 0, 0, 3, 0],
[2, 2, 2, 0, 4],
[0, 0, 0, 3, 0]])
tensor([[0, 0, 0, 3, 0],
[2, 2, 2, 0, 4],
[0, 0, 0, 3, 0]])
Because this assignment:
new[:,col_i] = (raw[:,col_i] - raw[:,col_i].min())
is an in-place operation. Therefore, x and y will share the underlying .data.
The smallest change that would solve this issue would be to make a copy of x inside the function:
def minus_min(raw):
new = raw.clone() # <--- here
for col_i in range(len(raw[0])):
new[:,col_i] = raw[:,col_i] - raw[:,col_i].min()
return new
If you want, you can simplify your function (and remove the for loop):
y = x - x.min(dim=0).values

Bokeh plot tag rendering issue

I am using the embed .html example given on the bokeh site: http://docs.bokeh.org/en/latest/docs/user_guide/embed.html. Note I am using bokeh 12.3. The plots are displaying fine but the text is rendering as the exact output from the script function - including '{' and '\n' characters.
scatter function:
from bokeh.plotting import figure
from bokeh.models import Range1d
from bokeh.embed import components
def scatter():
# create some data
x1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y1 = [0, 8, 2, 4, 6, 9, 5, 6, 25, 28, 4, 7]
x2 = [2, 5, 7, 15, 18, 19, 25, 28, 9, 10, 4]
y2 = [2, 4, 6, 9, 15, 18, 0, 8, 2, 25, 28]
x3 = [0, 1, 0, 8, 2, 4, 6, 9, 7, 8, 9]
y3 = [0, 8, 4, 6, 9, 15, 18, 19, 19, 25, 28]
# select the tools we want
TOOLS="pan,wheel_zoom,box_zoom,reset,save"
# the red and blue graphs will share this data range
xr1 = Range1d(start=0, end=30)
yr1 = Range1d(start=0, end=30)
# only the green will use this data range
xr2 = Range1d(start=0, end=30)
yr2 = Range1d(start=0, end=30)
# build our figures
p1 = figure(x_range=xr1, y_range=yr1, tools=TOOLS, plot_width=300, plot_height=300)
p1.scatter(x1, y1, size=12, color="red", alpha=0.5)
p2 = figure(x_range=xr1, y_range=yr1, tools=TOOLS, plot_width=300, plot_height=300)
p2.scatter(x2, y2, size=12, color="blue", alpha=0.5)
p3 = figure(x_range=xr2, y_range=yr2, tools=TOOLS, plot_width=300, plot_height=300)
p3.scatter(x3, y3, size=12, color="green", alpha=0.5)
# plots can be a single Bokeh Model, a list/tuple, or even a dictionary
plots = {'Red': p1, 'Blue': p2, 'Green': p3}
script, div = components(plots)
return script, div
My flask code is:
script, div = scatter()
return self.render_template('bokeh_example.html', script=script, div=div)
bokeh_example.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="http://cdn.bokeh.org/bokeh/release/bokeh-0.12.3.min.css" type="text/css" />
<script type="text/javascript" src="http://cdn.bokeh*emphasized text*.org/bokeh/release/bokeh-0.12.3.min.js"></script>
{{ script | safe }}
</head>
<body>
<div class='bokeh'>
<h1>Scatter Example</h1>
{{ div | safe }}
</div>
</body>
</html>
The plots display fine but the div text renders as literals:
{'Red': '\n #this text displays instead of just the string 'Red'
\n #this displays on next line in smaller font
#plot displays fine here
\n #this text displays after the plot instead of creating a blank line.
Any clues?
You are passing a dictionary of plots to components:
plots = {'Red': p1, 'Blue': p2, 'Green': p3}
script, div = components(plots)
return script, div
This means (per the documentation) that the result is not a single script and a single div. Rather, it's a single script and a dictionary mapping your original names to multiple divs:
components({"Red": p1, "Blue": p2, "Green": p3})
#=> (script, {"Red": p1_div, "Blue": p2_div, "Green": p3_div})
Right now you are trying to template the dict itself into your HTML. Presumably Jinja just calls str on the dict to turn it into a string, and the browser doesn't know what to do with that. You need to template each one of the divs in the dict returned by components, individually.
For a suitably updated template, that might look like:
script, divs = scatter() # notice plural: divS
return self.render_template(
'bokeh_example.html',
script=script,
div_red=divs['Red'],
div_blue=divs['Blue'],
div_green=divs['Green'],
)
Or alternatively you might update the template to iterate over divs directly using some of Jinja2's capabilities for iterating over template arguments that are collections.

How do I make a function that will allow me to store values in a list?

I need to create a python function, which will recursively store fibonacci values in a list, and then return that list to me. Then, I can print that list. Here is what I have.
def recFib(x):
result = []
if x == 1:
return result.append(1)
if x == 2:
return result.append(2)
for i in range(2,x):
result.append(recFib(i-1)+recFib(i-2))
return result
I am new to Python, so alot of concepts are new to me and I seem to be unable to figure out what I'm doing wrong.
Here is a solution based on the #Akavall's anwer
def recFib(x):
fibArray = [0, 1]
def fib(x):
if x < len(fibArray):
return fibArray[x]
temp = fib(x - 1) + fib(x - 2)
fibArray.append(temp)
return temp
return fib(x)
There a few issues with you code, for example:
return result.append(1)
will return None, not result with 1 appended to it.
Also,
def recFib(x):
result = []
Will set result to [] every time recFib is called, I don't think that's what you wanted.
And I am not sure that I understand your entire logic, sorry.
A possible solution is something like this:
def recFib(x):
n_to_fib = {0: 0, 1: 1}
def fib(x):
if x in n_to_fib:
return n_to_fib[x]
temp = fib(x - 1) + fib(x - 2)
n_to_fib[x] = temp
return temp
return fib(x)
if __name__ == "__main__":
result = [recFib(i) for i in range(8)]
print result
Result:
[0, 1, 1, 2, 3, 5, 8, 13]
And calculating new values is fast because the previous values are stored in n_to_fib dict and thus don't have to be recomputed.
We can also modify recFib to return the list directly:
fib(x)
return sorted(n_to_fib.values())
instead of:
return fib(x)
Try this:
def recFib(n):
if n == 1:
return [0]
elif n == 2:
return [0, 1]
else:
a = recFib(n-1)
return a + [a[-1] + a[-2]]
The key in the program is here:
return a + [a[-1] + b[-1]]
a # This gets the most recently created list, since it's `recFib(n-1)`
+ # Append to the next list (but does not return `None`)
[ a[-1] # Last value of `recFib(n-1)`, which is the previous value.
+ # Add to
a[-2]] # Second last value of `recFib(n-1)`, which is the second previous value.
Here are the breakpoints for recFib(5):
n = 3
a[-1]: 1
a[-2]: 0
n = 4
a[-1]: 1
a[-2]: 1
n = 5
a[-1]: 2
a[-2]: 1
[0, 1, 1, 2, 3]
It doesn't require two functions. I sort of find that as cheating:
>>> recFib(20)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
>>> recFib(10)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
>>> recFib(19)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584]
Generating a list by for looping a recursive function is very inefficient. You are recalculating the whole sequence every time you use the function.

Keras ImageDataGenerator not working as expected

I'm trying to build an autoencoder using Keras, based on [this example][1] from the docs. Because my data is large, I'd like to use a generator to avoid loading it into memory.
My model looks like:
model = Sequential()
model.add(Convolution2D(16, 3, 3, activation='relu', border_mode='same', input_shape=(3, 256, 256)))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(MaxPooling2D((2, 2), border_mode='same'))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(8, 3, 3, activation='relu', border_mode='same'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(16, 3, 3, activation='relu'))
model.add(UpSampling2D((2, 2)))
model.add(Convolution2D(1, 3, 3, activation='sigmoid', border_mode='same'))
model.compile(optimizer='adadelta', loss='binary_crossentropy')
My generator:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('IMAGE DIRECTORY', color_mode='rgb', class_mode='binary', batch_size=32, target_size=(256, 256))
And then fitting the model:
model.fit_generator(
train_generator,
samples_per_epoch=1,
nb_epoch=1,
verbose=1,
)
I'm getting this error:
Exception: Error when checking model target: expected convolution2d_76 to have 4 dimensions, but got array with shape (32, 1)
That looks like the size of my batch rather than a sample. What am I doing wrong?
The error is most likely due to the class_mode='binary'. It makes the generator produce binary classes, so the output has shape (batch_size, 1), while your model produces a four dimensional output (since the last layer is a convolution).
I guess that you want your label to be the image itself. Based on the source of the flow_from_directory and the DirectoryIterator it uses, it is impossible to do by just changing the class_mode. A possible solution would be along the lines of:
train_generator_ = train_datagen.flow_from_directory('IMAGE DIRECTORY', color_mode='rgb', class_mode=None, batch_size=32, target_size=(256, 256))
def train_generator():
for x in train_iterator_:
yield x, x
Note that I set class_mode to None. It makes the generator to return just the image instead of tuple(image, label). I then define a new generator, that returns the image as both the input and the label.