Pyevolve - sum of chromosome = 1 - pyevolve

# Genome instance, 1D List of 20 elements
genome = G1DList.G1DList(20)
Sets the range max and min of the 1D List
genome.setParams(rangemin=0, rangemax=1)
Change the initializator to Real values
genome.initializator.set(Initializators.G1DListInitializatorReal)
This give 20 elements between 0 and 1. I need that the sum of all elements in the chromosome to be equal to 1 . Any idea how to do this?

I found the answer. I created my own Initializator which is a modified copy of G1DListInitializatorReal:
def G1DListInitializatorRealSumEqualOne(genome, **args):
""" Real initialization function of G1DList
This initializator accepts the *rangemin* and *rangemax* genome parameters.
"""
range_min = genome.getParam("rangemin", 0)
range_max = genome.getParam("rangemax", 100)
genome.genomeList = [random.uniform(range_min, range_max) for i in xrange(genome.getListSize())]
genome.genomeList[:] = [x / sum(genome.genomeList) for x in genome.genomeList]
And then changed the code in my question to:
# Genome instance, 1D List of 20 elements
genome = G1DList.G1DList(20)
Sets the range max and min of the 1D List
genome.setParams(rangemin=0, rangemax=1)
Change the initializator to Real values
genome.initializator.set(G1DListInitializatorRealSumEqualOne)

Related

The number of MiniBatchSize always 1 even when i change it in minibatchqueue parameters to 32? in MATLAB

I follow the example in Matlab for CNN with Multiple Outputs
Train Network with Multiple Outputs
https://www.mathworks.com/help/deeplearning/ug/train-network-with-multiple-outputs.html
But with a colour image as input data with two output as below:
Name Size Bytes Class Attributes
XTrain 128x128x3x2016 792723456 double
Name Size Bytes Class Attributes
yTrain1 1x2016 2230 categorical
Name Size Bytes Class Attributes
yTrain2 1x2016 16128 double
I define minibatchqueue object as below:
mbq = minibatchqueue(dsTrain,...
'MiniBatchSize',32,...
'MiniBatchFcn', #preprocessData,...
'MiniBatchFormat',{'SSCB','',''});
But the number of Batch size in output is always 1 as
mbq =
minibatchqueue with 3 outputs and properties:
Mini-batch creation:
MiniBatchSize: 1
PartialMiniBatch: 'return'
MiniBatchFcn: #preprocessDataJwan
DispatchInBackground: 0
Outputs:
OutputCast: {'single' 'single' 'single'}
OutputAsDlarray: [1 1 1]
MiniBatchFormat: {'SSCB' '' ''}
OutputEnvironment: {'auto' 'auto' 'auto'}
even I check the data size
[dlX,dlY1,dlY2] = next(mbq);
size(dlX)
size(dlY1)
size(dlY2)
ans =
128 128 3 2016
ans =
2 2016
ans =
1 2016
please, can anyone help me with that?

csv empty strings handling and values appending

With a csv of ~50 rows (stars) and ~30 columns (name, magnitudes and distance), that has some empty string values (''), I am trying to do two things in which all the help so far hasn't been useful. (1) I need to parse empty strings as 0.0, so I can (2) append each row in a list of lists (what I called s).
In other words:
- s is a list of stars (each one has all its parameters)
- d is a particular parameter for all the stars (distance), which I obtain correctly.
Big issue is with s. My try:
with open('stars.csv', 'r') as mycsv:
csv_stars = csv.reader(mycsv)
next(csv_stars) #skip header
stars = list(csv_stars)
s = [] # star
d = [] # distances
for row in stars:
row[row==''] = '0'
s.append(float(row)) #stars
d.append(arcsec*AU*float(row[30]))
I can't think of a better syntax, and so I get the error
s.append(float(row)) # stars
TypeError: float() argument must be a string or a number
From s I would obtain later the magnitudes for all the stars, separately. But first things first...
#cwasdwa Please look at below code. it will give you an idea. I am sure there might be better way. This solution is based on what I have understood from your code.
with open('stars.csv', 'r') as mycsv:
csv_stars = csv.reader(mycsv)
next(csv_stars) #skip header
stars = list(csv_stars)
s = [] # star
d = [] # distances
for row in stars:
newRow = [] #create new row array to convert all '' to 0.0
for x in row:
if x =='':
newRow.append(0.0)
else:
newRow.append(x)
s.append(newRow) #stars
if row[30] == '':
value = 0.0
else:
value = row[30]
d.append(arcsec*AU*float(value))

Calculating the average of a column in csv per hour

I have a csv file that contains data in the following format.
Layer relative_time Ht BSs Vge Temp Message
57986 2:52:46 0.00m 87 15.4 None CMSG
20729 0:23:02 45.06m 82 11.6 None BMSG
20729 0:44:17 45.06m 81 11.6 None AMSG
I want to get read in this csv file and calculate the average BSs for every hour. My csv file is quite huge about 2000 values. However the values are not evenly distributed across every hour. For e.g.
I have 237 samples from hour 3 and only 4 samples from hour 6. Also I should mention that the BSs can be collected from multiple sources.The value always ranges from 20-100. Because of this it is giving a skewed result. For each hour I am calculating the sum of BSs for that hour divided by the number of samples in that hour.
The primary purpose is to understand how BSs evolves over time.
But what is the common approach to this problem. Is this where people apply normalization? It would be great if someone could explain how to apply normalization in such a situation.
The code I am using for my processing is shown below. I believe the code below is correct.
#This 24x2 matrix will contain no of values recorded per hour per hour
hours_no_values = [[0 for i in range(24)] for j in range(2)]
#This 24x2 matrix will contain mean bss stats per hour
mean_bss_stats = [[0 for i in range(24)] for j in range(2)]
with open(PREFINAL_OUTPUT_FILE) as fin, open(FINAL_OUTPUT_FILE, "w",newline='') as f:
reader = csv.reader(fin, delimiter=",")
writer = csv.writer(f)
header = next(reader) # <--- Pop header out
writer.writerow([header[0],header[1],header[2],header[3],header[4],header[5],header[6]]) # <--- Write header
sortedlist = sorted(reader, key=lambda row: datetime.datetime.strptime(row[1],"%H:%M:%S"), reverse=True)
print(sortedlist)
for item in sortedlist:
rel_time = datetime.datetime.strptime(item[1], "%H:%M:%S")
if rel_time.hour not in hours_no_values[0]:
print('item[6] {}'.format(item[6]))
if 'MAN' in item[6]:
print('Hour found {}'.format(rel_time.hour))
hours_no_values[0][rel_time.hour] = rel_time.hour
mean_bss_stats[0][rel_time.hour] = rel_time.hour
mean_bss_stats[1][rel_time.hour] += int(item[3])
hours_no_values[1][rel_time.hour] +=1
else:
pass
else:
if 'MAN' in item[6]:
print('Hour Previous {}'.format(rel_time.hour))
mean_bss_stats[1][rel_time.hour] += int(item[3])
hours_no_values[1][rel_time.hour] +=1
else:
pass
for i in range(0,24):
if(hours_no_values[1][i] != 0):
mean_bss_stats[1][i] = mean_bss_stats[1][i]/hours_no_values[1][i]
else:
mean_bss_stats[1][i] = 0
pprint.pprint('mean bss stats {} \n hour_no_values {} \n'.format(mean_bss_stats,hours_no_values))
The number of value per each hour are as follows for hours starting from 0 to 23.
[31, 117, 85, 237, 3, 67, 11, 4, 57, 0, 5, 21, 2, 5, 10, 8, 29, 7, 14, 3, 1, 1, 0, 0]
You could do it with pandas using groupby and aggregate to appropriate column:
import pandas as pd
import numpy as np
df = pd.read_csv("your_file")
df.groupby('hour')['BSs'].aggregate(np.mean)
If you don't have that column in initial dataframe you could add it:
df['hour'] = your_hour_data
numpy.mean - calculates the mean of the array.
Compute the arithmetic mean along the specified axis.
pandas.groupby
Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns
From pandas docs:
By “group by” we are referring to a process involving one or more of the following steps
Splitting the data into groups based on some criteria
Applying a function to each group independently
Combining the results into a data structure
Aggregation: computing a summary statistic (or statistics) about each group.
Some examples:
Compute group sums or means
Compute group sizes / counts

Egg dropping in worst case

I have been trying to write an algorithm to compute the maximum number or trials required in worst case, in the egg dropping problem. Here is my python code
def eggDrop(n,k):
eggFloor=[ [0 for i in range(k+1) ] ]* (n+1)
for i in range(1, n+1):
eggFloor[i][1] = 1
eggFloor[i][0] = 0
for j in range(1, k+1):
eggFloor[1][j] = j
for i in range (2, n+1):
for j in range (2, k+1):
eggFloor[i][j] = 'infinity'
for x in range (1, j + 1):
res = 1 + max(eggFloor[i-1][x-1], eggFloor[i][j-x])
if res < eggFloor[i][j]:
eggFloor[i][j] = res
return eggFloor[n][k]print eggDrop(2, 100)
```
The code is outputting a value of 7 for 2eggs and 100floors, but the answer should be 14, i don't know what mistake i have made in the code. What is the problem?
The problem is in this line:
eggFloor=[ [0 for i in range(k+1) ] ]* (n+1)
You want this to create a list containing (n+1) lists of (k+1) zeroes. What the * (n+1) does is slightly different - it creates a list containing (n+1) copies of the same list.
This is an important distinction - because when you start modifying entries in the list - say,
eggFloor[i][1] = 1
this actually changes element [1] of all of the lists, not just the ith one.
To instead create separate lists that can be modified independently, you want something like:
eggFloor=[ [0 for i in range(k+1) ] for j in range(n+1) ]
With this modification, the program returns 14 as expected.
(To debug this, it might have been a good idea to write out a function to pring out the eggFloor array, and display it at various points in your program, so you can compare it with what you were expecting. It would soon become pretty clear what was going on!)

What's the correct way to expand a [0,1] interval to [a,b]?

Many random-number generators return floating numbers between 0 and 1.
What's the best and correct way to get integers between a and b?
Divide the interval [0,1] in B-A+1 bins
Example A=2, B=5
[----+----+----+----]
0 1/4 1/2 3/4 1
Maps to 2 3 4 5
The problem with the formula
Int (Rnd() * (B-A+1)) + A
is that your Rnd() generation interval is closed on both sides, thus the 0 and the 1 are both possible outputs and the formula gives 6 when the Rnd() is exactly 1.
In a real random distribution (not pseudo), the 1 has probability zero. I think it is safe enough to program something like:
r=Rnd()
if r equal 1
MyInt = B
else
MyInt = Int(r * (B-A+1)) + A
endif
Edit
Just a quick test in Mathematica:
Define our function:
f[a_, b_] := If[(r = RandomReal[]) == 1, b, IntegerPart[r (b - a + 1)] + a]
Build a table with 3 10^5 numbers in [1,100]:
table = SortBy[Tally[Table[f[1, 100], {300000}]], First]
Check minimum and maximum:
In[137]:= {Max[First /# table], Min[First /# table]}
Out[137]= {100, 1}
Lets see the distribution:
BarChart[Last /# SortBy[Tally[Table[f[1, 100], {300000}]], First],
ChartStyle -> "DarkRainbow"]
X = (Rand() * (B - A)) + A
Another way to look at it, where r is your random number in the range 0 to 1:
(1-r)a + rb
As for your additional requirement of the result being an integer, maybe (apart from using built in casting) the modulus operator can help you out. Check out this question and the answer:
Expand a random range from 1–5 to 1–7
Well, why not just look at how Python does it itself? Read random.py in your installation's lib directory.
After gutting it to only support the behavior of random.randint() (which is what you want) and removing all error checks for non-integer or out-of-bounds arguments, you get:
import random
def randint(start, stop):
width = stop+1 - start
return start + int(random.random()*width)
Testing:
>>> l = []
>>> for i in range(2000000):
... l.append(randint(3,6))
...
>>> l.count(3)
499593
>>> l.count(4)
499359
>>> l.count(5)
501432
>>> l.count(6)
499616
>>>
Assuming r_a_b is the desired random number between a and b and r_0_1 is a random number between 0 and 1 the following should work just fine:
r_a_b = (r_0_1 * (b-a)) + a