Adding the results of multiple functions - function

Using python 3.3
Stumbled upon another problem with my program. Its the same solar program. Again i decided to add more functionality. Its basically ad-hoc. I'm adding things as i go along. I realize it can be made more efficient but once i decide its done, I'll post the whole coding up.
Anyway, i need to add the results from multiple functions. Here's a part of my coding:
def janCalc():
for a in angle(0,360,10): #angle of orientation
for d in days(1,32,1.0006630137): #day number of year
for smodule in equation(): #equation() function not shown in this coding
total_jan+=smodule #total_jan is already defined elsewhere
avg_jan=total_jan/(60*(1.0006630137*31))
ratio_jan=avg_jan/5.67
calcJan=(ratio_jan*4.79)
yield calcJan
total_jan=0 #necessary to reset total to 0 for next angle interval
def febCalc():
for a in angle(0,360,10):
for d in days ((1.0006630137*31),61,1.0006630137):
for smodule in equation():
total_feb+=smodule
avg_feb=total_feb/(60*(1.0006630137*28))
ratio_feb=avg_feb/6.56
calcFeb=(ratio_feb*4.96)
yield calcFeb
total_feb=0
#etc..............
Is there anyway to add the yield of each function?
for e.g: calcJan+calcFeb+.....
I would like to get the total results under each angle interval and then dividing by 12 to get the average value per interval. Like so:-
0 degrees---->total/12
10 deg ---->total/12
20 deg ---->total/12
30 deg ---->total/12
........
360 deg ---->total/12
If you need more info, let me know.
ADDENDUM
The solution was essentially solved by #jonrsharpe. But i encountered a bit of a problem.
Traceback (most recent call last):
File "C:\Users\User\Documents\Python\Solar program final.py", line 247, in <module>
output=[sum(vals)/12 for vals in zip(*(gen() for gen in months))]
File "C:\Users\User\Documents\Python\Solar program final.py", line 247, in <listcomp>
output=[sum(vals)/12 for vals in zip(*(gen() for gen in months))]
File "C:\Users\User\Documents\Python\Solar program final.py", line 103, in janCalc
for smodule in equation():
File "C:\Users\User\Documents\Python\Solar program final.py", line 63, in equation
d=math.asin(math.sin(math.radians(23.45))*math.sin(math.radians((360/365.242)*(d-81))))
NameError: global name 'd' is not defined
I've isolated it to:
for d in days ((1.0006630137*31),61,1.0006630137):
for smodule in equation():
It turns out i can't reference a function from inside a function? I'm not too sure. So even my original coding did not work. I assumed it was working because previously i had not defined each month as a function. I should have tested it out first.
Do you know how to get around this?

A simple example to demonstrate how to combine multiple generators:
>>> def gen1():
for x in range(5):
yield x
>>> def gen2():
for x in range(5, 10):
yield x
>>> [sum(vals) for vals in zip(*(gen() for gen in (gen1, gen2)))]
[5, 7, 9, 11, 13]
Or, written out long hand:
output = list(gen1())
for index, value in enumerate(gen2()):
output[index] += value
You can modify either version to include a division, too, so your case would look something like:
months = [janCalc, fabCalc, ...]
output = [sum(vals) / 12 for vals i zip(*(gen() for gen in months))]

Related

Use of function / return

I had the task to code the following:
Take a list of integers and returns the value of these numbers added up, but only if they are odd.
Example input: [1,5,3,2]
Output: 9
I did the code below and it worked perfectly.
numbers = [1,5,3,2]
print(numbers)
add_up_the_odds = []
for number in numbers:
if number % 2 == 1:
add_up_the_odds.append(number)
print(add_up_the_odds)
print(sum(add_up_the_odds))
Then I tried to re-code it using function definition / return:
def add_up_the_odds(numbers):
odds = []
for number in range(1,len(numbers)):
if number % 2 == 1:
odds.append(number)
return odds
numbers = [1,5,3,2]
print (sum(odds))
But I couldn’t make it working, anybody can help with that?
Note: I'm going to assume Python 3.x
It looks like you're defining your function, but never calling it.
When the interpreter finishes going through your function definition, the function is now there for you to use - but it never actually executes until you tell it to.
Between the last two lines in your code, you need to call add_up_the_odds() on your numbers array, and assign the result to the odds variable.
i.e. odds = add_up_the_odds(numbers)

Columns of Data Frame are Being Swapped: Why is my loop switching the column values when I identify and assign the columns by name?

I need help with the specific code I will paste below. I am using the Ames Housing data set collected by Dean De Cock.
I am using a Python notebook and editing thru Anaconda's Jupyter Lab 2.1.5.
The code below is supposed to replace all np.nan or "None" values. For some reason,
after repeatedly calling a hand-made function inside a for loop, the columns of the resulting data frame get swapped around.
Note: I am aware I could do this with an "imputer." I plan to select numeric and object type features, impute them separately then put them back together. As a side-note, is there any way I can do that while having the details I output manually using text displayed or otherwise verified?
In the cell in question, the flow is:
Get and assign the number of data points in the data frame df_train.
Get and assign a series that lists the count of null values in df_train. The syntax is sr_null_counts = df_train.isnull().sum().
Create an empty list to which names of features that have 5% of their values equal to null are appended. They will be dropped later,
outside the for loop. I thought at first that this was the problem since the command to drop the columns of df_train in-place
used to be within the for-loop.
Repeatedly call a hand-made function to impute columns with null values not exceeding 5% of the row count for df_train.
I used a function that has a for-loop and nested try-except statements to:
Accept a series and, optionally, the series' name when it was a column in a dataframe. It assigns a copy of the passed series
to a local variable.
In the exact order, (a) try to replace all null (NaN or None) values with the mean of the passed series.
(b) If that fails, try to replace all null values with the median of the series.
(c) If even that fails, replace all null values with the mode of the series.
Return the edited copy of the series with all null values replaced. It should also print out strings that tell me what feature
was modified and what summary statistic was used to replace/impute the missing values.
The final line is to drop all the columns marked as having more than 5% missing values.
Here is the full code:
Splitting the main dataframe into a train and test set.
The full data-set was loaded thru df_housing = pd.read_csv(sep = '\t', filepath_or_buffer = "AmesHousing.tsv").
def make_traintest(df, train_fraction = 0.7, random_state_val = 88):
df = df.copy()
df_train = df.sample(frac = train_fraction, random_state = random_state_val)
bmask_istrain = df.index.isin(df_train.index.values)
df_test = df.loc[ ~bmask_istrain ]
return {
"train":df_train,
"test":df_test
}
dict_traintest = make_traintest(df = df_housing)
df_train = dict_traintest["train"]
df_test = dict_traintest["test"]
Get a List of Columns With Null Values
lst_have_nulls = []
for feature in df_housing.columns.values.tolist():
nullcount = df_housing[feature].isnull().sum()
if nullcount > 0:
lst_have_nulls.append(feature)
print(feature, "\n=====\nNull Count:\t", nullcount, '\n', df_housing[feature].value_counts(dropna = False),'\n*****')
Definition of the hand-made function:
def impute_series(sr_values, feature_name = ''):
sr_out = sr_values.copy()
try:
sr_out.fillna(value = sr_values.mean())
print("Feature", feature_name, "imputed with mean:", sr_values.mean())
except Exception as e:
print("Filling NaN values with mean of feature", feature_name, "caused an error:\n", e)
try:
sr_out.fillna(value = sr_values.median())
print("Feature", feature_name, "imputed with median:", sr_values.median())
except Exception as e:
print("Filling NaN values with median for feature", feature_name, "caused an error:\n", e)
sr_out.fillna(value = sr_values.mode())
print("Feature", feature_name, "imputed with mode:", sr_values.mode())
return sr_out
For-Loop
Getting the count of null values, defining the empty list of columns to drop to allow appending, and repeatedly
doing the following: For every column in lst_have_nulls, check if the column has equal, less or more than 5% missing values.
If more, append the column to lst_drop. Else, call the hand-made imputing function. After the for-loop, drop all columns in
lst_drop, in-place.
Where did I go wrong? In case you need the entire notebook, I have uploaded it to Kaggle. Here is a link.
https://www.kaggle.com/joachimrives/ames-housing-public-problem
Update: Problem Still Exists After Testing Anvar's Answer with Changes
When I tried the code of Anvar Kurmukov, my dataframe column values still got swapped. The change I made was adding int and float to the list of dtypes to check. The changes are inside the for-loop:
if dtype in [np.int64, np.float64, int, float].
It may be a problem with another part of my code in the full notebook. I will need to check where it is by calling df_train.info() cell by cell from the top. I tested the code in the notebook I made public. It is in cell 128. For some reason, after running Anvar's code, the df_train.info() method returned this:
1st Flr SF 2nd Flr SF 3Ssn Porch Alley Bedroom AbvGr Bldg Type Bsmt Cond Bsmt Exposure Bsmt Full Bath Bsmt Half Bath ... Roof Style SalePrice Screen Porch Street TotRms AbvGrd Total Bsmt SF Utilities Wood Deck SF Year Built Year Remod/Add
1222 1223 534453140 70 RL 50.0 4882 Pave NaN IR1 Bnk ... 0 0 0 0 0 NaN NaN NaN 0 87000
1642 1643 527256040 20 RL 81.0 13870 Pave NaN IR1 HLS ... 52 0 0 174 0 NaN NaN NaN 0 455000
1408 1409 905427050 50 RL 66.0 21780 Pave NaN Reg Lvl ... 36 0 0 144 0 NaN NaN NaN 0 185000
1729 1730 528218050 60 RL 65.0 10237 Pave NaN Reg Lvl ... 72 0 0 0 0 NaN NaN NaN 0 178900
1069 1070 528180110 120 RL 58.0 10110 Pave NaN IR1 Lvl ... 48 0 0 0 0 NaN NaN NaN 0 336860
tl;dr instead of try: except you should simply use if and check dtype of the column; you do not need to iterate over columns.
drop_columns = df.columns[df.isna().sum() / df.shape[0] > 0.05]
df.drop(drop_columns, axis=1)
num_columns = []
cat_columns = []
for col, dtype in df.dtypes.iteritems():
if dtype in [np.int64, np.float64]:
num_columns.append(col)
else:
cat_columns.append(col)
df[num_columns] = df[num_columns].fillna(df[num_columns].mean())
df[cat_columns] = df[cat_columns].fillna(df[cat_columns].mode())
Short comment on make_traintest function: I would simply return 2 separate DataFrames instead of a dictionary or use sklearn.model_selection.train_test_split.
upd. You can check for number of NaN values in a column, but it is unnecessary if your only goal is to impute NaNs.
Answer
I discovered the answer as to why my columns were being swapped. They were not actually being swapped. The original problem was that I had not set the "Order" column as the index column. To fix the problem on the notebook in my PC, I simply added the following paramater and value to pd.read_csv: index_col = "Order". That fixed the problem on my local notebook. When I tried it on the Kaggle notebook, however, it did not fix the problem
The version of the Ames Housing data set I first used on the notebook - for some reason - was also the cause for the column swapping.
Anvar's Code is fine. You may test the code I wrote, but to be safe, defer to Anvar's code. Mine is still to be tested.
Testing Done
I modified the Kaggle notebook I linked in my question. I used the data set I was actually working in with my PC. When I did that, the code given by Anvar Kurmukov's answer worked perfectly. I tested my own code and it seems fine, but test both versions before trying. I only reviewed the data sets using head() and manually checked the column inputs. If you want to check the notebook, here it is:
https://www.kaggle.com/joachimrives/ames-housing-public-problem/
To test if the data set was at fault, I created to data frames. One was taken directly from my local file uploaded to Kaggle. The other used the current version of the Ames Iowa Housing data set I had used as input. The columns were properly "aligned" with their expected input. To find the expected column values, I used this source:
http://jse.amstat.org/v19n3/decock/DataDocumentation.txt
Here are the screenshots of the different results I got when I swapped data sets:
With an uploaded copy of my local file:
With the original AmesHousing.csv From Notebook Version 1:
The data set I Used that Caused the Column-swap on the Kaggle Notebook
https://www.kaggle.com/marcopale/housing

Using 2 different outputs of 'return' of a function in separate elements of a plot

I am drawing a plot of voltage per time. For the voltage values, I want the values to be evaluated by a 'scaling' function which converts the values from volts to kilovolts if the biggest element is higher than 1000 volts (11000 volts to 11 KILOvolts).
This function is supposed to return 2 separate outputs; one for (new) values of voltage and one for the unit. The values are fed into the y axis values of the plot and the unit is given to the labeling line of that axis. For example:
import numpy as np
time = np.array([0, 1, 2, 3])
system_voltage1 = np.array([110, 120, 130, 150])
system_voltage2 = np.array([11000, 12000, 13000, 15000])
scaling_function(input)
if np.amax(input) < 1000:
output = input/1
Voltage_label = 'Voltage in Volts'
if np.amax(input) > 1000:
output = input/1000
Voltage_label = 'Voltage in KILOVolts'
return(output, Voltage_label)
fig14 = plt.figure(figsize=(16,9))
ax1 = fig14.add_subplot(111)
l1, = ax1.plot(time, scaling_function(system_voltage), color='r')
ax1.set_xlabel("time in second", color='k')
ax1.set_ylabel(Voltage_label, color='k')
Now, I am having trouble, calling this function properly. I need the function to only receive the output for scaling_function(system_voltage), and receive Voltage_label in ax1.set_ylabel(Voltage_label, color='k'). Now:
A) My problem: I don't know how to write the code so only the first output is received and used for scaling_function(system_voltage) , and the second element for the labeling line.
B) Something I tried but didn't work:Voltage_label does not recognize the value of voltage_label from scaling_function, as it is located in an outer loop than the function. I mean, I cannot access voltage_label as its value is not globally assigned.
Can anyone help me with this?
y,l = scaling_function(system_voltage)
l1, = ax1.plot(time, y, color='r')
ax1.set_xlabel("time in second", color='k')
ax1.set_ylabel(l, color='k')

A function I created can only be called once, second time it shows an error

I created a function to draw a bargraph with turtle. The first call works fine but fails when calling it for the second time:
"File "C:\Users\NTC\AppData\Local\Programs\Python\Python37\lib\turtle.py", line 1292, in _incrementudc raise Terminator turtle.Terminator"
only thing i tried is using t.terminator at the end, same results
def bar_chart():
t = turtle.Turtle()
screen = turtle.Screen()
##snip # lines detailing the lines drawing
for value in range(1,11): # the upper limit has to be more thanhighest value in the "count"
t.forward(20)
t.write((" " + str(value)), align="left",font=("Arial", 8, "normal"))
screen.exitonclick()
just expect it to be called multiple times in a looped program.
The screen.exitonclick() function has no business being in a function that you plan to call multiple times. It's logically the last thing you do in your program. You also shouldn't keep allocating turtles to do the same thing, allocate one and reuse it. Something like:
from turtle import Screen, Turtle
def bar_chart(t):
for value in range(1, 11): # the upper limit has to be more than highest value in the "count"
t.forward(20)
t.write((" " + str(value)), align="left", font=("Arial", 8, "normal"))
screen = Screen()
turtle = Turtle()
bar_chart(turtle)
screen.exitonclick()

How do I write a function that takes the average of a list of numbers

I want to avoid importing different modules as that is mostly what I have found while looking online. I am stuck with this bit of code and I don't really know how to fix it or improve on it. Here's what I've got so far.
def avg(lst):
'''lst is a list that contains lists of numbers; the
function prints, one per line, the average of each list'''
for i[0:-1] in lst:
return (sum(i[0:-1]))//len(i)
Again, I'm quite new and this for loops jargon is quite confusing to me, so if someone could help me get it so the output of, say, a list of grades would be different lines containing the averages. So if for lst I inserted grades = [[95,92,86,87], [66,54], [89,72,100], [33,0,0]], it would have 4 lines that all had the averages of those sublists. I also am to assume in the function that the sublists could have any amount of grades, but I can assume that the lists have non-zero values.
Edit1: # jramirez, could you explain what that is doing differently than mine possible? I don't doubt that it is better or that it will work but I still don't really understand how to recreate this myself... regardless, thank you.
I think this is what you want:
def grade_average(grades):
for grade in grades:
avg = 0
for num in grade:
avg += num
avg = avg / len(grade)
print ("Average for " + str(grade) + " is = " + str(avg))
if __name__ == '__main__':
grades = [[95,92,86,87],[66,54],[89,72,100],[33,0,0]]
grade_average(grades)
Result:
Average for [95, 92, 86, 87] is = 90.0
Average for [66, 54] is = 60.0
Average for [89, 72, 100] is = 87.0
Average for [33, 0, 0] is = 11.0
Problems with your code: the extraneous indexing of i; the use of // to truncate he averate (use round if you want to round it); and the use of return in the loop, so it would stop after the first average. Your docstring says 'print' but you return instead. This is actually a good thing. Functions should not print the result they calculate, as that make the answer inaccessible to further calculation. Here is how I would write this, as a generator function.
def averages(gradelists):
'''Yield average for each gradelist.'''
for glist in gradelists:
yield sum(glist) /len(glist)
print(list(averages(
[[95,92,86,87], [66,54], [89,72,100], [33,0,0]])))
[90.0, 60.0, 87.0, 11.0]
To return a list, change the body of the function to (beginner version)
ret = []
for glist in gradelists:
ret.append(sum(glist) /len(glist))
return ret
or (more advanced, using list comprehension)
return [sum(glist) /len(glist) for glist in gradelists]
However, I really recommend learning about iterators, generators, and generator functions (defined with yield).