Pandas function that iterates over values in a series with case statements

Pandas function that iterates over values in a series with case statements - function

I have a dataframe that has contains a column of integers. I want to write a function that takes a series as an argument, iterates through each value of the series, and performs a case statement on each integer within the series, and returns a new series from the results of the case statement. Currently I'm working with the following code and getting errors:
def function(series):
if series['column_of_ints'] >= 0 and series['column_of_ints'] < 100:
return series['column_of_ints']
elif series['column_of_ints'] >= 100 and series['column_of_ints'] < 200:
return series['column_of_ints'] + 1
else:
return series['column_of_ints'] + 2
df['column_of_ints_v2'] = df['column_of_ints'].apply(function, axis=1)

Don't use apply you can achieve the same result much faster using 3 .loc calls:
df.loc[(df['column_of_ints'] >= 0) & (df['column_of_ints'] < 100), 'column_of_ints_v2'] df['column_of_ints']
df.loc[(df['column_of_ints'] >= 100) & (df['column_of_ints'] < 200), 'column_of_ints_v2'] = df['column_of_ints'] + 1
df.loc[(df['column_of_ints'] < 0) & (df['column_of_ints'] >= 200), 'column_of_ints_v2'] = df['column_of_ints'] + 2
Or using where:
df['column_of_ints_v2'] = np.where((df['column_of_ints'] >= 0) & (df['column_of_ints') < 100), df['column_of_ints'] + 1, np.where( (df['column_of_ints'] >= 100) & (df['column_of_ints'] < 200), df['column_of_ints'] + 2, df['column_of_ints'] ))
As to why your code fails:
df['column_of_ints'].apply(function, axis=1)
df['column_of_ints'] is a Series not a DataFrame, there is no axis=1 for apply method for a Series, you can force this to a DataFrame using double square brackets:
df[['column_of_ints']].apply(function, axis=1)
If you're applying row-wise to a single column then you don't need the column accessors in your function:
def function(series):
if series >= 0 and series < 100:
return series
elif series >= 100 and series < 200:
return series + 1
else:
return series + 2
but really you should use a vectorised method like my proposal above

Related

For loop with different number of iterations based on datetime

I am trying to get hourly data from a JSON file for a 34-month period. To do this I have created a daterange which I use in a nested loop to get data for each day for all 24 hours. This works fine.
However, because of daylight savings, there are only 23 daily observations on 3 occasions, the first being 2020-03-29. And therefore, I would like to loop only 23 iterations on this date since my loop crashes otherwise.
Below is my code. Right now it gets stuck on the date for SyntaxError: invalid syntax. But there is a high risk it will get stuck on something else when this is fixed.
Thank you.
start_date = date(2020, 1, 1)
end_date = date(2022, 11, 1)
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
parsing_range_svk = []
for single_date in daterange(start_date, end_date):
single = single_date.strftime("%Y-%m-%d")
parsing_range_svk.append(single)
######################################
svk =[]
for i in parsing_range_svk:
data_json_svk = json.loads(urlopen("https://www.svk.se/services/controlroom/v2/situation?date={}&biddingArea=SE1".format(i)).read())
if i == '2020-03-29'
for i in range(23):
rows = data_json_svk['Data'][0]['data'][i]['y']
else:
for i in range(24):
rows = data_json_svk['Data'][0]['data'][i]['y']
svk.append(rows)

Don't check explicitly for a date, rather use list comprehension to get values you need (it will work correctly for 23/24 hours days):
from urllib.request import urlopen
from datetime import date, timedelta
start_date = date(2020, 1, 1)
end_date = date(2022, 11, 1)
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
parsing_range_svk = []
for single_date in daterange(start_date, end_date):
single = single_date.strftime("%Y-%m-%d")
parsing_range_svk.append(single)
######################################
url = "https://www.svk.se/services/controlroom/v2/situation?date={}&biddingArea=SE1"
svk = []
for i in parsing_range_svk:
data_json_svk = json.loads(urlopen(url.format(i)).read())
svk.append([v["y"] for v in data_json_svk["Data"][0]["data"]])
print(svk)

Why isn't my counter going up in my class function? (Python)

The counters in functions "getWeight" and "getPrice" do not go up as I planned. It always reverts back to the original integer, which applies to whatever value you input in the console. How can I fix this error?
import math
class ShippingCharges:
def __init__(self, userInput=None, packNum= None, packPrice = None):
self.userInput = userInput
self.packNum = packNum
self.packPrice = packPrice
def getPrice (self):
if (self.userInput <= 2):
print("Package", self.packNum," will cost $1.10 per pound\n")
elif(6 >= self.userInput and 2 < self.userInput):
print("Package", self.packNum, " will cost $2.20 per pound\n")
elif(10 >= self.userInput and 6 < self.userInput):
print("Package", self.packNum, "will cost 3.70 per pound\n")
elif(self.userInput > 10):
print("Package ", self.packNum, "will cost $3.80 per pound\n")
def getWeight(self):
return self.userInput + self.userInput
def getPrice(self):
if (self.userInput <= 2):
return self.pacPrice + 1.10
elif(6 >= self.userInput and 2 < self.userInput):
return self.packPrice+ 2.20
elif(10 >= self.userInput and 6 < self.userInput):
return self.packPrice + 3.70
elif(self.userInput > 10):
return self.packPrice + 3.80
def displayInfo(self):
print("The total price is: %.2f" % self.getPrice())
print("The total weight of packages is: %.2f" % self.getWeight())
def main():
x = 0
userResponse = "y"
packNum = 1
packPrice = 0
while(x != 1):
userInput = eval(input("Enter the weight of package in pounds: "))
if(userInput >0):
package = ShippingCharges(userInput, packNum, packPrice)
packNum = packNum + 1
userResponse = input("Would you like to send another package? y/n \n")
if(userResponse == "n"):
break
elif(userInput <= 0):
print("Package must be greater than 0")
package.displayInfo()
main()
If I type "5" 5 times as the userInput, why is the output:
The total charge for all packages is: 2.2
The total weight of all packages is: 10.00
Why is it not?
The total charge for all packages is: 11.00
The total weight of all packages is: 25.00

writing function , giving input and printing them with a condition

I want to write a function, which will accept parameter and it will print with a condition (It's output will depend on the input). My program is giving key error. I am looking for an output like:
This number is less than 0 and it's spelling is one hundred
thirteen
and my code is:
def word(num):
d1= {0:'Zero',1:'One',2:'Two',3:'Three',4:'Four',5:'Five',6:'Six',7:'Seven',8:'Eight',9:'Nine',10:'Ten',11:'Eleven',12:'Twelve',13:'Thirteen',14:'Fourteen',15:'Fifteen',16:'Sixteen',17:'Seventeen',18:'Eighteen',19:'Ninteen',20:'Twenty',30:'Thirty',40:'Fourty',50:'Fifty',60:'Sixty',70:'Seventy',80:'Eighty',90:'Ninty'}
if (num<20):
return d1[num]
if (num<100):
if num % 10 == 0:
return d1[num]
else:
return d1[num // 10 * 10] + ' ' + d1[num % 10]
if (num < 0):
return "This number is less than 0 and it's spelling is" + word(num)
print (word(- 100))
print (word(13))

You should have your wide condition before narrow condition.
In your code, you have 3 if conditions, num < 20, num < 100, num < 0, which actually is 0 <= num < 20, 20 <= num < 100, num < 0. The last condition is the widest, but you move it incorrectly at the bottom.
Sort your condition train into the order num < 0, num < 20, num < 100 may fix this issue.
Update: You can't use word[num] in your num < 0 block. I can't understand what "is one hundred" in your expected output. Is it a hardcoded text? Then hardcode it, for example:
def word(num):
d1= {0:'Zero',1:'One',2:'Two',3:'Three',4:'Four',5:'Five',6:'Six',7:'Seven',8:'Eight',9:'Nine',10:'Ten',11:'Eleven',12:'Twelve',13:'Thirteen',14:'Fourteen',15:'Fifteen',16:'Sixteen',17:'Seventeen',18:'Eighteen',19:'Ninteen',20:'Twenty',30:'Thirty',40:'Fourty',50:'Fifty',60:'Sixty',70:'Seventy',80:'Eighty',90:'Ninty'}
if num < 0:
return "This number is less than 0 and it's spelling is one hundred"
if num < 20:
return d1[num]
if num < 100:
if num % 10 == 0:
return d1[num]
else:
return d1[num // 10 * 10] + ' ' + d1[num % 10]
print(word(-100))
print(word(13))

Tweaking a Function in Python

I am trying to get the following code to do a few more tricks:
class App(Frame):
def __init__(self, master):
Frame.__init__(self, master)
self.grid()
self.create_widgets()
def create_widgets(self):
self.answerLabel = Label(self, text="Output List:")
self.answerLabel.grid(row=2, column=1, sticky=W)
def psiFunction(self):
j = int(self.indexEntry.get())
valueList = list(self.listEntry.get())
x = map(int, valueList)
if x[0] != 0:
x.insert(0, 0)
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
self.answer = Label(self, text=rtn)
self.answer.grid(row=2, column=2, sticky=W)
if __name__ == "__main__":
root = Tk()
In particular, I am trying to get it to calculate len(x) * j - 1 terms, and to work for a variety of parameter values. If you try running it you should find that you get errors for larger parameter values. For example with a list 0,1,2,3,4 and a parameter j=3 we should run through the program and get 0123456789101112. However, I get an error that the last value is 'out of range' if I try to compute it.
I believe it's an issue with my function as defined. It seems the issue with parameters has something to do with the way it ties the parameter to the n value. Consider 0123. It works great if I use 2 as my parameter (called index in the function) but fails if I use 3.
EDIT:
def psi_j(x, j):
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
if r == 0:
rtn.append(j * x[n])
else:
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
print 'n2 =', n2, ': n =', n, ' r =' , r, ' rtn =', rtn
return rtn
For example if we have psi_j(x,2) with x = [0,1,2,3,4] we will be able to get [0,1,2,3,4,5,6,7,8,9,10,11] with an error on 12.
The idea though is that we should be able to calculate that last term. It is the 12th term of our output sequence, and 12 = 3*4+0 => 3*x[4] + 0*(x[n+1]-x[n]). Now, there is no 5th term to calculate so that's definitely an issue but we do not need that term since the second part of the equation is zero. Is there a way to write this into the equation?

If we think about the example data [0, 1, 2, 3] and a j of 3, the problem is that we're trying to get x[4]` in the last iteration.
len(x) * j - 2 for this data is 10
range(0, 10) is 0 through 9.
Manually processing our last iteration, allows us to resolve the code to this.
n = 3 # or 9 / 3
r = 0 # or 9 - 3 * 3
rtn.append(3 * x[3] + 0 * (x[3 + 1] - x[3]))
We have code trying to reach x[3 + 1], which doesn't exist when we only have indices 0 through 3.
To fix this, we could rewrite the code like this.
n = n2 / j
r = n2 - n * j
if r == 0:
rtn.append(j * x[n])
else:
rtn.append(j * x[n] + r * (x[n + 1] - x[n]))
If r is 0, then (x[n + 1] - x[n]) is irrelevant.
Please correct me if my math is wrong on that. I can't see a case where n >= len(x) and r != 0, but if that's possible, then my solution is invalid.

Without understanding that the purpose of the function is (is it a kind of filter? or smoothing function?), I prickled it out of the GUI suff and tested it alone:
def psiFunction(j, valueList):
x = map(int, valueList)
if x[0] != 0:
x.insert(0, 0)
rtn = []
for n2 in range(0, len(x) * j - 2):
n = n2 / j
r = n2 - n * j
print "n =", n, "max_n2 =", len(x) * j - 2, "n2 =", n2, "lx =", len(x), "r =", r
val = j * x[n] + r * (x[n + 1] - x[n])
rtn.append(val)
print j * x[n], r * (x[n + 1] - x[n]), val
return rtn
if __name__ == '__main__':
print psiFunction(3, [0, 1, 2, 3, 4])
Calling this module leads to some debugging output and, at the end, the mentionned error message.
Obviously, your x[n + 1] access fails, as n is 4 there, so n + 1 is 5, one too much for accessing the x array, which has length 5 and thus indexes from 0 to 4.
EDIT: Your psi_j() gives me the same behaviour.
Let me continue guessing: Whatever we want to do, we have to ensure that n + 1 stays below len(x). So maybe a
for n2 in range(0, (len(x) - 1) * j):
would be helpful. It only produces the numbers 0..11, but I think this is the only thing which can be expected out of it: the last items only can be
3*3 + 0*(4-3)
3*3 + 1*(4-3)
3*3 + 2*(4-3)
and stop. And this is achieved with the limit I mention here.

How to perform a conditional assignment on each element of the vector

I have a function like this:
y=-2 with x<=0
y=-2+3x^2 with 0=1
I need to compute this function on each element of the 1D matrix, without using a loop.
I thought it was possibile defining a function like this one:
function y= foo(x)
if x<=0
y=-2;
elseif x>=1
y=1;
else
y= -2+3*x.^2;
end
end
But this just produces a single result, how to operate on all elements? I know the . operator, but how to access the single element inside an if?

function b = helper(s)
if s<=0
b=-2;
elseif s>=1
b=1;
else
b= -2+3*s^2;
end
end
Then simply call
arrayfun(#helper, x)
to produce the behaviour you want of your function foo.

Another approach which doesn't need arrayfun() would be to multiply by the conditions:
y = -2*(x <= 0) + (-2+3*x.^2).*(x < 1).*(x > 0) + (x >= 1)
which you could also make a function. This will accept vector inputs for x e.g.
x = [1 4 0 -1 0.5];
y = -2*(x <= 0) + (-2+3*x.^2).*(x < 1).*(x > 0) + (x >= 1)
outputs
y =
1.0000 1.0000 -2.0000 -2.0000 -1.2500

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Pandas function that iterates over values in a series with case statements - function

Related

For loop with different number of iterations based on datetime

Why isn't my counter going up in my class function? (Python)

writing function , giving input and printing them with a condition

Tweaking a Function in Python

How to perform a conditional assignment on each element of the vector

Categories

Resources