Function on each row of pandas DataFrame but not generating a new column - function

I have a data frame in pandas as follows:
A B C D
3 4 3 1
5 2 2 2
2 1 4 3
My final goal is to produce some constraints for an optimization problem using the information in each row of this data frame so I don't want to generate an output and add it to the data frame. The way that I have done that is as below:
def Computation(row):
App = pd.Series(row['A'])
App = App.tolist()
PT = [row['B']] * len(App)
CS = [row['C']] * len(App)
DS = [row['D']] * len(App)
File3 = tuplelist(zip(PT,CS,DS,App))
return m.addConstr(quicksum(y[r,c,d,a] for r,c,d,a in File3) == 1)
But it does not work out by calling:
df.apply(Computation, axis = 1)
Could you please let me know if there is anyway to do this process?

.apply will attempt to convert the value returned by the function to a pandas Series or DataFrame. So, if that is not your goal, you are better off using .iterrows:
# In pseudocode:
for row in df.iterrows:
constrained = Computation(row)
Also, your Computation can be expressed as:
def Computation(row):
App = list(row['A']) # Will work as long as row['A'] is iterable
# For the next 3 lines, see note below.
PT = [row['B']] * len(App)
CS = [row['C']] * len(App)
DS = [row['D']] * len(App)
File3 = tuplelist(zip(PT,CS,DS,App))
return m.addConstr(quicksum(y[r,c,d,a] for r,c,d,a in File3) == 1)
Note: [<list>] * n will create n pointers or references to the same <list>, not n independent lists. Changes to one copy of n will change all copies in n. If that is not what you want, use a function. See this question and it's answers for details. Specifically, this answer.

Related

Retrieve data in sets Pandas

I'm retrieving data from the Open Weather Map API. I have the following code where I'm extracting the current weather from more than 500 cities and I want the log that is giving me separate the data in sets of 50 each
I did a non efficient way that I would really like to improve!
Many many thanks!
x = 1
for index, row in df.iterrows():
base_url = "http://api.openweathermap.org/data/2.5/weather?"
units = "imperial"
query_url = f"{base_url}appid={api_key}&units={units}&q="
city = row['Name'] #this comes from a df
response = requests.get(query_url + city).json()
try:
df.loc[index,"Max Temp"] = response["main"]["temp_max"]
if index < 50:
print(f"Processing Record {index} of Set {x} | {city}")
elif index <100:
x = 2
print(f"Processing Record {index} of Set {x} | {city}")
elif index <150:
x = 3
print(f"Processing Record {index} of Set {x} | {city}")
except (KeyError, IndexError):
pass
print("City not found. Skipping...")

Writing Fibonacci Sequence Elegantly Python

I am trying to improve my programming skills by writing functions in multiple ways, this teaches me new ways of writing code but also understanding other people's style of writing code. Below is a function that calculates the sum of all even numbers in a fibonacci sequence up to the max value. Do you have any recommendations on writing this algorithm differently, maybe more compactly or more pythonic?
def calcFibonacciSumOfEvenOnly():
MAX_VALUE = 4000000
sumOfEven = 0
prev = 1
curr = 2
while curr <= MAX_VALUE:
if curr % 2 == 0:
sumOfEven += curr
temp = curr
curr += prev
prev = temp
return sumOfEven
I do not want to write this function recursively since I know it takes up a lot of memory even though it is quite simple to write.
You can use a generator to produce even numbers of a fibonacci sequence up to the given max value, and then obtain the sum of the generated numbers:
def even_fibs_up_to(m):
a, b = 0, 1
while a <= m:
if a % 2 == 0:
yield a
a, b = b, a + b
So that:
print(sum(even_fibs_up_to(50)))
would output: 44 (0 + 2 + 8 + 34 = 44)

Storing coefficients from a Regression in Stata

I am trying to store the coefficients from a simulated regression in a variable b1 and b2 in the code below, but I'm not quite sure how to go about this. I've tried using return scalar b1 = _b[x1] and return scalar b2 = _b[x2], from the rclass() function, but that didn't work. Then I tried using scalar b1 = e(x1) and scalar b2 = e(x2), from the eclass() function and also wasn't successful.
The goal is to use these stored coefficients to estimate some value (say rhat) and test the standard error of rhat.
Here's my code below:
program montecarlo2, eclass
clear
version 11
drop _all
set obs 20
gen x1 = rchi2(4) - 4
gen x2 = (runiform(1,2) + 3.5)^2
gen u = 0.3*rnormal(0,25) + 0.7*rnormal(0,5)
gen y = 1.3*x1 + 0.7*x2 + 0.5*u
* OLS Model
regress y x1 x2
scalar b1 = e(x1)
scalar b2 = e(x2)
end
I want to do something like,
rhat = b1 + b2, and then test the standard error of rhat.
Let's hack a bit at your program:
Version 1
program montecarlo2
clear
version 11
set obs 20
gen x1 = rchi2(4) - 4
gen x2 = (runiform(1,2) + 3.5)^2
gen u = 0.3*rnormal(0,25) + 0.7*rnormal(0,5)
gen y = 1.3*x1 + 0.7*x2 + 0.5*u
* OLS Model
regress y x1 x2
end
I cut drop _all as unnecessary given the clear. I cut the eclass. One reason for doing that is the regress will leave e-class results in its wake any way. Also, you can if you wish add
scalar b1 = _b[x1]
scalar b2 = _b[x2]
scalar r = b1 + b2
either within the program after the regress or immediately after the program runs.
Version 2
program montecarlo2, eclass
clear
version 11
set obs 20
gen x1 = rchi2(4) - 4
gen x2 = (runiform(1,2) + 3.5)^2
gen u = 0.3*rnormal(0,25) + 0.7*rnormal(0,5)
gen y = 1.3*x1 + 0.7*x2 + 0.5*u
* OLS Model
regress y x1 x2
* stuff to add
end
Again, I cut drop _all as unnecessary given the clear. Now the declaration eclass is double-edged. It gives the programmer scope for their program to save e-class results, but you have to say what they will be. That's the stuff to add indicated by a comment above.
Warning: I've tested none of this. I am not addressing the wider context. #Dimitriy V. Masterov's suggestion of lincom is likely to be a really good idea for whatever your problem is.

Generating random number with different digits

So I need to write a program which generates random numbers from 100 to 999, and the tricky part is that the digits in the number can't be the same.
For example: 222, 212 and so on are not allowed.
So far, I have this:
import random
a = int (random.randint(1,9)) <-- first digit
b = int (random.randint(0,9)) <-- second digit
c = int (random.randint(0,9)) <-- third digit
if (a != b and a != b and b != c):
print a,b,c
As you can see, I generate all three digits separately. I think it's easier to check if there are same digits in the number.
So, now I want to do a loop that will generate the numbers until the requirements are met (now it either prints blank page or the number). Note that it generates only once and I have to open the program again.
In C++ I did that with the 'while loop'. And here I don't know how to do it.
And one more question. How can I code the number(quantity) of random numbers I want to generate?
To be more specific:
I want to generate 4 random numbers. How should I code it?
P.S.
Thank you all for your answers and suggestions, I am really keen on learning new techniques and codes.
It might be easier to use random.sample to sample from the distribution [0,9] without replacement, rejecting any samples which select 0 first:
import random
def pick_number():
a = 0
while a == 0:
a, b, c = random.sample(range(10), 3)
return 100*a + 10*b + c
Further explanation: range(10) (in Python 2) generates the list [0,1,2,3,4,5,6,7,8,9], and random.sample picks 3 elements from it without replacement. The while loop repeats until the first element, a is not zero, so when it exits you have the three digits of a number that meets the requirements. You can turn this into a single integer by multiplying the first by 100, the second by 10 and then adding all three.
To loop until it's ok you can also use while in Python:
from random import randint
a = randint(1, 9)
b = randint(0, 9)
c = randint(0, 9)
while not (a!=b and b!=c and c!=a):
a = randint(1, 9)
b = randint(0, 9)
c = randint(0, 9)
You can also put it in a function:
def generate_number():
a = randint(1, 9)
b = randint(0, 9)
c = randint(0, 9)
while not (a!=b and b!=c and c!=a):
a = randint(1, 9)
b = randint(0, 9)
c = randint(0, 9)
return (a, b, c)
And if you want n such numbers (they are not actually numbers since (a, b, c) is a tuple of 3 int values), you can call it n times:
for i in range(n):
print(generate_number())
If you prefer formatting the values, you can also do:
for i in range(n):
print('%d %d %d'%generate_number()) # old style formatting
print('{} {} {}'.format(*generate_number())) # new style
Finally, you can use get n from the command line:
import sys
n = sys.argv[1]
Or you can ask it directly:
n = int(input("Please enter some number: ")) # in python2.x you'd use raw_input instead of input
You'll get an exception if the value cannot be converted; you can catch the exception and loop as for the generation of numbers.
Putting it all together with the typical main construct:
from random import randint
import sys
def generate_number():
a = randint(1, 9)
b = randint(0, 9)
c = randint(0, 9)
while not (a!=b and b!=c and c!=a):
a = randint(1, 9)
b = randint(0, 9)
c = randint(0, 9)
return (a, b, c)
def main():
n = sys.argv[1]
for i in range(n):
print('{} {} {}'.format(*generate_number()))
if __name__=='__main__':
main()
Instead of generating 3 random numbers create a list of numbers 0-9 then call random.shuffle on it. You then just pull out as many digits as you need from the shuffled list and no digit will be repeated
import random
def generate_number():
numbers = range(10) # Generates a list 0-9
random.shuffle(numbers) # Shuffle the list
return (numbers[0], numbers[1], numbers[2]) #take the first 3 items
(This answer is pretty much the same as xnx's answer execpt it uses the method shuffle which can be used in version prior to 2.3 and it does not loop waiting for a non 0 first digit.)
On top of #xnx answer, some background: this is called Fisher-Yates (or F-Y-Knuth) shuffle, and it is similar to randomly picking lottery tickets, using ticket only once. O(n) complexity, more to read here http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Writing to multiple columns (not rows) - Python

I am trying to write two lists to a csv file. I want the lists to feed vertically into the spreadsheet into two columns.
import csv
import os
name = "rr"
newname = name+".csv"
rs = [1,2,3,4]
dr = [2,3,4,5]
with open(newname, 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerow(rs)
writer.writerow(dr)
I am getting this:
1 2 3 4
2 3 4 5
I want this:
1 2
2 3
3 4
4 5
Using your example lists you could do:
rs = [1,2,3,4]
dr = [2,3,4,5]
output = ""
for r in zip(rs, dr):
output += str(str(r[0]) + " " + str(r[1]) + "\n")
#now write the output etc.
This is not complete regarding the loop and lacks the writing. But I think what you actually wanted was, what the zip-builtin provides.
Explanation of what zip() does taken from official documentation:
This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.