For loop with different number of iterations based on datetime - json

I am trying to get hourly data from a JSON file for a 34-month period. To do this I have created a daterange which I use in a nested loop to get data for each day for all 24 hours. This works fine.
However, because of daylight savings, there are only 23 daily observations on 3 occasions, the first being 2020-03-29. And therefore, I would like to loop only 23 iterations on this date since my loop crashes otherwise.
Below is my code. Right now it gets stuck on the date for SyntaxError: invalid syntax. But there is a high risk it will get stuck on something else when this is fixed.
Thank you.
start_date = date(2020, 1, 1)
end_date = date(2022, 11, 1)
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
parsing_range_svk = []
for single_date in daterange(start_date, end_date):
single = single_date.strftime("%Y-%m-%d")
parsing_range_svk.append(single)
######################################
svk =[]
for i in parsing_range_svk:
data_json_svk = json.loads(urlopen("https://www.svk.se/services/controlroom/v2/situation?date={}&biddingArea=SE1".format(i)).read())
if i == '2020-03-29'
for i in range(23):
rows = data_json_svk['Data'][0]['data'][i]['y']
else:
for i in range(24):
rows = data_json_svk['Data'][0]['data'][i]['y']
svk.append(rows)

Don't check explicitly for a date, rather use list comprehension to get values you need (it will work correctly for 23/24 hours days):
from urllib.request import urlopen
from datetime import date, timedelta
start_date = date(2020, 1, 1)
end_date = date(2022, 11, 1)
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
parsing_range_svk = []
for single_date in daterange(start_date, end_date):
single = single_date.strftime("%Y-%m-%d")
parsing_range_svk.append(single)
######################################
url = "https://www.svk.se/services/controlroom/v2/situation?date={}&biddingArea=SE1"
svk = []
for i in parsing_range_svk:
data_json_svk = json.loads(urlopen(url.format(i)).read())
svk.append([v["y"] for v in data_json_svk["Data"][0]["data"]])
print(svk)

Related

COUNTIFS: Excel to pandas and remove counted elements

I have a COUNTIFS equation in excel (COUNTIFS($A$2:$A$6, "<=" & $C4))-SUM(D$2:D3) where A2toA6 is my_list. C4 is current 'bin' with the condition and D* are previous summed results from my_list that meet the condition. I am attempting to implement this in Python
I have looked at previous COUNTIF questions but I am struggling to complete the final '-SUM(D$2:D3)' part of the code.
See the COUNTIFS($A$2:$A$6, "<=" & $C4) section below.
'''
my_list=(-1,-0.5, 0, 1, 2)
bins = (-1, 0, 1)
out = []
for iteration, num in enumerate(bins):
n = []
out.append(n)
count = sum(1 for elem in my_list if elem<=(num))
n.append(count)
print(out)
'''
out = [1, [3], [4]]
I need to sum previous elements, that have already been counted, and remove these elements from the next count so that they are not counted twice ( Excel representation -SUM(D$2:D3) ). This is where I need some help! I used enumerate to track iterations. I have tried the code below in the same loop but I can't resolve this and I get errors:
'''
count1 = sum(out[0:i[0]]) for i in (out)
and
count1 = out(n) - out(n-1)
''''
See expected output values in 'out' array for bin conditions below:
I was able to achieve the required output array values by creating an additional if/elif statement to factor out previous array elements and generate a new output array 'out1'. This works but may not be the most efficient way to achieve the end goal:
'''
import numpy as np
my_list=(-1,-0.5, 0, 1, 2)
#bins = np.arange(-1.0, 1.05, 0.05)
bins = (-1, 0, 1)
out = []
out1 = []
for iteration, num in enumerate(bins):
count = sum(1 for elem in my_list if elem<=(num))
out.append(count)
if iteration == 0:
count1 = out[iteration]
out1.append(count1)
elif iteration > 0:
count1 = out[iteration] - out[iteration - 1]
out1.append(count1)
print(out1)
'''
I also tried using the below code as suggested in other answers but this didn't work for me:
'''
-np.diff([out])
print(out)
'''

Is there a convienient way to call a function multiple times without a loop?

I am currently making some code to randomly generate a set of random dates and assigning them to a matrix. I wish to randomly generate N amount of dates (days and months) and display them in a Nx2 matrix. My code is as follows
function dates = dategen(N)
month = randi(12);
if ismember(month,[1 3 5 7 8 10 12])
day = randi(31);
dates = [day, month];
elseif ismember(month,[4 6 9 11])
day = randi(30);
dates = [day, month];
else
day = randi(28);
dates = [day, month];
end
end
For example if I called on the function, as
output = dategen(3)
I would expect 3 dates in a 2x3 matrix. However, I am unsure how to do this. I believe I need to include N into the function somewhere but I'm not sure where or how.
Any help is greatly appreciated.
You can do it using logical indexing as follows:
function dates = dategen(N)
months = randi(12, 1, N);
days = NaN(size(months)); % preallocate
ind = ismember(months, [1 3 5 7 8 10 12]);
days(ind) = randi(31, 1, sum(ind));
ind = ismember(months, [4 6 9 11]);
days(ind) = randi(30, 1, sum(ind));
ind = ismember(months, 2);
days(ind) = randi(28, 1, sum(ind));
dates = [months; days];
end

Formatting data being written to a csv file with csv python 3.7

I am trying to write the Barycenter positions of planets to a csv file. I am using skyfield api, csv and python 3.7. The position output is given as x y z coordinates. I want to have columns for the date/time which I have, columns for each of the x, y & z coordinates for each planet on the same row. I have tried 2 ways to achieve this, 1 gives the data in the columns how I want it but on separate rows and the other gives the header as I want but the coordinates for a planet are in are in a single column rather than 3 columns. I have looked at other formatting examples but none have resolved the issue I have.
#This is first attempt;#
`
from skyfield.api import utc
from skyfield.api import load
import csv
import datetime
from datetime import datetime
from datetime import timedelta, date
planets = load('de421.bsp')
sun = planets['sun']
earth = planets['earth']
moon = planets['moon']
mercury = planets['mercury']
venus = planets['venus']
mars = planets['mars']
JUPITER_BARYCENTER = planets['JUPITER_BARYCENTER']
SATURN_BARYCENTER = planets['SATURN_BARYCENTER']
URANAS_BARYCENTER = planets['URANUS_BARYCENTER']
NEPTUNE_BARYCENTER = planets['NEPTUNE_BARYCENTER']
PLUTO_BARYCENTER = planets['PLUTO_BARYCENTER']
ts = load.timescale()
start_date = date(1986, 11, 8)
end_date = date(1986, 12, 31)
with open('BCRS positions-1.2.csv', 'w') as csvfile:
for single_date in daterange(start_date, end_date):
single_date.strftime("%Y/%m/%d")
#date = datetime.strptime(single_date, "%Y/%m/%d")
writer = csv.writer(csvfile)
t = ts.utc(single_date, 10, 30, 0)
BCRS = ('Date', single_date,
'Sun-x','Sun-y','Sun- z',sun.at(t).position.au,
'Mercury-x','Mercury-y','Mercury-z', mercury.at(t).position.au,
'Venus-x','Venus-y','Venus-z', venus.at(t).position.au,
'Moon-x','Moon-y','Moon-z',moon.at(t).position.au,
'Earth-x', 'Earth-y', 'Earth-z', earth.at(t).position.au,
'Mars-x', 'Mars-y', 'Mars-z', mars.at(t).position.au,
'Jupiter-x','Jupiter-y','Jupiter-z', JUPITER_BARYCENTER.at(t).position.au,
'Saturn-x','Saturn-y','Saturn-z', SATURN_BARYCENTER.at(t).position.au,
'Uranas-x','Uranas-y','Uranas-z', URANAS_BARYCENTER.at(t).position.au,
'Neptune-x','Neptune-y','Neptune-z', NEPTUNE_BARYCENTER.at(t).position.au,
'Pluto-x','Pluto-y','Pluto-z', PLUTO_BARYCENTER.at(t).position.au)
writer.writerow(BCRS)
csvfile.close()`
Output
Date,1986-11-08,Sun-x,Sun-y,Sun-z,[-0.0038418 0.0051725 0.00223502],Mercury-x,Mercury-y,Mercury-z,[0.30680392 0.12163008 0.03220969],Venus-x,Venus-y,Venus-z,[0.48875971 0.4985835 0.19301198],Moon-x,Moon-y,Moon-z,[0.6923354 0.65149371 0.28223558],Earth-x,Earth-y,Earth-z,[0.69095218 0.65328502 0.28325265],Mars-x,Mars-y,Mars-z,[1.38798446 0.094756 0.00565359],Jupiter-x,Jupiter-y,Jupiter-z,[ 4.93083814 -0.48155161 -0.32663981],Saturn-x,Saturn-y,Saturn-z,[-3.16415776 -8.82081358 -3.50690074],Uranas-x,Uranas-y,Uranas-z,[ -2.56470042 -17.41419317 -7.59061325],Neptune-x,Neptune-y,Neptune-z,[ 2.85009934 -27.8321119 -11.46284476],Pluto-x,Pluto-y,Pluto-z,[-22.59654067 -19.26368339 0.79656139]
##This is the second attempt;##
from skyfield.api import utc
from skyfield.api import load
import csv
import datetime
from datetime import datetime
from datetime import timedelta, date
# Sun, Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto
planets = load('de421.bsp')
sun = planets['sun']
earth = planets['earth']
moon = planets['moon']
mercury = planets['mercury']
mars = planets['mars']
venus = planets['venus']
JUPITER_BARYCENTER = planets['JUPITER_BARYCENTER']
SATURN_BARYCENTER = planets['SATURN_BARYCENTER']
URANAS_BARYCENTER = planets['URANUS_BARYCENTER']
NEPTUNE_BARYCENTER = planets['NEPTUNE_BARYCENTER']
PLUTO_BARYCENTER = planets['PLUTO_BARYCENTER']
# Specfiy the date and time (UTC) for planets positions
# date/time format - t = ts.utc(yyyy, mm ,dd, hh, mm, ss)
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
ts = load.timescale()
start_date = datetime(1986, 11, 8, 10, 30, 0, tzinfo=utc)
end_date = datetime(1986, 12, 31, 10, 30, 0, tzinfo=utc)
with open('BCRS positions-test.csv', 'w', newline='') as csvFile:
writer = csv.writer(csvFile, delimiter=',')
writer.writerow(['Date', 'Sun-x', 'Sun-y', 'Sun-z', 'Mercury-x','Mercury-y','Mercury-z', 'Venus-x','Venus-y','Venus-z', 'Moon-x','Moon-y','Moon-z',
'Earth-x', 'Earth-y', 'Earth-z', 'Mars-x', 'Mars-y', 'Mars-z', 'Jupiter-x','Jupiter-y','Jupiter-z',
'Saturn-x','Saturn-y','Saturn-z', 'Uranas-x','Uranas-y','Uranas-z', 'Neptune-x','Neptune-y','UNeptune-z',
'Pluto-x','Pluto-y','Pluto-z'])
for single_date in daterange(start_date, end_date):
single_date.strftime("%Y/%m/%d")
#date = datetime.strptime(single_date, "%Y/%m/%d")
t = ts.utc(single_date, 10, 30, 0)
writer.writerow([single_date, sun.at(t).position.au, mercury.at(t).position.au, venus.at(t).position.au, moon.at(t).position.au,
earth.at(t).position.au, mars.at(t).position.au, JUPITER_BARYCENTER.at(t).position.au, SATURN_BARYCENTER.at(t).position.au,
URANAS_BARYCENTER.at(t).position.au, NEPTUNE_BARYCENTER.at(t).position.au, PLUTO_BARYCENTER.at(t).position.au])
csvFile.close()
Output
Sun-x,Sun-y,Sun-z - I get this as a header in 3 columns
[-0.0038418 0.0051725 0.00223502] I get this in a single column below the header but needs to be in 3 columns one for each x, y, z position
###What I am trying to achieve is;###
Sun-x,Sun-y,Sun-z,Mercury-x,Mercury-y,Mercury-z,Venus-x,Venus-y,Venus-z
-0.003953380897142202,0.004828488778607356,0.0020912483521329586,-0.11600122254182059,-0.39948956059670143,-0.20224588140237967,-0.4899043811693688,0.47522226197221284,0.2444554690221239
Any help with this would be greatly appreciated
I have found the solution I need to achieve what I wanted.
from skyfield.api import wgs84
from skyfield.api import load
from skyfield.data import iers
from skyfield.api import utc
from pytz import timezone
import csv
from datetime import datetime, timedelta
from itertools import chain
import pandas as pd
from decimal import *
def weekly_it(start, finish):
while finish > start:
start = start + timedelta(weeks=1)
yield start
start = datetime(1985, 6, 29, 9, 30, 15, tzinfo=utc)
finish = datetime(2022, 1, 1, 9, 30, 15, tzinfo=utc)
fieldnames = ['Date', 'Sun-x', 'Sun-y', 'Sun-z', 'Mercury-x','Mercury-y','Mercury-z', 'Venus-x','Venus-y','Venus-z', 'Moon-x','Moon-y','Moon-z',
'Earth-x', 'Earth-y', 'Earth-z', 'Mars-x', 'Mars-y', 'Mars-z', 'Jupiter-x','Jupiter-y','Jupiter-z',
'Saturn-x','Saturn-y','Saturn-z', 'Uranas-x','Uranas-y','Uranas-z', 'Neptune-x','Neptune-y','UNeptune-z',
'Pluto-x','Pluto-y','Pluto-z']
with open('FileName.csv', 'w', newline='') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(fieldnames)
for week in weekly_it(start, finish):
w = [week.strftime("%Y-%m-%d")]
astrometric = aveley.at(ts.utc(week)).observe(sun)
r = w, astrometric.position.au
flatten_list = list(chain.from_iterable(r))
p = flatten_list
writer = csv.writer(csvFile, delimiter=",")
writer.writerows([p])
I Hope this helps someone with same issue

Sorting with csv library, error says my dates don't match '%Y-%m-%d' format when it does

I'm trying to sort a CSV by date first then time second. With Pandas, it was easy by using df = df.sort_values(by=['Date', 'Time_UTC']). In the csv library, the code is (from here):
with open ('eqph_csv_29May2020_noF_5lines.csv') as file:
reader = csv.DictReader(file, delimiter=',')
date_sorted = sorted(reader, key=lambda Date: datetime.strptime('Date', '%Y-%m-%d'))
print(date_sorted)
The datetime documentation clearly says these codes are right. Here's a sample CSV (no delimiter):
Date Time_UTC Latitude Longitude
2020-05-28 05:17:31 16.63 120.43
2020-05-23 02:10:27 15.55 121.72
2020-05-20 12:45:07 5.27 126.11
2020-05-09 19:18:12 14.04 120.55
2020-04-10 18:45:49 5.65 126.54
csv.DictReader returns an iterator that yields a dict for each row in the csv file. To sort it on a column from each row, you need to specify that column in the sort function:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'], '%Y-%m-%d'))
To sort on both Date and Time_UTC, you could combine them into one string and convert that to a datetime:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'] + ' ' + row['Time_UTC'], '%Y-%m-%d %H:%M:%S'))
Nick's answer worked and used it to revise mine. I used csv.reader() instead.
lon,lat = [],[]
xy = zip(lon,lat)
with open ('eqph_csv_29May2020_noF_20lines.csv') as file:
reader = csv.reader(file, delimiter=',')
next(reader)
date_sorted = sorted(reader, key=lambda row: datetime.strptime
(row[0] + ' ' + row[1], '%Y-%m-%d %H:%M:%S'))
for row in date_sorted:
lon.append(float(row[2]))
lat.append(float(row[3]))
for i in xy:
print(i)
Result
(6.14, 126.2)
(14.09, 121.36)
(13.74, 120.9)
(6.65, 125.42)
(6.61, 125.26)
(5.49, 126.57)
(5.65, 125.61)
(11.33, 124.64)
(11.49, 124.42)
(15.0, 119.79) # 2020-03-19 06:33:00
(14.94, 120.17) # 2020-03-19 06:49:00
(6.7, 125.18)
(5.76, 125.14)
(9.22, 124.01)
(20.45, 122.12)
(5.65, 126.54)
(14.04, 120.55)
(5.27, 126.11)
(15.55, 121.72)
(16.63, 120.43)

Formatting data in a CSV file (calculating average) in python

import csv
with open('Class1scores.csv') as inf:
for line in inf:
parts = line.split()
if len(parts) > 1:
print (parts[4])
f = open('Class1scores.csv')
csv_f = csv.reader(f)
newlist = []
for row in csv_f:
row[1] = int(row[1])
row[2] = int(row[2])
row[3] = int(row[3])
maximum = max(row[1:3])
row.append(maximum)
average = round(sum(row[1:3])/3)
row.append(average)
newlist.append(row[0:4])
averageScore = [[x[3], x[0]] for x in newlist]
print('\nStudents Average Scores From Highest to Lowest\n')
Here the code is meant to read the CSV file and in the first three rows (row 0 being the users name) it should add all the three scores and divide by three but it doesn't calculate a proper average, it just takes the score from the last column.
Basically you want statistics of each row. In general you should do something like this:
import csv
with open('data.csv', 'r') as f:
rows = csv.reader(f)
for row in rows:
name = row[0]
scores = row[1:]
# calculate statistics of scores
attributes = {
'NAME': name,
'MAX' : max(scores),
'MIN' : min(scores),
'AVE' : 1.0 * sum(scores) / len(scores)
}
output_mesg ="name: {NAME:s} \t high: {MAX:d} \t low: {MIN:d} \t ave: {AVE:f}"
print(output_mesg.format(**attributes))
Try not to consider if doing specific things is inefficient locally. A good Pythonic script should be as readable as possible to every one.
In your code, I spot two mistakes:
Appending to row won't change anything, since row is a local variable in for loop and will get garbage collected.
row[1:3] only gives the second and the third element. row[1:4] gives what you want, as well as row[1:]. Indexing in Python normally is end-exclusive.
And some questions for you to think about:
If I can open the file in Excel and it's not that big, why not just do it in Excel? Can I make use of all the tools I have to get work done as soon as possible with least effort? Can I get done with this task in 30 seconds?
Here is one way to do it. See both parts. First, we create a dictionary with names as the key and a list of results as values.
import csv
fileLineList = []
averageScoreDict = {}
with open('Class1scores.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
fileLineList.append(row)
for row in fileLineList:
highest = 0
lowest = 0
total = 0
average = 0
for column in row:
if column.isdigit():
column = int(column)
if column > highest:
highest = column
if column < lowest or lowest == 0:
lowest = column
total += column
average = total / 3
averageScoreDict[row[0]] = [highest, lowest, round(average)]
print(averageScoreDict)
Output:
{'Milky': [7, 4, 5], 'Billy': [6, 5, 6], 'Adam': [5, 2, 4], 'John': [10, 7, 9]}
Now that we have our dictionary, we can create your desired final output by sorting the list. See this updated code:
import csv
from operator import itemgetter
fileLineList = []
averageScoreDict = {} # Creating an empty dictionary here.
with open('Class1scores.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
fileLineList.append(row)
for row in fileLineList:
highest = 0
lowest = 0
total = 0
average = 0
for column in row:
if column.isdigit():
column = int(column)
if column > highest:
highest = column
if column < lowest or lowest == 0:
lowest = column
total += column
average = total / 3
# Here is where we put the emtpy dictinary created earlier to good use.
# We assign the key, in this case the contents of the first column of
# the CSV, to the list of values.
# For the first line of the file, the Key would be 'John'.
# We are assigning a list to John which is 3 integers:
# highest, lowest and average (which is a float we round)
averageScoreDict[row[0]] = [highest, lowest, round(average)]
averageScoreList = []
# Here we "unpack" the dictionary we have created and create a list of Keys.
# which are the names and single value we want, in this case the average.
for key, value in averageScoreDict.items():
averageScoreList.append([key, value[2]])
# Sorting the list using the value instead of the name.
averageScoreList.sort(key=itemgetter(1), reverse=True)
print('\nStudents Average Scores From Highest to Lowest\n')
print(averageScoreList)
Output:
Students Average Scores From Highest to Lowest
[['John', 9], ['Billy', 6], ['Milky', 5], ['Adam', 4]]