I would like to use the Empirical Orthogonal Function in my lat/long/time/temperature dataset. First problem I face is to convert my .csv data into .nc (I need to obtain a three dimension but I failed).
In the follow my code and what I get:
import pandas as pd
import xarray
new_df =df[['TIME','LAT','LONG','Temperat']].copy()
print("DataFrame Shape:",new_df.shape)
display(new_df.head(5))
xr = xarray.Dataset.from_dataframe(new_df)
xr.to_netcdf('test.nc')
image of the dataset
Related
I am stuck in my code. I am parsing 'Json' using pandas but while transforming there is one column that I am stuck with. This column below is "json" inside a list. I want to transform it to 4 columns and the the corresponding data in the rows. The column name is "TypeValues". If I use, pd.concat to transform, I get the error message: "DataFrame constructor not properly called!". I tried using the "pd.DataFrame(eval(data))" but that gives me " eval() arg 1 must be a string, bytes or code object".
There is another column ID. For each ID, i have this below type of data in the "TypeValues" column. I want to get the transformed data with each ID. Does someone have an idea, how can I achieve this? (moreover, some of the rows of the column "TypeValues" have entries [] & blanks)
The column is in the comments section as it didn't allow me to post directly
Thanks in advance
import numpy as np
import pandas as pd
import json from pathlib
import Path from pandas.io.json
import json_normalize import ast
#Reading the File data = pd.read_csv('Test.csv')
Data1 = (pd.concat({k: pd.DataFrame(v) for k, v in data['TypeValues'].pop('TypeValues').items()})).reset_index(level=1, drop=True)
This returns the error of incorrectly using the Dataframe
I'm working on a project and need to parse a GeoJSON shapefile of flight route airspace in the US. The data is coming from the FAA open data portal: https://adds-faa.opendata.arcgis.com/datasets/faa::route-airspace/about
There seems to be some relevant documentation at /workspace/documentation/product/geospatial-docs/vector_data_in_transforms where it mentions:
A typical Foundry Ontology pipeline for geospatial vector data may include the following steps:
- Convert into rows with GeoJSON representation of the shape for each feature.
However there isn't actually any guidance on how to go about doing this when the source is a single GeoJSON file with a FeatureCollection and the desired output is a dataset with one row per Feature in the collection.
Anyone have a code snippet for accomplishing this? Seems like a pretty generic task in Foundry.
I typically do something like this:
import json
with open('Route_Airspace.geojson', 'r') as f:
data = json.load(f)
rows = []
for feature in data['features']:
row = {
'geometry': json.dumps(feature['geometry']),
'properties': json.dumps(feature['properties']),
'id': feature['properties']['OBJECTID']
}
rows.append(row)
Note you can leave out the properties, but I like to keep them in this step in case I need them later. Also note this is a good place to set each row's primary key as well (the Features in this dataset have an OBJECTID property, but this may vary)
This rows list can then be used to initialise a Pandas dataframe:
import pandas as pd
df = pd.DataFrame(rows)
or a Spark dataframe (*assuming you're doing this within a transform):
df = ctx.spark_session.createDataFrame(rows)
The resulting dataframes will have one row per Feature, where that feature's shape is contained within the geometry column.
Full example within transform:
from transforms.api import transform, Input, Output
import json
#transform(
out=Output('Path/to/output'),
source_df=Input('Path/to/source'),
)
def compute(source_df, out, ctx)
with source_df.filesystem().open('Route_Airspace.geojson', 'r') as f:
data = json.load(f)
rows = []
for feature in data['features']:
row = {
'geometry': json.dumps(feature['geometry']),
'properties': json.dumps(feature['properties']),
'id': feature['properties']['OBJECTID']
}
rows.append(row)
df = ctx.spark_session.createDataFrame(rows)
out.write_dataframe(df)
Note that for this to work your GeoJSON file needs to be uploaded into a "dataset without a schema" so the raw file becomes accessible via the FileSystem API.
I want to scrape data at the county level from https://apidocs.covidactnow.org
However I could only get a dataframe with one line for each county, and data for each date is stored within a dictionary in each row/county. I would like to access this data and store it in long format (= have one row per county-date).
import requests
import pandas as pd
import os
if __name__ == '__main__':
os.chdir('/home/username/Desktop/')
url = 'https://api.covidactnow.org/v2/counties.timeseries.json?apiKey=ENTER_YOUR_KEY'
response = requests.get(url).json()
data = pd.DataFrame(response)
This seems like a trivial question, but I've tried for hours. What would be the best way to achieve that ?
Do you mean something like that?
import requests
url = 'https://api.covidactnow.org/v2/states.timeseries.csv?apiKey=YOURAPIKEY'
response = requests.get(url)
csv_response = (response.text)
# Then you can transform STRING to CSV
Check this fo string to CSV --> python parsing string to csv format
I am a newbie and have this Keras UNet model that I need to train.
https://github.com/Swapneel7/Deep-Learning-Keras/blob/master/UNet%20Keras.pdf
The input tensor does follow the required dimensions of 1024*1024*1.
Input:-
Parse the mat into array
import scipy.io as sio
import tensorflow as tf
import numpy as np
input1=sio.loadmat('D:\\Users\\svekhande\\tempx.mat')
TensorInput1=input1['temp']
TensorInput= np.expand_dims(TensorInput1,2)
print(TensorInput.shape)
Model:-
def unet(input_size=(1024,1024,1)):
inputs = Input(input_size)
...
.
Training:-
model.fit(TensorInput, steps_per_epoch=1, epochs=3)
Error
IndexError: list index out of range
The error indicates I exceeded the bound so I tried feeding smaller tensors (1023*1023*1), changing the network shape and resetting the graph but it did not work hence posting.
import pandas as pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from matplotlib.finance import candlestick_ohlc
import matplotlib.dates as mdates
import datetime as dt
import urllib
import json
from urllib.request import urlopen
import datetime as dt
import requests
dataLink ='http://api.huobi.com/staticmarket/btc_kline_015_json.js'
r = requests.get(dataLink) # r is a response object.
quotes = pd.DataFrame.from_records(r.json()) # fetches dataset
quotes[0] = pd.to_datetime(quotes[0].str[:-3], format='%Y%m%d%H%M%S')
#Naming columns
quotes.columns = ["Date","Open","High",'Low',"Close", "Vol"]
#Converting dates column to float values
quotes['Date'] = quotes['Date'].map(mdates.date2num)
#Making plot
fig = plt.figure()
fig.autofmt_xdate()
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=6, colspan=1)
#Converts raw mdate numbers to dates
ax1.xaxis_date()
plt.xlabel("Date")
print(quotes)
#Making candlestick plot
candlestick_ohlc, (ax1,quotes.values,width=1,colorup='g',colordown='k',
alpha=0.75)
plt.show()
I'm trying to plot a candlestick chart from json data provided by Huobi but I can't sort the dates out & the plot looks horrible. Can you explain in fairly simple terms that a novice might understand what I am doing wrong please? This is my code ....
Thx, in advance`
You can put the fig.autofmt_xdate() at some point after calling the candlestick function; this will make the dates look nicer.
Concerning the plot itself, you may decide to make the bars a bit smaller, width=0.01, such that they won't overlap.
You may then also decide to zoom in a bit, to actually see what's going on in the chart, either interactively, or programmatically,
ax1.set_xlim(dt.datetime(2017,04,17,8),dt.datetime(2017,04,18,0))
This boiled down to a question of how wide to make the candlesticks given the granularity of the data as determined by the period & length parameters of the json feed. You just have to fiddle around with the width parameter in candlestick_ohlc() until the graph looks right...