I am trying to Plot using Panels Widget FileInput? - widget

I am trying to make a plotting system with a panel widget FileInput to also read yaml files. I get my buttons to functions and pick my files, but when I try to plot my files I get a TypeError: float() argument must be a string or a number, not 'FileInput'. I am put a sample of my code but not the files I am using on here. I could really use some help in what I am doing wrong. I am just starting to learn python widgets. Thanks in advance for any help I can get.
import matplotlib.pyplot as plt
import panel as pn
pn.extension()
year = 2006
model_path = pn.widgets.FileInput(multiple = True)
model_path
plot_control_file = pn.widgets.FileInput()
plot_control_file
plt.plot(model_path, plot_control_file)

Related

How to plot a function in Python when both variables cannot be isolated to one side

I am trying to graph variable "u" versus variable "T" for 1<T<1000 (integers). However, the function I have includes both of the variables within an integral so I cannot create an isolated u=f(T) function. My thought process is to manipulate the function so that it is 0=f(T,u) and output a "u" value that minimizes f(T,u) for each T. However, I seem to be struggling a lot with how these variables and functions should be defined. All constants are defined and "E" should be defined through the integration step. The overall function I start with is:
5x10^28=integrate((pi/2)(8m/h^2)(E^0.5)(exp((E-u)/k*T)+1)^-1) from 0 to infinity and with respect to "E"
I am very new to python but the following code is how far I've been able to develop it based on previous forums and video tutorials. Any help is much appreciated!
from scipy.integrate import quad
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as spo
m=9.11e-31
h=6.63e-34
k=1.38e-23
T=list(range(1,1001))
def f(E,u):
return (np.pi/2)*(8*m/(h**2))*(E**0.5)*(1/((np.exp((E-u)/k*T)+1)))
Func_Equal_Zero=quad(f,0,np.inf,args=(u,))[0]-5e-28
Start_Guess_T_u=[500,1e-5]
result=spo.minimize(Func_Equal_Zero,Start_Guess_T_u)
plt.plot(T,u)
plt.figure(figsize=(6,6))
plt.xlabel('Temperature (k)')
plt.ylabel('Chemical Potential (J)')
I expected that I could just define the functions including "u" but python does not seem to like what I have tried. I am not sure if any of my other syntax is not correct because I cannot get past its issue with defining "u".

Palantir Foundry How to allow dynamic number of input in compute (Code repository)

I have a folder where I will upload one file every month. The file will have the same format in every month.
First problem
The idea is to concatenate all the files in this folder into one file. Currently I am hardcoding the filenames (filename[0], filename[1], filename[2]..) but imagine later I will have 50 files, should I explicitly add them to the transform_df decorator? Is there any other method to handle this?
Second problem:
Currently I have let's say 4 files (2021_07, 2021_08, 2021_09, 2021_10) and I want whenever I add the file presenting 2021_12 data to avoid changing the code.
If I add input_5 = Input(path_to_2021_12_do_not_exists) the code will not be run and give an error.
How can I implement the code for future files and let the code ignore the input if it does not exist without manually each month add a new value to my code?
Thank you
# from pyspark.sql import functions as F
from transforms.api import transform_df, Input, Output
from pyspark.sql.functions import to_date, year, col
from pyspark.sql.types import StringType
from myproject.datasets import utils
from pyspark.sql import DataFrame
from functools import reduce
input_dir = '/Company/Project_name/'
prefix_filename = 'DataInput1_'
suffixes = ['2021_07', '2021_08', '2021_09', '2021_10', '2021_11', '2021_12']
filenames = [input_dir + prefix_filename + suffixe for suffixe in suffixes]
#transform_df(
Output("/Company/Project_name/Data/clean/File_concat"),
input_1=Input(filenames[0]),
input_2=Input(filenames[1]),
input_3=Input(filenames[2]),
input_4=Input(filenames[3]),
)
def compute(input_1, input_2, input_3, input_4):
input_dfs = [input_1, input_2, input_3, input_4]
dfs = []
def transformation_input(df):
# some transformation
return df
for input_df in input_dfs:
dfs.append(transformation_input(input_df))
dfs = reduce(DataFrame.unionByName, dfs)
return dfs
This question comes up a lot, the simple answer is that you don't. Defining datasets and executing a build on them are two different steps executed at different stages.
Whenever you commit your code and run the checks, your overall python code is executed during the renderSchrinkwrap stage, except for the compute part. This allows Foundry to discover what datasets exist and publish.
Publishing involves creating your dataset and putting whatever is inside your compute function is published into the jobspec of the dataset, so foundry knows what code to execute whenever you run a build.
Once you hit build on the dataset, Foundry will only pick up whatever is on the jobspec and execute it. Any other code has already run during your checks, and it has run just once.
So any dynamic input/output would require you to re-run checks on your repo, which means that some code change would have had to happen since the Checks is part of the CI process, not part of the build.
Taking a step back, assuming each of your input files has the same schema, Foundry would expect you to have all of those files in the same dataset as append transactions.
This might not be possible though, if for instance, the only indication of the "year" of the data is embedded in the filename, but your sample code would indicate that you expect all these datasets to have the same schema and easily union together.
You can do this manually through the Dataset Preview - just use the Upload File button or drag-and-drop the new file into the Preview window - or, if it's an "end user" workflow, with a File Upload Widget in a Workshop app. You may need to coordinate with your Foundry support team if this widget isn't available.
Bit late to the post although for anyone who is interested in an answer to most of the question. Dynamically determining file names from within a folder is not doable although having some level of dynamic input is possible as follows:
# from pyspark.sql import functions as F
from transforms.api import transform, Input, Output
from pyspark.sql.functions import to_date, year, col
from pyspark.sql.types import StringType
from myproject.datasets import utils
from pyspark.sql import DataFrame
# from functools import reduce
from transforms.verbs.dataframes import union_many # use this instead of reduce
input_dir = '/Company/Project_name/'
prefix_filename = 'DataInput1_'
suffixes = ['2021_07', '2021_08', '2021_09', '2021_10', '2021_11', '2021_12']
filenames = [input_dir + prefix_filename + suffixe for suffixe in suffixes]
inputs = {('input{}'.format(index)): Input(filename) for (index, filename) in enumerate(filenames))}
#transform(
output=Output("/Company/Project_name/Data/clean/File_concat"),
**inputs
)
def compute(output, **kwargs):
# Extract dataframes from input datasets
input_dfs = [dataset_df.dataframe() for dataset_name, dataset_df in kwargs.items()]
dfs = []
def transformation_input(df):
# some transformation
return df
for input_df in input_dfs:
dfs.append(transformation_input(input_df))
# dfs = reduce(DataFrame.unionByName, dfs)
unioned_dfs = union_many(*dfs)
return unioned_dfs
Couple points:
Created dynamic input dict.
That dict is read into the transform using **kwargs.
Using transform decorator not transform_df, we can extract the dataframes.
(not in question) Combine multiple dataframes using union_many function from transforms_verbs library.

Can we have multiple dashboards of different dropdowns in python dash?

Recently I started using dash for Data Visualization and I'm analyzing the Stock Data using qunadle API, but unable to get multiple dashboards of dropdown displaying the options of each dataset using a for loop like this
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import pandas as pd
import quandl
import plotly.graph_objs as go
import auth
api_key = auth.key
def easy_analysis(quandl_datasets):
try:
for dataset in quandl_datasets:
df = quandl.get(dataset,authtoken=api_key)
df = df.reset_index()
app = dash.Dash(__name__)
app.layout = html.Div([
html.H3(dataset),
dcc.Dropdown(
id=dataset,
options=[{'label' : s,'value' : s} for s in df.columns[1:]],
value=['Open'],
multi=True
),
dcc.Graph(id='dataset' + str(dataset))
])
#app.callback(
Output('dataset' + str(dataset),'figure'),
[Input(dataset,'value')]
)
def draw_graph(dataset):
graphs = []
for column in dataset:
graphs.append(go.Scatter(
x=list(df.Date),
y=list(df[column]),
name=str(column),
mode='lines'
))
return {'data' : graphs}
app.run_server(debug=True)
except Exception as e:
print(str(e))
easy_analysis(['NSE/KOTAKNIFTY','NSE/ZENSARTECH','NSE/BSLGOLDETF'])
The Output which I expected was having multiple dashboards with all the dropdown options one after the other. But the result what I got was having only one dashboard of the last item in the easy_analysis() function list
easy_analysis(['NSE/KOTAKNIFTY','NSE/ZENSARTECH','NSE/BSLGOLDETF']), considered only 'NSE/BSLGOLDETF'
what am I supposed do to fix this and get multiple dashboards of each dataset as provided in the list. I also checked the Dash User Guide, but could not get what I was looking for.
But, when passed only one argument for only one dataset with a for loop, the code works fine and the graph changes according to the option selected in the dropdown.
The code is here.
The code does not work because you are redefining a Dash app at each iteration of the for loop.
Even if you have three datasets, you need to define the Dash app and its layout only once.
You can make three requests to the Quandl API and - if possible - save everything in the same pandas Dataframe.
One question is whether you want to display all dropdowns and graphs (i.e. dropdown + graph for each Quandl dataset) or only one dropdown and one graph. I would suggest to start with the first approach, because it's much easier. Anyway, for the second approach you can have a look at this solution.

Why Bokeh has the following error msg?

I am trying to follow along the udemy tutorial and ran into the following error:
Bokeh Error
attempted to retrieve property value for property without value specification in the html by running the code below.
Does anyone have a clue as why it happened? Thanks!!
from bokeh.plotting import figure
from bokeh.io import output_file, show, gridplot
#from bokeh.sampledata.periodic_table import elements
from bokeh.models import Range1d, PanTool, ResetTool, HoverTool,
ColumnDataSource, LabelSet
import pandas
from bokeh.models.annotations import Span#assess object within
annotations
#prepare the output file
output_file("layout.html")
x1,y1=list(range(0,10)),list(range(10,20))
#create a new plot
f1=figure(width=250, plot_height=250, title="Circles")
f1.circle(x1, y1, size=10, color="navy", alpha=0.5)
#create a span annotation (a vertical reference line)
span_4=Span(location=4,dimension='height',line_color='grenn',
line_width=2)
#define where to add the span_4 object instance, add_layout method
f1.add_layout(span_4)
#create a box annotation
box_2_6=BoxAnnotation(left=2,right=6,fill_color="firebrick",
fill_alpha=0.3)
f1.add_layout(box_2_6)
#show the results
show(f1)
It appears to be because of a typo - line_color='grenn' should be line_color='green'. You're right that it's an unhelpful error message though. I'll try to open a GH issue about it.

Coefficient in support vector regression (SVR) using grid search (GridSearchCV) and Pipeline in Scikit Learn

I am having trouble to access the coefficients of a support vector regression model (SVR) in scikit learn when the model is embedded in a pipeline and a grid search.
Consider the following example:
from sklearn.datasets import load_iris
import numpy as np
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
iris = load_iris()
X_train = iris.data
y_train = iris.target
clf = SVR(kernel='linear')
select = SelectKBest(k=2)
steps = [('feature_selection', select), ('svr', clf)]
pipeline = Pipeline(steps)
grid = GridSearchCV(pipeline, param_grid={"svr__C":[10,10,100],"svr__gamma": np.logspace(-2, 2)})
grid.fit(X_train, y_train)
This seems to work fine but when I try to access the coefficient of the best fitting model
grid.best_estimator_.coef_
I get an error message: AttributeError: 'Pipeline' object has no attribute 'coef_'.
I also tried to access the individual steps of the pipeline:
pipeline.named_steps['svr']
but could not find the coefficients there.
Just happened to come across the same problem and this post
had the answer:
grid.best_estimator_ contains an instance of the pipeline, which consists of steps. The last step should always be the estimator, so you should always find the coefficients at:
grid.best_estimator_.steps[-1][1].coef_