Linear Regression Function for Sin Graph - regression

I am currently working on a project where I must analyze data and find a period for the graph. The data contains outliers. I need a function that will make a line of best fit for the function.
I attempted to simply get a sin graph on the plot, but I could not even do that. Can anyone give me a starting hint?
import os
import pyfits as fits
import numpy as np
import pylab
import random
import scipy.optimize
import scipy.signal
from numpy import arange
from matplotlib import pyplot
from scipy.optimize import curve_fit
filename = 'C:\Users\Ken Preiser\Desktop\Space thing\Snapshots\BAT_70m_snapshot_SWIFT_J1647.9-4511B.lc'
namePortion = filename[-39:]
hdulist = fits.open(filename, 'readonly', None, False) #{unpacks file) name, mode, memorymap, savebackup
data = hdulist[1].data
datapoints = 23310
def sinfunc(a, b, c): #I tried graphing a sinfunction, but it did not work...
return a*np.sin(bx-c)
time = data.field('TIME')
time = time / 86400.0
timeViewingThreshold = 10
rateViewingThreshold = .01
rate = np.sum(data['RATE'][:,:4], axis=1)
average = np.sum(rate)/23310
error = data.field('ERROR')
error = np.sqrt(np.sum(data['ERROR'][:,:4]**2, axis=1))
print rate.size,(", rate")
print time.size,(", time")
fig = pylab.figure()
ax = fig.add_subplot(111)
ax.set_xlabel('Time')
ax.set_ylabel('Rate')
ax.set_title('Rate vs Time graph: ' + namePortion)
pylab.plot(time, rate, 'o')
pyplot.xlim(min(time) - timeViewingThreshold, max(time) + timeViewingThreshold)
pyplot.ylim(min(rate) - rateViewingThreshold, max(rate) + rateViewingThreshold)
ax.errorbar(time, rate, xerr=0, yerr=error)
pylab.show()
(the outputs)
http://imgur.com/jbfuxOA

You're trying to fit points to the model: y = sin(ax + b). Since you're using linear regression, you need a linear model. So one way to do that is compute arcsin for each point and now compute the linear regression. The model is now: arcsin(y) = ax + b. The regression model gives you a and b which is what you're after. You should be able to test this out pretty quickly in excel, then code it up once the nuances are figured out.

Related

Get Variables in SciPy LeastSq to use them

I need to get fitting result for each parameter created in each least_sq run.
Can anyone guide me on Parameter names inferred from the function arguments in the SciPy LeastSq??
** Good news!
using the lmfit such a wonderful package!**
from scipy.optimize import least_squares
from matplotlib.pylab import plt
import numpy as np
from numpy import exp, linspace, random
from lmfit import Model
def gaussian(x, amp, cen, wid):
return amp * np.exp(-(x-cen)**2 / wid)
x = linspace(-10, 10, 101)
y = gaussian(x, 2.33, 0.21, 1.51) + random.normal(0, 0.2, len(x))
gmodel = Model(gaussian)
params = gmodel.make_params()
print('parameter names: {}'.format(gmodel.param_names))
print('independent variables: {}'.format(gmodel.independent_vars))
result = gmodel.fit(y, params, x=x, amp=5, cen=5, wid=1)
print(result.fit_report())
[n many cases we might want to extract parameters and standard error estimates programatically rather than by reading the fit report][1]

How to plot a 2 dimensional density / heatmap in plotnine?

I am trying to recreate the following graph in plotnine. It's asking me for more details but I don't want to distract from the example. I think it's pretty obvious what I'm trying to do. I have been given a function by a colleague. I'm not interested in rewriting the function. I want to take sm and use plotnine to plot it instead of matplotlib. I plot lots of dataframes with plotnine but I'm not sure how to use it in this case. I have tried on my own to figure it out and I keep getting lost. I hope that for someone more experienced I am overlooking something simple.
import matplotlib.pyplot as plt
def getSuccess(y,x):
return((y*(-x))*.5+.5)
steps = 100
stepSize = 1/steps
sm = []
for y in range(steps*2+1):
sm.append([getSuccess((y-steps)*stepSize,(x-steps)*stepSize) for x in range(steps*2+1)])
plt.imshow(sm)
plt.ylim(-1, 1)
plt.colorbar()
plt.yticks([0,steps,steps*2],[str(y) for y in [-1.0,0.0,1.0]])
plt.xticks([0,steps,steps*2],[str(x) for x in [-1.0,0.0,1.0]])
plt.show()
You could try geom_raster.
I have taken your synthetic data sm and converted to a dataframe as plotnine will need this.
import pandas as pd
import numpy as np
from plotnine import *
df = pd.DataFrame(sm).melt()
df.rename(columns={'variable':'x','value':'density'}, inplace=True)
df.insert(1,'y',df.index % 201)
p = (ggplot(df, aes('x','y'))
+ geom_raster(aes(fill='density'), interpolate=True)
+ labs(x=None,y=None)
+ scale_x_continuous(expand=(0,0), breaks=[0,100,200], labels=[-1,0,1])
+ scale_y_continuous(expand=(0,0), breaks=[0,100,200], labels=[-1,0,1])
+ theme_matplotlib()
+ theme(
text = element_text(family="Calibri", size=9),
legend_title = element_blank(),
axis_ticks = element_blank(),
legend_key_height = 29.6,
legend_key_width = 6,
)
)
p.save(filename='C:\\Users\\BRB\\geom_raster.png', height=10, width=10, units = 'cm', dpi=400)
This result is:

How to feed an LSTM/GRU model multiple independent Time Series?

In Order to explain it simply: I have 53 Oil Producing wells measurements, each well has been measured each day for 6 years, we recorded multiple variables (Pressure, water production, gas production...etc), and our main component(The one we want to study and forecast) is the Oil production rate. How can I Use all the data to train my model of LSTM/GRU knowing that the Oil wells are independent and that the measurments have been done in the same time for each one?
The knowledge that "the measurments have been done in the same time for each [well]" is not necessary if you want to assume that the wells are independent. (Why do you think that that knowledge is useful?)
So if the wells are considered independent, treat them as individual samples. Split them into a training set, validation set, and test set, as usual. Train a usual LSTM or GRU on the training set.
By the way, you might want to use the attention mechanism instead of recurrent networks. It is easier to train and usually yields comparable results.
Even convolutional networks might be good enough. See methods like WaveNet if you suspect long-range correlations.
These well measurements sound like specific and independent events. I work in the finance sector. We always look at different stocks, and each stocks specific time neries using LSTM, but not 10 stocks mashed up together. Here's some code to analyze a specific stock. Modify the code to suit your needs.
from pandas_datareader import data as wb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.preprocessing import MinMaxScaler
start = '2019-06-30'
end = '2020-06-30'
tickers = ['GOOG']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Open','Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']])
#names = np.reshape(price_data, (len(price_data), 1))
df = pd.concat(price_data)
df.reset_index(inplace=True)
for col in df.columns:
print(col)
#used for setting the output figure size
rcParams['figure.figsize'] = 20,10
#to normalize the given input data
scaler = MinMaxScaler(feature_range=(0, 1))
#to read input data set (place the file name inside ' ') as shown below
df['Adj Close'].plot()
plt.legend(loc=2)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
ntrain = 80
df_train = df.head(int(len(df)*(ntrain/100)))
ntest = -80
df_test = df.tail(int(len(df)*(ntest/100)))
#importing the packages
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
#dataframe creation
seriesdata = df.sort_index(ascending=True, axis=0)
new_seriesdata = pd.DataFrame(index=range(0,len(df)),columns=['Date','Adj Close'])
length_of_data=len(seriesdata)
for i in range(0,length_of_data):
new_seriesdata['Date'][i] = seriesdata['Date'][i]
new_seriesdata['Adj Close'][i] = seriesdata['Adj Close'][i]
#setting the index again
new_seriesdata.index = new_seriesdata.Date
new_seriesdata.drop('Date', axis=1, inplace=True)
#creating train and test sets this comprises the entire data’s present in the dataset
myseriesdataset = new_seriesdata.values
totrain = myseriesdataset[0:255,:]
tovalid = myseriesdataset[255:,:]
#converting dataset into x_train and y_train
scalerdata = MinMaxScaler(feature_range=(0, 1))
scale_data = scalerdata.fit_transform(myseriesdataset)
x_totrain, y_totrain = [], []
length_of_totrain=len(totrain)
for i in range(60,length_of_totrain):
x_totrain.append(scale_data[i-60:i,0])
y_totrain.append(scale_data[i,0])
x_totrain, y_totrain = np.array(x_totrain), np.array(y_totrain)
x_totrain = np.reshape(x_totrain, (x_totrain.shape[0],x_totrain.shape[1],1))
#LSTM neural network
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(x_totrain.shape[1],1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))
lstm_model.compile(loss='mean_squared_error', optimizer='adadelta')
lstm_model.fit(x_totrain, y_totrain, epochs=10, batch_size=1, verbose=2)
#predicting next data stock price
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
myinputs = myinputs.reshape(-1,1)
myinputs = scalerdata.transform(myinputs)
tostore_test_result = []
for i in range(60,myinputs.shape[0]):
tostore_test_result.append(myinputs[i-60:i,0])
tostore_test_result = np.array(tostore_test_result)
tostore_test_result = np.reshape(tostore_test_result,(tostore_test_result.shape[0],tostore_test_result.shape[1],1))
myclosing_priceresult = lstm_model.predict(tostore_test_result)
myclosing_priceresult = scalerdata.inverse_transform(myclosing_priceresult)
totrain = df_train
tovalid = df_test
#predicting next data stock price
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
# Printing the next day’s predicted stock price.
print(len(tostore_test_result));
print(myclosing_priceresult);
Final result:
1
[[1396.532]]

Simple regression with Keras seems not working properly

I am trying, just for practising with Keras, to train a network to learn a very easy function.
The input of the network is 2Dimensional . The output is one dimensional.
The function can indeed represented with an image, and the same is for the approximate function.
At the moment I'm not looking for any good generalization, I just want that the network is at least good in representing the training set.
Here I place my code:
import matplotlib.pyplot as plt
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import random as rnd
import math
m = [
[1,1,1,1,0,0,0,0,1,1],
[1,1,0,0,0,0,0,0,1,1],
[1,0,0,0,1,1,0,1,0,0],
[1,0,0,1,0,0,0,0,0,0],
[0,0,0,0,1,1,0,0,0,0],
[0,0,0,0,1,1,0,0,0,0],
[0,0,0,0,0,0,1,0,0,1],
[0,0,1,0,1,1,0,0,0,1],
[1,1,0,0,0,0,0,0,1,1],
[1,1,0,0,0,0,1,1,1,1]] #A representation of the function that I would like to approximize
matrix = np.matrix(m)
evaluation = np.zeros((100,100))
x_train = np.zeros((10000,2))
y_train = np.zeros((10000,1))
for x in range(0,100):
for y in range(0,100):
x_train[x+100*y,0] = x/100. #I normilize the input of the function, between [0,1)
x_train[x+100*y,1] = y/100.
y_train[x+100*y,0] = matrix[int(x/10),int(y/10)] +0.0
#Here I show graphically what I would like to have
plt.matshow(matrix, interpolation='nearest', cmap=plt.cm.ocean, extent=(0,1,0,1))
#Here I built the model
model = Sequential()
model.add(Dense(20, input_dim=2, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(1, init='uniform'))
model.add(Activation('sigmoid'))
#Here I train it
sgd = SGD(lr=0.5)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.fit(x_train, y_train,
nb_epoch=100,
batch_size=100,
show_accuracy=True)
#Here (I'm not sure), I'm using the network over the given example
x = model.predict(x_train,batch_size=1)
#Here I show the approximated function
print x
print x_train
for i in range(0, 10000):
evaluation[int(x_train[i,0]*100),int(x_train[i,1]*100)] = x[i]
plt.matshow(evaluation, interpolation='nearest', cmap=plt.cm.ocean, extent=(0,1,0,1))
plt.colorbar()
plt.show()
As you can see, the two function are completely different, and I can't understand why.
I think that maybe model.predict doesn't work as I axpect.
Your understanding is correct; it's just a question of hyperparameter tuning.
I just tried your code, and it looks like you're not giving your training enough time:
Look at the loss, under 100 epochs, it's stuck at around 0.23. But try using the 'adam' otimizer instead of SGD, and increase the number of epochs up to 10,000: the loss now decreases down to 0.09 and your picture looks much better.
If it's still not precise enough for you, you may also want to try increasing the number of parameters: just add a few layers; this will make overfitting much easier ! :-)
I have changed just your network structure and added a training dataset. The loss decreases down to 0.01.
# -*- coding: utf-8 -*-
"""
Created on Thu Mar 16 15:26:52 2017
#author: Administrator
"""
import matplotlib.pyplot as plt
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import random as rnd
import math
from keras.optimizers import Adam,SGD
m = [
[1,1,1,1,0,0,0,0,1,1],
[1,1,0,0,0,0,0,0,1,1],
[1,0,0,0,1,1,0,1,0,0],
[1,0,0,1,0,0,0,0,0,0],
[0,0,0,0,1,1,0,0,0,0],
[0,0,0,0,1,1,0,0,0,0],
[0,0,0,0,0,0,1,0,0,1],
[0,0,1,0,1,1,0,0,0,1],
[1,1,0,0,0,0,0,0,1,1],
[1,1,0,0,0,0,1,1,1,1]] #A representation of the function that I would like to approximize
matrix = np.matrix(m)
evaluation = np.zeros((1000,1000))
x_train = np.zeros((1000000,2))
y_train = np.zeros((1000000,1))
for x in range(0,1000):
for y in range(0,1000):
x_train[x+1000*y,0] = x/1000. #I normilize the input of the function, between [0,1)
x_train[x+1000*y,1] = y/1000.
y_train[x+1000*y,0] = matrix[int(x/100),int(y/100)] +0.0
#Here I show graphically what I would like to have
plt.matshow(matrix, interpolation='nearest', cmap=plt.cm.ocean, extent=(0,1,0,1))
#Here I built the model
model = Sequential()
model.add(Dense(50, input_dim=2, init='uniform'))## init是关键字,’uniform’表示用均匀分布去初始化权重
model.add(Activation('tanh'))
model.add(Dense(20, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(1, init='uniform'))
model.add(Activation('sigmoid'))
#Here I train it
#sgd = SGD(lr=0.01)
adam = Adam(lr = 0.01)
model.compile(loss='mean_squared_error', optimizer=adam)
model.fit(x_train, y_train,
nb_epoch=100,
batch_size=100,
show_accuracy=True)
#Here (I'm not sure), I'm using the network over the given example
x = model.predict(x_train,batch_size=1)
#Here I show the approximated function
print (x)
print (x_train)
for i in range(0, 1000000):
evaluation[int(x_train[i,0]*1000),int(x_train[i,1]*1000)] = x[i]
plt.matshow(evaluation, interpolation='nearest', cmap=plt.cm.ocean, extent=(0,1,0,1))
plt.colorbar()
plt.show()

Plotting candlestick data from a dataframe in Python

I would like create a daily candlestick plot from data i downloaded from yahoo using pandas. I'm having trouble figuring out how to use the candlestick matplotlib function in this context.
Here is the code:
#The following example, downloads stock data from Yahoo and plots it.
from pandas.io.data import get_data_yahoo
import matplotlib.pyplot as plt
from matplotlib.pyplot import subplots, draw
from matplotlib.finance import candlestick
symbol = "GOOG"
data = get_data_yahoo(symbol, start = '2013-9-01', end = '2013-10-23')[['Open','Close','High','Low','Volume']]
ax = subplots()
candlestick(ax,data['Open'],data['High'],data['Low'],data['Close'])
Thanks
Andrew.
Using bokeh:
import io
from math import pi
import pandas as pd
from bokeh.plotting import figure, show, output_file
df = pd.read_csv(
io.BytesIO(
b'''Date,Open,High,Low,Close
2016-06-01,69.6,70.2,69.44,69.76
2016-06-02,70.0,70.15,69.45,69.54
2016-06-03,69.51,70.48,68.62,68.91
2016-06-04,69.51,70.48,68.62,68.91
2016-06-05,69.51,70.48,68.62,68.91
2016-06-06,70.49,71.44,69.84,70.11
2016-06-07,70.11,70.11,68.0,68.35'''
)
)
df["Date"] = pd.to_datetime(df["Date"])
inc = df.Close > df.Open
dec = df.Open > df.Close
w = 12*60*60*1000
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, title
= "Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3
p.segment(df.Date, df.High, df.Date, df.Low, color="black")
p.vbar(df.Date[inc], w, df.Open[inc], df.Close[inc], fill_color="#D5E1DD", line_color="black")
p.vbar(df.Date[dec], w, df.Open[dec], df.Close[dec], fill_color="#F2583E", line_color="black")
output_file("candlestick.html", title="candlestick.py example")
show(p)
Code above forked from here:
http://docs.bokeh.org/en/latest/docs/gallery/candlestick.html
I have no reputation to comment #randall-goodwin answer, but for pandas 0.16.2 line:
# convert the datetime64 column in the dataframe to 'float days'
data.Date = mdates.date2num(data.Date)
must be:
data.Date = mdates.date2num(data.Date.dt.to_pydatetime())
because matplotlib does not support the numpy datetime64 dtype
I stumbled across a great pastebin entry: http://pastebin.com/ne7Fjdiq that does this well. I too was having trouble getting the calling syntax right. It usually revolves around transforming your data in simple ways to get the function to work right. My issue was with the datetime. There must be something in my format data. Once I replaced the Date series with range(maxdata) then it worked.
data = pandas.read_csv('data.csv', parse_dates={'Timestamp': ['Date', 'Time']}, index_col='Timestamp')
ticks = data.ix[:, ['Price', 'Volume']]
bars = ticks.Price.resample('1min', how='ohlc')
barsa = bars.fillna(method='ffill')
fig = plt.figure()
fig.subplots_adjust(bottom=0.1)
ax = fig.add_subplot(111)
plt.title("Candlestick chart")
volume = ticks.Volume.resample('1min', how='sum')
value = ticks.prod(axis=1).resample('1min', how='sum')
vwap = value / volume
Date = range(len(barsa))
#Date = matplotlib.dates.date2num(barsa.index)#
DOCHLV = zip(Date , barsa.open, barsa.close, barsa.high, barsa.low, volume)
matplotlib.finance.candlestick(ax, DOCHLV, width=0.6, colorup='g', colordown='r', alpha=1.0)
plt.show()
Here is the solution:
from pandas.io.data import get_data_yahoo
import matplotlib.pyplot as plt
from matplotlib import dates as mdates
from matplotlib import ticker as mticker
from matplotlib.finance import candlestick_ohlc
import datetime as dt
symbol = "GOOG"
data = get_data_yahoo(symbol, start = '2014-9-01', end = '2015-10-23')
data.reset_index(inplace=True)
data['Date']=mdates.date2num(data['Date'].astype(dt.date))
fig = plt.figure()
ax1 = plt.subplot2grid((1,1),(0,0))
plt.ylabel('Price')
ax1.xaxis.set_major_locator(mticker.MaxNLocator(6))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
candlestick_ohlc(ax1,data.values,width=0.2)
Found this question when I too was looking how to use candlestick with a pandas dataframe returned from one of the DataReader services like get_data_yahoo. I eventually figured it out. One of the keys was this other question, answered by Wes McKinney and RJRyV. Here is that link:
Pandas convert dataframe to array of tuples
The key was to read the candlestick.py function definition to determine how it expected to receive the data. The date needed to be converted first, then the entire dataframe needed to be converted to an array of tuples.
Here is the final code that worked for me. Maybe there is some other Candlestick chart out there somewhere that works directly on a pandas dataframe returned from one of the stock quote services. That would be very nice.
# Imports
from pandas.io.data import get_data_yahoo
from datetime import datetime, timedelta
import matplotlib.dates as mdates
from matplotlib.pyplot import subplots, draw
from matplotlib.finance import candlestick
import matplotlib.pyplot as plt
# get the data on a symbol (gets last 1 year)
symbol = "TSLA"
data = get_data_yahoo(symbol, datetime.now() - timedelta(days=365))
# drop the date index from the dateframe
data.reset_index(inplace = True)
# convert the datetime64 column in the dataframe to 'float days'
data.Date = mdates.date2num(data.Date)
# make an array of tuples in the specific order needed
dataAr = [tuple(x) for x in data[['Date', 'Open', 'Close', 'High', 'Low']].to_records(index=False)]
# construct and show the plot
fig = plt.figure()
ax1 = plt.subplot(1,1,1)
candlestick(ax1, dataAr)
plt.show()