Legend does not work for scatter plot in Octave - octave

I am trying to draw a scatter plot in Octave 5.1.0 and the legend sometimes does not work.
I import data from external files, scatter some part of data and add a legend. Only a horizontal line is displayed instead of a full legend box. I don't understand why, since I created a similar plot several weeks ago with a different data set, and it worked correctly.
Also it works with fltk, but does not work with gnuplot. However, I need exactly gnuplot to use Russian symbols.
clf
graphics_toolkit ("gnuplot")
set (0, "defaultaxesfontname", "Arial")
load cryo.dat
load hy2a.dat
load sent.dat
load saral.dat
load ja2.dat
load ja3.dat
subplot(3,1,1)
hold on
scatter(cryo(:,1),cryo(:,2),40,[0.6 0.6 0.6],'s','filled')
legend("CRYOSAT2","location","northeast")
Several first strings of cryo.dat file:
57754.985 0.82
57755.999 0.96
57756.999 0.93
57757.999 1.04
57758.999 0.83
57759.999 0.97
57760.999 0.9
57761.999 0.93
57762.999 0.93
57763.999 0.96
57764.999 0.94
57765.999 0.95
57766.999 0.94
57767.999 0.86
57768.999 0.92
57769.999 0.97
57770.999 0.97
57771.999 0.98
57772.999 0.88
57773.999 0.84
57774.999 0.92
57775.999 0.85
57776.999 0.9
I am also able to reproduce it with rand function:
test(:,1) = rand(100,1)
test(:,2) = rand(100,1)
subplot(3,1,1)
hold on
scatter(test(:,1),test(:,2),40,[0.6 0.6 0.6],'s','filled')
legend('test','location','northeastoutside')
grid on

Related

Extract CSV from plotly plot

I have a .html file which is a plot made with Plotly. Is there an easy/already implemented way of creating a CSV with the data from this plot?
For example consider this plot (Python):
import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="petal_length", facet_col='species')
fig.write_html('plot.html',include_plotlyjs='cdn')
where df looks like this
sepal_length sepal_width petal_length petal_width species species_id
0 5.1 3.5 1.4 0.2 setosa 1
1 4.9 3.0 1.4 0.2 setosa 1
2 4.7 3.2 1.3 0.2 setosa 1
3 4.6 3.1 1.5 0.2 setosa 1
4 5.0 3.6 1.4 0.2 setosa 1
.. ... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 virginica 3
146 6.3 2.5 5.0 1.9 virginica 3
147 6.5 3.0 5.2 2.0 virginica 3
148 6.2 3.4 5.4 2.3 virginica 3
149 5.9 3.0 5.1 1.8 virginica 3
[150 rows x 6 columns]
and plot.html contains this plot:
If you open the .html file with a text editor you will find all the data shown in the plot spread out in some kind of dictionary or so, probably JavaScript? How can one recover a CSV as close as possible to df from this?
Not looking for a Python exclusive answer, it can be anything. However Python is preferred.
Datasets used by Plotly are available on their dedicated repository : plotly/datasets.
The formats used are mainly CSV, JSON and GeoJSON. It happens that Iris data originally are in CSV format (iris.csv), so if you need the whole set you can grab it from there.
Otherwise you can always use df.to_csv().

Format requirements for reading csv files into q/kdb+

(I'm using 32 bit KDB+ 3.3 on OS X.)
If I copy and paste the iris dataset into Excel and save it as "MS-DOS Comma Separated (.csv)" and read it into kdb+, I get this:
q)("FFFFS";enlist ",")0:`iris.csv
5.1al Length Sepal Width Petal Length Petal Width Species
-------------------------------------------------------------
If I save it as "Windows Comma Separated (.csv)", I get this:
q)("FFFFS";enlist ",")0:`iris.csv
Sepal Length Sepal Width Petal Length Petal Width Species
---------------------------------------------------------
5.1 3.5 1.4 0.2 setosa
4.9 3 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
..
Obviously saving as a Windows csv is what I need to do, and this answer explains the differences, but why does this matter for kdb+? And is there an option I can add to the code to read in MS-DOS csv files?
I'm running on windows rather than OSX so I can only reproduce the opposite problem but it'll be the same either way.
Use "read0" to see the difference. In my case:
q)read0 `:macintosh.csv
"col1,col2\ra,1\rb,2\rc,3"
q)read0 `:msdos.csv
"col1,col2"
"a,1"
"b,2"
"c,3"
In order to use 0: to parse the file as a table, kdb is expecting multiple strings (as in my msdos file) rather than that single string where the newlines weren't recognised.
So I get:
q)("SI";enlist ",")0:`:msdos.csv
col1 col2
---------
a 1
b 2
c 3
q)("SI";enlist ",")0:`:macintosh.csv
aol1 col2
-----------
You could put something in your code to recognise the situation and handle it accordingly but it would be slower and less efficient:
q)("SI";enlist ",")0:{$[1=count x;"\r" vs first x;x]}read0 `:msdos.csv
col1 col2
---------
a 1
b 2
c 3
q)("SI";enlist ",")0:{$[1=count x;"\r" vs first x;x]}read0 `:macintosh.csv
col1 col2
---------
a 1
b 2
c 3
Works either way

Caffe training iteration loss is -nan

I'm trying to implement FCN-8s using my own custom data. While training, from scratch, I see that my loss = -nan.
Could someone suggest what's going wrong and how can I correct this? My solver.prototxt is as follows:-
The train_val.prototxt is the same as given in the link above. My custom images are of size 3x512x640 and labels of 1x512x640.There are 11 different types of labels.
net: "/home/ubuntu/CNN/train_val.prototxt"
test_iter: 13
test_interval: 500
display: 20
average_loss: 20
lr_policy: "fixed"
base_lr: 1e-4
momentum: 0.99
iter_size: 1
max_iter: 3000
weight_decay: 0.0005
snapshot: 200
snapshot_prefix: "train"
test_initialization: false

Multiple regression with lagged time series using libsvm

I'm trying to develop a forecaster for electric consumption. So I want to perform a regression using daily data for an entire year. My dataset has several features. Googling I've found that my problem is a Multiple regression problem (Correct me please if I am mistaken).
What I want to do is train a svm for regression with several independent variables and one dependent variable with n lagged days. Here's a sample of my independent variables, I actually have around 10. (We used PCA to determine which variables had some correlation to our problem)
Day Indep1 Indep2 Indep3
1 1.53 2.33 3.81
2 1.71 2.36 3.76
3 1.83 2.81 3.64
... ... ... ...
363 1.5 2.65 3.25
364 1.46 2.46 3.27
365 1.61 2.72 3.13
And the independendant variable 1 is actually my dependant variable in the future. So for example, with a p=2 (lagged days) I would expect my svm to train with the first 2 time series of all three independant variables.
Indep1 Indep2 Indep3
1.53 2.33 3.81
1.71 2.36 3.76
And the output value of the dependent variable would be "1.83" (Indep variable 1 on time 3).
My main problem is that I don't know how to train properly. What I was doing is just putting all features-p in an array for my "x" variables and for my "y" variables I'm just putting my independent variable on p+1 in case I want to predict next day's power consumption.
Example of training.
x with p = 2 and 3 independent variables y for next day
[1.53, 2.33, 3.81, 1.71, 2.36, 3.76] [1.83]
I tried with x being a two dimensional array but when you combine it for several days it becomes a 3d array and libsvm says it can't be.
Perhaps I should change from libsvm to another tool or maybe it's just that I'm training incorrectly.
Thanks for your help,
Aldo.
Let me answer with the python / numpy notation.
Assume the original time series data matrix with columns (Indep1, Indep2, Indep3, ...) is a numpy array data with shape (n_samples, n_variables). Let's generate it randomly for this example:
>>> import numpy as np
>>> n_samples = 100, n_variables = 5
>>> data = np.random.randn(n_samples, n_variables)
>>> data.shape
(100, 5)
If you want to use a window size of 2 time-steps, then the training set can be built as follows:
>>> targets = data[2:, 0] # shape is (n_samples - 2,)
>>> targets.shape
(98,)
>>> features = np.hstack([data[0:-2, :], data[1:-1, :]]) # shape is (n_samples - 2, n_variables * 2)
>>> features.shape
(98, 10)
Now you have your 2D input array + 1D targes that you can feed to libsvm or scikit-learn.
Edit: it might very well be the case that extracting more time-series oriented features such as moving average, moving min, moving max, moving differences (time based derivatives of the signal) or STFT might help your SVM mode make better predictions.

Inverse radix4 FFT

I have a radix4 FFT that works in forward direction. How different is the inverse fft from froward? I think the only difference is twiddle factors. My code is a modified version of
Source. Can some one enlighten me on this. Thanks.
My output
50 688
-26 -6
-10 -16
6.0 -26
Expected output
50 688
6 -26
-10 -16
-26 -6
Google search "how to compute inverse FFT". Top result:
http://www.adamsiembida.com/node/23
The equation:
IFFT(X) = 1/N * conj(FFT(conj(X)))
conj() means "complex conjugate", which basically just means multiplying all the complex values by -1.
http://en.wikipedia.org/wiki/Complex_conjugate