another question about spectrum calculation in Octave using FFT:
With Audacity I created a .wav file containing white noise, then filtered with a lowpass with fcut = 1kHz, -20dB/decade. The spectrum analysed by Audacity looks as expected (unfortunately I can't post an image since I don't have enough reputation)
Then I analysed the spectrum with Octave using the following code:
pkg load signal %load signal package
graphics_toolkit ("gnuplot") % use gnuplots
close all
clear all
[stereosig, fs,bit] =wavread('noise_stereo.wav'); % load wav
ch1 = stereosig(:,1); % left
ch2 = stereosig(:,2); % right
NFFT = 1024;
NFFT_half = NFFT/2;
X = fft(ch1, NFFT);
freqvect = (1:NFFT_half+1)./NFFT*fs;
figure(1)
X_mag_scaled = abs(X(1:NFFT_half+1)./NFFT_half);
semilogx(freqvect, 20*log10(X_mag_scaled));
grid on
xlabel('Frequency (Hz)')
ylabel('Mag [dB]')
The plot produced by Octave is extremly "unsmooth", the -20dB/decade can not be recognised properly since the difference between two adjacent points can be > 10dB or so. The 1kHz cut off frequency can only be guessed.
Does anybody have an idea how to get a "smooth" spectrum as it is done in Audacity (which I suppose to be a reasonable solution)?
Philippe
Related
PPO model doesn't iterate through the whole dataframe .. its basically repeating the first step many times (10,000 in this example) ?
In this case, DF's shape is (5476, 28) and each step's obs shape is: (60, 28).. I dont see that its iterating through the whole DF.
# df shape - (5476, 28)
env = MyRLEnv(df)
model = PPO("MlpPolicy", env, verbose=4)
model.learn(total_timesteps=10000)
MyRLEnv:
self.action_space = spaces.Discrete(4)
self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(60, 28) , dtype=np.float64)
Thanks!
I also got stuck few days ago on something similar to this , but after inspecting deeply I found that the learn method actually runs the environment for n times , now this n is equal to total_timesteps/size_of_df , this in your case would be nearly 10000/5476times which is almost equal to 1.8 , so this 1.8 means , the algorithm would reset the environment at the beginning, then run the step method for the entire dataframe and reset the environment again and run the step method for only 80% of the data in dataframe. So, when the PPO Algorithm stops you see only 80% of the dataframe is being ran .
The Actor Critic Algorithms runs the environment numerous number of times to improve it's efficiency, so that is the reason it is usually suggested that in order to get better results we should keep the value of total_timesteps fairly high , so that it can run it on the same data for quite some times to learn better.
Example:
Say my total_timesteps = 10000 and len(df) = 5000,
then in that case it would run for, n = total_timesteps/len(df) = 2 Full scans of the entire dataframe .
I would like to access the raw pixels in the OpenAI gym CartPole-v0 environment without opening a render window. How do I do this?
Example code:
import gym
env = gym.make("CartPole-v0")
env.reset()
img = env.render(mode='rgb_array', close=True) # Returns None
print(img)
img = env.render(mode='rgb_array', close=False)
# Opens annoying window, but gives me the array that I want
print(img.shape)
PS. I am having a hard time finding good documentation for OpenAI gym. Is it just me, or does it simply not exist?
Edit: I don't need to ever open the render video.
I was curious about same so I started looking into the source code and this is what I found.
Open AI uses pyglet for displaying the window and animations.
For showing the animation everything is drawn on to window and then rendered.
And then pyglet stores what is being displayed on to a buffer.
Dummy version of how code is written in open AI
import pyglet
from pyglet.gl import *
import numpy as np
display = pyglet.canvas.get_display()
screen = display.get_screens()
config = screen[0].get_best_config()
pyglet.window.Window(width=500, height=500, display=display, config=config)
# draw what ever you want
#get image from the buffer
buffer = pyglet.image.get_buffer_manager().get_color_buffer()
image_data=buffer.get_image_data()
arr = np.frombuffer(image_data.get_data(),dtype=np.uint8)
print(arr)
print(arr.shape)
output:
[0 0 0 ... 0 0 0]
(1000000,)
so basically every image we get is from buffer of what is being displayed on the window.
So if we don't draw anything on window we get no image so that window is required to get the image.
so you need to find a way such that windows is not displayed but its values are stored in buffer.
I know its not what you wanted but I hope it might lead you to a solution.
I've just gone through half of the gym source code line by line, and I can tell you that 1, the observation space of cartpole is digits to the ai, not pixels. eg, from their cartpole env py file...
Observation:
Type: Box(4)
Num Observation Min Max
0 Cart Position -2.4 2.4
1 Cart Velocity -Inf Inf
2 Pole Angle -0.209 rad (-12 deg) 0.209 rad (12 deg)
3 Pole Angular Velocity -Inf Inf
So, the pixels are for you at this point. And 2, if your goal is to teach the ai on pixels, you will need to render images from your data-in array, then pass them THROUGH the observation space as a pixel array, like Maunish Dave shows. OpenAI's Atari version does this.
If you want a better guide, don't read the OpenAI Docs, read the Stable Baseline docs here: https://stable-baselines.readthedocs.io/
Someone offers an answer here:
https://github.com/openai/gym/issues/374
"The atari and doom environments give pixels in their observations (ie, the return value from step). I don't think any other ones do.
render produces different results on different OSes, so they're not part of any official environment for benchmarking purposes. But if you want to create a new environment where the observation is in pixels, you could implement it by wrapping an existing environment and calling render."
I'm also working on getting raw pixels as well and I'm trying to find a way to see if what has been returned is what I expect it is.
The documentation can be found:
https://gym.openai.com/docs
And a forum for discussing OpenAI:
discuss.openai.com
Although its not very lively.
I have faced the similar problem:
This is how fixed it, in rendering.py file at /gym/envs/classic_control find the following line in the Viewer class:
self.window = pyglet.window.Window(width=width, height=height, display=display)
Change this line to:
self.window = pyglet.window.Window(width=width, height=height, display=display, visible=False)
Hope it helps!!
I have a project where I have to recognize the frequency from an audio file. For this I use a single tone of 10 kHz to see if I can get it working.
Since I am pretty new to Octave, I tried this example with my own audio file.
I tried to understand what happens by doing some research to all functions.
My question here is; if I let specgram plot the figure when I do not specify it's output:
specgram(y,fftn,Fs,hanning(window),step);
it gives a line at 10kHz which is what I want.
But if I specify the output for the specgram function
[S,f,t]= specgram(y,fftn,Fs,hanning(window),step);
and let it plot, it plots the line at 18 kHz.
I figured it have to be in the inputs for the figure and I tried modifying these a bit, but every time I do that Octave gives an error.
I need the frequency as an given output, since I have to do some calculations with it, I figured I need to specify the frequency output.
This is the part of the code that specify the plot for the spectrogram:
step= fix(5*Fs/1000); % stepsize of the window
window= fix(90*Fs/1000); % window size
fftn =2^nextpow2(window); % Size of the FFT block
[S,f,t]= specgram(y,fftn,Fs,hanning(window),step);
S= abs(S(2:fftn*12000/Fs,:)); % Normalize the phase
S= S/max(S(:)); % Normalize the Energy
S= max(S, 10^(-40/10)); % Throw out values below -40 dB and above -3dB
S= min(S, 10^(-3/10));
figure
imagesc(t,f,(log(S)));
Can anyone help me here how to gain the frequency data from the audio file so I can use it in some calculations?
I have searched for answers already in the Octave manual for help and I tried it with various matlab sites. Also checked already many posts here such as:
How does Octave spectrogram 'specgram' from signal work?
Methodology of FFT for Matlab spectrogram / short time Fourier transform functions
P.S. Sorry for my bad English, it's not my native language
I found the answer myself, it turns out it is in this line of code:
S= abs(S(2:fftn*12000/Fs,:));
if I delete this line, the lines are placed on the right frequency in the figure. To me it looks like this line just takes a small space of the fft and replaces it with other frequencies but I'm not shure about that.
I tried to train & cross validate a set of data with 8616 samples using epsilon SVR.
Among the datasets, I take 4368 for test, 4248 for CV.
Kernel type = RBF kernel. Libsvm provides a result as shown below.
optimization finished, #iter = 502363
nu = 0.689607
obj = -6383530527604706.000000, rho = 2884789.960212
nSV = 3023, nBSV = 3004
This is a result gotten by setting
-s 3 -t 2 -c 2^28 -g 2^-13 -p 2^12
(a) What does "nu" means? Sometimes I got nu = 0.99xx for different parameter.
(b) It seems that "obj" is surprisingly large. Does it sounds correct? Libsvm FAQ said this is "optimal objective value of the dual SVM problem". Does it means that this is the min value of f(alpha)?
(c) "rho" is large too. This is the bias term, b. The dataset labels (y) consist of value between 82672 to 286026. So I guess this is reasonable, am I right?
For training set,
Mean squared error = 1.26991e+008 (regression)
Squared correlation coefficient = 0.881112 (regression)
For cross-validation set,
Mean squared error = 1.38909e+008 (regression)
Squared correlation coefficient = 0.883144 (regression)
Using the selected param, I have produced the below result
kernel_type=2 (best c:2^28=2.68435e+008, g:2^-13=0.00012207, e:2^12=4096)
NRMS: 0.345139, best_gap:0.00199433
Mean Absolute Percent Error (MAPE): 5.39%
Mean Absolute Error (MAE): 8956.12 MWh
Daily Peak MAPE: 5.30%
The CV set MAPE is low (5.39%). Using Bias-Variance test, the difference between train set MAPE and CV set MAPE is only 0.00199433, which mean the param seems to be set correctly. But I wonder if the extremely large "obj", "rho" value is correct....
I am very new to SVR, do correct me if my interpretation or validation method is incorrect/insufficient.
Method to calculate MAPE
train_model = svmtrain(train_label, train_data, cmd);
[result_label, train_accuracy, train_dec_values] = svmpredict(train_label, train_data, train_model);
train_err = train_label-result_label;
train_errpct = abs(train_err)./train_label*100;
train_MAPE = mean(train_errpct(~isinf(train_errpct)));
The objective and rho values are high because (most probably) the data were not scaled. Scaling is highly recommended to avoid overflow; the overflow risk also depends on the type of kernel. Btw, when scaling the training data, do not forget to also scale the test data, which is most easily accomplished by scaling all data first, and then splitting them into a training and test set.
I understand the complex output of a DFT contains both "amplitude" and "phase" information at discrete frequencies.
Amplitude[n] = sqrt((r[n]*r[n]) + (i[n]*i[n]))
Phase[n] = (atan2(i[n],r[n]))
Frequency[n] = n * (sample_rate / (fft_input_length / 2))
It seems that I should be able to use the frequency, amplitude, and phase information to calculate the amplitude of each output bin as if the input at the corresponding frequency had a zero-phase alignment in the FFT input. But I am drawing a blank.
Hmm, digging deeper into my problem I discovered that the imaginary potion of the FFT output is always 0.0 regardless of the input. So I am guessing my code is flawed or the algorithm is not what I need.
If you want to rotate all DFT result bins to a phase of zero with reference to the start (sample 0): set r[n] = amplitude[n], i[n] = 0; make sure r[n] is symmetric over the full DFT length if you want strictly real data; and compute the IDFT if needed.