Getting musical notations using aubio library [closed]

Getting musical notations using aubio library [closed] - fft

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Well, this is my first question on stackoverflow so kinda excited about it :)
Here it is: My input is a wave file. For now, I have recorded a piece using my guitar. So the wave file contains this instrumental recording. What I want to do is, get the musical notation(A,B,C and so on) of each note that is being played. I have heard about techniques like the FFT but considering my poor knowledge of how to use FFT, I thought of using the aubio library.
So aubio provides:
aubiopitch which extracts pitch candidates and
aubiocut which extracts onsets.
Where I am stuck is: How do I get the frequency at the particular time of the note played using aubio?
According to me, aubiopitch and aubiocut would help but I dont understand how to do the mapping between them. Any help would be greatly appreciated :)
Hi piem: Thanks for your answer. Could you please analyse this output?
aubiopitch -i Reverse_Open.wav
1.408 68.9486465454
1.536 81.7372512817
1.664 164.290893555
1.792 164.464691162
1.92 82.6862487793
2.048 328.539306641
2.176 218.885116577
2.304 219.06237793
2.432 219.042160034
2.56 219.133621216
2.688 145.751785278
2.816 146.437744141
2.944 146.199829102
3.072 195.059829712
3.2 194.912689209
3.328 195.724975586
3.456 195.517547607
3.584 247.317428589
3.712 246.764221191
3.84 246.857452393
3.968 145.454727173
4.096 328.569610596
4.224 329.625823975
4.352 329.16619873
4.48 328.906402588
4.608 328.96786499
4.736 329.187835693
4.864 145.741394043
My notes with frequencies are: E(82 approx),A(110),D(147),G(197),B(247),E(329.2)
which are played at 1.344,1.888,2.4,2.88,3.36,3.872 resp(according to aubiopitch which I suppose is correct).
Any idea how do I extract these 6 notes and their times from the above output?

aubiopitch outputs a list of tuple. Each tuple contains two floats:
a timestamp in seconds
a fundamental frequency in Hertz
Here is an example on a guitar sound:
$ aubiopitch -i guitar_Cold_Blood_-_Baby_I_Love_You.wav | head
0.000000 0.000000
0.005805 293.884338
0.011610 386.387207
0.017415 0.000000
0.023220 551.689758
0.029025 3608.569336
0.034830 3588.231201
0.040635 416.824066
0.046440 3606.715576
0.052245 417.116425
if you are curious (please be), you can get the latest git version and try the demo script demo_pitch.py:
$ ./python/demos/demo_pitch.py bass_Don_Ellis_-_Conquistador.wav
you would get the following plot:
The first row represents the waveform.
The second row, the extracted pitch track, in midi frequency.
The third, the confidence of these pitch candidates (using the yinfft algorithm).
In this sample of a bass line, extracting the pitch during the transient attacks is more challenging than in the steady state. Pitch candidates that are found below an arbitrary threshold (here 0.8) can be discarded (dashed green line), while others can be kept (solid blue line).

Related

Is it possible to use the prediction of a CNN for images not belonging to the model classes?

I am using photos of faces of individuals of 10, 12, 14, 16, 18, and 20 years old. The question I am trying to answer is: How does the face "evolves"? Will it remain +/- similar until a certain threshold where it will suddenly change?
To answer this question, I trained a CNN on 2 classes, the 10 year old photos (labelled as "0") and the 20 year old photos (labelled as "1").
I used this model to predict the categories that do not belong to the model's classes (the 12 to 18 years old), to compute the average prediction for each age group. The result is shown in the figure, where each value is respectively the mean prediction for 12, 14, 16, 18 years old. Mean prediction for 12,14,16,18 years old
My question here is : Does it make sense to use this model to predict other age groups and say for example "The 12 years old have a mean prediction of 0.2, which means they are more similar to the 10 years old faces than the 20 years old faces" ?
As the values are increasing with age, can I say that the faces are getting more similar to 20 years old faces ? And are there any references of articles using a model to predict images belonging to none of the model's classes ?
Thank you !

This is quite an interesting question. Let me answer this in two ways.
Q1: Can CNN predict image whose class is not included in the training set?
A1: Yes. However, instead of training the model and comparing the output probabilities, we use a method called “few-shot learning”, or “zero-shot learning” to figure this out. The basic idea is: first, we train the model so that it recognizes the high-level features underlying the data (e.g., the edges or the shape of eyes in your example). Then we implement the model to a new dataset, relying on its generalizability. This research is also closely related to transfer learning.
As a starting point, here is a good paper.
Generalizing from a Few Examples: A Survey on Few-shot Learning
Q2: If the probability is higher, can we say the faces are more similar to 20 year old faces?
A2: The short answer is, YES. The reason is that we only have two classes in your training data – if the value of probability is higher, it means the model has more confidence that the picture belongs to the class of 1 (i.e., 20 year old photos). But we cannot make sure how the model makes this prediction. You may want to visualize outputs of intermediate layers to see which feature the model finds. You can have a check at this blog.
Understanding CNN (Convolutional Neural Network)

How to encode poker cards?

I am currently working on a poker AI and I am stuck on this question: What is the best way to encode poker cards for my AI? I am using deep reinforcement learning techniques and I just don't know how to anwser my question.
The card information is stored as a string. For example: "3H" would be "three of hearts". I thought about ranking the cards and then attaching values to them such that a high-rated card like AH ("Ace of hearts") would get a high number like 52 or something like that. The problem with this approach is that it doesn't take the suits into acccount.
I have seen some methods where they just assign a number to each and every card such that at the end there are 52 numbers from 0-51 (https://www.codewars.com/kata/52ebe4608567ade7d700044a/javascript). The problem I see with that is that my neural net wouldn't or at least have difficulties getting the connection between similar cards like Aces ('cause as in the link above one Ace is labeled with a 0 the other one with 13 etc.).
Can someone please help me with this question such that the encodings take care of the: suits, values, ranks, etc and my NN would be able to get the connections between similar cards.
Thanks in andvance

Does any programming language support the expressions like "12.Pounds.ToKilograms()"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Converting units and measurements from one system to another can be achieved in most programming languages in one or another way. But, Can we express something like "12.Pounds.ToKilograms()" in any programming language?

Not exactly in that syntax but you may want to take a look at Frink: https://frinklang.org
Frink syntax is similar to Google Calculator or Wolfram Alpha but not exactly the same. Whereas Google and Wolfram Alpha uses the in keyword to trigger unit conversion Frink uses the -> operator. So in frink, the following is valid source code:
// Calculate length of Wifi antenna:
lightspeed / (2.4GHz) / 4 -> inches
As I mentioned, this syntax is similar to Google. For reference, the same calculation in Google syntax is speed of light / 2.4GHz / 4 in inches. Frink predates both Google calculator and Wolfram Alpha. I was first aware of frink sometime in the early 2000s.
Frink is unit aware. A number in frink always has unit even if that unit is simply "scalar" (no units). So to declare a variable that is 12 pounds you'd do:
var x = 12 pounds
To convert you'd do:
x -> kg
Or you can simply write the expression:
12 pounds -> kg

In Smalltalk you could express this as
12 pounds inKilograms
Notice however that it is up to you to program both messages pounds and inKilograms (there are libraries that do that kind of things also). But the key point is that the expression above is perfectly valid in Smalltalk (even if these messages do not exist).

I can't say I've ever seen this as a valid expression. 12 would have to be defined as a type. Let's assume 12 is an integer. An integer type only knows that it is an integer and in most languages there are built in functions/ methods for integers. For this to be a valid expression you would need to define a weight type and within that object you could define methods of conversion or inherit child types with conversion methods.

You could do it in Ruby because of it's syntactic sugar.
Actually there's a ruby gem - alchemist - you should check it out.
For example you could write:
10.miles.to.meters

What is out of bag error in Random Forests? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is out of bag error in Random Forests?
Is it the optimal parameter for finding the right number of trees in a Random Forest?

I will take an attempt to explain:
Suppose our training data set is represented by T and suppose data set has M features (or attributes or variables).
T = {(X1,y1), (X2,y2), ... (Xn, yn)}
and
Xi is input vector {xi1, xi2, ... xiM}
yi is the label (or output or class).
summary of RF:
Random Forests algorithm is a classifier based on primarily two methods -
Bagging
Random subspace method.
Suppose we decide to have S number of trees in our forest then we first create S datasets of "same size as original" created from random resampling of data in T with-replacement (n times for each dataset). This will result in {T1, T2, ... TS} datasets. Each of these is called a bootstrap dataset. Due to "with-replacement" every dataset Ti can have duplicate data records and Ti can be missing several data records from original datasets. This is called Bootstrapping. (en.wikipedia.org/wiki/Bootstrapping_(statistics))
Bagging is the process of taking bootstraps & then aggregating the models learned on each bootstrap.
Now, RF creates S trees and uses m (=sqrt(M) or =floor(lnM+1)) random subfeatures out of M possible features to create any tree. This is called random subspace method.
So for each Ti bootstrap dataset you create a tree Ki. If you want to classify some input data D = {x1, x2, ..., xM} you let it pass through each tree and produce S outputs (one for each tree) which can be denoted by Y = {y1, y2, ..., ys}. Final prediction is a majority vote on this set.
Out-of-bag error:
After creating the classifiers (S trees), for each (Xi,yi) in the original training set i.e. T, select all Tk which does not include (Xi,yi). This subset, pay attention, is a set of boostrap datasets which does not contain a particular record from the original dataset. This set is called out-of-bag examples. There are n such subsets (one for each data record in original dataset T). OOB classifier is the aggregation of votes ONLY over Tk such that it does not contain (xi,yi).
Out-of-bag estimate for the generalization error is the error rate of the out-of-bag classifier on the training set (compare it with known yi's).
Why is it important?
The study of error estimates for bagged classifiers in Breiman
[1996b], gives empirical evidence to show that the out-of-bag estimate
is as accurate as using a test set of the same size as the training
set. Therefore, using the out-of-bag error estimate removes the need
for a set aside test set.1
(Thanks #Rudolf for corrections. His comments below.)

In Breiman's original implementation of the random forest algorithm, each tree is trained on about 2/3 of the total training data. As the forest is built, each tree can thus be tested (similar to leave one out cross validation) on the samples not used in building that tree. This is the out of bag error estimate - an internal error estimate of a random forest as it is being constructed.

What is the origin of magic number 42, indispensable in coding? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Update:
Surprised that it is being so heavily downvoted...
The question is coding-related and before asking this question I have googled for "42" in combination with:
site:msdn.micrsoft.com
"code example"
"c#"
"magic number"
And I am not an expert/fan of Western culture/literature.
Also found, Why are variables “i” and “j” used for counters? [duplicate] which was not closed but even protected.
I feel that everybody knows it, except me...
What is the origin of ubiquitous magic digit 42 used all over the code samples and samples?
How have you come using 42? because I have not ever come or ever used 42
After some search, I found MSDN doc on it: Magic Numbers: Integers:
"Aside from a book/movie reference, developers often use this as an arbitrary value"
Well, this did not explain me anything.
Which movies and books have I missed for all those years of being involved in development, coding and programming and around-IT related activities like rwquirements analysis, system administration, etc??
Some references to some texts using code snippets with 42 (just C#-related):
Jérôme Laban. C# Async Tips and Tricks, Part 3: Tasks and the Synchronization Context
var t = Task.Delay(TimeSpan.FromSeconds(1))
.ContinueWith
(
_ => Task.Delay(TimeSpan.FromSeconds(42))
);
MSDN Asynchronous Agents Library
send(_target, 42);
Quickstart: Calling asynchronous APIs in C# or Visual Basic
Office.context.document.setSelectedDataAsync(
"<html><body>hello world</body></html>",
{coercionType: "html", asyncContext: 42},
function(asyncResult) {
write(asyncResult.status + " " + asyncResult.asyncContext);
Asynchronous Programming in C++ Using PPL
task<int> myTask = someOtherTask.then([]() { return 42; });
Boxing and Unboxing (C# Programming Guide)
Console.WriteLine(String.Concat("Answer", 42, true));
How To: Override the ToString Method (C# Programming Guide)
int x = 42;
Trace Listeners
// Use this example when debugging.
System.Diagnostics.Debug.WriteLine("Error in Widget 42");
// Use this example when tracing.
System.Diagnostics.Trace.WriteLine("Error in Widget 42");
|| Operator (C# Reference
// The following line displays True, because 42 is evenly
// divisible by 7.
Console.WriteLine("Divisible returns {0}.", Divisible(42, 7));
// The following line displays False, because 42 is not evenly
// divisible by 5.
Console.WriteLine("Divisible returns {0}.", Divisible(42, 5));
// The following line displays False when method Divisible
// uses ||, because you cannot divide by 0.
// If method Divisible uses | instead of ||, this line
// causes an exception.
Console.WriteLine("Divisible returns {0}.", Divisible(42, 0));
WIKIPedia C Sharp (programming language)
int foo = 42; // Value type.

It's from The Hitch Hiker's Guide to the Galaxy.
In The Hitchhiker's Guide to the Galaxy (published in 1979), the
characters visit the legendary planet Magrathea, home to the
now-collapsed planet-building industry, and meet Slartibartfast, a
planetary coastline designer who was responsible for the fjords of
Norway. Through archival recordings, he relates the story of a race of
hyper-intelligent pan-dimensional beings who built a computer named
Deep Thought to calculate the Answer to the Ultimate Question of Life,
the Universe, and Everything. When the answer was revealed to be 42,
Deep Thought explained that the answer was incomprehensible because
the beings didn't know what they were asking. It went on to predict
that another computer, more powerful than itself would be made and
designed by it to calculate the question for the answer. (Later on,
referencing this, Adams would create the 42 Puzzle, a puzzle which
could be approached in multiple ways, all yielding the answer 42.)

The answer is, as people already have stated, The Hitchhiker's Guide to the Galaxy.
I made a little experiment and put a couple of numbers in the search field, and these are the results:
It seems like 42 beats its neighbors clearly, but it can't touch regular numbers like 40, 45 and 50, no matter how magical it is.
It would be interesting to do the same search in source code only.

Dude!
It's the Answer to the Ultimate Question of Life, the Universe, and Everything! As computed by Deep Thought supercomputer, which took 7.5 million years!
http://en.wikipedia.org/wiki/The_answer_to_life_the_universe_and_everything#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe_and_Everything_.2842.29

Check this out. 42 is the ultimate answer to the ultimate question of life the universe and everything

This is from The Hitch hikers Guide to the Galaxy and is:
The Answer to the Ultimate Question of Life, the Universe, and Everything
WikiLink

Refer The Hitch Hiker's Guide to the Galaxy.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Getting musical notations using aubio library [closed] - fft

Related

Is it possible to use the prediction of a CNN for images not belonging to the model classes?

How to encode poker cards?

Does any programming language support the expressions like "12.Pounds.ToKilograms()"? [closed]

What is out of bag error in Random Forests? [closed]

What is the origin of magic number 42, indispensable in coding? [closed]

Categories

Resources