Fast Fourier Transform of Frequency Analysis in Labview - fft

I'm currently measuring the signal from 3 direction vibration sensor. I wan to convert my signal to a FFT form to determine the frequency analysis of it. Anyone got idea how I do it in Labview?

There is an FFT VI under Signal Processing >> Transforms on the Functions Palette that should do what you're asking. Probably not a bad place to start.

Check out Signal Processing >> Waveform Measurements as well. This has slightly higher level functions which will work out all of the magnitudes and frequencies for you as well. These do require the full development system though.

Somehow I wan to use the DC component from my arduino ADC and compute it into FFT. I tried using this configuration as shows here: http://i.stack.imgur.com/Yr5FP.png But the output of FFT i get doesnt seems much effect when sustained to huge voltage changes overtime in my ADC reading.
Anyone got better way to use FFT to compute the voltage signal from my ADC?

Related

Deep Learning for Acoustic Emission concrete fracture speciments: regression on-set time and classification of type of failure

How can I use deep learning for both regression and classification tasks?
I am facing a problem with acoustic emission on fracture with concrete speciment. The objective is to find automatically the on-set time instant (time at the beginning of the acoustic emission) and the slope with the peak value to determine the kind of fracture (mode I or mode II based on the raise angle RA).
I have tried Regional CNN to work with images of the signals Fine-tuning Faster-RCNN using pytorch, but unfortunately the results are not outstanding up to now.
I would like to work with sequences (time series) of amplitude data according to a certain sampling frequency, but they have different length each. How can I deal with this problem?
Can I make a 1D-CNN which makes a sort of anomaly detection based on the supervised point that I can mark manually on training examples?
I have a certain number of recordings which I would like to exploit to train the model sampled at 100Hz. In examples on anomaly detection like Timeseries anomaly detection using an Autoencoder, they use the same time series and they perform a window with sliding 1 time step in order to obtain about 3700 to train their neural network. Instead I have different number of recordings (time series) each of them with a certain on-set time instant and different global length in seconds. How can I manage it?
I actually need the time instant of the beginning of the signal and the maximum point to define the raise angle and classify the type of fracture. Can I make classification directly with CNN simultaneously with regression tasks of the on-set time instant?
Thank you in advance!
I finally solved, thanks to the fundamental suggestion by #JonNordby, using Sound Event Detection method. We adopted and readapted the code from GitHub YashNita.
I labelled the data according to the following image:
Then, I adopted the method for extracting features from computing the spectrogram of the input signals:
And finally we were able to get a more precise output recognition of the Seismic Event Detection which is directly connected to the Acoustic Emission Event detection, obtaining the following result:
For the moment, only the event recognition phase was done, but it would be simple to readapt also to conduct classification of mode I or mode II of cracking.

What is the use of task graphs in CUDA 10?

CUDA 10 added runtime API calls for putting streams (= queues) in "capture mode", so that instead of executing, they are returned in a "graph". These graphs can then be made to actually execute, or they can be cloned.
But what is the rationale behind this feature? Isn't it unlikely to execute the same "graph" twice? After all, even if you do run the "same code", at least the data is different, i.e. the parameters the kernels take likely change. Or - am I missing something?
PS - I skimmed this slide deck, but still didn't get it.
My experience with graphs is indeed that they are not so mutable. You can change the parameters with 'cudaGraphHostNodeSetParams', but in order for the change of parameters to take effect, I had to rebuild the graph executable with 'cudaGraphInstantiate'. This call takes so long that any gain of using graphs is lost (in my case). Setting the parameters only worked for me when I build the graph manually. When getting the graph through stream capture, I was not able to set the parameters of the nodes as you do not have the node pointers. You would think the call 'cudaGraphGetNodes' on a stream captured graph would return you the nodes. But the node pointer returned was NULL for me even though the 'numNodes' variable had the correct number. The documentation explicitly mentions this as a possibility but fails to explain why.
Task graphs are quite mutable.
There are API calls for changing/setting the parameters of task graph nodes of various kinds, so one can use a task graph as a template, so that instead of enqueueing the individual nodes before every execution, one changes the parameters of every node before every execution (and perhaps not all nodes actually need their parameters changed).
For example, See the documentation for cudaGraphHostNodeGetParams and cudaGraphHostNodeSetParams.
Another useful feature is the concurrent kernel executions. Under manual mode, one can add nodes in the graph with dependencies. It will explore the concurrency automatically using multiple streams. The feature itself is not new but make it automatic becomes useful for certain applications.
When training a deep learning model it happens often to re-run the same set of kernels in the same order but with updated data. Also, I would expect Cuda to do optimizations by knowing statically what will be the next kernels. We can imagine that Cuda can fetch more instructions or adapt its scheduling strategy when knowing the whole graph.
CUDA Graphs is trying to solve the problem that in the presence of too many small kernel invocations, you see quite some time spent on the CPU dispatching work for the GPU (overhead).
It allows you to trade resources (time, memory, etc.) to construct a graph of kernels that you can use a single invocation from the CPU instead of doing multiple invocations. If you don't have enough invocations, or your algorithm is different each time, then it won't worth it to build a graph.
This works really well for anything iterative that uses the same computation underneath (e.g., algorithms that need to converge to something) and it's pretty prominent in a lot of applications that are great for GPUs (e.g., think of the Jacobi method).
You are not going to see great results if you have an algorithm that you invoke once or if your kernels are big; in that case the CPU invocation overhead is not your bottleneck. A succinct explanation of when you need it exists in the Getting Started with CUDA Graphs.
Where task graph based paradigms shine though is when you define your program as tasks with dependencies between them. You give a lot of flexibility to the driver / scheduler / hardware to do scheduling itself without much fine-tuning from the developer's part. There's a reason why we have been spending years exploring the ideas of dataflow programming in HPC.

does thrust::device_vector.pushback() cause a call to memcpy?

Summary
I'd like some clarification on how the thrust::device_vector works.
AFAIK, writing to an indexed location such as device_vector[i] = 7 is implemented by the host, and therefore causes a call to memcpy. Does device_vector.push_back(7) also call memcpy?
Background
I'm working on a project comparing stock prices. The prices are stored in two vectors. I iterate over the two vectors, and when there's a change in their prices relative to each other, I write that change into a new vector. So I never know how long the resulting vector is going to be. On the CPU the natural way to do this is with push_back, but I don't want to use push_back on the GPU vector if its going to call memcpy every time.
Is there a more efficient way to build a vector piece by piece on the GPU?
Research
I've looked at this question, but it (and others) are focused on the most efficient way to access elements from the host. I want to build up a vector on the GPU.
Thank you.
Does device_vector.push_back(7) also call memcpy?
No. It does, however, result in a kernel launch per call.
Is there a more efficient way to build a vector piece by piece on the GPU?
Yes.
Build it (or large segments of it) in host memory first, then copy or insert to memory on the device in a single operation. You will greatly reduce latency and increase PCI-e bus utilization by doing so.

CUDA FFT plan reuse across multiple 'overlapped' CUDA Stream launches

I'm in trying to improve the performance of my code using asynchronous memory transfer overlapped with GPU computation.
Formerly I had a code where I created an FFT plan, and then make use of it multiple times. In such situation the time invested in creating the CUDA FFT plan is negligible althought according to this earlier post it could be quite significant.
Now that I move to streams, what I'm doing is creating the "same" plan "multiple times" and then setting the CUDA FFT stream. According to the answers given by some of you in this other post this is wasteful. But, is there any other way to do it?
NOTE: I'm acquiring the data in real-time, so launching a "batch" CUDA FFT is out of the question. What I'm doing is to create and lauch a new CUDA stream as a result of a complete pulse transmission.
NOTE 2: I was also considering using a "pool" of "CUDA Streams/FFT Plans" instead but I don't think that would be an elegant, sensible solution, any thoughts?
Is there otherwise a way to "copy" an "existent" fft plan before I assign the CUDA Stream?
Thanks guys!/gals? Hopefully meet some of you in San Jose. =)
Omar
What I'm doing is to create and lauch a new CUDA stream as a result of a complete pulse transmission.
Re-use the streams, rather than creating a new stream each time. Then you can re-use the plan created for that stream ahead of time, and you have no need to recreate the "same" plan on-the-fly.
Perhaps this is what you mean by the pool of streams method. Your criticism is that it is not "elegant" or "sensible". I have no idea what that means. Stream re-use in pipelined algorithms is a common tactic, if for no other reason than to avoid the cudaStreamCreate overhead (whatever it may be, large or small).
A cufft plan has a stream associated with it. You cannot copy a plan without the stream association. A plan is an opaque container.

Specifying number of samples for a custom xaudio2 effect

I'm trying to write a custom xaudio2 effect that involves a fourier transform. However, the number of samples given to the process method each call is not a power of 2 (a precondition of the fourier transform implementation I have).
Is there a way to force power of 2 sized samples? Is there a technique to allow working with non power of 2 sizes?
Don't send samples to the FFT every call that you are given samples. Buffer (save) them up till you have at least a power-of-2 samples or more and then process the power-of-2 number of samples from your intermediate buffer. Rinse and repeat.
Also, newer FFTs will often allow sizes with prime factors larger than 2.
If your implementation requires that you have a power of 2 sample size, then you can pad the sample to force it to accept. Zero padding seems to be the easiest/most straight forward.
Here is an article that explains another way to do it:
The Chirp z-Transform Algorithm and Its Application