Window function after Fast Fourier Transform - fft

Hi this is open for discussion since I am doing a capstone project. So I am working with a Raspberry Pi4 and we have a piezoelectric sensor hooked up and and LCD screen as well. My team is recording voltages after a playing an instrument and we used the guitar for this. So I have data that we recorded. My question is... do i need to perform a fast Fourier transform on that data first and then do a window function or can i just simply use a window function if i want my data to have better noise measurements? Our project is coded in python and I've been looking at python code for FFT and window functions.
Just been looking online sources for this and I am not sure how i can apply this to my application since I've never done this before.

Related

Is there a reality capture parameter to request the desired number of vertices?

In the previous reality capture system users could set a parameter which would determine the resolution of the output models. I want to wind up with models about 100-150K vertices. Is there a setting that allows me to request the modeler to keep the number of generated vertices within some bounds, somewhere in the forge API?
The vertex/triangle decimation is usually, what can be called "subjective" task, which can also explain why there are so many optimization algorithms in the wild.
One type of optimization you would need and expect for "organic" models, and totally different one for an architectural building.
The Reality Capture API provides you only with raw Hi-Res results, avoiding "opionated" optimizations. This should be considered just as a step in automation pipeline.
Another step, would be, upon receiving, to automatically optimize the resulted mesh based on set of settings you need.
One of these steps could be Design Automation for 3ds Max, where you feed a model and using the ProOptimizer Modifier within 3ds Max, you output the mesh with needed detail. A sample of this step, can be found here: https://forge-showroom.autodesk.io/post/prooptimizer.
There are also numerous opensource solutions which should help you cover this post-processing step.

Is there an open source solution for Multiple camera multiple object (people) tracking system?

I have been trying to tackle a problem where I need to track multiple people through multiple camera viewpoints on a real-time basis.
I found a solution DeepCC (https://github.com/daiwc/DeepCC) on DukeMTMC dataset but unfortunately, this solution has been taken down because of data confidentiality issues. They were using Fast R-CNN for object detection, triplet loss for Re-identification and DeepSort for real-time multiple object tracking.
Questions:
1. Can someone share some other resources regarding the same problem?
2. Is there a way to download and still use the DukeMTMC database for multiple tracking problem?
3. Is anyone aware when the official website (http://vision.cs.duke.edu/DukeMTMC/) will be available again?
Please feel free to provide different variations of the question :)
Intel OpenVINO framewors has all part of this task:
Objects detection with pretrained Faster RCNN, SSD or YOLO.
Reidentification models.
And complete demo application.
And you can use another models. Or if you want to use detection on GPU then take opencv_dnn_cuda for detection and OpenVINO for reidentification.
A good deep learning library that I have used in the past for my work is called Mask R-CNN, or Mask Regions-Convolutional Neural-Network. Although I have only used this algorithm on images and not on videos, the same principles apply, and it's very easy to make the transition to detection objects in a video. The algorithm uses Tensorflow and Keras, where you can split your input data, i.e images of people, into two sets, training, and validation.
For training, use a third party software like via, to annotate the people in the images. After the annotations have been drawn, you will export a JSON file with all annotations drawn, which will be used for the training process. Do the same thing for the validation phase, BUT make sure the images in the validation have not been seen before by the algorithm.
Once you have annotated both groups and generated JSON files, you then can start training the algorithm. Mask R-CNN makes it very easy to train, with all you need to do is pass one line full of commands to start it. If you want to train data on your GPU instead of your CPU, then install Nvidia's CUDA, which works very well with supported GPUs, and requires no coding after the installation.
During the training stage, you will be generating weights files, which are stored in the .h5 format. Depending on the number of epochs you choose, there will be a weights file generated per epoch. Once the training has finished, you then will just have to reference that weights file anytime you want to detect relevant objects, i.e. in your video feed.
Some important info:
Mask R-CNN is somewhat of an older algorithm, but it still works flawlessly today. Although some people have updated the algorithm to Tenserflow 2.0+, to get the best use out of it, use the following.
Tensorflow-gpu 1.13.2+
Keras 2.0.0+
CUDA 9.0 to 10.0
Honestly, the hardest part for me in the past was not using the algorithm, but finding the right versions of Tensorflow, Keras, and CUDA, that all play well with each other, and don't error out. Although the above-mentioned versions will work, try and see if you can upgrade or downgrade certain libraries to see if you can get better results.
Article about Mask R-CNN with video, I find it to be very useful and resourceful.
https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/
The GitHub repo can be found below.
https://github.com/matterport/Mask_RCNN
EDIT
You can use this method across multiple cameras, just set up multiple video captures within a computer vision library like OpenCV. I assume this would be done with Python, which both Mask R-CNN and OpenCV are primarly based in.

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.
Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).
I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.
There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?
If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here.
There is even an implementation in Neon which works pretty well.
Finally, if you want to use continuous actions, you might be interested in this very recent paper.
To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)

AVR-Studio how to output?

I do not have experience with micorcontrollers but I have something related to them. Here is and explanation of my issue:
I have an algorithm, and I want to calculate how many cycles my algorithm would cost on a specific avr microcontroller.
To do that I downloaded AVR-STudio 6, and I used the simulator. I succeeded in obtaining the number of cycles for my algorithm. What I wan to know is that how can I make sure that my algorithm is working as it should be. AVR-Studio allows me to debug using the simulator but I am not able to see the output of my algorithm.
To simplify my question, I would like some help in implementing the hello world example in AVR-Studio, that is I want to see "hello world" in the output window, if that is possible.
My question is not how to program the microcontroller, my question is that how could I see the output of a program in AVR-Studio.
Many thanks
As Hanno Binder suggested in his comment:
Atmel Studio still does not provide any means to display debug messages sent by the program simulated. Your only option is to place breakpoints at apropriate locations and then inspect the state of the device in the simulator. For example the locations in RAM where your result is stored, or the registers in which it may reside; maybe have a 'watch' set on a variable or expression.
I think this is the best answer, watch vairables and memory while in debug mode.
Note: turn off optimization when you want to debug for infomation, or some variables will be optimized away.
the best thing to test if algorithms work is to run them in a regular PC program and feed them with data and compare the results with ground trouth.
Clearly to be able to do this a good programming style is neccessary that separates hardware related tasks from the actual data processing. Additionally you have to keep architectural differences in mind (eg: int=16bit vs. int=32bit --> use inttypes.h)

How can I analyze live data from webcam?

I am going to be working on self-chosen project for my college networking class and I just had a couple questions to help get me started in the right direction.
My project will involve creating a new "physical" link over which data, in the form of text, will be transmitted from one computer to another. This link will involve one computer with a webcam that reads a series of flashing colors (black/white) as binary and converts it to text. Each series of flashes will simulate a packet of data. I will be using OSX an the integrated webcam in a Macbook, the flashing computer will either be windows or osx.
So my questions are: which programming languages or API's would be best for reading live webcam data and analyzing the color of a certain area as well as programming and timing the flashes? Also, would I need to worry about matching the flash rate of the "writing" computer and the frame capture rate of the "reading" computer?
Thank you for any help you might be able to provide.
Regarding the frame capture rate, Shannon sampling theorem says that "perfect reconstruction of a signal is possible when the sampling frequency is greater than twice the maximum frequency of the signal being sampled". In other words if your flashing light switches 10 times per second, you need a camera of more than 20fps to properly capture that. So basically check your camera specs, divide by 2, lower the resulting a little and you have your maximum flashing rate.
Whatever can get the frames will work. If the light conditions in which the camera works are gonna be stable, and the position of the light on images is gonna be static then it is gonna be very very easy with checking the average pixel values of a certain area.
If you need additional image processing you should probably also find out about OpenCV (it has bindings to every programming language).
To answer your question about language choice, I would recommend java. The Java Media Framework is great and easy to use. I have used it for capturing video from webcams in the past. Be warned, however, that everyone you ask will recommend a different language - everyone has their preferences!
What are you using as the flashing device? What kind of distance are you trying to achieve? Something worth thinking about is how are you going to get the receiver to recognise where within the captured image to look for the flashes. Some kind of fiducial marker might be necessary. Longer ranges will make this problem harder to resolve.
If you're thinking about shorter ranges, have you considered using a two-dimensional transmitter? (given that you're using a two-dimensional receiver, it makes sense) and maybe have a transmitter that shows a sequence of QR codes (or similar encodings) on a monitor?
You will have to consider some kind of error-correction encoding, such as a hamming code. While encoding would increase the data footprint, it might give you overall better bandwidth given that you can crank up the speed much higher without having to worry about the odd corrupt bit.
Some 'evaluation' type material might include you discussing the obvious security risks in using such a channel - anyone with line of sight to the transmitter can eavesdrop! You could suggest in your writeup using some kind of encryption, a block cipher in CBC would do, but would require a key-exchange prior to transmission, so you could think about public key encryption.