Inferring depth from a front facing camera using Deep Reinforcement Learning, ConvNets and RNN's - deep-learning

For a personal project I thought about a Pi Car that drives forward in a loop between the living room and kitchen and is able to steer itself between hallways and avoid collisions.
I was able to create this PoC using Behavioral Cloning. I manually drove the RC car along a black line on the floor while it recorded images.
I then ran the images through a ConvNet and used the model to predict the left and right motor controls. It worked, but left a lot to be desired.
Now I would like to repeat this PoC without manual training. A sonar sensor / LiDAR would work to avoid collision, but I am hoping to learn more about CV
The approach I have in mind is:
1) have the car in continuous forward motion while it records images
2) feed the images into a ConvNet to learn features like how close an object is
3) feed the output of the ConvNet into an RNN
4) the RNN will guide a reinforcement policy
5) the policy is simple: anything blocking your forward motion should be avoided
This is loosely based on this work at Samsung and UC. My thinking is the features learned from the CNN will be used by the RNN in a time series to learn how close an object is.
Think of the car moving closer and closer to the couch. The features of the couch will change; thus hopefully inferring depth. Blocked forward motion would mean objects are getting closer.
One of the issues at now is how can I reflect being blocked by objects in the policy outside of a simulator?
In a ROS simulator it would be easy, since Gazebo gives you x,y coordinates and I can set a rate of change as good forward motion vs. being blocked.
But how can I do this on a physical robot that has no localization?
Also I am not enrolled in any classes and have been following free online material for the last year for all of this. Any critique, feedback and discussion is highly needed!

Related

How can I track in real-time small fast-moving stones that are free-falling?

I am looking for the fastest real-time object trackers for tracking small fast-moving stones that are free-falling vertically, given that there can be up to 50 objects in a single frame, and their shape is very similar.
I have trained a YoloV5 object detection model on stones and the inference speed is doing pretty good (120 FPS), but when I pass the .pt weights file to DeepSort algorithm for object tracking and test it on a normal speed video, it does not track my objects at all. However, I tried to Slow-Motion the video to * 0.25 speed and re-tested DeepSort and it worked, but was not able to associate stones and differentiate well between them (one ID is given to multiple objects).
Note: I am using the pre-trained weights on pedestrians of the deep part of DeepSort.
Is there any solution to:
1- Make the model work on the normal video without having to slow-motion the video?
2- Solve the problem of ID switching and ID repeating?
3- Should I re-train the deep part of DeepSort on my dataset of stones? or I can use the pre-trained weights?
Any help of any kind will be very appreciated :)
1- Make the model work on the normal video without having to slow-motion the video?
Most of the Github repos that implement DeepSort perform the tracking offline. This is, when the object detection + association process is done for a certain frame it takes the next, and so on, till it is done. So the FPS of your video shouldn't affect your tracking results as the only thing that changes in the video by slowing it down is the presentation timestamp (PTS) of each video frame.
2- Solve the problem of ID switching and ID repeating?
Most of the DeepSort implementations on Github (https://github.com/nwojke/deep_sort, https://github.com/ZQPei/deep_sort_pytorch)
have not implemented Lambda as per Eq(5) in the paper. This implies that the position of the objects are not taken into consideration when performing the ID association. In you case this is a waste of information, specially as the stones are falling and their movement is easily predictable.
3- Should I re-train the deep part of DeepSort on my dataset of stones? or I can use the pre-trained weights?
Visually, your stones most likely look very similar. This means that training a custom ReID model on stones would have very little effect on you final tracking results. Hence, in your specific case, it is more important that the stones' position gets into consideration when performing the ID association, so we are back at the previous point.
Here is a repo that implements most of what you need (https://github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch)
Start with computer vision basics before powering your YOLOv5 model. Have you heard about the atmospheric turbulence model? You can read about it here or just check Chapter 5 (Image Restoration and Reconstruction) of Digital Image Processing 3rd Edition by Rafael Gonzalez.
Perhaps this paper help you understand more about moving objects: https://openaccess.thecvf.com/content/CVPR2021/html/Rozumnyi_DeFMO_Deblurring_and_Shape_Recovery_of_Fast_Moving_Objects_CVPR_2021_paper.html
Good luck and enjoy!

How to creat CNN model in Image Recognition with Tensorflow to compare with Inception v3

I'm studying Image Recognition with Tensorflow. I already read about the topic How to retrain Inception's Layer for new categories on Tensorflow.org, which utilize the Inception v3 training model.
Now, I desire to creat my own CNN model in order to compare with Inception v3, but I don't know how can I begin with.
Anyone knows some guides step-by-step on this problem?
I'd appreciate any your suggestion
Thanks in advance
First baby steps
The gold standard for getting started in image recognition is processing MNIST images. Tensorflow has a great tutorial on how to get started and also how to move to convolutional networks.
From there it is a long hard road to compete with Inception without just copying someone else's graph. You'll probably want to get a feel for what the different layers of convolution do. I created a basic Tensorflow Tutorial which contains an example python file that demos different convolution graphs and their resulting accuracy.
Going deeper
After conquering MNIST you'll need a lot of images (you can get them from imageNet) and a lot of GPU (to run all your training) and a software setup so that you can not only run and test your model, but dozens (if not hundreds) of variations to explore your hyper parameters (like learning rate, convolution size, dropout, etc). Remember, it took a team of leading edge Machine Learning experts to create something like Inception, many many months (possibly years) of iteration to find the model they use today, and thousands of CPU/GPU hours.
If you are trying to understand what is going on and what makes a good graph, then trying to recreate Inception is a great idea. If you just want an excellent Image recognition model, then reuse an existing one.
If you are trying to have fun, just do it!
Cheers-

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.
Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).
I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.
There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?
If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here.
There is even an implementation in Neon which works pretty well.
Finally, if you want to use continuous actions, you might be interested in this very recent paper.
To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)

Order-issuing neural network?

I'm interested in writing certain software that uses machine learning, and performs certain actions based on external data.
However I've run into problem (that was always interesting to me) -
how is it possible to write machine learning software that issues orders or sequences of orders?
The problem is that as I understand it, neural network gets bunch on inputs, and "recalls" output based on results of previous trainings. Instantly (well, more or less). So I'm not sure how "issuing orders" could fit into that system, especially when actions performed by system affect the system with certain delay. I'm also a bit unsure how is it possible to train this thing.
Examples of such system:
1. First person shooter enemy controller. As I understand it, it is possible to implement neural network controller for the bot that will switch bot behavior strategies(well, assign priorities to them) based on some inputs (probably something like health, ammo, etc). But I don't see a way to make higher-order controller, that could issue sequence of commands like "go there, then turn left". Also, bot's actions will affect variables that control bot's behavior. I.e. shooting reduces ammo, falling from heights reduces health, etc.
2. Automated market trader. It is certainly possible to make system that will try to predict the next market price of something. However, I don't see how is it possible to make system that would issue order to buy something, watch the trend, then sell it back to gain profit/cover up losses.
3. Car driver. Again, (as I understand it) it is possible to make system that will maintain desired movement vector based on position/velocity/torque data and results of previous training. However I don't see a way to make such system (learn to) perform sequence of actions.
I.e. as I understood it, neural net is technically a matrix - you give it input, it produces output. But what about generating sequences of actions that could change environment program operates in?
If such tasks are not entirely suitable for neural networks, what else could be used?
P.S. I understand that the question isn't exactly clear, and I suspect that I'm missing some knowledge. So I'll appreciate some pointers (i.e. books/resources to read, etc).
You could try to connect the output neurons to controllers directly, e.g. moving forward, turning, or shooting in the ego shooter, or buying orders for the trader. However, I think that the best results are gained nowadays when you let the neural net solve one rather specific subproblem, and then let a "normal" program interpret its answer. For example, you could let the neural net construct a map overlay of "where do I want to be", which the bot then translates into movements. The neural network for the trader could produce a "how much do I want which paper", which the bot then translates into buying or selling orders.
The decision which subproblem should be solved by a neural network is a very central one for its design. The important thing is that good solutions can be taught to the neural network.
Edit: Expanding this in the examples: When the ego shooter bot gets shot, it should not have wanted to be there; when it gets to shoot someone else, it should have wanted to be there more. When the trader loses money from a paper, it should have wanted it less before; if it gains, it should have wanted it more. These things can be taught.
The problem you are describing is known as Reinforcement Learning. Reinforcement learning is essentially a machine learning algorithm (such as a neural network) coupled with a controller. It has been used for all of the applications you mention, even to drive real cars.

Solar system computer model

I'm interested in building a 3D model of our solar system for web use (probably with AS3 and papervision) and have been looking into how I would go about encoding the planetary positions. My idea was to download the already calculated positions from NASA as calculating the positions myself seems a but overcomplicated. I'm not sure though whether I should use a helio centric or an earth centric encoding.
I wanted to know if there are any one with any experience in this. Which approach would be better? The NASA JPL website seems to have the positions of all the major bodies in our solar system as earth centric. I can see this becoming a problem later on though when adding Voyager and Mars Lander missions to the model?
Any feedback, comments and links are very welcome.
EDIT: I have a rough model running that uses heliocentric coordinates, but I haven't been able to find the coordinates for all planets in this format.
UPDATE:
I don't have a lot of detail to provide for know because I really don't know what I'm doing (from the space point of view). I wanted to get a handle on 3D programming, and am interested in space. The idea was that I would make a rough solar system simulator with at first all the planets and their orbiters (maybe excluding satellites at first). Perhaps include a news aggregator and some links to news/resources and so on. The general idea would be to allow people to click around and get super excited about going to the moon and Mars (for a starter).
In the long run I hopefully would be able to add in satellites and the moon missions (scroll back in time to the 70's and see the moon missions).
So to answer Arrieta's question the idea was not to calculate eclipses but to build an easy to approach, interactive space exploratorium, and learn some 3D and space related stuff on the way.
Glad you want to build your own simulator, but depending on what you want to do it may be far from an easy task. The simplest approach is as follows:
Download the JPL-DE405 ephemerides and the subroutines for retrieving the planetary positions (wrt Solar System Barycenter).
Request for timespan, compute the positions, and display them to the screen in a visually appealing manner
Done
Now, why would you want to do this? If you want to view the planet's orbits, that's it. You are done. If you want to compute geometric events (like eclipses, or line-of-sight, or ilumination) then you are in a whole different ball game. That's astronautics, and it is not simple.
Please be more specific. The distinction you make of "geocentric" or "heliocentric" coordinates really has no major difficulty involved. If you have all the states in heliocentric frame, you can compute the geocentric frame by simple vector subtraction. That's not the problem! The problems are a thousand more, but you need to be specific so we can provide more guidance.
JPL has provided high quality ephemerides for decades now, and we have a full team of brilliant people working on it. It is one of the most difficult things to get right!
Again, provide more details or check out other sources of information.
Please google "Solar System Simulator" (done here, at JPL) and see if it fulfills your needs.
Cheers.
It may be worth you checking out the ASCOM Platform (we also have a stack exchange site called ASCOM Answers).
The ASCOM Platform has several useful libraries for doing this sort of thing.
USNO NOVAS (Naval Observatory Vector Astrometry)
Kepler orbit engine
The USNO/NOVAS stuff was originally written in C and we've wrapped it up in .NET for ease of use from C# and VB.
As an added bonus (actually it's the raison d’être for ASCOM), the Platform makes it easy for you to control things like telescopes, it's used by Microsoft's World Wide Telescope for exactly that purpose. I tmight be a fun extension to your model to be able to point a telescope at things.
I'd probably start (well, I did a while back) with heliocentric coordinates and get a few of the planets up and running. But sooner or later you'll want to write a heliocentric-to-geocentric coordinate conversion routine, and its inverse. For some bodies, such as artificial satellites the geocentric coordinates will be easier to deal with.
You can use the astro-phys api to get a JSON formatted state vector for all the planets. It calculates them using JPL's de406 so it's pretty accurate and uses the solar system barycenter.
Although, if you know where the sun is relative to the earth and you're in a geocentric model, you can subtract the position of the sun from all of the bodies (including earth) to be heliocentric.