Can you please recommend papers/github or smth about object detection on RGB-D images (NOT 3d cloud points).The result should still be objects in rectangles in the 2d image, as in the usual methods for object detection like YOLO and others. All I can find is Silent Object detection methods, but it seems like not what I'm looking for.
Here is a good primer to start your research.
https://arxiv.org/abs/1907.09236
There are many open-source implementations. Try some and ask on stackoverflow when you get stuck.
Related
I have been trying to tackle a problem where I need to track multiple people through multiple camera viewpoints on a real-time basis.
I found a solution DeepCC (https://github.com/daiwc/DeepCC) on DukeMTMC dataset but unfortunately, this solution has been taken down because of data confidentiality issues. They were using Fast R-CNN for object detection, triplet loss for Re-identification and DeepSort for real-time multiple object tracking.
Questions:
1. Can someone share some other resources regarding the same problem?
2. Is there a way to download and still use the DukeMTMC database for multiple tracking problem?
3. Is anyone aware when the official website (http://vision.cs.duke.edu/DukeMTMC/) will be available again?
Please feel free to provide different variations of the question :)
Intel OpenVINO framewors has all part of this task:
Objects detection with pretrained Faster RCNN, SSD or YOLO.
Reidentification models.
And complete demo application.
And you can use another models. Or if you want to use detection on GPU then take opencv_dnn_cuda for detection and OpenVINO for reidentification.
A good deep learning library that I have used in the past for my work is called Mask R-CNN, or Mask Regions-Convolutional Neural-Network. Although I have only used this algorithm on images and not on videos, the same principles apply, and it's very easy to make the transition to detection objects in a video. The algorithm uses Tensorflow and Keras, where you can split your input data, i.e images of people, into two sets, training, and validation.
For training, use a third party software like via, to annotate the people in the images. After the annotations have been drawn, you will export a JSON file with all annotations drawn, which will be used for the training process. Do the same thing for the validation phase, BUT make sure the images in the validation have not been seen before by the algorithm.
Once you have annotated both groups and generated JSON files, you then can start training the algorithm. Mask R-CNN makes it very easy to train, with all you need to do is pass one line full of commands to start it. If you want to train data on your GPU instead of your CPU, then install Nvidia's CUDA, which works very well with supported GPUs, and requires no coding after the installation.
During the training stage, you will be generating weights files, which are stored in the .h5 format. Depending on the number of epochs you choose, there will be a weights file generated per epoch. Once the training has finished, you then will just have to reference that weights file anytime you want to detect relevant objects, i.e. in your video feed.
Some important info:
Mask R-CNN is somewhat of an older algorithm, but it still works flawlessly today. Although some people have updated the algorithm to Tenserflow 2.0+, to get the best use out of it, use the following.
Tensorflow-gpu 1.13.2+
Keras 2.0.0+
CUDA 9.0 to 10.0
Honestly, the hardest part for me in the past was not using the algorithm, but finding the right versions of Tensorflow, Keras, and CUDA, that all play well with each other, and don't error out. Although the above-mentioned versions will work, try and see if you can upgrade or downgrade certain libraries to see if you can get better results.
Article about Mask R-CNN with video, I find it to be very useful and resourceful.
https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/
The GitHub repo can be found below.
https://github.com/matterport/Mask_RCNN
EDIT
You can use this method across multiple cameras, just set up multiple video captures within a computer vision library like OpenCV. I assume this would be done with Python, which both Mask R-CNN and OpenCV are primarly based in.
I'm currently trying to make a small platformer, but don't want to use Tile2D (specific reasons), what should I use instead to make platforms (objects the player can collide with).
At the moment, I have a List with every Rectangle the player can collide with and I go through every Rectangle when I want to check collisions, but I find that to be very clunky.
What should I use to make platforms, the player etc. I haven't used Box2D yet, dont know if its the thing I need and am also not sure wether Scene2D is the thing I am looking for. Any tips would be appreciated. Not sure if this is the right place to post this, but its worth a try.
Don't mix two things:
Box2D is a physics engine that allows you to simulate physical world with its whole actions like collisions handling, applying forces or velocity etc
Scene2D is a framework to "clean up" handling objects you want to manage - by definition it is scene graph that allows you to treat bunches of objects as single objects (groups) and apply for them some actions (like setting position on the screen)
So basically when Box2D is more about how objects will behave themselves during application running the Scene2D is more about how you write your code before application running.
Of course Scene2D is very helpful if you want to implement your own mechanism of collisions (like you wrote - you have rectangles array, then iterate over them and check their positions... etc) but the Box2D deliveres you this mechanism so you don't have to do nothing to check just tell the application what to do when collision will occurs.
Then it is problem about is it worth to implement your own collisions mechanism. The most frequent answer I guess is - if the game is simple and the mechanism will be then yes. If not just use physics engine - do not invent fire again ;)
To read about Box2D and learn how to use it visit:
Libgdx box2d intro
Box2D official manual
To read about Scene2D:
Libgdx Scene2D intro
This tutorial
I need to create an application to get input from a webcam or camera connected to a computer and detect certain 3d objects.
I could do this from a .3ds file or something else? I'm not quite sure.
I am pretty sure it is possible with flash as3? I have been looking into openCV but i can't find any examples of this kind of thing.
Any help would be great, and if you have any further questions to understand more. please ask.
Thanks
Frank
EDIT: Ow and i need this to be a web based solution. so i was thinking of python, AS3 something along those lines.
To detect a "3D object" through an inherently 2D medium (a bitmap captured by a camera) is a very complex thing, and requires the detection of lit and shaded areas and how they move in respect to an often known light source. What you likely want to do instead (unless you have access to hardware with a depth buffer, e.g. the Kinect) is to analyze the 2D picture for 2D shapes, i.e. the silhouette of the object that you're looking for.
Have a look at ASFEAT and IN2AR, which are made by the same russian wunderkind as ASSURF, but actively developed an not using patented algorithms.
OpenCV (the port of which to Flash/AS3 is called Marilena) might do the trick, but it's not as optimized for Flash, and requires fairly complex descriptor files. I believe the only ones that are readily available are for face detection.
Your best bet is probably ASSURF, it won't do detection of 3D models but it will do 2D shapes.
I am trying to encode a stream using x264 (by feeding individual images), but what's unusual is that I already have some motion information for my frames. I know exactly which areas have been modified in each frame, and I know where motion has occurred in the frame.
Is there a way to feed x264 my own motion information? I'd like to give it motion vectors for given areas in the frame, and somehow tell it that certain areas in the frame are guaranteed to not have had any motion in them.
I think this might significantly improve the performance of the encoding (because I'm allowing the codec to completely skip the motion estimation phase), and should also somewhat increase quality in cases where the encoder's motion estimation algos might have missed the motion that actually occurred.
Do I need to modify the encoder in order to do this, or is this supported in the existing API?
Short answer: No you can't feed in your motion estimation data to x264.
Long Answer: IIRC, x264 does it's work by being fed in the raw frame, with no extra data. To accommodate the motion estimation data you have, you'd have to modify the x264 source code to accomplish this.
You may be able to find what you need within common\mvpred.c or encoder\me.c. I'm not sure how many of the x264 developers actually visit Stack overflow (I know one of their lead developers has an account here) but you can try talking to them through their usual channels on their IRC channel or on the doom9 forums.
doom9: http://forum.doom9.org/forumdisplay.php?f=77
doom10:http://doom10.org/index.php?board=5.0 IRC:
irc://irc.freenode.net/x264 and irc://irc.freenode.net/x264dev
Mailing list: http://mailman.videolan.org/listinfo/x264-devel
I wish I could give you more information, but unfortunately I'm not particularly well versed in the code base. The developers are always willing and able to help anyone wishing to work on x264 though.
I'm interested in building a 3D model of our solar system for web use (probably with AS3 and papervision) and have been looking into how I would go about encoding the planetary positions. My idea was to download the already calculated positions from NASA as calculating the positions myself seems a but overcomplicated. I'm not sure though whether I should use a helio centric or an earth centric encoding.
I wanted to know if there are any one with any experience in this. Which approach would be better? The NASA JPL website seems to have the positions of all the major bodies in our solar system as earth centric. I can see this becoming a problem later on though when adding Voyager and Mars Lander missions to the model?
Any feedback, comments and links are very welcome.
EDIT: I have a rough model running that uses heliocentric coordinates, but I haven't been able to find the coordinates for all planets in this format.
UPDATE:
I don't have a lot of detail to provide for know because I really don't know what I'm doing (from the space point of view). I wanted to get a handle on 3D programming, and am interested in space. The idea was that I would make a rough solar system simulator with at first all the planets and their orbiters (maybe excluding satellites at first). Perhaps include a news aggregator and some links to news/resources and so on. The general idea would be to allow people to click around and get super excited about going to the moon and Mars (for a starter).
In the long run I hopefully would be able to add in satellites and the moon missions (scroll back in time to the 70's and see the moon missions).
So to answer Arrieta's question the idea was not to calculate eclipses but to build an easy to approach, interactive space exploratorium, and learn some 3D and space related stuff on the way.
Glad you want to build your own simulator, but depending on what you want to do it may be far from an easy task. The simplest approach is as follows:
Download the JPL-DE405 ephemerides and the subroutines for retrieving the planetary positions (wrt Solar System Barycenter).
Request for timespan, compute the positions, and display them to the screen in a visually appealing manner
Done
Now, why would you want to do this? If you want to view the planet's orbits, that's it. You are done. If you want to compute geometric events (like eclipses, or line-of-sight, or ilumination) then you are in a whole different ball game. That's astronautics, and it is not simple.
Please be more specific. The distinction you make of "geocentric" or "heliocentric" coordinates really has no major difficulty involved. If you have all the states in heliocentric frame, you can compute the geocentric frame by simple vector subtraction. That's not the problem! The problems are a thousand more, but you need to be specific so we can provide more guidance.
JPL has provided high quality ephemerides for decades now, and we have a full team of brilliant people working on it. It is one of the most difficult things to get right!
Again, provide more details or check out other sources of information.
Please google "Solar System Simulator" (done here, at JPL) and see if it fulfills your needs.
Cheers.
It may be worth you checking out the ASCOM Platform (we also have a stack exchange site called ASCOM Answers).
The ASCOM Platform has several useful libraries for doing this sort of thing.
USNO NOVAS (Naval Observatory Vector Astrometry)
Kepler orbit engine
The USNO/NOVAS stuff was originally written in C and we've wrapped it up in .NET for ease of use from C# and VB.
As an added bonus (actually it's the raison d’être for ASCOM), the Platform makes it easy for you to control things like telescopes, it's used by Microsoft's World Wide Telescope for exactly that purpose. I tmight be a fun extension to your model to be able to point a telescope at things.
I'd probably start (well, I did a while back) with heliocentric coordinates and get a few of the planets up and running. But sooner or later you'll want to write a heliocentric-to-geocentric coordinate conversion routine, and its inverse. For some bodies, such as artificial satellites the geocentric coordinates will be easier to deal with.
You can use the astro-phys api to get a JSON formatted state vector for all the planets. It calculates them using JPL's de406 so it's pretty accurate and uses the solar system barycenter.
Although, if you know where the sun is relative to the earth and you're in a geocentric model, you can subtract the position of the sun from all of the bodies (including earth) to be heliocentric.