6D object pose estimation by known 3d model - deep-learning

I have a photo of an object from different angles, and its 3d model,
I need to find out from a series of photos where to place(what position and what rotation angle) the 3d model (transformation matrix), help me please
find which algorithms are best to use, what can I read, maybe there are ready-made solutions?
What I've found is mostly about one photo, but if I have a series it should make finding the location easier.
Thanks a lot.

Related

Displacement and Velocity in a Single Direction from Apple Core Motion

I am working on a project and need to find velocity and displacement in a single direction (that direction being straight up and down). I am using my Apple Watch and retrieving all of the Core Motion data from this. I understand that there is drift when integrating the acceleration which can create highly inaccurate displacements. Although through my research, I have read that if you constrain the motion to just one direction you can get better results.
If I would like to find velocity and displacement in only one direction will that truly give me better results? If so, how is this constraining actually done mathematically?
All the work I have done so far is to find the resultant direction of the acceleration from userAcceleration and while looking into the best way to integrate came across this displacement drift issue and wanted to find a path forward.

CNN attention/activation maps

What are common techniques for finding which parts of images contribute most to image classification via convolutional neural nets?
In general, suppose we have 2d matrices with float values between 0 and 1 as entires. Each matrix is associated with a label (single-label, multi-class) and the goal is to perform classification via (Keras) 2D CNN's.
I'm trying to find methods to extract relevant subsequences of rows/columns that contribute most to classification.
Two examples:
https://github.com/jacobgil/keras-cam
https://github.com/tdeboissiere/VGG16CAM-keras
Other examples/resources with an eye toward Keras would be much appreciated.
Note my datasets are not actual images, so using methods with ImageDataGenerator might not directly apply in this case.
There are many visualization methods. Each of these methods has its strengths and weaknesses.
However, you have to keep in mind that the methods partly visualize different things. Here is a short overview based on this paper.
You can distinguish between three main visualization groups:
Functions (gradients, saliency map): These methods visualize how a change in input space affects the prediction
Signal (deconvolution, Guided BackProp, PatternNet): the signal (reason for a neuron's activation) is visualized. So this visualizes what pattern caused the activation of a particular neuron.
Attribution (LRP, Deep Taylor Decomposition, PatternAttribution): these methods visualize how much a single pixel contributed to the prediction. As a result you get a heatmap highlighting which pixels of the input image most strongly contributed to the classification.
Since you are asking how much a pixel has contributed to the classification, you should use methods of attribution. Nevertheless, the other methods also have their right to exist.
One nice toolbox for visualizing heatmaps is iNNvestigate.
This toolbox contains the following methods:
SmoothGrad
DeConvNet
Guided BackProp
PatternNet
PatternAttribution
Occlusion
Input times Gradient
Integrated Gradients
Deep Taylor
LRP
DeepLift

Orthographic projection - What is the process converting 3d point to 2d

I'm implementing a simple penalty shootout game using actionscript 3.0. The view of the game is similar to view of the old "Sensible World of Soccer". I want to use 3d game logic by using dimension z as I think that it could help me in order to achieve better collision detection - response results. However, I would like to keep the graphics style and view equivalent to old 2d soccers'. Hence, I assume that orthographic projection is suitable for this implementation. Although there is plenty of information in the internet regarding orthographic projection, I'm a little bit confused about how someone can apply it in his/her code.
So my questions are:
Which is the procedure step by step in order for someone to convert a 3d (x, y, z) point to 2d (x', y') point in orthographic projection?
Can we avoid using matrices? If yes, what are the equations that associate coordinates x', y' with x, y, z?
Do we have to define a camera position and angle before applying the conversion? In my case, camera will be in a fixed position and angle.
DisplayObjects and their descendants (ie MovieClip and Sprite) have a z property you can use to do this without the headaches - they also have rotationX/Y/Z and scaleX/Y/Z properties too!
Using 'z' will adjust the position and scale of an object accordingly (though it will convert vectors to bitmaps), there's no depth sorting, so it will stay on top of objects even if its z co-ord suggests it should be behind them, but for the project you have in mind I can't see this being a problem - it's pretty easy to fix anyway, have an array of objects in the scene, sort it according to z-position and reset the depth index of each/re-add to stage in sorted order.
You can use the perspectiveProjection member of a clip to adjust the FOV, origin etc -
Perspective Tutorial
..but you don't need to get any more sophisticated than that. Certainly you don't need to dabble with matrices with a fixed camera view, even if you wanted to calculate this manually as an experiment.
Hope this helps

Calculate 3D coordinates from 2D Image plane accounting for perspective without direct access to view/projection matrix

First time asking a question on the stack exchange, hopefully this is the right place.
I can't seem to develop a close enough approximation algorithm for my situation as I'm not exactly the best in terms of 3D math.
I have a 3d environment in which I can access the position and rotation of any object, including my camera, as well as run trace lines from any two points to get distances between a point and a point of collision. I also have my camera's field of view. I do not have any form of access to the world/view/projection matrices however.
I also have a collection of 2d images that are basically a set of screenshots of the 3d environment from the camera, each collection is from the same point and angle and the average set is taken at about an average of a 60 degree angle down from the horizon.
I have been able to get to the point of using "registration point entities" that can be placed in the 3d world that represent the corners of the 2d image, and then when a point is picked on the 2d image it is read as a coordinate with range 0-1, which is then interpolated between the 3d positions of the registration points. This seems to work well, but only if the image is a perfect top down angle. When the camera is tilted and another dimension of perspective is introduced, the results become more grossly inaccurate as there no compensation for this perspective.
I don't need to be able to calculate the height of a point, say a window on a sky scraper, but at least the coordinate at the base of the image plane, or which if I extend a line out from my image from a specified image space point I need at least the point that the line will intersect with the ground if there was nothing in the way.
All of the material I found about this says to just deproject the point using the world/view/projection matrices, which I find straightforward in itself except I don't have access to these matrices, just data I can collect at screenshot time and other algorithms use complex maths I simply don't grasp yet.
One end goal of this would be able to place markers in the 3d environment where a user clicks in the image, while not being able to run a simple deprojection from the user's view.
Any help would be appreciated, thanks.
Edit: Herp derp, while my implementation for doing so is a bit odd due to the limitations of my situation, the solution essentially boiled down to ananthonline's answer about simply recalculating the view/projection matrices.
Between position, rotation and FOV of the camera, could you not calculate the View/Projection matrices of the camera (songho.ca/opengl/gl_projectionmatrix.html) - thus allowing you to unproject known 3D points?

Path Finding in 3d environment in Java

I understand that there are a couple of links about path finding in 2d.Is there any java example that shows how to implement path findings in 3d environment.
I have already seen lot of code and explanations about this.but none of them will really help.How to implement this in 3d enviorment
If you mean 3d pathfinding as 2d pathfinding in 3d space. For example, you have a 3d scene and something needs to find its way around the walls and rooms. If this is the case you can just make some system to keep track of all your walls (say... BSP tree?) and then implement a common pathfinding algorithm such as A*.