Calculate 3D coordinates from 2D Image plane accounting for perspective without direct access to view/projection matrix - language-agnostic

First time asking a question on the stack exchange, hopefully this is the right place.
I can't seem to develop a close enough approximation algorithm for my situation as I'm not exactly the best in terms of 3D math.
I have a 3d environment in which I can access the position and rotation of any object, including my camera, as well as run trace lines from any two points to get distances between a point and a point of collision. I also have my camera's field of view. I do not have any form of access to the world/view/projection matrices however.
I also have a collection of 2d images that are basically a set of screenshots of the 3d environment from the camera, each collection is from the same point and angle and the average set is taken at about an average of a 60 degree angle down from the horizon.
I have been able to get to the point of using "registration point entities" that can be placed in the 3d world that represent the corners of the 2d image, and then when a point is picked on the 2d image it is read as a coordinate with range 0-1, which is then interpolated between the 3d positions of the registration points. This seems to work well, but only if the image is a perfect top down angle. When the camera is tilted and another dimension of perspective is introduced, the results become more grossly inaccurate as there no compensation for this perspective.
I don't need to be able to calculate the height of a point, say a window on a sky scraper, but at least the coordinate at the base of the image plane, or which if I extend a line out from my image from a specified image space point I need at least the point that the line will intersect with the ground if there was nothing in the way.
All of the material I found about this says to just deproject the point using the world/view/projection matrices, which I find straightforward in itself except I don't have access to these matrices, just data I can collect at screenshot time and other algorithms use complex maths I simply don't grasp yet.
One end goal of this would be able to place markers in the 3d environment where a user clicks in the image, while not being able to run a simple deprojection from the user's view.
Any help would be appreciated, thanks.
Edit: Herp derp, while my implementation for doing so is a bit odd due to the limitations of my situation, the solution essentially boiled down to ananthonline's answer about simply recalculating the view/projection matrices.

Between position, rotation and FOV of the camera, could you not calculate the View/Projection matrices of the camera (songho.ca/opengl/gl_projectionmatrix.html) - thus allowing you to unproject known 3D points?

Related

Cesium Resampling

I know that Cesium offers several different interpolation methods, including linear (or bilinear in 2D), Hermite, and Lagrange. One can use these methods to resample sets of points and/or create curves that approximate sampled points, etc.
However, the question I have is what method does Cesium use internally when it is rendering a 3D scene and the user is zooming/panning all over the place? This is not a case where the programmer has access to the raster, etc, so one can't just get in the middle of it all and call the interpolation functions directly. Cesium is doing its own thing as quickly as it can in response to user control.
My hunch is that the default is bilinear, but I don't know that nor can I find any documentation that explicitly says what is used. Further, is there a way I can force Cesium to use a specific resampling method during these activities, such as Lagrange resampling? That, in fact, is what I need to do: force Cesium to employ Lagrange resampling during scene rendering. Any suggestions would be appreciated.
EDIT: Here's a more detailed description of the problem…
Suppose I use Cesium to set up a 3-D model of the Earth including a greyscale image chip at its proper location on the model Earth's surface, and then I display the results in a Cesium window. If the view point is far enough from the Earth's surface, then the number of pixels displayed in the image chip part of the window will be fewer than the actual number of pixels that are available in the image chip source. Some downsampling will occur. Likewise, if the user zooms in repeatedly, there will come a point at which there are more pixels displayed across the image chip than the actual number of pixels in the image chip source. Some upsampling will occur. In general, every time Cesium draws a frame that includes a pixel data source there is resampling happening. It could be nearest neighbor (doubt it), linear (probably), cubic, Lagrange, Hermite, or any one of a number of different resampling techniques. At my company, we are using Cesium as part of a large government program which requires the use of Lagrange resampling to ensure image quality. (The NGA has deemed that best for its programs and analyst tools, and they have made it a compliance requirement. So we have no choice.)
So here's the problem: while the user is interacting with the model, for instance zooming in, the drawing process is not in the programmer's control. The resampling is either happening in the Cesium layer itself (hopefully) or in even still lower layers (for instance, the WebGL functions that Cesium may be relying on). So I have no clue which technique is used for this resampling. Worse, if that technique is not Lagrange, then I don't have any clue how to change it.
So the question(s) would be this: is Cesium doing the resampling explicitly? If so, then what technique is it using? If not, then what drawing packages and functions are Cesium relying on to render an image file onto the map? (I can try to dig down and determine what techniques those layers may be using, and/or have available.)
UPDATE: Wow, my original answer was a total misunderstanding of your question, so I've rewritten from scratch.
With the new edits, it's clear your question is about how images are resampled for the screen while rendering. These
images are texturemaps, in WebGL, and the process of getting them to the screen quickly is implemented in hardware,
on the graphics card itself. Software on the CPU is not performant enough to map individual pixels to the screen
one at a time, which is why we have hardware-accelerated 3D cards.
Now for the bad news: This hardware supports nearest neighbor, linear, and mapmapping. That's it. 3D graphics
cards do not use any fancier interpolation, as it needs to be done in a fraction of a second to keep frame rate as high as possible.
Mapmapping is described well by #gman in his article WebGL 3D Textures. It's
a long article but search for the word "mipmap" and skip ahead to his description of that. Basically a single image is reduced
into smaller images prior to rendering, so an appropriately-sized starting point can be chosen at render time. But there will
always be a final mapping to the screen, and as you can see, the choices are NEAREST or LINEAR.
Quoting #gman's article here:
You can choose what WebGL does by setting the texture filtering for each texture. There are 6 modes
NEAREST = choose 1 pixel from the biggest mip
LINEAR = choose 4 pixels from the biggest mip and blend them
NEAREST_MIPMAP_NEAREST = choose the best mip, then pick one pixel from that mip
LINEAR_MIPMAP_NEAREST = choose the best mip, then blend 4 pixels from that mip
NEAREST_MIPMAP_LINEAR = choose the best 2 mips, choose 1 pixel from each, blend them
LINEAR_MIPMAP_LINEAR = choose the best 2 mips. choose 4 pixels from each, blend them
I guess the best news I can give you is that Cesium uses the best of those, LINEAR_MIPMAP_LINEAR to
do its own rendering. If you have a strict requirement for more time-consuming imagery interpolation, that means you
have a requirement to not use a realtime 3D hardware-accelerated graphics card, as there is no way to do Lagrange image interpolation during a realtime render.

When drawing on a canvas, should calculations be done relative to cartesian plane coordinates?

I've been seeing a lot of canvas-graphics-related javascript projects and libraries lately and was wondering how they handle the coordinate system. When drawing shapes and vectors on the canvas, are the points calculated based on a cartesian plane and converted for the canvas, or is everything calculated directly for the canvas?
I tried playing around with drawing a circle by graphing all its tangent lines until the line intersections start to resemble a curve and found the difference between the cartesian planes I'm familiar with and the coordinate system used by web browsers very confusing. The function for a circle, for example, "y^2 + x^2 = r^2" would need to be translated to "(y-1)^2 + (x-1)^2 = r^2" to be seen on the canvas. And then negative slopes were positive slopes on the canvas and everything would be upside down :/ .
The easiest way for me to think about it was to pretend the origin of a cartesian plane was in the center of the canvas and adjust my coordinates accordingly. On a 500 x 500 canvas, the center would be 250,250, so if I ended up with a point at 50,50, it would be drawn at (250 + 50, 250 - 50) = (300, 200).
I get the feeling I'm over-complicating this, but I can't wrap my mind around the clean way to work with a canvas.
Current practice can perhaps be exemplified by a quote from David Flanagan's book "JavaScript : The Definitive Guide", which says that
Certain canvas operations and attributes (such as extracting raw
pixel values and setting shadow offsets) always use this default
coordinate system
(the default coordinate system is that of the canvas). And it continues with
In most canvas operations, when you specify the coordinates
of a point, it is taken to be a point in the current coordinate system
[that's for example the cartesian plane you mentioned, #Walkerneo],
not in the default coordinate system.
Why is using a "current coordinate system" more useful than using directly the canvas c.s. ?
First and foremost, I believe, because it is independent of the canvas itself, which is tied to the screen (more specifically, the default coordinate system dimensions are expressed in pixels). Using for instance a Cartesian (orthogonal) coordinate system makes it easy for you (well, for me too, obviously :-D ) to specify your drawing in terms of what you want to draw, leaving the task of how to draw it to the transformations offered by the Canvas API. In particular, you can express dimensions in the natural units of your drawing, and perform a scale and a translation to fit (or not, as the case may be...) your drawing to the canvas.
Furthermore, using transformations is often a clearer way to build your drawing since it allows you to get "farther" from the underlying coord system and specify your drawing in terms of higher level operations ('scale', 'rotate', 'translate' and the more general 'transform'). The abovementioned book gives a very nice exemple of the power of this approach, drawing a Koch (fractal) snowflake in many fewer lines that would be possible (if at all) using canvas coordinates.
The HTML5 canvas, like most graphics systems, uses a coordinate system where (0,0) is in the top left and the x-axis and y-axis go from left to right and top down respectively. This makes sense if you think about how you would create a graphics system with nothing but a block of memory: the simplest way to map coordinates (x,y) to a memory slot is to take x+w*y, where w is the width of a line.
This means that the canvas coordinate system differs from what you use in mathematics in two ways: (0,0) is not the center like it usually is, and y grows down rather than up. The last part is what makes your figures upside down.
You can set transformations on the canvas that make the coordinate system more like what you are used to:
var ctx = document.getElementById('canvas').getContext('2d');
ctx.translate(250,250); // Move (0,0) to (250, 250)
ctx.scale(1,-1); // Make y grow up rather than down

Orthographic projection - What is the process converting 3d point to 2d

I'm implementing a simple penalty shootout game using actionscript 3.0. The view of the game is similar to view of the old "Sensible World of Soccer". I want to use 3d game logic by using dimension z as I think that it could help me in order to achieve better collision detection - response results. However, I would like to keep the graphics style and view equivalent to old 2d soccers'. Hence, I assume that orthographic projection is suitable for this implementation. Although there is plenty of information in the internet regarding orthographic projection, I'm a little bit confused about how someone can apply it in his/her code.
So my questions are:
Which is the procedure step by step in order for someone to convert a 3d (x, y, z) point to 2d (x', y') point in orthographic projection?
Can we avoid using matrices? If yes, what are the equations that associate coordinates x', y' with x, y, z?
Do we have to define a camera position and angle before applying the conversion? In my case, camera will be in a fixed position and angle.
DisplayObjects and their descendants (ie MovieClip and Sprite) have a z property you can use to do this without the headaches - they also have rotationX/Y/Z and scaleX/Y/Z properties too!
Using 'z' will adjust the position and scale of an object accordingly (though it will convert vectors to bitmaps), there's no depth sorting, so it will stay on top of objects even if its z co-ord suggests it should be behind them, but for the project you have in mind I can't see this being a problem - it's pretty easy to fix anyway, have an array of objects in the scene, sort it according to z-position and reset the depth index of each/re-add to stage in sorted order.
You can use the perspectiveProjection member of a clip to adjust the FOV, origin etc -
Perspective Tutorial
..but you don't need to get any more sophisticated than that. Certainly you don't need to dabble with matrices with a fixed camera view, even if you wanted to calculate this manually as an experiment.
Hope this helps

Calculate the area a camera can see on a plane

I have a camera with the coordinates x,y at height h that is looking onto the x-y-plane at a specific angle, with a specific field of view. I want to calculate the 4 corners the camera can see on the plane.
There is probably some kind of formula for that, but I can't seem to find it on google.
Edit: I should probably mention that I mean a camera in the 3D-Graphics sense. Specifically I'm using XNA.
I've had to do similar things for debugging graphics code in 3D games. I found the easiest way of thinking about it was creating the vectors representing the corners of the screen and then calculating the intersection with whatever relevant objects (in this case, a plane).
Take a look at your view-projection (or whatever your Camera's matrix stack looks like multiplied together) and realize the screen space they output to has corners with homogonized coordinates of (-1, -1), (-1, 1), (1, -1), (1, 1). Knowing this, you're left with one free variable and can solve for the vector representing the corner of the camera.
Although, this is a pain. It's much easier to construct a vector for each corner as if the camera isn't rotated or translated and then transform them by the view matrix to give you world space vectors. Then you can calculate the intersection between the vectors and the plane to get your four corners.
I have a day job, so I leave the math to you. Some links that may help with that, however:
Angle/Field of view calculations
Line plane intersection
ignoring lens distortions and assuming the lens is almost at the focal point then you just have a triangle formed by the sensor size and the lens, then the lens to the subject - similar trianlges gives you the size of the subject plane.
If you want a tilted object plane that's just a projection onto the perpendicular object plane

Create a Graph from points in a Grid that contains holes

I've got a continuous plane (2-D) containing polygonal obstacles. I am uniformly sampling the plane at discrete positions to create a uniform grid of points. The grid does not have points where obstacles lie (i.e. holes where ever an obstacle is) as shown in the image below.
(Please view the image at http://i48.tinypic.com/2efnblg.png for a clear idea of what I'm attempting to accomplish. I couldn't embed it.)
Can anyone point me to some good implementations with optimal worst-case time-complexity?
Solved the problem using recursion.