Cesium Resampling - zooming

I know that Cesium offers several different interpolation methods, including linear (or bilinear in 2D), Hermite, and Lagrange. One can use these methods to resample sets of points and/or create curves that approximate sampled points, etc.
However, the question I have is what method does Cesium use internally when it is rendering a 3D scene and the user is zooming/panning all over the place? This is not a case where the programmer has access to the raster, etc, so one can't just get in the middle of it all and call the interpolation functions directly. Cesium is doing its own thing as quickly as it can in response to user control.
My hunch is that the default is bilinear, but I don't know that nor can I find any documentation that explicitly says what is used. Further, is there a way I can force Cesium to use a specific resampling method during these activities, such as Lagrange resampling? That, in fact, is what I need to do: force Cesium to employ Lagrange resampling during scene rendering. Any suggestions would be appreciated.
EDIT: Here's a more detailed description of the problem…
Suppose I use Cesium to set up a 3-D model of the Earth including a greyscale image chip at its proper location on the model Earth's surface, and then I display the results in a Cesium window. If the view point is far enough from the Earth's surface, then the number of pixels displayed in the image chip part of the window will be fewer than the actual number of pixels that are available in the image chip source. Some downsampling will occur. Likewise, if the user zooms in repeatedly, there will come a point at which there are more pixels displayed across the image chip than the actual number of pixels in the image chip source. Some upsampling will occur. In general, every time Cesium draws a frame that includes a pixel data source there is resampling happening. It could be nearest neighbor (doubt it), linear (probably), cubic, Lagrange, Hermite, or any one of a number of different resampling techniques. At my company, we are using Cesium as part of a large government program which requires the use of Lagrange resampling to ensure image quality. (The NGA has deemed that best for its programs and analyst tools, and they have made it a compliance requirement. So we have no choice.)
So here's the problem: while the user is interacting with the model, for instance zooming in, the drawing process is not in the programmer's control. The resampling is either happening in the Cesium layer itself (hopefully) or in even still lower layers (for instance, the WebGL functions that Cesium may be relying on). So I have no clue which technique is used for this resampling. Worse, if that technique is not Lagrange, then I don't have any clue how to change it.
So the question(s) would be this: is Cesium doing the resampling explicitly? If so, then what technique is it using? If not, then what drawing packages and functions are Cesium relying on to render an image file onto the map? (I can try to dig down and determine what techniques those layers may be using, and/or have available.)

UPDATE: Wow, my original answer was a total misunderstanding of your question, so I've rewritten from scratch.
With the new edits, it's clear your question is about how images are resampled for the screen while rendering. These
images are texturemaps, in WebGL, and the process of getting them to the screen quickly is implemented in hardware,
on the graphics card itself. Software on the CPU is not performant enough to map individual pixels to the screen
one at a time, which is why we have hardware-accelerated 3D cards.
Now for the bad news: This hardware supports nearest neighbor, linear, and mapmapping. That's it. 3D graphics
cards do not use any fancier interpolation, as it needs to be done in a fraction of a second to keep frame rate as high as possible.
Mapmapping is described well by #gman in his article WebGL 3D Textures. It's
a long article but search for the word "mipmap" and skip ahead to his description of that. Basically a single image is reduced
into smaller images prior to rendering, so an appropriately-sized starting point can be chosen at render time. But there will
always be a final mapping to the screen, and as you can see, the choices are NEAREST or LINEAR.
Quoting #gman's article here:
You can choose what WebGL does by setting the texture filtering for each texture. There are 6 modes
NEAREST = choose 1 pixel from the biggest mip
LINEAR = choose 4 pixels from the biggest mip and blend them
NEAREST_MIPMAP_NEAREST = choose the best mip, then pick one pixel from that mip
LINEAR_MIPMAP_NEAREST = choose the best mip, then blend 4 pixels from that mip
NEAREST_MIPMAP_LINEAR = choose the best 2 mips, choose 1 pixel from each, blend them
LINEAR_MIPMAP_LINEAR = choose the best 2 mips. choose 4 pixels from each, blend them
I guess the best news I can give you is that Cesium uses the best of those, LINEAR_MIPMAP_LINEAR to
do its own rendering. If you have a strict requirement for more time-consuming imagery interpolation, that means you
have a requirement to not use a realtime 3D hardware-accelerated graphics card, as there is no way to do Lagrange image interpolation during a realtime render.

Related

YOLOV1 theory of 2 bounding box predictors in each grid cell and its possible power as psuedo anchor boxes

After YOLO1 there was a trend of using anchor boxes for a while in other iterations as priors (I believe the reason was to both speed up the training and detect different sized objects better)
However YOLOV1 has an interesting mechanism where there are k number of bounding box predictors sliding each grid cell in order to be able to specialize in detecting different scaled objects.
Here is what I wonder, ladies and gentlemen:
Given a very long training time, can these bounding box predictors in YOLOV1 achieve better bounding boxes compared to YOLOV9000 or its counterparts that rely on anchor box mechanism
In my experience, yes they can. I observed two possible optimization paths, one of which is already implemented in latest version of YOLOV3 and V5 by Ultralytics (https://github.com/ultralytics/yolov5)
What I observed was that for a YOLOv3, even before training, using a K means clustering we can ascertain a number of ``common box'' shapes. These data when fed into the network as anchor maskes really improved the performance of the YOLOv3 network for "that particular" dataset since the non-max suppression routine had much better chance of succeeding at filtering out spurious detection for particular classes in each of the detection head. To the best of my knowledge, this technique was implemented in latest iterations of their bounding box regression code.
Suppressing certain layers. In YOLOv3, the network performed detection in three stages with the idea of progressively detecting larger objects to smaller objects. YOLOv3 (and in theory V1) can benefit if with some trial and error, you can ascertain which detection head is your network preferring to use based on the common bounding box shapes that you found in step 1.

what is class score map?

I was going through vggnet paper and i came across the testing phase of vggnet.
During the testing phase, test image goes through the vggnet and a class score map is obtained. This class score map is spatially averaged to produce a fixed size vector.
I have googled class score map, but then i couldn't find any relevant results. I wish to know what is the role of class score map.
Any hint would be greatly helpful. Thanks
When you train an image recognition model, you train it for a specific image size (and resolution), let's say n_dims = [256, 256]. Now, in the prediction phase, you have images of different sizes (with respect to pixels), e.g. [1024, 1024]. You extract patches (you can resize the image first by lowering the resolution) and hover over the image patches with your model, and for each patch, you obtain a prediction for all classes (in a patch, more than one of the objects might be present), which you have to average somehow for the whole image at the end.
See OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks.
Instead, we explore the entire image by densely running the network at each location and at multiple
scales. While the sliding window approach may be computationally prohibitive for certain types
of model, it is inherently efficient in the case of ConvNets (see section 3.5). This approach yields
significantly more views for voting, which increases robustness while remaining efficient. The result
of convolving a ConvNet on an image of arbitrary size is a spatial map of C-dimensional vectors at
each scale.

Spritestage and multiple spritesheets

My team is currently working on a rather large Web application. We have switched from the flash platform to Html5 in hope for a one size fits all platform.
The UI, is mainly based on createjs, which I by the way really enjoy working with.
However we have now reached the maturity phase and started optimizing some of the animations, that doesn't run smoothly in especially IE.
The thing is that we have a around 1500 sprites (pngs & jpgs) which is drawn onto a stage. We only draw around 60 of them per frame.
They are rather large (up to 800x800 pixels), and the application engine can choose which 60 to show more or less randomly.
The images are packed in a zip file and unpacked in the browser and Html images are constructed by converting the binary data to a base64 encoded string, which is passed to the src property of an image.
So in each frame render a set of around
60 images are drawn to the stage. And this is for some reason slow.
I have therefore used some time to experiment with the Spritestage of createjs to take advantage of Webgl, but with only small improvements.
So now I'm considering to pack our sprites in a spritesheet, which results in many sheets because of the large amount of data.
My question is therefore:
Would spritestage gain any improvements if my sprites are spread across multiple sheets? According to the documentation only spritesheets with a single image are supported.
Best regards
/Mikkel Rasmussen
In general, spritesheets speed up rendering by decrease the number of draw call required by frame. Instead of say, using a different texture and render for every sprite, spritesheet implementations can draw multiple sprites with one render call, provided that the sprite sheet contains all the different individual sprite images. So to answer your question, you are unlikely to see performance gains if the sprites you want to draw are scattered on different sprite sheets.
Draw calls have significant overhead and it is generally a good idea to minimize them. 1500 individual draw calls would be pretty slow.
I dont know if it this is applicable to your situation but it is possible your bottleneck is not the number of draw calls you dispatch to GPU but you are doing too much overdraw since you mention your sprites are 800x800 each. If that is the case, try to render front to back with the depth test or stencil test turned on.

Calculate 3D coordinates from 2D Image plane accounting for perspective without direct access to view/projection matrix

First time asking a question on the stack exchange, hopefully this is the right place.
I can't seem to develop a close enough approximation algorithm for my situation as I'm not exactly the best in terms of 3D math.
I have a 3d environment in which I can access the position and rotation of any object, including my camera, as well as run trace lines from any two points to get distances between a point and a point of collision. I also have my camera's field of view. I do not have any form of access to the world/view/projection matrices however.
I also have a collection of 2d images that are basically a set of screenshots of the 3d environment from the camera, each collection is from the same point and angle and the average set is taken at about an average of a 60 degree angle down from the horizon.
I have been able to get to the point of using "registration point entities" that can be placed in the 3d world that represent the corners of the 2d image, and then when a point is picked on the 2d image it is read as a coordinate with range 0-1, which is then interpolated between the 3d positions of the registration points. This seems to work well, but only if the image is a perfect top down angle. When the camera is tilted and another dimension of perspective is introduced, the results become more grossly inaccurate as there no compensation for this perspective.
I don't need to be able to calculate the height of a point, say a window on a sky scraper, but at least the coordinate at the base of the image plane, or which if I extend a line out from my image from a specified image space point I need at least the point that the line will intersect with the ground if there was nothing in the way.
All of the material I found about this says to just deproject the point using the world/view/projection matrices, which I find straightforward in itself except I don't have access to these matrices, just data I can collect at screenshot time and other algorithms use complex maths I simply don't grasp yet.
One end goal of this would be able to place markers in the 3d environment where a user clicks in the image, while not being able to run a simple deprojection from the user's view.
Any help would be appreciated, thanks.
Edit: Herp derp, while my implementation for doing so is a bit odd due to the limitations of my situation, the solution essentially boiled down to ananthonline's answer about simply recalculating the view/projection matrices.
Between position, rotation and FOV of the camera, could you not calculate the View/Projection matrices of the camera (songho.ca/opengl/gl_projectionmatrix.html) - thus allowing you to unproject known 3D points?

How can I turn an image file of a game map into boundaries in my program?

I have an image of a basic game map. Think of it as just horizontal and vertical walls which can't be crossed. How can I go from a png image of the walls to something in code easily?
The hard way is pretty straight forward... it's just if I change the image map I would like an easy way to translate that to code.
Thanks!
edit: The map is not tile-based. It's top down 2D.
I dabble in video games, and I personally would not want the hassle of checking the boundaries of pictures on the map. Wouldn't it be cleaner if these walls were objects that just happened to have an image property (or something like it)? The image would display, but the object would have well defined coordinates and a function could decide whether an object was hit every time the player moved.
I need more details.
Is your game tile based? Is it 3d?
If its tile based, you could downsample your image to the tile resolution and then do a 1:1 conversion with each pixel representing a tile.
I suggest writing a script that takes each individual pixel and determines if it represents part of a wall or not (ie black or white). Then, code your game so that walls are built from individual little block, represented by the pixels. Shouldn't be TOO hard...
If you don't need to precompute anything using the map info. You can just check in runtime logic using getPixel(x,y) like function.
Well, i can see two cases with two different "best solution" depending on where your graphic comes from:
Your graphics is tiled, and thus you can easily "recognize" a block because it's using the same graphics as other blocks and all you would have to do is a program that, when given a list of "blocking tiles" and a map can produce a "collision map" by comparing each tile with tiles in the "blocking list".
Your graphics is just some graphics (e.g. it could be a picture, or some CG graphics) and you don't expect pixels for a block to be the same as pixels from another block. You could still try to apply an "edge detection" algorithm on your picture, but my guess is then that you should rather split your picture in a BG layer and a FG layer so that the FG layer has a pre-defined color (or alpha=0) and test pixels against that color to define whether things are blocking or not.
You don't have much blocking shapes, but they are usually complex (polygons, ellipses) and would be unefficient to render using a bitmap of the world or to pack as "tile attributes". This is typically the case for point-and-click adventure games, for instance. In that case, you're probably to create path that match your boundaries with a vector drawing program and dig for a library that does polygon intersection or bezier collisions.
Good luck and have fun.