XPS text position -- what am I not understanding? - xps

I have multiple examples, this is just the first one. I've been extracting some information from this file (I really wish there were a better way!) and something changed that broke my parser. Here's an example of the problem:
The first line displays the date over at the right margin, the second displays "Room 1" in a slightly larger, bolded font against the left margin farther down the page. Note, though, that the first line has a Y location about 250 units below the second. There's obviously something about the location I'm not understanding.
(I only need to be able to properly Y-sort the items. While this one is in the right order I have found them out of order in the past.)
<Glyphs RenderTransform="0.0403884,0,0,0.0403119,0,0" Fill="#ff000000" FontUri="/Documents/1/Resources/Fonts/6C15166D-E658-4B97-A6C0-E217017F767F.odttf" FontRenderingEmSize="327.68" StyleSimulations="None" OriginX="14079.4" OriginY="2270.24" Indices="26,60;17,61;3,59;48,63;68,60;85,60;70,60;75,61;3,62;21,60;19,60;21,61;21" UnicodeString="7. March 2022" />
<Glyphs RenderTransform="0.0646215,0,0,0.0644991,0,0" Fill="#ff000000" FontUri="/Documents/1/Resources/Fonts/FF418D14-C0F5-49FA-8F94-42C185369528.odttf" FontRenderingEmSize="327.68" StyleSimulations="None" OriginX="836.8" OriginY="1929.92" Indices="53,59;82,60;82,59;80,62;3,58;20" UnicodeString="Room 1" />

The reason you are seeing this unintuitive behavior can be determined by a careful (if painful) reading of the XPS standard. You are comparing the OriginY values of two sets of glyphs. Per the spec, section 12.1, OriginY is specified in the effective coordinate space. A bit further down, it mentions that RenderTransform establishes a new coordinate frame for the glyph run, and in so doing, it affects the origin x and origin y of the glyphs.
To determine the actual coordinates of the glyphs on the page, you need to apply the render transform to the OriginX and OriginY. An explanation of how the RenderTransform works can be found in section 14.4.1
For this specific example, it is possible to recover the actual coordinates of the glyphs by multiplying the OriginX by the first value in the RenderTransform, and OriginY by the fourth value in the RenderTransform. This will not hold true in all cases as the fifth and sixth values may specify additional x and y offsets, and the second and third values can introduce skew and rotation effects as well.

Related

Is it possible to limit the device/camera movement in Facebook AR Studio?

For example there is an image using canvas with a rectangle in World Space. In doing so the camera/device can look around freely with the image placed into the "real" world. I wonder if there is a way to limit that "movement", spanning left to right, top to bottom but the device/camera view is limited at a certain point. Even if users turn the device/camera 360degrees, the view is stuck at a certain point. Say if the user pans left the camera/device stops at rotationY: 9, If right then stops at rotationY :-15, rotationX is stuck at 0.
I saw there's a BoundBox in the documentation but not sure what that is. There's a DeviceMotionModule but no idea how to use it. I don't know what the script example given is suppose to do.
Look into using DeviceMotion. https://sparkar.com/ar-studio/learn/documentation/reference/classes/devicemotionmodule
The script example rotates the 3d plane according to the rotation of the phone.
You will have to do some maths to position your objects according to rules and signal you get from DeviceMotion.
Using the reactive module you can access the "Clamp" method which is actually made to restrict values between two bounds. I recently found this out because I had a similar problem. From this page
clamp(x: ScalarSignal, min: ScalarSignal, max: ScalarSignal): ScalarSignal
Returns a signal with the value that is the value of the given x signal constrained to lie between the values of the given min and max signals.
Note: The behavior is undefined if min is greater than max.

Anchor Boxes in YOLO : How are they decided

I have gone through a couple of YOLO tutorials but I am finding it some what hard to figure if the Anchor boxes for each cell the image is to be divided into is predetermined. In one of the guides I went through, The image was divided into 13x13 cells and it stated each cell predicts 5 anchor boxes(bigger than it, ok here's my first problem because it also says it would first detect what object is present in the small cell before the prediction of the boxes).
How can the small cell predict anchor boxes for an object bigger than it. Also it's said that each cell classifies before predicting its anchor boxes how can the small cell classify the right object in it without querying neighbouring cells if only a small part of the object falls within the cell
E.g. say one of the 13 cells contains only the white pocket part of a man wearing a T-shirt how can that cell classify correctly that a man is present without being linked to its neighbouring cells? with a normal CNN when trying to localize a single object I know the bounding box prediction relates to the whole image so at least I can say the network has an idea of what's going on everywhere on the image before deciding where the box should be.
PS: What I currently think of how the YOLO works is basically each cell is assigned predetermined anchor boxes with a classifier at each end before the boxes with the highest scores for each class is then selected but I am sure it doesn't add up somewhere.
UPDATE: Made a mistake with this question, it should have been about how regular bounding boxes were decided rather than anchor/prior boxes. So I am marking #craq's answer as correct because that's how anchor boxes are decided according to the YOLO v2 paper
I think there are two questions here. Firstly, the one in the title, asking where the anchors come from. Secondly, how anchors are assigned to objects. I'll try to answer both.
Anchors are determined by a k-means procedure, looking at all the bounding boxes in your dataset. If you're looking at vehicles, the ones you see from the side will have an aspect ratio of about 2:1 (width = 2*height). The ones viewed from in front will be roughly square, 1:1. If your dataset includes people, the aspect ratio might be 1:3. Foreground objects will be large, background objects will be small. The k-means routine will figure out a selection of anchors that represent your dataset. k=5 for yolov3, but there are different numbers of anchors for each YOLO version.
It's useful to have anchors that represent your dataset, because YOLO learns how to make small adjustments to the anchor boxes in order to create an accurate bounding box for your object. YOLO can learn small adjustments better/easier than large ones.
The assignment problem is trickier. As I understand it, part of the training process is for YOLO to learn which anchors to use for which object. So the "assignment" isn't deterministic like it might be for the Hungarian algorithm. Because of this, in general, multiple anchors will detect each object, and you need to do non-max-suppression afterwards in order to pick the "best" one (i.e. highest confidence).
There are a couple of points that I needed to understand before I came to grips with anchors:
Anchors can be any size, so they can extend beyond the boundaries of
the 13x13 grid cells. They have to be, in order to detect large
objects.
Anchors only enter in the final layers of YOLO. YOLO's neural network makes 13x13x5=845 predictions (assuming a 13x13 grid and 5 anchors). The predictions are interpreted as offsets to anchors from which to calculate a bounding box. (The predictions also include a confidence/objectness score and a class label.)
YOLO's loss function compares each object in the ground truth with one anchor. It picks the anchor (before any offsets) with highest IoU compared to the ground truth. Then the predictions are added as offsets to the anchor. All other anchors are designated as background.
If anchors which have been assigned to objects have high IoU, their loss is small. Anchors which have not been assigned to objects should predict background by setting confidence close to zero. The final loss function is a combination from all anchors. Since YOLO tries to minimise its overall loss function, the anchor closest to ground truth gets trained to recognise the object, and the other anchors get trained to ignore it.
The following pages helped my understanding of YOLO's anchors:
https://medium.com/#vivek.yadav/part-1-generating-anchor-boxes-for-yolo-like-network-for-vehicle-detection-using-kitti-dataset-b2fe033e5807
https://github.com/pjreddie/darknet/issues/568
I think that your statement about the number of predictions of the network could be misleading. Assuming a 13 x 13 grid and 5 anchor boxes the output of the network has, as I understand it, the following shape: 13 x 13 x 5 x (2+2+nbOfClasses)
13 x 13: the grid
x 5: the anchors
x (2+2+nbOfClasses): (x, y)-coordinates of the center of the bounding box (in the coordinate system of each cell), (h, w)-deviation of the bounding box (deviation to the prior anchor boxes) and a softmax activated class vector indicating a probability for each class.
If you want to have more information about the determination of the anchor priors you can take a look at the original paper in the arxiv: https://arxiv.org/pdf/1612.08242.pdf.

What's the purpose of Canvas.Context Save and Restore in this example?

This page shows some animations in HTML5 canvas. If you look at the source of the scroller, there's a statement to save the context after clearing the rectangle and restoring it after the animation. If I substitute the restore statement with another ctx.clearRect(0, 0, can.width, can.height statement, nothing works. I thought the restore is restoring the cleared rectangle but it seems its restoring more info. What's that extra info that's needed for the next frame?
I am not looking for HTML5 textbook definitions of Save and Restore but I want to understand why they are needed in this specific example.
UPDATE
It's frustrating to get an answer where I specifically already mentioned in the question I don't want to get the definitions of save() and restore(). I already know Save() saves the state of the context and Restor()e restores it. My question is very specific. Why is restore() used in the manner in the example when all the Save did is saved an empty canvas. Why is restoring an empty canvas not the same as clearing it?
Canvas state isn't what's drawn on it. It's a stack of properties which define the current state of the tools which are used to draw the next thing.
Canvas is an immediate-mode bitmap.
Like MS Paint. Once it's there, it's there, so there's no point "saving" the current image data, because that would be like saving the whole JPEG, every time you make a change, every frame...
...no, the state you save is the state which will dictate what coordinate-orientation, dimension-scale, colour, etc, you use to draw the NEXT thing (and all things thereafter, until you change those values by hand).
var canvas = document.createElement("canvas"),
easel = canvas.getContext("2d");
easel.fillStyle = "rgb(80, 80, 120)";
easel.strokeStyle = "rgb(120, 120, 200)";
easel.fillRect(x, y, width, height);
easel.strokeRect(x, y, width, height);
easel.save(); // stores ALL current status properties in the stack
easel.rotate(degrees * Math.PI / 180); // radians
easel.scale(scale_X, scale_Y); // any new coordinates/dimensions will now be multiplied by these
easel.translate(new_X, new_Y); // new origin coordinates, based on rotated orientation, multiplied by the scale-factor
easel.fillStyle = "gold";
easel.fillRect(x, y, width, height); // completely new rectangle
// origin is different, and the rotation is different, because you're in a new coordinate space
easel.clearRect(0, 0, width, height); // not even guaranteed to clear the actual canvas, anymore
easel.strokeRect(width/2, height/2, width, height); // still in the new coordinate space, still with the new colour
easel.restore(); // reassign all of the previous status properties
easel.clearRect(0, 0, width, height);
Assuming that you were only one state-change deep on the stack, that last line, now that your canvas' previous state was restored, should have successfully cleared itself (subpixel shenanigans notwithstanding).
So as you can see, it has very, VERY little to do with erasing the canvas.
In fact, it has nothing to do with erasing it, at all.
It has to do with wanting to draw something, and doing the basic outlining and sweeping colours/styles, and then manually writing in the colours for the smaller details on top, and then manually writing all of the styles back the way they were before, to go back to sweeping strokes for the next object, and on and on...
Instead, save general states that will be reused, create a new state for smaller details, and return to the general state, without having to hard-code it, every time, or write setter functions to set frequently-used values on the canvas over and over (resetting scale/rotation/affine-transforms/colours/fonts/line-widths/baseline-alignment/etc).
In your exact example, then, if you're paying attention, you'll see that the only thing that's changing is the value of step.
They've set the state of a bunch of values for the canvas (colour/font/etc).
And then they save. Well, what did they save?
You're not looking deep enough. They actually saved the default translation (ie: origin=0,0 in original world-space).
But you didn't see them define it?
That's because it's defined as default.
They then increase the step 1 pixel (actually, they do this first, but it doesn't matter after the first loop -- stay with me here).
Then they set a new origin point for 0,0 (ie: from now on, when they type 0,0 that new origin will point to a completely different place on the canvas).
That origin point is equal to x being the exact middle of the canvas, and y being equal to the current step (ie: pixel 1 or pixel 2, etc... and why the difference between starting at 0 and starting at 1 really doesn't matter).
Then what do they do?
They restore.
Well, what have they restored?
...well, what have they changed?
They're restoring the point of origin to 0,0
Why?
Well, what would happen if they didn't?
If the canvas is 500px x 200px, and it starts at 0,0 in our current screen space... ...that's great...
Then they translate it to width/2, 1
Okay, so now when they ask to draw text at 0,0 they'll actually be drawing at 250, 1
Wonderful. But what happens next time?
Now they're translating by width/2, 2
You think, well, that's fine... ...the draw call for 0,0 is going to happen at 250, 2, because they've set it to clear numbers: canvas.width/2, 2
Nope. Because current 0,0 is actually 250,1 according to our screen. And one translation is relative to its previous translation...
...so now you're telling the canvas to start at it's current-coordinates' 0,0 and go left 250, and down 2.
According to the screen (which is like a window, looking at the map, and not the map, itself) we're now 500px to the right, and 3 pixels down from where we started... And only one frame has gone by.
So they restore the map's coordinates to be the same origin as the screen's coordinates (and the rotation to be the same, and the scale, and the skew, etc...), before setting the new one.
And as you might guess, by looking at it, now, you can see that the text should actually move top to bottom. Not right to left, like the page says...
Why do this?
Why go to the trouble of changing the coordinate-system of the drawing-context, when the draw commands give you an x and y right there in the function?
If you want to draw a picture on the canvas, and you know how high and wide it is, and where you'd like the top-left corner to be, why can't you just do this:
easel.drawImage(myImg, x, y, myImg.width, myImg.height);
Well, you can.
You can totally do that. There's nothing stopping you.
In fact, if you wanted to make it zoom around the screen, you could just update the x and y on a timer, and call it a day.
But what about if you were drawing a game character? What if the character had a hat, and had gloved hands, and big boots, and all of those things were drawn separate from the character?
So first you'd say "well, he's standing at x and y in the world, so x plus where his hand is in relation to his body would be x + body.x - hand.x...or was that plus..."
...and now you have draw calls for all of his parts that are all looking like a notebook full of Grade 5 math homework.
Instead, you can say: "He's here. Set my coordinates so that 0,0 is right in the middle of my guy". Now your draw calls are as simple as "My right hand is 6 pixels to the right of the body, my left hand is 3 pixels to the left".
And when you're done drawing your character, you can set your origin back to 0,0 and then the next character can be drawn. Or, if you want to attempt it, you can then translate from there to the origin of the next character, based on the delta from one to the other (this will save you a function call per translation). And then, if you only saved state once the whole time (the original state), at the end, you can return to 0,0 by calling .restore.
The context save() saves stuff like transformation color among other stuff. Then you can change the context and restore it to have the same as when you saved it. It works like a stack so you can push multiple canvas states onto the stack and recover them.
http://html5.litten.com/understanding-save-and-restore-for-the-canvas-context/

How to expand to a normal vessel with ITK when I have a skeleton line and every radius for pixels?

I did an thinning operation on vessels, and now I'm trying to reconstruct it.
How to expand them to normal vessels in ITK when I have a skeleton line and radius values for each pixel?
DISCLAIMER: This could be slow, but since no other answer has been suggested, here you go.
Since your question does not indicate this, I'm assuming that you're talking about a 2D image, but the following approach can be extended for 3D too. This is how I'd go about it:
Create a blank image with zero filled pixel values
Create multiple instances of disk/sphere ShapedNeighborhoodIterator each having a different radius on the blank image (choose the most common radii from the vessel width histogram).
Visit each pixel in the binary skeleton image. When you come upon a white (vessel skeleton) pixel, recollect the vessel radius at that pixel.
If you already have a ShapedNeighborhoodIterator for that radius value, take the iterator to the pixel location in the blank image and fill up a disk/sphere of white pixels centered about that pixel. If you don't have a ShapedNeighborhoodIterator for that radius value, create one and do the same operation.
Once you finish iterating over the skeletonized image, you will have a reconstructed tree in the other image. Note that step 2 is optional, but will help you achieve faster computation.

How to divide tiny double precision numbers correctly without precision errors?

I'm trying to diagnose and fix a bug which boils down to X/Y yielding an unstable result when X and Y are small:
In this case, both cx and patharea increase smoothly. Their ratio is a smooth asymptote at high numbers, but erratic for "small" numbers. The obvious first thought is that we're reaching the limit of floating point accuracy, but the actual numbers themselves are nowhere near it. ActionScript "Number" types are IEE 754 double-precision floats, so should have 15 decimal digits of precision (if I read it right).
Some typical values of the denominator (patharea):
0.0000000002119123
0.0000000002137313
0.0000000002137313
0.0000000002155502
0.0000000002182787
0.0000000002200977
0.0000000002210072
And the numerator (cx):
0.0000000922932995
0.0000000930474444
0.0000000930582124
0.0000000938123574
0.0000000950458711
0.0000000958000159
0.0000000962901528
0.0000000970442977
0.0000000977984426
Each of these increases monotonically, but the ratio is chaotic as seen above.
At larger numbers it settles down to a smooth hyperbola.
So, my question: what's the correct way to deal with very small numbers when you need to divide one by another?
I thought of multiplying numerator and/or denominator by 1000 in advance, but couldn't quite work it out.
The actual code in question is the recalculate() function here. It computes the centroid of a polygon, but when the polygon is tiny, the centroid jumps erratically around the place, and can end up a long distance from the polygon. The data series above are the result of moving one node of the polygon in a consistent direction (by hand, which is why it's not perfectly smooth).
This is Adobe Flex 4.5.
I believe the problem most likely is caused by the following line in your code:
sc = (lx*latp-lon*ly)*paint.map.scalefactor;
If your polygon is very small, then lx and lon are almost the same, as are ly and latp. They are both very large compared to the result, so you are subtracting two numbers that are almost equal.
To get around this, we can make use of the fact that:
x1*y2-x2*y1 = (x2+(x1-x2))*y2 - x2*(y2+(y1-y2))
= x2*y2 + (x1-x2)*y2 - x2*y2 - x2*(y2-y1)
= (x1-x2)*y2 - x2*(y2-y1)
So, try this:
dlon = lx - lon
dlat = ly - latp
sc = (dlon*latp-lon*dlat)*paint.map.scalefactor;
The value is mathematically the same, but the terms are an order of magnitude smaller, so the error should be an order of magnitude smaller as well.
Jeffrey Sax has correctly identified the basic issue - loss of precision from combining terms that are (much) larger than the final result.
The suggested rewriting eliminates part of the problem - apparently sufficient for the actual case, given the happy response.
You may find, however, that if the polygon becomes again (much) smaller and/or farther away from the origin, inaccuracy will show up again. In the rewritten formula the terms are still quite a bit larger than their difference.
Furthermore, there's another 'combining-large&comparable-numbers-with-different-signs'-issue in the algorithm. The various 'sc' values in subsequent cycles of the iteration over the edges of the polygon effectively combine into a final number that is (much) smaller than the individual sc(i) are. (if you have a convex polygon you will find that there is one contiguous sequence of positive values, and one contiguous sequence of negative values, in non-convex polygons the negatives and positives may be intertwined).
What the algorithm is doing, effectively, is computing the area of the polygon by adding areas of triangles spanned by the edges and the origin, where some of the terms are negative (whenever an edge is traversed clockwise, viewing it from the origin) and some positive (anti-clockwise walk over the edge).
You get rid of ALL the loss-of-precision issues by defining the origin at one of the polygon's corners, say (lx,ly) and then adding the triangle-surfaces spanned by the edges and that corner (so: transforming lon to (lon-lx) and latp to (latp-ly) - with the additional bonus that you need to process two triangles less, because obviously the edges that link to the chosen origin-corner yield zero surfaces.
For the area-part that's all. For the centroid-part, you will of course have to "transform back" the result to the original frame, i.e. adding (lx,ly) at the end.