In YoloV3, how to decide anchor boxes belongs to which scale? - deep-learning

Anchor boxes are important to YoloV3, especially when running on the custom dataset.
I knew that anchor boxes are calculated with bboxes height and width, through kmeans.
After I got the anchor boxes, they are supposed to be distributed into 3 scales. By default, the 3 scales are 1313, 2626, 52*52.
So, the question is how to decide which anchor belongs to which scale?
Currently, I'm sorting those anchors based on their summary, since the 13*13 scale should have a larger anchor.
Am I right here?
Following is an example:
Then, I sort them:
.
In the second image, the first 3 anchor boxes belongs to 13*13 scale.
I read lots of blogs and codes, but seems no one have explained that clearly, may because of it's too simple for them.

Related

Anchor Boxes in YOLO : How are they decided

I have gone through a couple of YOLO tutorials but I am finding it some what hard to figure if the Anchor boxes for each cell the image is to be divided into is predetermined. In one of the guides I went through, The image was divided into 13x13 cells and it stated each cell predicts 5 anchor boxes(bigger than it, ok here's my first problem because it also says it would first detect what object is present in the small cell before the prediction of the boxes).
How can the small cell predict anchor boxes for an object bigger than it. Also it's said that each cell classifies before predicting its anchor boxes how can the small cell classify the right object in it without querying neighbouring cells if only a small part of the object falls within the cell
E.g. say one of the 13 cells contains only the white pocket part of a man wearing a T-shirt how can that cell classify correctly that a man is present without being linked to its neighbouring cells? with a normal CNN when trying to localize a single object I know the bounding box prediction relates to the whole image so at least I can say the network has an idea of what's going on everywhere on the image before deciding where the box should be.
PS: What I currently think of how the YOLO works is basically each cell is assigned predetermined anchor boxes with a classifier at each end before the boxes with the highest scores for each class is then selected but I am sure it doesn't add up somewhere.
UPDATE: Made a mistake with this question, it should have been about how regular bounding boxes were decided rather than anchor/prior boxes. So I am marking #craq's answer as correct because that's how anchor boxes are decided according to the YOLO v2 paper
I think there are two questions here. Firstly, the one in the title, asking where the anchors come from. Secondly, how anchors are assigned to objects. I'll try to answer both.
Anchors are determined by a k-means procedure, looking at all the bounding boxes in your dataset. If you're looking at vehicles, the ones you see from the side will have an aspect ratio of about 2:1 (width = 2*height). The ones viewed from in front will be roughly square, 1:1. If your dataset includes people, the aspect ratio might be 1:3. Foreground objects will be large, background objects will be small. The k-means routine will figure out a selection of anchors that represent your dataset. k=5 for yolov3, but there are different numbers of anchors for each YOLO version.
It's useful to have anchors that represent your dataset, because YOLO learns how to make small adjustments to the anchor boxes in order to create an accurate bounding box for your object. YOLO can learn small adjustments better/easier than large ones.
The assignment problem is trickier. As I understand it, part of the training process is for YOLO to learn which anchors to use for which object. So the "assignment" isn't deterministic like it might be for the Hungarian algorithm. Because of this, in general, multiple anchors will detect each object, and you need to do non-max-suppression afterwards in order to pick the "best" one (i.e. highest confidence).
There are a couple of points that I needed to understand before I came to grips with anchors:
Anchors can be any size, so they can extend beyond the boundaries of
the 13x13 grid cells. They have to be, in order to detect large
objects.
Anchors only enter in the final layers of YOLO. YOLO's neural network makes 13x13x5=845 predictions (assuming a 13x13 grid and 5 anchors). The predictions are interpreted as offsets to anchors from which to calculate a bounding box. (The predictions also include a confidence/objectness score and a class label.)
YOLO's loss function compares each object in the ground truth with one anchor. It picks the anchor (before any offsets) with highest IoU compared to the ground truth. Then the predictions are added as offsets to the anchor. All other anchors are designated as background.
If anchors which have been assigned to objects have high IoU, their loss is small. Anchors which have not been assigned to objects should predict background by setting confidence close to zero. The final loss function is a combination from all anchors. Since YOLO tries to minimise its overall loss function, the anchor closest to ground truth gets trained to recognise the object, and the other anchors get trained to ignore it.
The following pages helped my understanding of YOLO's anchors:
https://medium.com/#vivek.yadav/part-1-generating-anchor-boxes-for-yolo-like-network-for-vehicle-detection-using-kitti-dataset-b2fe033e5807
https://github.com/pjreddie/darknet/issues/568
I think that your statement about the number of predictions of the network could be misleading. Assuming a 13 x 13 grid and 5 anchor boxes the output of the network has, as I understand it, the following shape: 13 x 13 x 5 x (2+2+nbOfClasses)
13 x 13: the grid
x 5: the anchors
x (2+2+nbOfClasses): (x, y)-coordinates of the center of the bounding box (in the coordinate system of each cell), (h, w)-deviation of the bounding box (deviation to the prior anchor boxes) and a softmax activated class vector indicating a probability for each class.
If you want to have more information about the determination of the anchor priors you can take a look at the original paper in the arxiv: https://arxiv.org/pdf/1612.08242.pdf.

How to position an element on top of another element without using position and margin?

This is my code:
<table>...Some content...</table>
<table>...Another content...</table>
I want to put the second table on top of the first table. This is to be used as an email template (in some clients position and margin are not available).
Those are the only two options available (outside of transform, which definitely won't work if position isn't available) that will allow one element to invade another element's space. If you can't use position or margin, then you're out of luck, and you need to re-evaluate what you are trying to achieve and why. Any chance you could do this with images?
There are always ways...not always elegant, but when you have limited options, 'works' is often all you really need. IMO, creativity is as much about solving a problem with limited options as it is thinking 'outside the box'.
Most email clients allow you to set 'height', so simply wrap the first table (the background) in a div and give that div height:0px;. the table will overflow the div, but the next element won't respect it's space because it has 0 height, and will effectively be layered in front.
http://jsfiddle.net/L0d3tnzu/
If you want the size of the tables to match exactly, you'll probably have to explicitly set heights and widths, but the fiddle above illustrates the basic concept. Hope this helps!
EDIT:
Based on the additional info in the comment (the second table should only partly overlap the first table) here is an updated fiddle: https://jsfiddle.net/acq3ob6y/1/
EDIT #2:
Dang. Outlook switching to the Word/Office rendering engine for HTML/CSS might be the only way possible to get WORSE than the IE version. Sigh. (Thanks to #Gortonington for the comment/clarification, though!)
Ok, then, the idea of a background image is only a problem for retina displays (if you want them to be all crisp and beautiful and retina-ie), and retina devices are going to be handling CSS in a more modern way (hopefully!), so how about this as a solution: Media Query targeting device resolution loads CSS with the double-size img and uses css background-size to constrain it: http://jsfiddle.net/tcyjo7ok
Third try is a charm? At least the list of options is growing...
The only way to overlay two elements across email clients is through use of background images. Even this can be broken in some clients and requires a lot of conditional and reiterate code (backgrounds.cm is good resource for email bg images).
This is the only option that will display in MOST clients. Even this is still very restricted and not very agile to use (but that is true in ALL email coding). Most other techniques will only work for a couple clients and break completely in all others.

Creating a table-like grid without using table

Please see my awesome graphic below which is neither too scale or complete. BUT, I wanted to show the structure I'm going for instead of describing it.
I am creating a space rental system wherein a calendar, structured similarly to the below image, both shows the "taken" spots and also allows a user to click an "open" slot to reserve it themselves. I don't need help with the functionality though, just the layout.
Since this is tabular data at it's finest, with headers and everything, I was able to easily create the desired layout that way. However, tables render from left to right, so in the example below, it renders SPACE 1 9:00am, SPACE 2 9:00am, SPACE 3 9:00am, etc. I need it to actually render SPACE 1 9:00am, SPACE 1 10:00am, etc.
The reason is that in order to make each reservation into a "block" represented by the blue squares below, I need to be able to loop through the columns vertically and not through the rows horizontally.
I also want the columns to be a consistent width and be flexible if more spaces are added or if one/some are removed down the road.
I've been playing with flexbox, which I've barely used before, and I'm having no luck at all. I'm not even sure that's the right direction.
My question would be either 1) is there a way to get a standard table to load the way I want or 2) how can I do this without tables?
Maybe bootstrap's grid system will fit your needs. http://getbootstrap.com/css/
You can create a grid like structure by adding columns, up to 12 in a row, and locking those into rows.

Reduce area of svg text

I am currently creating a word cloud using an in house developed library, it uses the svg element text to display the words, the problem I have encounter is that the area of some words sometimes overlaps other words as you can see if you inspect test1 in this jsfiddle, this becomes a problem if the words must be clickable.
I want to know if it is possible to reduce the area of the text to the minimum, just wrapping the word, a small padding is accepted.
I have already tried the solution posted in this answer but it didn't work.
I would prefer a css solution if it exists rather than messing with svg but if there is no other option that will do.
Edit: Ok, enough reputation to post images. What I currently have:
What I would like to have:
There are two problems; I currently have only a solution to one. Your text example is misleading. Try Text1g instead to see the descent (i.e. the amount of space below the baseline which the g needs). If you do this, then you'll see that the texts really overlap - you just don't notice because your test text doesn't contain a good set of test characters.
Apart from that, I see that the element is 67px high while the font-size is only 60px. I don't see where the additional 7 pixels are coming from. It's not padding and not margin :-/
Why do you need to know the minimum bounding box?
If it is because you are linking with the element, or applying click events to the words, then you should investigate the pointer-events attribute.
You possibly want something like:
<text ... pointer-events="fill">ejecutar</text>
You will only get events when the pointer is over the fill of the words. This might be a bit fiddly for clicking though because the holes in words will not be clickable.
You could ease that by putting an invisible <rect> of an appropriate size in front of the word with pointer-events="fill". The "fill" value will attract events for where the fill would be even if it is invisible. However that requires you know the bbox of the word, which we already established you don't have (?).
You could give the words an invisible fat stroke and use pointer-events="all". The invisible stroke will make the clickable area (invisbly) fatter and hence the inter-word holes smaller.

Dynamically Populate Content Within Un-Orthodox Grid

I'm currently developing out a blog page with a 3 X 3 grid layout for content to fall into the different boxes (see attached example).
http://imageshack.us/photo/my-images/337/cssex.jpg/
The content blocks in the lighter gray are meant to be stationary, so any updated, recently added, etc. content will not affect these boxes, only the black ones. I'm trying to figure out the best approach with keeping the gray boxes stationary, but allowing the black boxes to be populated dynamically (WordPress blog entries) and floating naturally through the layout.
As of now, I'm thinking that each individual black box will query the recent post that aligns to it. So, the first black box would query the most recent post, the second black box would query the second recent post and so on.
A big order!
Here is the general idea to help get you going:
You need to make those blocks a <div> or <section> with an ID tag like this:
<section id=brief1>
(BTW, you can also use a "table" & merge cells to get that layout, just ensure you use an ID)
Then you need find a script to update the innerHTML using straight JavaScript, or a JS library like jQuery, MooTools, etc. This will allow you to inject text &/or an image inside those boxes. Example search: http://duckduckgo.com/?q=javascript+update+innerHTML+div
Once you have 1 spot updated with text, it is time to edit that script. Make an array of our ID tags, then loop though all of them to insert new content one at a time.
Good luck! If I see something pre-rolled on my travels, I'll update this thread.