Generalized a specific question I had which is:
Would it still be called Cellular Automata if the state of the cell depended on the state of all the other cells in the grid?
Yes. Expand the grid size to n-dimensional neighborhood:
http://en.wikibooks.org/wiki/Cellular_Automata/Neighborhood
Related
Four common bean trials were established in fields, one trial per year. We combined density, bean genotype, and fungicide to manage white mold with a factorial scheme. The experimental design was a randomized complete block with four replicates. Each trial was analyzed by a three-way ANOVA. The fixed factors were density, genotype, fungicide, and interactions. The random factor was block.
My intent is to treat each trial as a form of replication, then I would like to combine all trials together in a more concise analysis.
We don’t want to draw conclusions between trials. We want to make conclusions of in general about our treatments.
I have used the complex model with fixed and random effects like this:
y ~ DENS:GEN:FUNG + (1 | trials) + (1 | trials:block)
I would be very grateful if someone could tell me if the model is appropriate for my search.
The model:
y ~ DENS:GEN:FUNG + (1 | trials) + (1 | trials:block)
has the following features:
A fixed effect for the 3-way interaction DENS:GEN:FUNG,
Random intercepts for block varying within levels of trials
It is very rarely a good idea to fit a 3-way interaction as a fixed effect without the 2-way interactions and the main effects. See these for further discussion:
https://stats.stackexchange.com/questions/236113/are-lower-order-interactions-a-prequisite-for-three-way-interactions-in-regressi
https://stats.stackexchange.com/questions/27724/do-all-interactions-terms-need-their-individual-terms-in-regression-model
As for the random structure, then yes, based on the description, this seems to be appropriate, although you don't state how many trials there are - if this is very few then it may be better to fit trials as a fixed effect.
I'm trying to understand how YOLO works for a project I'm doing. I've gone through the papers, many articles, and blog posts, but I'm still not sure why YOLO divides the entire image into a grid cell and considers each cell for computations. What would happen if we considered the whole image as just one cell (without dividing)? What is the purpose this grid cell serve? Is there a maximum number of objects a particular cell can detect?
Grid cells put the network predictions in a more structure form. Each grid cells correspond to a specific region of image, and these cells predicts objects which their centers lay into the region. So, it is about having a structured output representation to use the advantage of spatial regularity of images.
Each grid cell can make a prediction of a vector which has a form [objectness_value, bbox_h, bbox_w, bbox_cx, bbox_cy, p1, p2, .. pn].
objectness_value: how confident the prediction
bbox_h, bbox_w, bbox_cx, bbox_cy: offsets for bounding box height, width, center coordinate in x-axis, and center coordinate in y_axis, respectively.
p1, p2, ..pn: predicted class probabilities of each object category. (n objects in total)
More grid cell means more predictions. If you have one grid cell (image itself), you will have one bounding box prediction. It is not practical because there are likely many objects in images.
Note that a grid cell can make multiple bounding box predictions adding more bbox offsets to the output vector.
I have gone through a couple of YOLO tutorials but I am finding it some what hard to figure if the Anchor boxes for each cell the image is to be divided into is predetermined. In one of the guides I went through, The image was divided into 13x13 cells and it stated each cell predicts 5 anchor boxes(bigger than it, ok here's my first problem because it also says it would first detect what object is present in the small cell before the prediction of the boxes).
How can the small cell predict anchor boxes for an object bigger than it. Also it's said that each cell classifies before predicting its anchor boxes how can the small cell classify the right object in it without querying neighbouring cells if only a small part of the object falls within the cell
E.g. say one of the 13 cells contains only the white pocket part of a man wearing a T-shirt how can that cell classify correctly that a man is present without being linked to its neighbouring cells? with a normal CNN when trying to localize a single object I know the bounding box prediction relates to the whole image so at least I can say the network has an idea of what's going on everywhere on the image before deciding where the box should be.
PS: What I currently think of how the YOLO works is basically each cell is assigned predetermined anchor boxes with a classifier at each end before the boxes with the highest scores for each class is then selected but I am sure it doesn't add up somewhere.
UPDATE: Made a mistake with this question, it should have been about how regular bounding boxes were decided rather than anchor/prior boxes. So I am marking #craq's answer as correct because that's how anchor boxes are decided according to the YOLO v2 paper
I think there are two questions here. Firstly, the one in the title, asking where the anchors come from. Secondly, how anchors are assigned to objects. I'll try to answer both.
Anchors are determined by a k-means procedure, looking at all the bounding boxes in your dataset. If you're looking at vehicles, the ones you see from the side will have an aspect ratio of about 2:1 (width = 2*height). The ones viewed from in front will be roughly square, 1:1. If your dataset includes people, the aspect ratio might be 1:3. Foreground objects will be large, background objects will be small. The k-means routine will figure out a selection of anchors that represent your dataset. k=5 for yolov3, but there are different numbers of anchors for each YOLO version.
It's useful to have anchors that represent your dataset, because YOLO learns how to make small adjustments to the anchor boxes in order to create an accurate bounding box for your object. YOLO can learn small adjustments better/easier than large ones.
The assignment problem is trickier. As I understand it, part of the training process is for YOLO to learn which anchors to use for which object. So the "assignment" isn't deterministic like it might be for the Hungarian algorithm. Because of this, in general, multiple anchors will detect each object, and you need to do non-max-suppression afterwards in order to pick the "best" one (i.e. highest confidence).
There are a couple of points that I needed to understand before I came to grips with anchors:
Anchors can be any size, so they can extend beyond the boundaries of
the 13x13 grid cells. They have to be, in order to detect large
objects.
Anchors only enter in the final layers of YOLO. YOLO's neural network makes 13x13x5=845 predictions (assuming a 13x13 grid and 5 anchors). The predictions are interpreted as offsets to anchors from which to calculate a bounding box. (The predictions also include a confidence/objectness score and a class label.)
YOLO's loss function compares each object in the ground truth with one anchor. It picks the anchor (before any offsets) with highest IoU compared to the ground truth. Then the predictions are added as offsets to the anchor. All other anchors are designated as background.
If anchors which have been assigned to objects have high IoU, their loss is small. Anchors which have not been assigned to objects should predict background by setting confidence close to zero. The final loss function is a combination from all anchors. Since YOLO tries to minimise its overall loss function, the anchor closest to ground truth gets trained to recognise the object, and the other anchors get trained to ignore it.
The following pages helped my understanding of YOLO's anchors:
https://medium.com/#vivek.yadav/part-1-generating-anchor-boxes-for-yolo-like-network-for-vehicle-detection-using-kitti-dataset-b2fe033e5807
https://github.com/pjreddie/darknet/issues/568
I think that your statement about the number of predictions of the network could be misleading. Assuming a 13 x 13 grid and 5 anchor boxes the output of the network has, as I understand it, the following shape: 13 x 13 x 5 x (2+2+nbOfClasses)
13 x 13: the grid
x 5: the anchors
x (2+2+nbOfClasses): (x, y)-coordinates of the center of the bounding box (in the coordinate system of each cell), (h, w)-deviation of the bounding box (deviation to the prior anchor boxes) and a softmax activated class vector indicating a probability for each class.
If you want to have more information about the determination of the anchor priors you can take a look at the original paper in the arxiv: https://arxiv.org/pdf/1612.08242.pdf.
I am a back-end developer and new to Foundation as well as to CSS. I have seen few of the front-end developers generally struggles in converting a design into CSS if the designer has not considered the grid structure while designing. This generally happens because the 12 grid (default) column structure doesn't provide the flexibility to place the elements exactly as desired.
Since, Foundation provides us to use custom grid count, is it wise to use it? Most of the grid structure uses 12 grid column because 12 is a good number. What if we use a 60 grid column structure with the same gutter as we would use it for 12 grid (say gutter is 20px)? I believe it should give us more flexibility to place the elements in place.
Let me explain. Suppose we need a three column structure for my webpage, with a ratio 3:3:4 and I do not want to leave any offsets. I am not sure how can we achieve this using 12 grid columns except for may be positioning the elements manually. But in 60 grid column, we can easily achieve this by using large-18, large-18, large-24, with gutter as required, say 20px.
Some may say that if we use gutter as 20px in a 60 grid column structure, the gutters between the columns would take most of the space of the webpage. But no, gutters are imaginary till we use the actual columns. So here space for only 3 gutters will be used and rest will be the columns with the ratio 3:3:4.
This is precisely my understanding about the grid structure. Can someone with more knowledge let me know, if whatever I am assuming actually makes sense or are there some other points which I am missing and may haunt me if I use a 60 grid structure?
Great questions.
I'm definitely no expert, but I have used Foundation quite a bit. I'll assume you're using Foundation 6. There are a couple ways you could go about accomplishing these tasks, and I've used both successfully in projects.
First way is to change the default grid to fit as many columns as you feel comfortable with maintaining. Personally, the most I've ever used is a 24 column layout. It's flexible enough to fit most layouts. Change this in your _settings.scss:
$grid-column-count: 24;
If that, generally, won't be flexible enough for you, you can use the regular grid, and also make a custom grid and call it with a class. That way, you've got the regular 12/24 column grid to use for most pages, but you can call on the custom grid for special cases. http://foundation.zurb.com/sites/docs/grid.html#grid-row
#include grid-row($columns, $behavior, $width, $cf, $gutter) { }
Lastly, if you don't want to use any of that special stuff, you can use percentages in your columns and omit the standard large-# column classes. So you custom scss would look something like this: http://foundation.zurb.com/sites/docs/grid.html#columns
.special-column{
#include grid-column(30%);
}
.slightly-larger-column{
#include grid-column(40%);
}
Not sure if this answers all your questions, but sounds like there's just some confusion as far as best practices? You can make you grid as large as you want, but you'll have to maintain it either way.
This question already has an answer here:
Selecting the 4-neighbours of a pixel [closed]
(1 answer)
Closed 9 years ago.
I know that there is a function called nlfilter in matlab. What I'm trying to find is the 4-neighbours of a pixel. Does that means a 2x2 window? Can we do that using nlfilter?
Thanks.
I think you might find this easier to comprehend if you think of it in terms of blocks rather than in terms of neighbors. So a 2x2 neighbor is actually just a 2x2 block.
If you are talking about a center pixel relative to a north, south, east, west pixel, then you would want to use a 3x3 block. Unfortunately, that block would also include northeast, northwest, southeast, southwest neighbors.
Here is an example of sliding neighborhood operations in Matlab using nlfilter.