Interpretation of yolov5 output - deep-learning

I am making a face mask detection project and I trained my model using ultralytics/yolov5.I saved the trained model as an onnx file, you can find the model file here model.onnx. Now I want you use this model.onnx with opencv to detect real time face mask. The input image size during training was 320*320. You can visualize this model using netron.
I have written this code to capture the image using webcam and pass it to model.onnx to predict my bounding boxes. The code is as follows:
def predict(img):
session = onnxruntime.InferenceSession(model_path)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
img = img.reshape((1,3,320,320))
data = json.dumps({'data':img.tolist()})
data = np.array(json.loads(data)['data']).astype('float32')
result = session.run([output_name],{input_name:data})
result = np.array(result)
print(result.shape)
The output of result.shape is (1, 1, 3, 40, 40, 85)
Can anyone help me in interpreting this shape and how can i use this result array to predict my class, bounding box and confidence.

I've never worked with a pure yolov5 model, but here's the output format for yolov5s. It looks like it should be similar.
ouput tensor structure (yolov5s):
output_tensor[a, b, c, d]
a -> image index (If you're input is a batch of images, this tells you which image's output you're looking at. If your input is just one image, leave this as 0.)
b -> index of image in batch
c -> information about bounding box
0, 1 -> x and y coordinate of bounding box center
2, 3 -> width and height of bounding box
4 -> bounding box confidence
5 - 85 -> single class confidences
d -> index of proposed bounding boxes

Related

Convert Autodesk Viewer Units to Inches

I am using the viewer with the Edit2D library and am trying to convert the length between two x and y points into real measurements.
For example, after a shape is drawn using the polygon tool, I want to get the length of the first edge.
I get the drawn shape and the first two points on the event shown below, get 2 points, and get the distance between them. It seems they are in Autodesk Units or something. Is there an easy way to convert the units to feet or inches?
I have found
Edit2DExtension.defaultContext.unitHandler.fromDisplayUnits()
as well as
Edit2DExtension.defaultContext.unitHandler.toDisplayUnits()
and also
Autodesk.Viewing.Private.convertUnits().
I've tried all three, but am unsure how to use them and haven't found any good results with them yet.
There may be a way to do it through Edit2d but I haven't found a way yet and there is next to no documentation I can find on this library.
beforeEdit2DAction(event) {
console.log('After Shape has been drawn -> ', event);
let shape = event.action.shape;
let pointA = shape._loops[0][0]; // Value: {x: 21.393766403198242, y: 20.934386880096092}
let pointB = shape._loops[0][1]; // Value: {x: 25.082155227661133, y: 20.934386880096092}
// Distance between 2 points (Assuming Autodesk units)
let length = Autodesk.Edit2D.Math2D.distance2D(pointA, pointB); // 3.6883888244628906
// Need to convert to real world units (preferably ft or inches)
}
The real length is 29.5 FEET
Any ideas, or comments are welcome! Thanks
Edit: Trying Petr's suggestion here's what it returned:
That's an interested question. The "unit handler" keeps track of two types of units:
layer units (Edit2DExtension.defaultContext.unitHandler.config.layerUnits, can be inch for example)
display units (Edit2DExtension.defaultContext.unitHandler.config.displayUnits)
These two properties control how the actual lengths and areas are displayed. For example, the unit handler's toDisplayUnits method is implemented like so:
toDisplayUnits(fromUnits, value) {
this.updateConfig();
return Autodesk.Viewing.Private.convertUnits(fromUnits, this.config.displayUnits, this.config.scaleFactor, value);
}
With that, configuring fromUnits and displayUnits (and scale) properly should give you the real measurements you need.

Cut ultrasound signal between specific values using Octave

I have an ultrasound wave (graph axes: Volt vs microsecond) and need to cut the signal/wave between two specific value to further analyze this clipping. My idea is to cut the signal between 0.2 V (y-axis). The wave is sine shaped as shown in the figure with the desired cutoff points in red
In my current code, I'm cutting the signal between 1900 to 4000 ms (x-axis) (Aa = A(1900:4000);) and then I want to make the aforementioned clipping and proceed with the code.
Does anyone know how I could do this y-axis clipping?
Thanks!! :)
clear
clf
pkg load signal
for k=1:2
w=1
filename=strcat("PCB 2.1 (",sprintf("%01d",k),").mat")
load(filename)
Lthisrun=length(A);
Pico(k,1:Lthisrun)=A;
Aa = A(1900:4000);
Ah= abs(hilbert(Aa));
step=100;
hold on
i=1;
Ac=0;
for index=1:step:3601
Ac(i+1)=Ac(i)+Ah(i);
i=i+1
r(k)=trapz(Ac)
end
end
ok, you want to just look at values 'above the noise' in your data. Or, in this case, 'clip out' everything below 0.2V. the easiest way to do this is with logical indexing. You can take an array and create a sub array eliminating everything that doesn't meet a certain logical condition. See this example:
f = #(x) sin(x)./x;
x = [-100:.1:100];
y = f(x);
plot(x,y);
figure;
x_trim = x(y>0.2);
y_trim = y(y>0.2);
plot(x_trim, y_trim);
From your question it looks like you want to do the clipping after applying the horizontal windowing from 1900-4000. (you say that that is in milliseconds, but your image shows the pulse being much sooner than 1900 ms). In any case, something like
Ab = Aa(Aa > 0.2);
will create another array Ab that will only contain the portions of Aa with values above 0.2. You may need to do something similar (see the example) for the horizontal axis if your x-data is not just the element index.

Random graph generator

I am interested in generating weighted, directed random graphs with node constraints. Is there a graph generator in R or Python that is customizable? The only one I am aware of is igraph's erdos.renyi.game() but I am unsure if one can customize it.
Edit: the customizations I want to make are 1) drawing a weighted graph and 2) constraining some nodes from drawing edges.
In igraph python, you can use link the Erdos_Renyi class.
For constraining some nodes from drawing edges, this is controlled by the p value.
Erdos_Renyi(n, p, m, directed=False, loops=False) #these are the defaults
Example:
from igraph import *
g = Graph.Erdos_Renyi(10,0.1,directed=True)
plot(g)
By setting the p=0.1 you can see that some nodes do not have edges.
For the weights you can do something like:
g.ecount() # to find the number of edges
g.es["weights"] = range(1, g.ecount())
g.es["label"] = weights
plot(g)
Result:

Using dxfwrite or ezdxf to create dxf text in z direction

I would like to use either dxfwrite or ezdxf to create text along (WCS) y direction, and with height in the (WCS) z direction.
Using autocad, I have done this by setting UCS and entering text.
How can I do in dxfwrite or ezdxf (or any other python friendly library)?
dxf.ucs('textucs',xaxis=(0.,1.,0),yaxis=(0.,0.,1.))
lab = dxf.mtext('hello',np.array([0.,0.,.5]),layer='mylay',height=0.3)
doesn't work, presumably because I have only created UCS, and am not using it.
Defining an UCS does nothing, dxfwrite/ezdxf are not CAD applications.
This example uses ezdxf to write a text in the YZ-plane:
import ezdxf
dwg = ezdxf.new('ac1015')
modelspace = dwg.modelspace()
modelspace.add_mtext("This is a text in the YZ-plane",
dxfattribs={
'width': 12, # reference rectangle width
'text_direction': (0, 1, 0), # write in y direction
'extrusion': (1, 0, 0) # normal vector of the text plane
})
dwg.saveas('mtext_in_yz_plane.dxf')
mtext in dxfwrite is just a bunch of TEXT entities, because the MTEXT entity requires DXF13 or later.

How does this work in computing the depth map?

From this site: http://www.catalinzima.com/?page_id=14
I've always been confused about how the depth map is calculated.
The vertex shader function calculates position as follows:
VertexShaderOutput VertexShaderFunction(VertexShaderInput input)
{
VertexShaderOutput output;
float4 worldPosition = mul(input.Position, World);
float4 viewPosition = mul(worldPosition, View);
output.Position = mul(viewPosition, Projection);
output.TexCoord = input.TexCoord; //pass the texture coordinates further
output.Normal =mul(input.Normal,World); //get normal into world space
output.Depth.x = output.Position.z;
output.Depth.y = output.Position.w;
return output;
}
What are output.Position.z and output.Position.w? I'm not sure as to the maths behind this.
And in the pixel shader there is this line: output.Depth = input.Depth.x / input.Depth.y;
So output.Depth is output.Position.z / outputPOsition.w? Why do we do this?
Finally in the point light shader (http://www.catalinzima.com/?page_id=55) to convert this output to be a position the code is:
//read depth
float depthVal = tex2D(depthSampler,texCoord).r;
//compute screen-space position
float4 position;
position.xy = input.ScreenPosition.xy;
position.z = depthVal;
position.w = 1.0f;
//transform to world space
position = mul(position, InvertViewProjection);
position /= position.w;
again I don't understand this. I sort of see why we use InvertViewProjection as we multiply by the view projection previously, but the whole z and now w being made to equal 1, after which the whole position is divided by w confuses me quite a bit.
To understand this completely, you'll need to understand how the algebra that underpins 3D transforms works. SO does not really help (or I don't know how to use it) to do matrix math, so it'll have to be without fancy formulaes. Here is some high level explanation though:
If you look closely, you'll notice that all transformations that happen to a vertex position (from model to world to view to clip coordinates) happens to be using 4D vectors. That's right. 4D. Why, when we live in a 3D world ? Because in that 4D representation, all the transformations we usually want to do to vertices are expressible as a matrix multiplication. This is not the case if we stay in 3D representation. And matrix multiplications are what a GPU is good at.
What does a vertex in 3D correspond to in 4D ? This is where it gets interesting. The (x, y, z) point corresponds to the line (a.x, a.y, a.z, a). We can grab any point on this line to do the math we need, and we usually pick the easiest one, a=1 (that way, we don't have to do any multiplication, just set w=1).
So that answers pretty much all the math you're looking at. To project a 3D point in 4D we set w=1, to get back a component from a 4D vector, that we want to compare against our standard sizes in 3D, we have to divide that component by w.
This coordinate system, if you want to dive deeper, is called homogeneous coordinates.