How to do detections in YOLOv5 only in a region of interest? - deep-learning

I want to detect objects only in a specified region and ignore all the other detections outside the ROI.

If I understand your question correctly, you want to detect objects which are present on the road surface.
One way to do that would be to first detect road surface (maybe by detecting lane markings: https://github.com/amusi/awesome-lane-detection) or using road free space detection models: https://github.com/fabvio/ld-lsi/) and then either feed only that part to your YOLOv5 to detect objects, or feed the complete image to it and later on filter your detected objects based on whether they are present on the road surface ( i.e. the bounding box of object overlaps with the road surface). If yes, you keep them otherwise ignore them.

Related

Get closest mesh point to actor?

My player has a collision sphere to detect any static mesh that gets close to it.
I need to find the closest point on the static meshes that are colliding with it.
I think I could use "Get Actor Bounds" to get the mesh boundaries and then use them to find the closest point but I'm not sure how to do it.
I also thought about using a trace but I would need to cast many of them in order to find the right one, and I would need a way to make the trace hit only the meshes I care about.
Right now I'm simply using the "Get Actor Location" but that gives me the center of the static mesh.
How should I approach the problem?
The straight forward method to get a closest point, is to compare the distance for each vertex and your point.
A simple for loop and a minimum test of the distance.
Accessing mesh vertices can be a bit tricky in Unreal especially for StaticMesh. Because vertices are stored in GPU and so you have to make huge conversions. And I don't recommend to iterate over vertices if you want a real time game.
To avoid iterating over every mesh vertex, you could also check for the function :
https://docs.unrealengine.com/4.26/en-US/BlueprintAPI/Collision/GetClosestPointonCollision/
By the way you could use a multiple trace with a bug sphere and iterate over every collider location. But I am not sure if the location of the break hit result is always the closest of the object.

In Object Detection, do you train the CNN classifier on the Ground Truth bounding boxes?

Let's take R-CNN, for example. I know that there is the region proposal network and then a separate classification network, with the general idea being that it finds potential regions that could be an object, and then passes those regions to the classifier to figure out what it is. I'm wondering how that classifier gets trained if I have a custom dataset. Does it simply extract all the bounding boxes, create new images with those bounding box coordinates, preprocess them, and then use them for training?
In other words, are the classifiers used in object detection models trained on images generated based on the bounding box coordinates or is it more complicated than that?
Based on what i understood from your question : "you want to understand how a classifier network works ?"
Well when we design a detector network followed by a classifier network, both the networks are trained on different kinds on training datasets. For example you want to detect different classes of vehicles like truck, bus, van, car, bike e.t.c.
Detector Network: This network will be trained using images that have marked bounding boxes around the vehicles in the scene. i.e coordinates of the bounding boxes. during testing this part of network, you will get results that will give you bounding boxes(coordinates) around the vehicles.
Classifier network: will be trained using cropped vehicle images with different class labels all resized to same dimensions for e.g. truck 1, bus 2, van 3, car 4 and so on.
Hence, while testing the whole pipeline(detector + classifier) after the detector network you will get multiple bounding boxes based on number of vehicles present in the scene. After that you need to resize all those cropped bounding box images to same dimension and feed one by one to classifier network. for e.g. a scene might have 5 vehicles then classifier will receive 5 vehicle images separately. You also need to retain the coordinates for the feeded cropped image to mark the class and location of vehicle on the result.

Surface mesh to volume mesh

I have a closed surface mesh generated using Meshlab from point clouds. I need to get a volume mesh for that so that it is not a hollow object. I can't figure it out. I need to get an *.stl file for printing. Can anyone help me to get a volume mesh? (I would prefer an easy solution rather than a complex algorithm).
Given an oriented watertight surface mesh, an oracle function can be derived that determines whether a query line segment intersects the surface (and where): shoot a ray from one end-point and use the even-odd rule (after having spatially indexed the faces of the mesh).
Volumetric meshing algorithms can then be applied using this oracle function to tessellate the interior, typically variants of Marching Cubes or Delaunay-based approaches (see 3D Surface Mesh Generation in the CGAL documentation). The initial surface will however not be exactly preserved.
To my knowledge, MeshLab supports only surface meshes, so it is unlikely to provide a ready-to-use filter for this. Volume mesher packages should however offer this functionality (e.g. TetGen).
The question is not perfectly clear. I try to give a different interpretation. According to your last sentence:
I need to get an *.stl file for printing
It means that you need a 3D model that is ok for being fabricated using a 3D printer, i.e. you need a watertight mesh. A watertight mesh is a mesh that define in a unambiguous way the interior of a volume and corresponds to a mesh that is closed (no boundary), 2-manifold (mainly that each edge is shared exactly by two face), and without self intersections.
MeshLab provide tools for both visualizing boundaries, non manifold and self-intersection. Correcting them is possible in many different ways (deletion of non manifoldness and hole filling or drastic remeshing).

Detect intersection without causing bodies to collide

I want to detect the intersection of two objects (sprites) in my scene. I don't want the object geometric intersection to cause a collision between the bodies in the scene.
I've created PhysicalBody for both of my object shapes, but I can't find a way to detect the intersection without having both bodies hit each other on impact.
I'm using cocos2d-x 3+ with the default chipmunk engine (which I'd like to stick with for now)
The question is, how do I detect the intersection of elements without having them physically push each other when they intersect.
The answer is very simple (Though it took me 2 days to figure it out)
When contact is detected and onContactBegin() is called, when the relevant shape is being hit returning false will stop the physical interaction.

AS3: How to access pixel data efficiently?

I'm working a game.
The game requires entities to analyse an image and head towards pixels with specific properties (high red channel, etc.)
I've looked into Pixel Bender, but this only seems useful for writing new colors to the image. At the moment, even at a low resolution (200x200) just one entity scanning the image slows to 1-2 Frames/second.
I'm embedding the image and instance it as a Bitmap as a child of the stage. The 1-2 FPS situation is using BitmapData.getPixel() (on each pixel) with a distance calculation beforehand.
I'm wondering if there's any way I can do this more efficiently... My first thought was some sort of spatial partioning coupled with splitting the image up into many smaller pieces.
I also feel like Pixel Bender should be able to help somehow, however I've had little experience with it.
Cheers for any help.
Jonathan
Let us call the pixels which entities head towards "attractors" because they attract the entities.
You describe a low frame rate due to scanning for attractors. This indicates that you may possibly be scanning an image at every frame. You don't specify whether the image scanned is static or changes as frequently as, e.g., a video input. If the image is changing with every frame, so that you must re-calculate attractors somehow, then what you are attempting is real-time computer vision with the ABC Virtual Machine, please see below.
If you have an unchanging image, then the most important optimization you can make is to scan the image one time only, then save a summary (or "memoization") of the locations of the attractors. At each rendering frame, rather than scan the entire image, you can search the list or array of known attractors. When the user causes the image to change, you can recalculate from scratch, or update your calculations incrementally -- as you see fit.
If you are attempting to do real-time computer vision with ActionScript 3, I suggest you look at the new vector types of Flash 10.1 and also that you look into using either abcsx to write ABC assembly code, or use Adobe's Alchemy to compile C onto the Flash runtime. ABC is the byte code of Flash. In other words, reconsider the use of AS3 for real-time computer vision.
BitmapData has a getPixels method (notice it's plural). It returns a byte array of all the pixels which can be iterated much faster than a for loop with a call to getPixel inside, nested inside another for loop . Unfortunately, bytearrays are, as their name implies, 1 dimensional arrays of bytes, so iterating each pixel(4 bytes) requires using a for loop, not a foreach loop. You can access each pixel's color channel individually by default, but this sounds like what you want (find pixels with a "high red channel"), so you won't have to bitwise-and each pixel value to isolate a particular channel.
I read somewhere that getPixel is very slow, so that's where I figured you'd save the most. I could be wrong, so it'd be worth timing it.
I would say Heath Hunnicutt's anwser is a good one. If the image doesnt change just store all the color values in a vector. or byteArray of whatever and use it as a lookup table so you don't need to call getPixel() every frame.