Custom data lable and training-YoloV5 detection problem - yolov5

I am doing some cat poop research and I tried to use YoloV5 to detect different types of poop in the litter box. I collected about 130 poop pictures (Just poop pictures with no background) and labeled them and use roboflow to get annotations, then I follow colab notebook to train data and get the best.pt file for detection. When I run detection using a ramdom litter box picture, the rectangle just marked the whole image or half of the image instead of marking the poops in that image.
Then I tried to labled 3 litter box images (Marked poops inside the litter box image) and do it all over again. But when I run detection using a litter box image. Nothing happened. I am so confused. Is it because poop shapes and color are so different to one and the other so it caused the detection didn't work.
Anyone could give me some clues on how to lable the images and train them?
Thank you
enter image description here

First i must say that your project is interesting and funny as well, no offence.
Your problem must be due to the number of training images. We cant expect the model to detect after training it with 130 images. Experts say we must use at least 1500 images for single class.
And some tips for labelling images in roboflow
Draw a box which includes all the parts of your interest. Dont leave any areas.
Try to avoid overlapping areas.

Related

Locate/Extract Patches from an Image

I have an image(e.g. 60x60) with multiple items inside it. Items are in the shape of square boxes, with say 4x4 dimensions, and are randomly placed within the image. The boxes(items) themselves are created with random patterns, some random pixels switched on and others switched off. So, it could be the same box repeated twice(or more in case of more than 2 items) in the image or could be entirely different.
I'm looking to create a deep learning model that could take in the original image(60x60) and output all the patches in the image.
This is all I have for now, but I can definitely share more details as the discussion starts. I'd be interested to weigh in different options that can help me achieve this objective. Thanks.
I would solve this using object detection. First I would train a network to detect those box like objects by cutting out patches of those objects. Then I would run a Faster R-CNN or something like this on it.
You might want to take a look at the stanford lecture on detection (slides here: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf).

How to remove graphic from scanned document before passing it to tesserract for OCRing?

I'm working on OCR project but I don't know how to remove graphics from the scanned document image before passing it to tesserract.
Some scanned documents which I want to remove graphics are below:
http://www.mediafire.com/view/hvmpty2z3cw3vao/IMG_0087.JPG
http://www.mediafire.com/view/1sgy5s2aaj2o8y3/IMG_0086.JPG
Any advice is very appreciate. Many thanks.
As the text area is usually sparse and does not connect each other, you may consider to have a sobel edge detection on the original image and detect the biggest connection area with some threshold to detect the image area.
Meanwhile, as the image is a rectangle area, another way is to have a Hough translation to detect straight line to consist a rectangle with 4 lines. If you go this way, it’s recommended that you zoom the image first to reduce the calculate complexity.
You can start by detecting text areas using an algorithm available in AForge.Net. See HorizontalRunLengthSmoothing and VerticalRunLengthSmoothing. The algorithm is not very complicated and you can implement easily it using your favorite image processing library. The only constraint is to know approximately the size of the characters in your images.

programmatically create Background Images in Flex 3

I'm developing a visualization for certain parts of a Warehouse with Flex 3. In this visualization there are lot of blocks where 1 to x pallets can be placed where x is between 9 and 15. I need to represent each pallet with a black square, each place which is already assigned to a pallet but not physically taken with a grey square and each free place with a white square. I first thought to just use a canvas for each place on a block and change their color if the state changes. But the hundreds of canvases which are there as a result of this approach are not updated quickly enough for my purposes (screen freezes for a few seconds).
I don't want to use embedded images because of the great amount of images I had to embed in the application (those Images appear in 4 orientations).
My idea was to create background images which reflect the state of the whole block only when needed for that certain state and cache them, so that the computation time is spread over the whole runtime.
My problem now is I don't know how to create them in a way that I can use them as "backgroundImages". As far as I understand I would need them as a class object but I don't know how to achieve that, when not embedding the images.
I'm of course open to better approaches to solve my problem. Thanks for your support.
I would suggest using Graphics property of a Sprite for example. It provides basic drawing API, like drawing lines, circles and rectangles.
Besides, you can draw bitmap images on the Graphics to produce more advances results.

How to find pixel co-ordinates of corners of a square pattern?

This may not be a programming related but possibly programmers would be in the best position to answer it.
For camera calibration I have a 8 x 8 square pattern printed on sheet of paper. I have to manually enter these co-ordinates into a text file. The software would then pick it up from there and compute the calibration parameters.
Is there a script or some software that I can run on these images and get the pixel co-ordinates of the 4 corners of each of the 64 squares?
You can do this with a traditional chessboard pattern (i.e. black and white squares with no gaps) using cvFindChessboardCorners(). You can read more about the function in the OpenCV API Reference and see some sample code in O'Reilly's OpenCV Book or elsewhere online. As an added bonus, OpenCV has built-in functions that calculate the intrinsic parameters of the camera and an array of extrinsic parameters for the multiple views of a planar calibration object.
I would:
apply threshold and get binarized image.
apply SobelX filter to image. You get an image with the vertical lines. This belong to the sides of the squares that are almost vertical. Keep this as image1.
apply SobelY filter to image. You get an image with the horizontal lines. This belong to the sides of the squares that are almost horizontal. Keep this as image2.
make (image1 xor image2). You get a black image with white pixels indicating the corner positions.
Hope it helps.
I'm sure there are many computer vision libraries with varying capabilities and licenses out there, but one that I can remember off the top of my head is ARToolKit, which should be able to recognize this pattern. And if that's not possible, it comes with a set of very good patterns that are tailored so that they can be recognized even if they're partially obscured.
I don't know ARToolKit (although i've heard a lot about it) but with OpenCV this processing is trivial.

idea for morphing captcha

I've been thinking of a dynamic way of creating a CAPTCHA that uses morphing shapes or dynamic colors.
My first idea is to have a graphic, flash or something, that gradually changes from, say a square into a sphere. The user will be required to click the button when it becomes spherical enough.
Second idea is to have an area of color that slowly changes from, say, red to blue and the user will be required to press a button when it becomes blue enough.
Third idea is a combination of both methods.
I'd say the difficulty will be to match the clicks with the transitions. But it should be hard for automated code to detect shades or shapes.
Can people please offer some comments on my idea.
edit -
Thanks for the feedback. I'm now considering using a flash based video playback of a server fed video feed of a few colored shapes that morph into other colored shapes. The user will be required to pause the feed when the colors and shapes match some canned questions: such as : click on the video when you see two green squares turn into 3 blue triangles. The shapes will be amongst over overlapping and moving morphing shapes. Fun for the whole family!
Color is a bad idea as (a) its very easy for a computer to detect; (b) very hard for some humans — the color blind — to detect. Even if you're OK with denying access to the disabled, you'd have to worry about different monitors, systems, lighting conditions, etc. giving rise to different color perceptions.
How hard do you think it is for a computer to compare the red component and blue component in a pixel (or averaged over several pixels)? Trivial. So this isn't a problem for a computer.
Similarly, it isn't that hard to program the difference between a square and a circle. One has strait lines, one doesn't!
Good idea, you could also do it so that the shapes keep turning or moving.
I don't know if it would be safer than a regular letter capcha tho.
I'm not sure why you think color would be any harder to detect than text. Shapes possibly, but they would have to be more complex than n-sided polygons. The gradual animation is a good idea however. But if you can code it to show, someone can code something that watches it.
The real test is to prove humanness by identifying semantic meanings, rather than syntactic meanings.
For instance show pictures of animals and make the user click when a bird shows up. Or just say "click on the thing that can fly." And show some pictures of animals. This would be rather unbeatable by a machine until all images had been cataloged. The trouble with CAPTCHA of course is trying to make semantics with syntax. Therefore defeating itself from the onset.
You're on the right track, and I'm sure your proof of concepts are interesting. But remember: made by a computer: solved by a computer.
Although these ideas will almost certainly work, it's a security-through-obscurity effect. Classic CAPTCHA images are "one-way" in that the correct answer can't (theoretically) be deduced by a computer. The problem with saying "click here when the image turns blue" is that a computer could easily do this, if somebody considered the stakes to be worth developing a program for.
Additionally, unusual captchas will force your users to think. Depending on your audience this may mean losing some users.
I did a fair bit of research when developing a CAPTCHA system, and the classic method of printing text to an image seems to be the most effective. The trick is not in having lots of "background noise" behind the text, or different colours. It's about the following two things:
1) Random text kerning, with most or all letters slightly overlapping each other.
2) Random distortion, translation and rotation of the text.
If you have a look at Google's CAPTCHA, they pretty well only have those two features: https://www.google.com/accounts/NewAccount?service=mail