Best way to identify a person without using facial recognition (deep learning) - deep-learning

I have a cctv video where I want to identify a person. I tried both facial recognition and object tracking but both failed to produce high accuracy since the quality of the frame isn't great and the face disappears from the frame sometimes.
I have simplified the problem as much as I can and now thinking about training a YOLOV3 on the person and do object tracking or training on Resnet50 as a classification problem.
I have also looked into re-identification but not sure if it will work in this use case or not.
So the problem now is simplified to given an image of people and objects in hostile environment, how do you find and identify specific person?
thanks

It seems that deep learning is precisely the tool to use for identifying a specific person. And without facial recognition that seems impossible, unless the person wears the same clothes every time and that's your criteria for "specific person".
Consider using Face-API.js -- you provide several photos of the specific person and you can then detect whether they are in a particular image.

If you are still open to use video as input and not a specific frame you can look into person identification through gait.
One example of a deep learning implementation would be:
https://github.com/marian-margeta/gait-recognition

Related

How to create a people tracking with reidentification model?

I am currently working on a project where I want to build a model which can detect and track people with a unique ID. The main issue is when a person leaves the frame and comes back after some time. Currently, I am working with yolov4 and Deepsort to detect and track. But it is failing in this situation.
Please suggest some approach where we can do detection, reidentification and tracking of people or cars or any other object.
Thank you :)
Although YOLOv4 can detect people in an image/video stream, I think it might be too general in your case. When a person leaves the frame and comes back, ideally the model should remember seeing that person before.
One way to tackle this is to train on images of the people you want to detect.
E.g. in a system like yours, you could take multiple images of the people you want to track from different angles and label them using their unique identifiers. Afterwards you could train the model using this data (for your downstream task). This will ideally give more specific results for detecting and tracking people with their unique identifiers as opposed to the general people detection when using YOLOv4 as is..
That said, I understand that taking lots of images of people may not be practical in certain scenarios. In that case you may want to look at techniques that produce accurate results with minimal data such as domain adaptation (https://arxiv.org/abs/1812.11806). However in an application for tracking and detecting people, I'm assuming you want minimal misclassifications.. Hence you could say it's always a tradeoff.
You can find out more about dealing with lack of data in this article: (https://www.kdnuggets.com/2019/06/5-ways-lack-data-machine-learning.html)
However I think this is a better place to start for a re-identification model: (https://github.com/KaiyangZhou/deep-person-reid)
It has ample documentation to get you started..

Semantic segmentation without labels in a single class

I am kind of new to semantic segmentation. I am trying to perform segmentation of images having defects.
I have the defect images annotated using a annotation tool and I created the mask for each image. I wanted to predict If an image has defect and where exactly it is located. But my problem is my defects does not look same in all the images. Example: Defects on steel- Steel breakage, erroded surface etc. I am just trying to classify if the image has defect or not and where it is located. So is it wrong to train the neural network with these all types considered as defects even though not everything lookalike?
I thought to do a binary segmentation of defect to no defect. If I am not correct how can I perform segmentation for defect and non defect images?
You first have to well define your problem and your objectives:
If you only want to detect if your image has a defect or not, it's a binary classification problem and you affect a label (0 or 1) to each image.
If you want to localise the defect approximatively (like a bounding box), it's an object detection problem and it can be realised with one or more classes.
If you want to localise precisely the defect (in order to performe measures for instance) the best is semantic segmentation or instance segmentation.
If you want to classify the defect, you will need to create classes for each defect you want to classify.
There is no magical solution because it depends of the objectives of your project. I can give you the following advices because I made an internship on a similar project :
Look carefully at your data, if you have thousands of images it will take a long to create your semantic segmentation dataset. Be smarter by using data augmentation techniques.
If you want to classify the defects, be sure to have enough defects of each type to train your network. If your network only sees one defect type per epoch, it can't learn to detect it.
Be sure that your network can detect the defects you're providing (not a scratch of two pixels for instance or alignement defects).
Performing semantic segmentation to only knows if there is a defect or not seems overkill because it's a long and complex process (rebuilding the image, memory of intermediaries images in Unet, lot of computations). If you really want to apply this method, you may create a threshold to detect if the number of detected pixels as defect allows to classify the image as 'presenting a defect' or not.
One class should be enough for your use-case. If you want to be able to distinguish between different types of defects though, you could try creating attributes for that class. So the class would be if a pixel has a defect or not, and the attribute would be breakage, eroded pixel, etc. Then you could train a model to detect a crack on the semantic class and another one to identify which type of defect it is.
Make sure to use an annotation tool that supports creating attributes. Personally, I use hasty.ai as their automation assistants are great! But I guess most tools should be able to do so.

Chess Engine - Confusion in engine types - Flash as3

I am not sure this kind of question has been asked before, and been answered, by as far as my search is concerned, I haven't got any answer yet.
First let me tell you my scenario.
I want to develop a chess game in Flash AS3. I have developed the interface. I have coded the movement of the pieces and movement rules of the pieces. (Please note: Only movement rules yet, not capture rules.)
Now the problem is, I need to implement the AI in chess for one player game. I am feeling helpless, because though I know each and every rules of the chess, but applying AI is not simple at all.
And my biggest confusion is: I have been searching, and all of my searches tell me about the chess engines. But I always got confused in two types of engines. One is for front end, and second is real engines. But none specifies (or I might not get it) which one is for which.
I need a API type of some thing, where when I can get searching of right pieces, and move according to the difficulty. Is there anything like that?
Please note: I want an open source and something to be used in Flash.
Thanks.
First of all http://nanochess.110mb.com/archive/toledo_javascript_chess_3.html here is the original project which implements a relatively simple AI (I think it's only 2 steps deep) in JavaScript. Since that was a contest project for minimal code, it is "obfuscated" somewhat by hand-made reduction of the source code. Here's someone was trying to restore the same code to a more or less readable source: https://github.com/bormand/nanochess .
I think that it might be a little bit too difficult to write it, given you have no background in AI... I mean, a good engine needs to calculate more then two steps ahead, but just to give you some numbers: the number of possible moves per step, given all pieces are on the board would be approximately 140 at max, the second step thus would be all the combination of these moves with all possible moves of the opponent and again this much combinations i.e. 140 * 140 * 140. Which means you would need a very good technique to discriminate the bad moves and only try to predict good moves.
As of today, there isn't a deterministic winning strategy for chess (in other words, it wasn't solved by computers, like some other table games), which means, it is a fairly complex game, but an AI which could play at a hobbyist level isn't all that difficult to come up with.
A recommended further reading: http://aima.cs.berkeley.edu/
A Chess Program these days comes in two parts:
The User Interface, which provides the chess board, moves view, clocks, etc.
The Chess Engine, which provides the ability to play the game of chess.
These two programs use a simple text protocol (UCI or XBoard) to communicate with the UI program running the chess engine as a child process and communicating over pipes.
This has several significant advantages:
You only need one UI program which can use any compliant chess engine.
Time to develop the chess engine is reduced as only a simple interface need be provided.
It also means that the developers get to do the stuff they are good at and don't necessarily have to be part of a team in order to get the other bit finished. Note that there are many more chess engines than chess UI's available today.
You are coming to the problem with several disadvantages:
As you are using Flash, you cannot use this two program approach (AFAIK Flash cannot use fork(). exec(), posix_spawn()). You will therefore need to provide all of the solution which you should at least attempt to make multi-threaded so the engine can work while the user is interacting with the UI.
You are using a language which is very slow compared to C++, which is what engines are generally developed in.
You have access to limited system resources, especially memory. You might be able to override this with some setting of the Flash runtime.
If you want your program to actually play chess then you need to solve the following problems:
Move Generator: Generates all legal moves in a position. Some engine implementations don't worry about the "legal" part and prune illegal moves some time later. However you still need to detect check, mate, stalemate conditions at some point.
Position Evaluation: Provide a score for a given position. If you cannot determine if one position is better for one side than another then you have no way of finding winning moves.
Move Tree and pruning: You need to store the move sequences you are evaluating and a way to prune (ignore) branches that don't interest you (normally because you have determined that they are weak). A chess move tree is vast given every possible reply to every possible move and pruning the tree is the way to manage this.
Transpotion table: There are many transpositions in chess (a position reached by moving the pieces in a different order). One method of avoiding the re-evaluation of the position you have already evaluated is to store the position score in a transposition table. In order to do that you need to come up with a hash key for the position, which is normally implemented using Zobrist hash.
The best sites to get more detailed information (I am not a chess engine author) would be:
TalkChess Forum
Chess Programming Wiki
Good luck and please keep us posted of your progress!

Interesting NLP/machine-learning style project -- analyzing privacy policies

I wanted some input on an interesting problem I've been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characteristics of them. For example, do they take the user's location?, do they share/sell with third parties?, etc.
I've talked to a few people, read a lot about privacy policies, and thought about this myself. Here is my current plan of attack:
First, read a lot of privacy and find the major "cues" or indicators that a certain characteristic is met. For example, if hundreds of privacy policies have the same line: "We will take your location.", that line could be a cue with 100% confidence that that privacy policy includes taking of the user's location. Other cues would give much smaller degrees of confidence about a certain characteristic.. For example, the presence of the word "location" might increase the likelihood that the user's location is store by 25%.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
I wanted to ask whether you guys think this is a good approach to this problem. How exactly would you approach a problem like this? Furthermore, are there any specific tools or frameworks you'd recommend using. Any input is welcome. This is my first time doing a project which touches on artificial intelligence, specifically machine learning and NLP.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
This is text classification. Given that you have multiple output categories per document, it's actually multilabel classification. The standard approach is to manually label a set of documents with the classes/labels that you want to predict, then train a classifier on features of the documents; typically word or n-gram occurrences or counts, possibly weighted by tf-idf.
The popular learning algorithms for document classification include naive Bayes and linear SVMs, though other classifier learners may work too. Any classifier can be extended to a multilabel one by the one-vs.-rest (OvR) construction.
A very interesting problem indeed!
On a higher level, what you want is summarization- a document has to be reduced to a few key phrases. This is far from being a solved problem. A simple approach would be to search for keywords as opposed to key phrases. You can try something like LDA for topic modelling to find what each document is about. You can then search for topics which are present in all documents- I suspect what will come up is stuff to do with licenses, location, copyright, etc. MALLET has an easy-to-use implementation of LDA.
I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.
You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.
I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.
Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.

Order-issuing neural network?

I'm interested in writing certain software that uses machine learning, and performs certain actions based on external data.
However I've run into problem (that was always interesting to me) -
how is it possible to write machine learning software that issues orders or sequences of orders?
The problem is that as I understand it, neural network gets bunch on inputs, and "recalls" output based on results of previous trainings. Instantly (well, more or less). So I'm not sure how "issuing orders" could fit into that system, especially when actions performed by system affect the system with certain delay. I'm also a bit unsure how is it possible to train this thing.
Examples of such system:
1. First person shooter enemy controller. As I understand it, it is possible to implement neural network controller for the bot that will switch bot behavior strategies(well, assign priorities to them) based on some inputs (probably something like health, ammo, etc). But I don't see a way to make higher-order controller, that could issue sequence of commands like "go there, then turn left". Also, bot's actions will affect variables that control bot's behavior. I.e. shooting reduces ammo, falling from heights reduces health, etc.
2. Automated market trader. It is certainly possible to make system that will try to predict the next market price of something. However, I don't see how is it possible to make system that would issue order to buy something, watch the trend, then sell it back to gain profit/cover up losses.
3. Car driver. Again, (as I understand it) it is possible to make system that will maintain desired movement vector based on position/velocity/torque data and results of previous training. However I don't see a way to make such system (learn to) perform sequence of actions.
I.e. as I understood it, neural net is technically a matrix - you give it input, it produces output. But what about generating sequences of actions that could change environment program operates in?
If such tasks are not entirely suitable for neural networks, what else could be used?
P.S. I understand that the question isn't exactly clear, and I suspect that I'm missing some knowledge. So I'll appreciate some pointers (i.e. books/resources to read, etc).
You could try to connect the output neurons to controllers directly, e.g. moving forward, turning, or shooting in the ego shooter, or buying orders for the trader. However, I think that the best results are gained nowadays when you let the neural net solve one rather specific subproblem, and then let a "normal" program interpret its answer. For example, you could let the neural net construct a map overlay of "where do I want to be", which the bot then translates into movements. The neural network for the trader could produce a "how much do I want which paper", which the bot then translates into buying or selling orders.
The decision which subproblem should be solved by a neural network is a very central one for its design. The important thing is that good solutions can be taught to the neural network.
Edit: Expanding this in the examples: When the ego shooter bot gets shot, it should not have wanted to be there; when it gets to shoot someone else, it should have wanted to be there more. When the trader loses money from a paper, it should have wanted it less before; if it gains, it should have wanted it more. These things can be taught.
The problem you are describing is known as Reinforcement Learning. Reinforcement learning is essentially a machine learning algorithm (such as a neural network) coupled with a controller. It has been used for all of the applications you mention, even to drive real cars.