How to create a people tracking with reidentification model? - deep-learning

I am currently working on a project where I want to build a model which can detect and track people with a unique ID. The main issue is when a person leaves the frame and comes back after some time. Currently, I am working with yolov4 and Deepsort to detect and track. But it is failing in this situation.
Please suggest some approach where we can do detection, reidentification and tracking of people or cars or any other object.
Thank you :)

Although YOLOv4 can detect people in an image/video stream, I think it might be too general in your case. When a person leaves the frame and comes back, ideally the model should remember seeing that person before.
One way to tackle this is to train on images of the people you want to detect.
E.g. in a system like yours, you could take multiple images of the people you want to track from different angles and label them using their unique identifiers. Afterwards you could train the model using this data (for your downstream task). This will ideally give more specific results for detecting and tracking people with their unique identifiers as opposed to the general people detection when using YOLOv4 as is..
That said, I understand that taking lots of images of people may not be practical in certain scenarios. In that case you may want to look at techniques that produce accurate results with minimal data such as domain adaptation (https://arxiv.org/abs/1812.11806). However in an application for tracking and detecting people, I'm assuming you want minimal misclassifications.. Hence you could say it's always a tradeoff.
You can find out more about dealing with lack of data in this article: (https://www.kdnuggets.com/2019/06/5-ways-lack-data-machine-learning.html)
However I think this is a better place to start for a re-identification model: (https://github.com/KaiyangZhou/deep-person-reid)
It has ample documentation to get you started..

Related

Best way to identify a person without using facial recognition (deep learning)

I have a cctv video where I want to identify a person. I tried both facial recognition and object tracking but both failed to produce high accuracy since the quality of the frame isn't great and the face disappears from the frame sometimes.
I have simplified the problem as much as I can and now thinking about training a YOLOV3 on the person and do object tracking or training on Resnet50 as a classification problem.
I have also looked into re-identification but not sure if it will work in this use case or not.
So the problem now is simplified to given an image of people and objects in hostile environment, how do you find and identify specific person?
thanks
It seems that deep learning is precisely the tool to use for identifying a specific person. And without facial recognition that seems impossible, unless the person wears the same clothes every time and that's your criteria for "specific person".
Consider using Face-API.js -- you provide several photos of the specific person and you can then detect whether they are in a particular image.
If you are still open to use video as input and not a specific frame you can look into person identification through gait.
One example of a deep learning implementation would be:
https://github.com/marian-margeta/gait-recognition

Combine two views of an image to strengthen the output of a DL model

I have been trying this problem for weeks but to no avail.
My problem is:
Deep Learning Model has the following information:
INPUT: Sequence of Images
OUTPUT: What is happening in the image, i.e. categorise what is the activity happening from a sequence of 10 activities.
I have two cameras recording the same activity from two views, how could I combine those two views to improve the accuracy?
I think you should use DELF features, extract features of both similar images and combine them.
How to combine the two views is fully dependent on your understanding of the problem. Let me give you two different examples,
CASE I: when you review your training data, you can easily tell which camera is better for some data. e.g. one camera may capture everything useful, while the other camera may not due to possible occlusions (note: I am not saying one camera is always better than the other). In this case, you may use a later fusion technique to only fuse the two resulting features representing the sequences from two cameras.
CASE II: it is difficult for you to tell which camera is better. This basically indicates that you may not see performance boost after you consider both cameras, but maybe some small improvement.
Finally, when you say two cameras, is it possible for you do something like the binocular stereo vision? In this case, you may obtain the extra depth information that is not included in any single camera, and maybe helpful for the recognition task.

Chess Engine - Confusion in engine types - Flash as3

I am not sure this kind of question has been asked before, and been answered, by as far as my search is concerned, I haven't got any answer yet.
First let me tell you my scenario.
I want to develop a chess game in Flash AS3. I have developed the interface. I have coded the movement of the pieces and movement rules of the pieces. (Please note: Only movement rules yet, not capture rules.)
Now the problem is, I need to implement the AI in chess for one player game. I am feeling helpless, because though I know each and every rules of the chess, but applying AI is not simple at all.
And my biggest confusion is: I have been searching, and all of my searches tell me about the chess engines. But I always got confused in two types of engines. One is for front end, and second is real engines. But none specifies (or I might not get it) which one is for which.
I need a API type of some thing, where when I can get searching of right pieces, and move according to the difficulty. Is there anything like that?
Please note: I want an open source and something to be used in Flash.
Thanks.
First of all http://nanochess.110mb.com/archive/toledo_javascript_chess_3.html here is the original project which implements a relatively simple AI (I think it's only 2 steps deep) in JavaScript. Since that was a contest project for minimal code, it is "obfuscated" somewhat by hand-made reduction of the source code. Here's someone was trying to restore the same code to a more or less readable source: https://github.com/bormand/nanochess .
I think that it might be a little bit too difficult to write it, given you have no background in AI... I mean, a good engine needs to calculate more then two steps ahead, but just to give you some numbers: the number of possible moves per step, given all pieces are on the board would be approximately 140 at max, the second step thus would be all the combination of these moves with all possible moves of the opponent and again this much combinations i.e. 140 * 140 * 140. Which means you would need a very good technique to discriminate the bad moves and only try to predict good moves.
As of today, there isn't a deterministic winning strategy for chess (in other words, it wasn't solved by computers, like some other table games), which means, it is a fairly complex game, but an AI which could play at a hobbyist level isn't all that difficult to come up with.
A recommended further reading: http://aima.cs.berkeley.edu/
A Chess Program these days comes in two parts:
The User Interface, which provides the chess board, moves view, clocks, etc.
The Chess Engine, which provides the ability to play the game of chess.
These two programs use a simple text protocol (UCI or XBoard) to communicate with the UI program running the chess engine as a child process and communicating over pipes.
This has several significant advantages:
You only need one UI program which can use any compliant chess engine.
Time to develop the chess engine is reduced as only a simple interface need be provided.
It also means that the developers get to do the stuff they are good at and don't necessarily have to be part of a team in order to get the other bit finished. Note that there are many more chess engines than chess UI's available today.
You are coming to the problem with several disadvantages:
As you are using Flash, you cannot use this two program approach (AFAIK Flash cannot use fork(). exec(), posix_spawn()). You will therefore need to provide all of the solution which you should at least attempt to make multi-threaded so the engine can work while the user is interacting with the UI.
You are using a language which is very slow compared to C++, which is what engines are generally developed in.
You have access to limited system resources, especially memory. You might be able to override this with some setting of the Flash runtime.
If you want your program to actually play chess then you need to solve the following problems:
Move Generator: Generates all legal moves in a position. Some engine implementations don't worry about the "legal" part and prune illegal moves some time later. However you still need to detect check, mate, stalemate conditions at some point.
Position Evaluation: Provide a score for a given position. If you cannot determine if one position is better for one side than another then you have no way of finding winning moves.
Move Tree and pruning: You need to store the move sequences you are evaluating and a way to prune (ignore) branches that don't interest you (normally because you have determined that they are weak). A chess move tree is vast given every possible reply to every possible move and pruning the tree is the way to manage this.
Transpotion table: There are many transpositions in chess (a position reached by moving the pieces in a different order). One method of avoiding the re-evaluation of the position you have already evaluated is to store the position score in a transposition table. In order to do that you need to come up with a hash key for the position, which is normally implemented using Zobrist hash.
The best sites to get more detailed information (I am not a chess engine author) would be:
TalkChess Forum
Chess Programming Wiki
Good luck and please keep us posted of your progress!

How to partition a problem into smaller understandable portions?

I'm not sure if it's possible to give general advice on this topic, but please try. It's hard to explain my case because it's too complex to explain. And that's exactly the problem.
I seem to constantly stumble on a situation where I try to design some part of my project, but it has so many things to take into consideration that I'm unable to get a grasp of it.
Are there any general tips or advice on how to look at my system in smaller pieces at a time? How to find smaller portions that could be designed separately on their own?
Create a glossary.
In other words, identify the terms that are meaningful to the project domain — not from the programmer's point of view, but from a user's, who is familiar with the subject matter.
Then define the terms as precisely and discretely as you can. A good definition in this form can serve as a kind of pseudocode.
Since you have not identified even the domain of your problem, I'll choose a random example. In a civilian personnel system, you might have terms like:
billet: a term of service (from start date to end date) at a particular grade and step
employee: a series of billets associated with a particular SSN
grade and step: row and column in the federal general schedule
And so on. This isn't to identify functional units, as it sounds like you are trying to do, but it's a good preparatory step before doing so, so that you can express your functional steps in well-defined terms.
Your key goals are:
High cohesion: Code (methods, fields, classes) within one piece/module/partition should interact intensively; it should make sense for these elements to know about each other. If you find that some of them don't interact much with the rest, they probably belong somwhere else or should form their own partition. If you find code outside interacting intensively with the partition and knowing too much about its inner workings, it probably belongs inside. The typical example is found in OO code written in procedural style, with "dumb" data objects and "manager" code that operates on them but should really be part of the data objects.
Loose coupling: Interaction between pieces/modules/partitions should only happen through narrow, well-defined, well-documented APIs. Try to identify such APIs and see what code is needed to implement them and what code will use them.
It's useful to approach problem decomposition both top-down and bottom-up.
If you're having trouble splitting a big problem into two or more smaller problems, try to think of the smallest possible problems that will need to be solved. Once those are handled, you may start to see ways to combine them into larger problems as you approach your original large problem.
When I find myself copying and pasting chunks of code with minimal adjustments I realize that's a "partition" and then create a class, method, function, or whatever.
Actually, the whole object oriented approach is what it's all about. Try thinking of your application as tangible things that do stuff. Write pseudo code describing what the things are and what they do, I find lots of "partitions" this way.
Here's a try, kind of wild guess.
People usually underestimate how long it will take them to do the work. If your project is large, then most likely you'll need several people to work on it, so you can try planning with that in mind. Now a person can be expected to hold just one area in the head, so you'll need to explain to him exactly what kind of task he's supposed to do.
So I'd say you should try to write a job description that should encompass as much as possible for one person to seriously concentrate on. Repeat, until you have broken your project into parts you wanted to. As a benefit, you're ready to assemble your team. But if you find out the parts are small, maybe you'll still be able to do it yourself.

How to measure usability to get hard data?

There are a few posts on usability but none of them was useful to me.
I need a quantitative measure of usability of some part of an application.
I need to estimate it in hard numbers to be able to compare it with future versions (for e.g. reporting purposes). The simplest way is to count clicks and keystrokes, but this seems too simple (for example is the cost of filling a text field a simple sum of typing all the letters ? - I guess it is more complicated).
I need some mathematical model for that so I can estimate the numbers.
Does anyone know anything about this?
P.S. I don't need links to resources about designing user interfaces. I already have them. What I need is a mathematical apparatus to measure existing applications interface usability in hard numbers.
Thanks in advance.
http://www.techsmith.com/morae.asp
This is what Microsoft used in part when they spent millions redesigning Office 2007 with the ribbon toolbar.
Here is how Office 2007 was analyzed:
http://cs.winona.edu/CSConference/2007proceedings/caty.pdf
Be sure to check out the references at the end of the PDF too, there's a ton of good stuff there. Look up how Microsoft did Office 2007 (regardless of how you feel about it), they spent a ton of money on this stuff.
Your main ideas to approach in this are Effectiveness and Efficiency (and, in some cases, Efficacy). The basic points to remember are outlined on this webpage.
What you really want to look at doing is 'inspection' methods of measuring usability. These are typically more expensive to set up (both in terms of time, and finance), but can yield significant results if done properly. These methods include things like heuristic evaluation, which is simply comparing the system interface, and the usage of the system interface, with your usability heuristics (though, from what you've said above, this probably isn't what you're after).
More suited to your use, however, will be 'testing' methods, whereby you observe users performing tasks on your system. This is partially related to the point of effectiveness and efficiency, but can include various things, such as the "Think Aloud" concept (which works really well in certain circumstances, depending on the software being tested).
Jakob Nielsen has a decent (short) article on his website. There's another one, but it's more related to how to test in order to be representative, rather than how to perform the testing itself.
Consider measuring the time to perform critical tasks (using a new user and an experienced user) and the number of data entry errors for performing those tasks.
First you want to define goals: for example increasing the percentage of users who can complete a certain set of tasks, and reducing the time they need for it.
Then, get two cameras, a few users (5-10) give them a list of tasks to complete and ask them to think out loud. Half of the users should use the "old" system, the rest should use the new one.
Review the tapes, measure the time it took, measure success rates, discuss endlessly about interpretations.
Alternatively, you can develop a system for bucket-testing -- it works the same way, though it makes it far more difficult to find out something new. On the other hand, it's much cheaper, so you can do many more iterations. Of course that's limited to sites you can open to public testing.
That obviously implies you're trying to get comparative data between two designs. I can't think of a way of expressing usability as a value.
You might want to look into the GOMS model (Goals, Operators, Methods, and Selection rules). It is a very difficult research tool to use in my opinion, but it does provide a "mathematical" basis to measure performance in a strictly controlled environment. It is best used with "expert" users. See this very interesting case study of Project Ernestine for New England Telephone operators.
Measuring usability quantitatively is an extremely hard problem. I tackled this as a part of my doctoral work. The short answer is, yes, you can measure it; no, you can't use the results in a vacuum. You have to understand why something took longer or shorter; simply comparing numbers is worse than useless, because it's misleading.
For comparing alternate interfaces it works okay. In a longitudinal study, where users are bringing their past expertise with version 1 into their use of version 2, it's not going to be as useful. You will also need to take into account time to learn the interface, including time to re-understand the interface if the user's been away from it. Finally, if the task is of variable difficulty (and this is the usual case in the real world) then your numbers will be all over the map unless you have some way to factor out this difficulty.
GOMS (mentioned above) is a good method to use during the design phase to get an intuition about whether interface A is better than B at doing a specific task. However, it only addresses error-free performance by expert users, and only measures low-level task execution time. If the user figures out a more efficient way to do their work that you haven't thought of, you won't have a GOMS estimate for it and will have to draft one up.
Some specific measures that you could look into:
Measuring clock time for a standard task is good if you want to know what takes a long time. However, lab tests generally involve test subjects working much harder and concentrating much more than they do in everyday work, so comparing results from the lab to real users is going to be misleading.
Error rate: how often the user makes mistakes or backtracks. Especially if you notice the same sort of error occurring over and over again.
Appearance of workarounds; if your users are working around a feature, or taking a bunch of steps that you think are dumb, it may be a sign that your interface doesn't give the tools to figure out how to solve their problems.
Don't underestimate simply asking users how well they thought things went. Subjective usability is finicky but can be revealing.