Combine two views of an image to strengthen the output of a DL model - deep-learning

I have been trying this problem for weeks but to no avail.
My problem is:
Deep Learning Model has the following information:
INPUT: Sequence of Images
OUTPUT: What is happening in the image, i.e. categorise what is the activity happening from a sequence of 10 activities.
I have two cameras recording the same activity from two views, how could I combine those two views to improve the accuracy?

I think you should use DELF features, extract features of both similar images and combine them.

How to combine the two views is fully dependent on your understanding of the problem. Let me give you two different examples,
CASE I: when you review your training data, you can easily tell which camera is better for some data. e.g. one camera may capture everything useful, while the other camera may not due to possible occlusions (note: I am not saying one camera is always better than the other). In this case, you may use a later fusion technique to only fuse the two resulting features representing the sequences from two cameras.
CASE II: it is difficult for you to tell which camera is better. This basically indicates that you may not see performance boost after you consider both cameras, but maybe some small improvement.
Finally, when you say two cameras, is it possible for you do something like the binocular stereo vision? In this case, you may obtain the extra depth information that is not included in any single camera, and maybe helpful for the recognition task.

Related

How to create a people tracking with reidentification model?

I am currently working on a project where I want to build a model which can detect and track people with a unique ID. The main issue is when a person leaves the frame and comes back after some time. Currently, I am working with yolov4 and Deepsort to detect and track. But it is failing in this situation.
Please suggest some approach where we can do detection, reidentification and tracking of people or cars or any other object.
Thank you :)
Although YOLOv4 can detect people in an image/video stream, I think it might be too general in your case. When a person leaves the frame and comes back, ideally the model should remember seeing that person before.
One way to tackle this is to train on images of the people you want to detect.
E.g. in a system like yours, you could take multiple images of the people you want to track from different angles and label them using their unique identifiers. Afterwards you could train the model using this data (for your downstream task). This will ideally give more specific results for detecting and tracking people with their unique identifiers as opposed to the general people detection when using YOLOv4 as is..
That said, I understand that taking lots of images of people may not be practical in certain scenarios. In that case you may want to look at techniques that produce accurate results with minimal data such as domain adaptation (https://arxiv.org/abs/1812.11806). However in an application for tracking and detecting people, I'm assuming you want minimal misclassifications.. Hence you could say it's always a tradeoff.
You can find out more about dealing with lack of data in this article: (https://www.kdnuggets.com/2019/06/5-ways-lack-data-machine-learning.html)
However I think this is a better place to start for a re-identification model: (https://github.com/KaiyangZhou/deep-person-reid)
It has ample documentation to get you started..

Training Faster R-CNN with multiple objects in an image

I want to train Faster R-CNN network with my own images to detect faces. I have checked quite a few Github libraries, but this is the example of the training file I always find:
/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
But I can't find an example how to train with images containing couple objects. Should it be:
1) /data/imgs/img_001.jpg,837,346,981,456,cow,215,312,279,391,cow
or
2) /data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_001.jpg,215,312,279,391,cow
?
I just could not help myself but quote FarCry3 here: "The definition of insanity is doing the same thing over and over and expecting different results."
(Note that this is purely in an entertaining context, and not meant to insult you in any way; I would not take the time to answer your question if I didn't think it worthwile)
In your second example, you would feed the exact same input data, but require the network to learn two different outcomes. But, as you already noted, it is not very common for many of the libraries to support multiple labels per image.
Oftentimes, this is purely done for the sake of simplicity, as it requires you to change your metrics, to accomodate for multiple outputs: Instead of having one-hot encoded targets, you now could have multiple "targets".
This is even more challenging in the task of object detection (and not object classification, as described before), since you now have to decide how you represent your targets.
If it is possible at all, I would personally restrict myself to labeling one class per image, or have a look at another image library that does support that, since the effort of rewriting that much code is probably not worth the minute improvement in the results.

Chess Engine - Confusion in engine types - Flash as3

I am not sure this kind of question has been asked before, and been answered, by as far as my search is concerned, I haven't got any answer yet.
First let me tell you my scenario.
I want to develop a chess game in Flash AS3. I have developed the interface. I have coded the movement of the pieces and movement rules of the pieces. (Please note: Only movement rules yet, not capture rules.)
Now the problem is, I need to implement the AI in chess for one player game. I am feeling helpless, because though I know each and every rules of the chess, but applying AI is not simple at all.
And my biggest confusion is: I have been searching, and all of my searches tell me about the chess engines. But I always got confused in two types of engines. One is for front end, and second is real engines. But none specifies (or I might not get it) which one is for which.
I need a API type of some thing, where when I can get searching of right pieces, and move according to the difficulty. Is there anything like that?
Please note: I want an open source and something to be used in Flash.
Thanks.
First of all http://nanochess.110mb.com/archive/toledo_javascript_chess_3.html here is the original project which implements a relatively simple AI (I think it's only 2 steps deep) in JavaScript. Since that was a contest project for minimal code, it is "obfuscated" somewhat by hand-made reduction of the source code. Here's someone was trying to restore the same code to a more or less readable source: https://github.com/bormand/nanochess .
I think that it might be a little bit too difficult to write it, given you have no background in AI... I mean, a good engine needs to calculate more then two steps ahead, but just to give you some numbers: the number of possible moves per step, given all pieces are on the board would be approximately 140 at max, the second step thus would be all the combination of these moves with all possible moves of the opponent and again this much combinations i.e. 140 * 140 * 140. Which means you would need a very good technique to discriminate the bad moves and only try to predict good moves.
As of today, there isn't a deterministic winning strategy for chess (in other words, it wasn't solved by computers, like some other table games), which means, it is a fairly complex game, but an AI which could play at a hobbyist level isn't all that difficult to come up with.
A recommended further reading: http://aima.cs.berkeley.edu/
A Chess Program these days comes in two parts:
The User Interface, which provides the chess board, moves view, clocks, etc.
The Chess Engine, which provides the ability to play the game of chess.
These two programs use a simple text protocol (UCI or XBoard) to communicate with the UI program running the chess engine as a child process and communicating over pipes.
This has several significant advantages:
You only need one UI program which can use any compliant chess engine.
Time to develop the chess engine is reduced as only a simple interface need be provided.
It also means that the developers get to do the stuff they are good at and don't necessarily have to be part of a team in order to get the other bit finished. Note that there are many more chess engines than chess UI's available today.
You are coming to the problem with several disadvantages:
As you are using Flash, you cannot use this two program approach (AFAIK Flash cannot use fork(). exec(), posix_spawn()). You will therefore need to provide all of the solution which you should at least attempt to make multi-threaded so the engine can work while the user is interacting with the UI.
You are using a language which is very slow compared to C++, which is what engines are generally developed in.
You have access to limited system resources, especially memory. You might be able to override this with some setting of the Flash runtime.
If you want your program to actually play chess then you need to solve the following problems:
Move Generator: Generates all legal moves in a position. Some engine implementations don't worry about the "legal" part and prune illegal moves some time later. However you still need to detect check, mate, stalemate conditions at some point.
Position Evaluation: Provide a score for a given position. If you cannot determine if one position is better for one side than another then you have no way of finding winning moves.
Move Tree and pruning: You need to store the move sequences you are evaluating and a way to prune (ignore) branches that don't interest you (normally because you have determined that they are weak). A chess move tree is vast given every possible reply to every possible move and pruning the tree is the way to manage this.
Transpotion table: There are many transpositions in chess (a position reached by moving the pieces in a different order). One method of avoiding the re-evaluation of the position you have already evaluated is to store the position score in a transposition table. In order to do that you need to come up with a hash key for the position, which is normally implemented using Zobrist hash.
The best sites to get more detailed information (I am not a chess engine author) would be:
TalkChess Forum
Chess Programming Wiki
Good luck and please keep us posted of your progress!

Drawing two-dimensional point-graphs

I've got a list of objects (probably not more than 100), where each object has a distance to all the other objects. This distance is merely the added absolute difference between all the fields these objects share. There might be few (one) or many (dozens) of fields, thus the dimensionality of the distance is not important.
I'd like to display these points in a 2D graph such that objects which have a small distance appear close together. I'm hoping this will convey clearly how many sub-groups there are in the entire list. Obviously the axes of this graph are meaningless (I'm not even sure "graph" is the correct word to use).
What would be a good algorithm to convert a network of distances into a 2D point distribution? Ideally, I'd like a small change to the distance network to result in a small change in the graphic, so that incremental progress can be viewed as a smooth change over time.
I've made a small example of the sort of result I'm looking for:
Example Graphic http://en.wiki.mcneel.com/content/upload/images/GraphExample.png
Any ideas greatly appreciated,
David
Edit:
It actually seems to have worked. I treat the entire set of values as a 2D particle cloud, construct inverse square repulsion forces between all particles and linear attraction forces based on inverse distance. It's not a stable algorithm, the result tends to spin violently whenever an additional iteration is performed, but it does always seem to generate a good separation into visual clusters:
alt text http://en.wiki.mcneel.com/content/upload/images/ParticleCloudSolution.png
I can post the C# code if anyone is interested (there's quite a lot of it sadly)
Graphviz contains implementations of several different approaches to solving this problem; consider using its spring model graph layout tools as a basis for your solution. Alternatively, its site contains a good collection of source material on the related theory.
The previous answers are probably helpful, but unfortunately given your description of the problem, it isn't guaranteed to have a solution, and in fact most of the time it won't.
I think you need to read in to cluster analysis quite a bit, because there are algorithms to sort your points into clusters based on a relatedness metric, and then you can use graphviz or something like that to draw the results. http://en.wikipedia.org/wiki/Cluster_analysis
One I quite like is a 'minimum-cut partitioning algorithm', see here: http://en.wikipedia.org/wiki/Cut_(graph_theory)
You might want to Google around for terms such as:
automatic graph layout; and
force-based algorithms.
GraphViz does implement some of these algorithms, not sure if it includes any that are useful to you.
One cautionary note -- for some algorithms small changes to your graph content can result in very large changes to the graph.

Can a Sequence Diagram realistically capture your logic in the same depth as code?

I use UML Sequence Diagrams all the time, and am familiar with the UML2 notation.
But I only ever use them to capture the essence of what I intend to do. In other words the diagram always exists at a level of abstraction above the actual code. Every time I use them to try and describe exactly what I intend to do I end up using so much horizontal space and so many alt/loop frames that its not worth the effort.
So it may be possible in theory but has anyone every really used the diagram in this level of detail? If so can you provide an example please?
I have the same problem but when I realize that I am going low-level I re-read this:
You should use sequence diagrams
when you want to look at the behavior
of several objects within a single use
case. Sequence diagrams are good at
showing collaborations among the
objects; they are not so good at
precise definition of the behavior.
If you want to look at the behavior of
a single object across many use cases,
use a state diagram. If you want
to look at behavior across many use
cases or many threads, consider an
activity diagram.
If you want to explore multiple
alternative interactions quickly, you
may be better off with CRC cards,
as that avoids a lot of drawing and
erasing. It’s often handy to have a
CRC card session to explore design
alternatives and then use sequence
diagrams to capture any interactions
that you want to refer to later.
[excerpt from Martin Fowler's UML Distilled book]
It's all relative. The law of diminishing returns always applies when making a diagram. I think it's good to show the interaction between objects (objectA initializes objectB and calls method foo on it). But it's not practical to show the internals of a function. In that regard, a sequence diagram is not practical to capture the logic at the same depth as code. I would argue for intricate logic, you'd want to use a flowchart.
I think there are two issues to consider.
Be concrete
Sequence diagrams are at their best when they are used to convey to a single concrete scenario (of a use case for example).
When you use them to depict more than one scenario, usually to show what happens in every possible path through a use case, they get complicated very quickly.
Since source code is just like a use case in this regard (i.e. a general description instead of a specific one), sequence diagrams aren't a good fit. Imagine expanding x levels of the call graph of some method and showing all that information on a single diagram, including all if & loop conditions..
That's why 'capturing the essence' as you put it, is so important.
Ideally a sequence diagram fits on a single A4/Letter page, anything larger makes the diagram unwieldy. Perhaps as a rule of thumb, limit the number of objects to 6-10 and the number of calls to 10-25.
Focus on communication
Sequence diagrams are meant to highlight communication, not internal processing.
They're very expressive when it comes to specifying the communication that happens (involved parties, asynchronous, synchronous, immediate, delayed, signal, call, etc.) but not when it comes to internal processing (only actions really)
Also, although you can use variables it's far from perfect. The objects at the top are, well, objects. You could consider them as variables (i.e. use their names as variables) but it just isn't very convenient.
For example, try depicting the traversal of a linked list where you need to keep tabs on an element and its predecessor with a sequence diagram. You could use two 'variable' objects called 'current' and 'previous' and add the necessary actions to make current=current.next and previous=current but the result is just awkward.
Personally I have used sequence diagrams only as a description of general interaction between different objects, i.e. as a quick "temporal interaction sketch". When I tried to get more in depth, all quickly started to be confused...
I've found that the best compromise is a "simplified" sequence diagram followed by a clear but in depth description of the logic underneath.
The answer is no - it does capture it better then your source code!
At least in some aspects. Let me elaborate.
You - like the majority of the programmers, including me - think in source code lines. But the software end product - let's call it the System - is much more than that. It only exists in the mind of your team members. In better cases it also exists on paper or in other documented forms.
There are plenty of standard 'views' to describe the System. Like UML Class diagrams, UML activity diagrams etc. Each diagram shows the System from another point of view. You have static views, dynamic views, but in an architectural/software document you don't have to stop there. You can present nonstandard views in your own words, e.g. deployment view, performance view, usability view, company-values view, boss's favourite things view etc.
Each view captures and documents certain properties of the System.
It's very important to realize that the source code is just one view. The most important though because it's needed to generate a computer program. But it doesn't contain every piece of information of your System, nor explicitly nor implicitly. (E.g. the shared data between program modules, what are only connected via offline user activity. No trace in the source). It's just a static view which helps very little to understand your processes, the runtime dynamics of your living-breathing program.
A classic example of the Observer pattern. Especially if it used heavily, you'll hardly understand the System mechanis from the source code. That's why you use Sequence diagrams in that case. It captures the 'dynamic logic' of your system a lot better than your source code.
But if you meant some kind of business logic in great detail, you are better off with plain text/source code/pseudocode etc. You don't have to use UML diagrams just because they are the standard. You can use usecase modeling without drawing usecase diagrams. Always choose the view what's the best for you and for your purpose.
U.M.L. diagrams are guidelines, not strictly rules.
You don't have to make them exactly & detailed as the source code, but, you may try it, if you want it.
Sometimes, its possible to do it, sometimes, its not possible, because of the detail or complexity of systems, or don't have the time or details to do it.
Cheers.
P.D.
Any cheese-burguer or tuna-fish-burguer for the cat ?