Order-issuing neural network? - language-agnostic

I'm interested in writing certain software that uses machine learning, and performs certain actions based on external data.
However I've run into problem (that was always interesting to me) -
how is it possible to write machine learning software that issues orders or sequences of orders?
The problem is that as I understand it, neural network gets bunch on inputs, and "recalls" output based on results of previous trainings. Instantly (well, more or less). So I'm not sure how "issuing orders" could fit into that system, especially when actions performed by system affect the system with certain delay. I'm also a bit unsure how is it possible to train this thing.
Examples of such system:
1. First person shooter enemy controller. As I understand it, it is possible to implement neural network controller for the bot that will switch bot behavior strategies(well, assign priorities to them) based on some inputs (probably something like health, ammo, etc). But I don't see a way to make higher-order controller, that could issue sequence of commands like "go there, then turn left". Also, bot's actions will affect variables that control bot's behavior. I.e. shooting reduces ammo, falling from heights reduces health, etc.
2. Automated market trader. It is certainly possible to make system that will try to predict the next market price of something. However, I don't see how is it possible to make system that would issue order to buy something, watch the trend, then sell it back to gain profit/cover up losses.
3. Car driver. Again, (as I understand it) it is possible to make system that will maintain desired movement vector based on position/velocity/torque data and results of previous training. However I don't see a way to make such system (learn to) perform sequence of actions.
I.e. as I understood it, neural net is technically a matrix - you give it input, it produces output. But what about generating sequences of actions that could change environment program operates in?
If such tasks are not entirely suitable for neural networks, what else could be used?
P.S. I understand that the question isn't exactly clear, and I suspect that I'm missing some knowledge. So I'll appreciate some pointers (i.e. books/resources to read, etc).

You could try to connect the output neurons to controllers directly, e.g. moving forward, turning, or shooting in the ego shooter, or buying orders for the trader. However, I think that the best results are gained nowadays when you let the neural net solve one rather specific subproblem, and then let a "normal" program interpret its answer. For example, you could let the neural net construct a map overlay of "where do I want to be", which the bot then translates into movements. The neural network for the trader could produce a "how much do I want which paper", which the bot then translates into buying or selling orders.
The decision which subproblem should be solved by a neural network is a very central one for its design. The important thing is that good solutions can be taught to the neural network.
Edit: Expanding this in the examples: When the ego shooter bot gets shot, it should not have wanted to be there; when it gets to shoot someone else, it should have wanted to be there more. When the trader loses money from a paper, it should have wanted it less before; if it gains, it should have wanted it more. These things can be taught.

The problem you are describing is known as Reinforcement Learning. Reinforcement learning is essentially a machine learning algorithm (such as a neural network) coupled with a controller. It has been used for all of the applications you mention, even to drive real cars.

Related

Is it possible to perform Object Detection without proper training data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
For my task, I am given a series of frames extracted from a Tom and Jerry video. I need to detect the objects in a frame(in my case, the objects are tom and jerry) and their location. Since my dataset is different from the classes in ImageNet, I am stuck with no training data.
I have searched extensively and there seem to be some tools where I need to manually crop the location of images, is there any way to do this without such manual work ?
Any suggestions would be really helpful, thanks a lot !
is there any way to do this without such manual work ?
Welcome to the current state of machine learning, driven by data-hungry networks and much labor and work on creating datasets side :) Labels are here and will stay for a while, to tell your network (via loss function) what you want to do.
but but.. you are not in that bad situation at all, because you can go for the pre-trained net and just fine-tune into your lovely Jerry and Tom (acquiring training data will be 1-2h itself).
So what is this fine-tuning and how does it work? Let's say you are taking a pre-trained net trained on Imagenet and this net can perform reasonably well on classes defined in Imagenet. It will be your starting point. This network already has learned quite abstract features about all these objects from ImageNet, that's why the network is capable of transfer learning with a reasonably small amount of new class samples. Now when you add Tom and Jerry to the network output and fine-tune it on a small amount of data (20-100 samples) it will perform not that bad (I guess acc will be somewhere in 65-85%). So here is what I suggest:
google some pre-trained net easy to interact. I found this. See chapter 4. Transfer Learning with Your Own Image Dataset.
pick some labeling tool.
label 20-100 Toms, Jerries with bounding box. For a small dataset like this, divide it to ./train (80%) and ./test (20%). Try to catch different poses, different backgrounds, frames distinct from each other. Go for some augmentation.
Remove last network layer and add layer for 2 new outputs, Tom and Jerry.
train it (fine-tune it), check accuracy on your test set.
have fun! Train it again with more data.
"Is it possible to perform Object Detection without proper training
data?"
It's kinda is, but I can't imagine anything simpler than fine-tuning. We can talk here about:
A. non-machine learning approaches: which is computer vision + hand-crafting features + manually defining parametes and using it as a detector, but in your case, it is rather not the way you want to go; however some box sliding and manually color histogram thresholding may work for Tom and Jerry (this thresholding parameter naturally might be subject to train). This is quite often more work to do than proposed fine-tuning. Sometimes it is a way to label thousands of samples that way, then correct labels, then train more powerful detectors. There are numerous tasks that this approach is enough and the benefit can be lightweight and speed.
B. machine learning approaches which deal with no proper training data. Or maybe which deal with a small amount of data as humans do. This is mainly emerging filed, currently under active R&D and few of my favorites are:
fine-tuning pre-trained nets. Hey, we are using this because it is so simple!
one-shot approaches, like triplet-loss+deep-metrics
memory augmented neural networks used in one/few shot context
unsupervised, semi-supervised approaches
bio-plausible nets, including no-backprop approach with only last layer tuned via supervision

Deep Neural Network combined with qlearning

I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.
Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).
I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.
There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?
If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here.
There is even an implementation in Neon which works pretty well.
Finally, if you want to use continuous actions, you might be interested in this very recent paper.
To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)

Interesting NLP/machine-learning style project -- analyzing privacy policies

I wanted some input on an interesting problem I've been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characteristics of them. For example, do they take the user's location?, do they share/sell with third parties?, etc.
I've talked to a few people, read a lot about privacy policies, and thought about this myself. Here is my current plan of attack:
First, read a lot of privacy and find the major "cues" or indicators that a certain characteristic is met. For example, if hundreds of privacy policies have the same line: "We will take your location.", that line could be a cue with 100% confidence that that privacy policy includes taking of the user's location. Other cues would give much smaller degrees of confidence about a certain characteristic.. For example, the presence of the word "location" might increase the likelihood that the user's location is store by 25%.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
I wanted to ask whether you guys think this is a good approach to this problem. How exactly would you approach a problem like this? Furthermore, are there any specific tools or frameworks you'd recommend using. Any input is welcome. This is my first time doing a project which touches on artificial intelligence, specifically machine learning and NLP.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
This is text classification. Given that you have multiple output categories per document, it's actually multilabel classification. The standard approach is to manually label a set of documents with the classes/labels that you want to predict, then train a classifier on features of the documents; typically word or n-gram occurrences or counts, possibly weighted by tf-idf.
The popular learning algorithms for document classification include naive Bayes and linear SVMs, though other classifier learners may work too. Any classifier can be extended to a multilabel one by the one-vs.-rest (OvR) construction.
A very interesting problem indeed!
On a higher level, what you want is summarization- a document has to be reduced to a few key phrases. This is far from being a solved problem. A simple approach would be to search for keywords as opposed to key phrases. You can try something like LDA for topic modelling to find what each document is about. You can then search for topics which are present in all documents- I suspect what will come up is stuff to do with licenses, location, copyright, etc. MALLET has an easy-to-use implementation of LDA.
I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.
You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.
I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.
Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.

Modeling an RTS or how Blizzard was able to put together Starcraft 1 & 2?

Not sure if this question is related to software development, but I hope someone can at least point me in the right direction(no, not that direction..)
I am very curious as to how Blizzard achieve such a balance of strategic/tactic forces in their games? If you look at Starcraft 1 or now 2, each race have unique features that sort of counterpart other unique features of other races and all together create a pretty beautiful(to my mind at least) balance.
Is there some sort of area of mathematics that could help model these things? How do they do it basically?
I don't have full answer but here is what I know. Initially, when game is technically ready the balance is not ideal. When they started first public beta there were holes in balance that they patched very fast. They let players (testers) play as is and captured statistics of % of wins per race and tuned the parameters according to it. When the beta was at the end ratio was almost ideal: 33%/33%/33%.
I've no idea how Blizzard specifically did it - it might have just been through a lot of user testing.
However, the entire field of CS and statistics dedicated to these kinds of problems is simulation. Generally the idea is to construct a model of your system and then generate inputs according to some statistical distribution to try to understand the behaviour of that model. In the case of finding game balance, you would want to show that most sequences of game events led to some kind of equilibrium. This would probably involve some kind of stochastic modeling.
Linear algebra, maybe a little calculus, and a lot of testing.
There's no real mystery to this sort of thing, but people often just don't think about it in the rigorous terms you need to get a system that is fairly well-balanced.
Basically the designers know how quickly you can gather resources (both the best case and the average case), and they know how long it takes to build a unit, and they know roughly how powerful a unit is (eg. by reference to approximations such as damage per second). With this, you can ensure a unit's cost in resources makes sense. And from that, it's possible to compare resource gathering with unit cost to model the strength of a force growing over time. Similarly you can measure a force's capacity for damage over time, and compare the two. If there's a big disparity then they can tweak one or more of the variables to reduce it. Obviously there are many permutations of units but the interesting thing is that you only really need to understand the most powerful permutations - if a player picks a poor combination then it's ok if they lose. You don't want every approach to be equally good, as that would imply a boring game where your decisions are meaningless.
This is, of course, a simplification, but it helps get everything in roughly the right place and ensure that there are no units that are useless. Then testing can hammer down the rough edges and find most of the exploits that remain.
If you've tried SCII you've noticed that Blizzard records the relevant data for each game played on B.Net. They did the same in WC3, and presumably in SC1. With enough games stored, it is possible to get accurate results from statistical analysis. For example, the Protoss should win a third of all match-ups with similarly skilled opponents. If this is not the case, Blizzard can analyze the games where the Protoss won vs the games where they lost, and see what units made the difference. Then they can nerf those units (with a bit of in-house testing), and introduce the change at the next patch. Balancing is a difficult problem - a change that fixes problems in top-level games may break the balance in mid-level games.
Proper testing in a closed beta is impossible - you just can't accumulate enough data. This is why Blizzard do open betas. Even so the real test is the release of the game.

How to measure usability to get hard data?

There are a few posts on usability but none of them was useful to me.
I need a quantitative measure of usability of some part of an application.
I need to estimate it in hard numbers to be able to compare it with future versions (for e.g. reporting purposes). The simplest way is to count clicks and keystrokes, but this seems too simple (for example is the cost of filling a text field a simple sum of typing all the letters ? - I guess it is more complicated).
I need some mathematical model for that so I can estimate the numbers.
Does anyone know anything about this?
P.S. I don't need links to resources about designing user interfaces. I already have them. What I need is a mathematical apparatus to measure existing applications interface usability in hard numbers.
Thanks in advance.
http://www.techsmith.com/morae.asp
This is what Microsoft used in part when they spent millions redesigning Office 2007 with the ribbon toolbar.
Here is how Office 2007 was analyzed:
http://cs.winona.edu/CSConference/2007proceedings/caty.pdf
Be sure to check out the references at the end of the PDF too, there's a ton of good stuff there. Look up how Microsoft did Office 2007 (regardless of how you feel about it), they spent a ton of money on this stuff.
Your main ideas to approach in this are Effectiveness and Efficiency (and, in some cases, Efficacy). The basic points to remember are outlined on this webpage.
What you really want to look at doing is 'inspection' methods of measuring usability. These are typically more expensive to set up (both in terms of time, and finance), but can yield significant results if done properly. These methods include things like heuristic evaluation, which is simply comparing the system interface, and the usage of the system interface, with your usability heuristics (though, from what you've said above, this probably isn't what you're after).
More suited to your use, however, will be 'testing' methods, whereby you observe users performing tasks on your system. This is partially related to the point of effectiveness and efficiency, but can include various things, such as the "Think Aloud" concept (which works really well in certain circumstances, depending on the software being tested).
Jakob Nielsen has a decent (short) article on his website. There's another one, but it's more related to how to test in order to be representative, rather than how to perform the testing itself.
Consider measuring the time to perform critical tasks (using a new user and an experienced user) and the number of data entry errors for performing those tasks.
First you want to define goals: for example increasing the percentage of users who can complete a certain set of tasks, and reducing the time they need for it.
Then, get two cameras, a few users (5-10) give them a list of tasks to complete and ask them to think out loud. Half of the users should use the "old" system, the rest should use the new one.
Review the tapes, measure the time it took, measure success rates, discuss endlessly about interpretations.
Alternatively, you can develop a system for bucket-testing -- it works the same way, though it makes it far more difficult to find out something new. On the other hand, it's much cheaper, so you can do many more iterations. Of course that's limited to sites you can open to public testing.
That obviously implies you're trying to get comparative data between two designs. I can't think of a way of expressing usability as a value.
You might want to look into the GOMS model (Goals, Operators, Methods, and Selection rules). It is a very difficult research tool to use in my opinion, but it does provide a "mathematical" basis to measure performance in a strictly controlled environment. It is best used with "expert" users. See this very interesting case study of Project Ernestine for New England Telephone operators.
Measuring usability quantitatively is an extremely hard problem. I tackled this as a part of my doctoral work. The short answer is, yes, you can measure it; no, you can't use the results in a vacuum. You have to understand why something took longer or shorter; simply comparing numbers is worse than useless, because it's misleading.
For comparing alternate interfaces it works okay. In a longitudinal study, where users are bringing their past expertise with version 1 into their use of version 2, it's not going to be as useful. You will also need to take into account time to learn the interface, including time to re-understand the interface if the user's been away from it. Finally, if the task is of variable difficulty (and this is the usual case in the real world) then your numbers will be all over the map unless you have some way to factor out this difficulty.
GOMS (mentioned above) is a good method to use during the design phase to get an intuition about whether interface A is better than B at doing a specific task. However, it only addresses error-free performance by expert users, and only measures low-level task execution time. If the user figures out a more efficient way to do their work that you haven't thought of, you won't have a GOMS estimate for it and will have to draft one up.
Some specific measures that you could look into:
Measuring clock time for a standard task is good if you want to know what takes a long time. However, lab tests generally involve test subjects working much harder and concentrating much more than they do in everyday work, so comparing results from the lab to real users is going to be misleading.
Error rate: how often the user makes mistakes or backtracks. Especially if you notice the same sort of error occurring over and over again.
Appearance of workarounds; if your users are working around a feature, or taking a bunch of steps that you think are dumb, it may be a sign that your interface doesn't give the tools to figure out how to solve their problems.
Don't underestimate simply asking users how well they thought things went. Subjective usability is finicky but can be revealing.