How to apply model free deep reinforcement learning when the access to the real environment is hard?

How to apply model free deep reinforcement learning when the access to the real environment is hard? - reinforcement-learning

Deep Reinforcement Learning can be very useful in applying it to real-world problems which have highly dynamic nature. Few examples can be listed as is finance, healthcare etc. But when it comes to these kinds of problems it is hard to have a simulated environment. So what are the possible things to do?

Let me first comment a couple concepts trying to give you future research directions according to your comments:
Probably the term "forecast" is not suitable to describe the kind of problems solved by Reinforcement Learning. In some sense, RL needs to do an internal forecast process to choose the best actions in the long term. But the problem solved is an agent choosing actions in an environment. So, if your problem is a forecast problem, maybe other techniques are more suitable than RL.
Between tabular methods and deep Q-learning there exist many other methods that maybe are more suitable to your problem. They are probably less powerful but easy to use (more stable, less parameter tuning, etc.) You can combine Q-learning with other function approximators (simpler than a deep neural network). In general, the best choice is the simplest one able to solve the problem.
I don't know how to simulate the problem of human activities with first-person vision. In fact, I don't fully understand the problem setup.
And regarding to the original question of applying RL without accessing a simulated environment, as I previously said in the comments, if you have enough data, you could probably apply an RL algorithm. I'm assuming that you can store data from your environment but you can not easily interact with it. This is typical, for example, in medical domains where there exist many data about [patient status, treatment, next patient status], but you can not interact with patients by applying random treatments. In this situation, there are some facts to take into account:
RL methods generally consume a very large quantity of data. This is specially true when combined with deep nets. How much data it is necessary depends totally of the problem, but be ready to store millions of tuples [state, action, next state] if your environment is complex.
The stored tuples should be collected with a policy which contains some exploratory actions. RL algorithm will try to find the best possible actions among the ones contained in the data. If the agent can interact with the environment, it should choose exploratory actions to find the best one. Similarly, if the agent cannot interact and instead the data is gathered in advance, this data also should contain exploratory actions. The papers Neural Fitted Q Iteration - First Experiences
with a Data Efficient Neural Reinforcement
Learning Method and Tree-Based Batch Mode Reinforcement Learning could be helpful to understand these concepts.

Related

human trace data for evaluation of reinforcement learning agent playing Atari?

In recent reinforcement learning researches about Atari games, agents performance is evaluated by human start.
[1507.04296] Massively Parallel Methods for Deep Reinforcement Learning
[1509.06461] Deep Reinforcement Learning with Double Q-learning
[1511.05952] Prioritized Experience Replay
In the human start evaluation, learned agents begin episodes of randomly sampled point from a human professional's game-play.
My question is:
Where can I get this human professional's game-play trace data?
For fare comparison, the trace data should be same among each research but I could not find the data.

I'm not aware of that data being publicly available anywhere. Indeed, as far as I know all the papers that use such human start evaluations were written by the same lab/organization (DeepMind), so that doesn't rule out the possibility that DeepMind has kept the data internal and hasn't shared it with external researchers.
Note that the paper Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents proposes a different (arguably better) approach for introducing the desired stochasticity in the environment to disincentivize an algorithm from simply memorizing strong sequences of actions. Their approach, referred to as sticky actions, is described in Section 5.2 of that paper. In 5.3 they also describe numerous disadvantages of other approaches, including disadvantages of the human starts approach.
In addition to arguably simply being a better approach, the sticky actions approach also has the advantage that it can very easily be implemented and used by all researchers, allowing for fair comparisons. So, I'd strongly recommend simply using the sticky actions instead of human starts. The disadvantage obviously is that you can't compare results easily anymore to results reported in those DeepMind papers with human starts, but those evaluations have numerous flaws as described in the paper linked above anyway (human starts can be considered as one flaw, but they also often have other flaws, such as reporting results of best run instead of reporting average of multiple runs, etc.).

model for general artificial intelligence in today's deep-learning and computational neroscience community

I've follwed recent DL's proceed from NIPS.
Although i haven't track what happens in Computational Neuroscience(CN)'s field.
But I wonder why so little work about general artificial intellgence(GAI)?for example, kinds of network of hebbian to build all supervised/unsupervised/Reinforcement learning.
Or Can anyone tell me what's the state of art about neural networks for GAI, from CN field?Or some review?
Thanks a lot!

All three fields, Deep Learning (e.g. what you read at NIPS), Artificial general intelligence (AGI) and Computational Neuroscience are separate fields with some overlap but not too much overlap.
A question like 'what is the state-of-the-art' cannot really be answered in such a general form and would also soon be outdated. You have to be more specific. But just check some sources in the specific field. Wikipedia might be a good start.

Interesting NLP/machine-learning style project -- analyzing privacy policies

I wanted some input on an interesting problem I've been assigned. The task is to analyze hundreds, and eventually thousands, of privacy policies and identify core characteristics of them. For example, do they take the user's location?, do they share/sell with third parties?, etc.
I've talked to a few people, read a lot about privacy policies, and thought about this myself. Here is my current plan of attack:
First, read a lot of privacy and find the major "cues" or indicators that a certain characteristic is met. For example, if hundreds of privacy policies have the same line: "We will take your location.", that line could be a cue with 100% confidence that that privacy policy includes taking of the user's location. Other cues would give much smaller degrees of confidence about a certain characteristic.. For example, the presence of the word "location" might increase the likelihood that the user's location is store by 25%.
The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
I wanted to ask whether you guys think this is a good approach to this problem. How exactly would you approach a problem like this? Furthermore, are there any specific tools or frameworks you'd recommend using. Any input is welcome. This is my first time doing a project which touches on artificial intelligence, specifically machine learning and NLP.

The idea would be to keep developing these cues, and their appropriate confidence intervals to the point where I could categorize all privacy policies with a high degree of confidence. An analogy here could be made to email-spam catching systems that use Bayesian filters to identify which mail is likely commercial and unsolicited.
This is text classification. Given that you have multiple output categories per document, it's actually multilabel classification. The standard approach is to manually label a set of documents with the classes/labels that you want to predict, then train a classifier on features of the documents; typically word or n-gram occurrences or counts, possibly weighted by tf-idf.
The popular learning algorithms for document classification include naive Bayes and linear SVMs, though other classifier learners may work too. Any classifier can be extended to a multilabel one by the one-vs.-rest (OvR) construction.

A very interesting problem indeed!
On a higher level, what you want is summarization- a document has to be reduced to a few key phrases. This is far from being a solved problem. A simple approach would be to search for keywords as opposed to key phrases. You can try something like LDA for topic modelling to find what each document is about. You can then search for topics which are present in all documents- I suspect what will come up is stuff to do with licenses, location, copyright, etc. MALLET has an easy-to-use implementation of LDA.

I would approach this as a machine learning problem where you are trying to classify things in multiple ways- ie wants location, wants ssn, etc.
You'll need to enumerate the characteristics you want to use (location, ssn), and then for each document say whether that document uses that info or not. Choose your features, train your data and then classify and test.
I think simple features like words and n-grams would probably get your pretty far, and a dictionary of words related to stuff like ssn or location would finish it nicely.
Use the machine learning algorithm of your choice- Naive Bayes is very easy to implement and use and would work ok as a first stab at the problem.

Genetic Programming Online Learning

Has anybody seen a GP implemented with online learning rather than the standard offline learning? I've done some stuff with genetic programs and I simply can't figure out what would be a good way to make the learning process online.
Please let me know if you have any ideas, seen any implementations, or have any references that I can look at.

Per the Wikipedia link, online learning "learns one instance at a time." The online/offline labels usually refer to how training data is feed to a supervised regression or classification algorithm. Since genetic programming is a heuristic search that uses an evaluation function to evaluate the fitness of its solutions, and not a training set with labels, those terms don't really apply.
If what you're asking is if the output of the GP algorithm (i.e. the best phenotype), can be used while it's still "searching" for better solutions, I see no reason why not, assuming it makes sense for your domain/application. Once the fitness of your GA/GP's population reaches a certain threshold, you can apply that solution to your application, and continue to run the GP, switching to a new solution when a better one becomes available.
One approach along this line is an algorithm called rtNEAT, which attempts to use a genetic algorithm to generate and update a neural network in real time.

I found a few examples by doing a Google scholar search for online Genetic Programming.
An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming
It actually looks like they found a way to make GP modify the machine code of the robot's control system during actual activities - pretty cool!
Those same authors went on to produce more related work, such as this improvement:
Evolution of a world model for a miniature robot using genetic programming
Hopefully their work will be enough to get you started - I don't have enough experience with genetic programming to be able to give you any specific advice.

It actually looks like they found a way to make GP modify the machine code of the robot's control system during actual activities - pretty cool!
Yes, the department at Uni Dortmund was heavily into linear GP :-)
Direct execution of GP programs vs. interpreted code has some advantages, though in these days you'd probably rather want to go with dynamic languages such as Java, C# or Obj-C that allow you to write classes/methods at runtime while still you can still benefit from some runtime rather than run on the raw CPU.
The online-learning approach doesn't seem like anything absolutely novel or different from 'classic GP' to me.
From my understanding it's just a case of extending the set of training/fitness/test cases during runtime?
Cheers,
Jay

So was that Data Structures & Algorithms course really useful after all?

I remember when I was in DSA I was like wtf O(n) and wondering where would I use it other than in grad school or if you're not a PhD like Bloch. Somehow uses for it does pop up in business analysis, so I was wondering when have you guys had to call up your Big O skills to see how to write an algorithm, which data structure did you use to fit or whether you had to actually create a new ds (like your own implementation of a splay tree or trie).

Understanding Data Structures has been fundamental to many of the projects I've worked on, and that goes beyond the ten minute song 'n dance one does when asked such a question in an interview situation.
Granted that modern environments with all sorts of collection classes can make light work of storing and accessing large amounts of data, but having an understanding that a particular problem is best solved with a particular data structure can be a great timesaver. And by "timesaver" I mean "the difference between something working and not working".

Honestly, being able to answer that stuff is my biggest criterion for taking interviewees seriously in an interview. Knowing how basic data structures work, basic O(n) analysis, and some light theory is really crucial to being able to write large applications successfully.
It's important in the interview because it's important in the job. I've worked with techs in the past that were self taught, without taking the data structures course or reading a data structures book, and their code is occasionally bad in ways they should have seen coming.
If you don't know that n2 is going to run slowly compared to n log n, you've got more to learn.
As far as the later half of the data structures courses, it isn't generally applicable to most tech jobs, but if you ever do wind up needing it, you'll wish you had paid more attention.

Big-O notation is one of the basic notations used when describing algorithms implemented by a particular library. For example, all documentation on STL that I've seen describes various operations in terms of big-O, so naturally you have to e.g. understand the difference between O(1), O(log n) and O(n) to understand the implications of your choice of STL containers and algorithms. MSDN also does that for .NET classes, and IIRC Java documentation does that for standard Java classes. So, I'd say that knowing the notation is pretty much a requirement for understanding documentation of most popular frameworks out there.

Sure (even though I'm a humble MS in EE -- no PhD, no CS, differently from my colleague Joshua Block), I write a lot of stuff that needs to be highly scalable (or components that may need to be reused in highly scalable apps), so big-O considerations are most always at work in my design (and it's not hard to take them into account). The data structures I use are almost always from Python's simple but rich supply (which I did lend a hand developing;-), rarely is a totally custom one needed (rather than building on top of list, dict, etc); but when it does happen (e.g. the bitvectors in my open source project gmpy), no big deal.

I was able to use B-Trees right when I learned about them in algorithm class (that was about 15 years ago when there were much less open source implementations available). And even later the knowledge about the differences of e. g. container classes came in handy...

Absolutely: even though stacks, queues, etc. are pretty straightforward, it helps to have been introduced to them in a disciplined fashion.
B-Tree's and more advanced sorting are a bit more difficult so learning them early was a big benefit and I have indeed had to implement each of them at various points.
Finally, I created an algorithm for single-connected components a few years back that was significantly better than the one our signal-processing team was using but I couldn't convince them that it was better until I could show that it was O(n) complexity rather than O(nlogn).
...just to name a few examples.
Of course, if you are content to remain a CRUD-system hacker with no real desire to do more than collect a paycheck, then it may not be necessary...

I found my knowledge of data structures very useful when I needed to implement a customizable event-driven system about ten years ago. That's the biggie, but I use that sort of knowledge fairly frequently in lesser ways.

For me, knowing the exact algorithms has been... nice as background knowledge. However, the thing that's been the most useful is the more general background of having to pay attention to how different pieces of an algorithm interact. For instance, there can be places in code where moving one piece of code (ie, outside a loop) can make a huge difference in both time and space.
Its less of the specific knowledge the course taught and, rather, more that it acted like several years of experience. The course took something that might take years to encounter (have drilled into you) all the variations of in pure "real world experience" and condensed it.

The title of your question asks about data structures and algorithms, but the body of your question focuses on complexity analysis, so I'll focus on that too:
There are lots of programming jobs where being able to do complexity analysis is at least occasionally useful. See What career can I hope for if I like algorithms? for some examples of these.
I can think of several instances in my career where either I or a co-worker have discovered a a piece of code where the (usually time, sometimes space) complexity was higher that it should have been. eg: something that was quadratic or cubic when it could have been linear or nlog(n). Such code would work fine when given small inputs, but on larger inputs would quickly become really slow or consume all available memory. Knowing alternative algorithms and data structures, their complexities, and also how to analyze the complexity to build new algorithms is vital in being able to correct these problems (or avoid them in the first place).

Networking is all I've used it: in an implementation of traveling salesman.

Unfortunately I do a lot of "line of business" and "forms over data" apps, so most problems I work on can be solved by hammering together arrays, linked lists, and hash tables. However, I've had the chance to work my data structures magic here and there:
Due to weird complex business rules, I worked on an application which used a custom thread pool implemented as a leftist-heap.
My dev team struggled to write a complex multithreaded app. It was plagued with race conditions, dead locks, and lousy performance due to very fine-grained locking. We re-worked the code to share state between threads, opting to write a very light-weight wrapper to facilitate message passing. Next, we converting our linked lists and hash tables to immutable stacks and immutable style and immutable red-black trees, we had no more problems with thread safety or performance. The resulting code was immaculate and surprisingly readable.
Frequently, a business rules engine requires you to roll your own state machine, which is very naturally modelled as a graph where vertexes and states and edges are transitions between states.
If for no other reasons, I'm glad I took the time to readable about data structures and algorithms simply to be able picture novel problems a little differently, especially combinatorial problems and graph problems. Graph theory is no longer a synonym for "scary".

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008