Logo

The Data Daily

r/MachineLearning - [D] Reinforcement Learning: Novelty, Uncertainty, Exploration, Unsupervised Categorization, and Long-term Memory

r/MachineLearning - [D] Reinforcement Learning: Novelty, Uncertainty, Exploration, Unsupervised Categorization, and Long-term Memory

Hey all, I’ve been thinking about RL for the past few months and I was curious to see if anyone here could give some guidance. Basically pointing me to papers or just a good dialogue would be much appreciated. I’m not in school so I don’t have much access to others interested in the field.

Uncertainty and exploration: I’ve been tinkering with cartpole and using an e-greedy exploration method. But, I don’t like fixed or pre-determined exploration rates because they’re just not realistic. One way I’ve approached this differently is to increase the likelihood of exploration when the net is uncertain about with action to take. I’ve implemented this by looking at the certainty conveyed by the softmax output; higher certainty is conveyed by a larger distance between outputs. Note that certainty doesn’t entail accuracy, merely a large about of consistent training for the current state. This does work but my experience is that it takes longer to converge. Open to suggestions.

Novelty nets: Along the lines of thought above it would be nice if upon entering a state the agent knew if it had been there before. Easy enough for the finite case right, but not so for continuous spaces. It’d be great if this would be accomplished with a neural net, but my understanding is that it’s impossible. You can only update a net with new info via backprop and one can’t train on data unseen. Which leads to my next line of thought...

Unsupervised categorization: If you’ve followed my previous two points this will make more sense. It’s a given that learning good categories enables good RL, but most robust categorization methods seem to involve supervised learning. I attribute this to the fact that nets can learn to engineer better distance metrics than the ones classically used in unsupervised learning. It strikes me that in a similar way that people abandoned hand-engineer features for learning them the future of unsupervised learning methods will involve learning the best distance metrics for the data set at hand. BUT, I’m not really sure where to start on this. If I could integrate a good unsupervised method that just so happened to have a way to judge classification uncertainty then I could address the novelty and exploration points above in one blow. This leads to my last thought...

Long-term memory: Robust unsupervised learning like that mentioned above would also enable a very compact form of memory storage, and storage in a way that doesn’t depend on unraveling RNNs through time. We certainly retain memories bizarrely well. I remember things from both my childhood and yesterday, likely using the same retrieval methods. As Sutton has pointed out, “What function approximation can’t do, however, is augment the state representation with memories of past observations.” I just feel like we need a better way to address long-term memories and its access. For example I can see a new scene that triggers an old memory, which is a scenario that maybe could be well approximated by an LSTM, but could it follow the memory down so to speak; one access triggering a related memory and so on until that linkage chain is exhausted and the useful memories assimilated. I think an unsupervised learning method could very well enable this by use of its learned relation methods.

Thanks to anyone who stuck with me, all thoughts welcome.

Images Powered by Shutterstock