Campbell, M., Hoane, A. J. Jr & & Hsu, F.-h. Deep Blue. Artif. Intell 134, 57– 83 (2002 ).
Silver, D. et al. Mastering the video game of Opt for deep neural networks and tree search. Nature 529, 484– 489 (2016 ).
Bellemare, M. G., Naddaf, Y., Veness, J. & & Bowling, M. The arcade discovering environment: an examination platform for basic representatives. J. Artif. Intell. Res 47, 253– 279 (2013 ).
Machado, M. et al. Reviewing the game discovering environment: examination procedures and open issues for basic representatives. J. Artif. Intell. Res 61, 523– 562 (2018 ).
Silver, D. et al. A basic support discovering algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140– 1144 (2018 ).
Schaeffer, J. et al. A world champion quality checkers program. Artif. Intell 53, 273– 289 (1992 ).
Brown, N. & & Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats leading specialists. Science 359, 418– 424 (2018 ).
Moravčík, M. et al. Deepstack: expert-level expert system in heads-up no-limit poker. Science 356, 508– 513 (2017 ).
Vlahavas, I. & & Refanidis, I. Preparation and Scheduling Technical Report (EETN, 2013).
Segler, M. H., Preuss, M. & & Waller, M. P. Preparation chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604– 610 (2018 ).
Sutton, R. S. & & Barto, A. G. Support Knowing: An Intro second edn (MIT Press, 2018).
Deisenroth, M. & & Rasmussen, C. PILCO: a model-based and data-efficient technique to policy search. In Proc. 28th Global Conference on Artificial Intelligence, ICML 2011 465– 472 (Omnipress, 2011).
Heess, N. et al. Knowing constant control policies by stochastic worth gradients. In NIPS’ 15: Proc. 28th Global Conference on Neural Info Processing Systems Vol. 2 (eds Cortes, C. et al.) 2944– 2952 (MIT Press, 2015).
Levine, S. & & Abbeel, P. Knowing neural network policies with directed policy search under unidentified characteristics. Adv. Neural Inf. Process. Syst 27, 1071– 1079 (2014 ).
Hafner, D. et al. Knowing hidden characteristics for preparing from pixels. Preprint at https://arxiv.org/abs/1811.04551 (2018 ).
Kaiser, L. et al. Model-based support discovering for atari. Preprint at https://arxiv.org/abs/1903.00374 (2019 ).
Buesing, L. et al. Knowing and querying quick generative designs for support knowing. Preprint at https://arxiv.org/abs/1802.03006 (2018 ).
Espeholt, L. et al. IMPALA: scalable dispersed deep-RL with significance weighted actor-learner architectures. In Proc. International Conference on Artificial Intelligence, ICML Vol. 80 (eds Dy, J. & & Krause, A.) 1407– 1416 (2018 ).
Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. & & Munos, R. Recurrent experience replay in dispersed support knowing. In International Conference on Knowing Representations (2019 ).
Horgan, D. et al. Dispersed prioritized experience replay. In International Conference on Knowing Representations (2018 ).
Puterman, M. L. Markov Choice Processes: Discrete Stochastic Dynamic Programs first edn (John Wiley & & Sons, 1994).
Coulom, R. Effective selectivity and backup operators in Monte-Carlo tree search. In International Conference on Computers and Games 72– 83 (Springer, 2006).
Wahlström, N., Schön, T. B. & & Deisenroth, M. P. From pixels to torques: policy knowing with deep dynamical designs. Preprint at http://arxiv.org/abs/1502.02251 (2015 ).
Watter, M., Springenberg, J. T., Boedecker, J. & & Riedmiller, M. Embed to manage: an in your area direct hidden characteristics design for control from raw images. In NIPS’ 15: Proc. 28th Global Conference on Neural Info Processing Systems Vol. 2 (eds Cortes, C. et al.) 2746– 2754 (MIT Press, 2015).
Ha, D. & & Schmidhuber, J. Recurrent world designs assist in policy advancement. In NIPS’ 18: Proc. 32nd Global Conference on Neural Info Processing Systems (eds Bengio, S. et al.) 2455– 2467 (Curran Associates, 2018).
Gelada, C., Kumar, S., Buckman, J., Nachum, O. & & Bellemare, M. G. DeepMDP: discovering constant hidden area designs for representation knowing. Proc. 36th Global Conference on Artificial Intelligence: Volume 97 of Proc. Artificial Intelligence Research Study (eds Chaudhuri, K. & & Salakhutdinov, R.) 2170– 2179 (PMLR, 2019).
van Hasselt, H., Hessel, M. & & Aslanides, J. When to utilize parametric designs in support knowing? Preprint at https://arxiv.org/abs/1906.05243 (2019 ).
Tamar, A., Wu, Y., Thomas, G., Levine, S. & & Abbeel, P. Worth model networks. Adv. Neural Inf. Process. Syst 29, 2154– 2162 (2016 ).
Silver, D. et al. The predictron: end-to-end knowing and preparation. In Proc. 34th Global Conference on Artificial Intelligence Vol. 70 (eds Precup, D. & & Teh, Y. W.) 3191– 3199 (JMLR, 2017).
Farahmand, A. M., Barreto, A. & & Nikovski, D. Value-aware loss function for model-based support knowing. In Proc. 20th International Conference on Expert System and Stats: Volume 54 of Proc. Artificial Intelligence Research Study (eds Singh, A. & & Zhu, J) 1486– 1494 (PMLR, 2017).
Farahmand, A. Iterative value-aware design knowing. Adv. Neural Inf. Process. Syst 31, 9090– 9101 (2018 ).
Farquhar, G., Rocktaeschel, T., Igl, M. & & Whiteson, S. TreeQN and ATreeC: differentiable tree preparation for deep support knowing. In International Conference on Knowing Representations (2018 ).
Oh, J., Singh, S. & & Lee, H. Worth forecast network. Adv. Neural Inf. Process. Syst 30, 6118– 6128 (2017 ).
Krizhevsky, A., Sutskever, I. & & Hinton, G. E. Imagenet category with deep convolutional neural networks. Adv. Neural Inf. Process. Syst 25, 1097– 1105 (2012 ).
He, K., Zhang, X., Ren, S. & & Sun, J. Identity mappings in deep recurring networks. In 14th European Conference on Computer System Vision 630– 645 (2016 ).
Hessel, M. et al. Rainbow: integrating enhancements in deep support knowing. In Thirty-Second AAAI Conference on Expert System (2018 ).
Schmitt, S., Hessel, M. & & Simonyan, K. Off-policy actor-critic with shared experience replay. Preprint at https://arxiv.org/abs/1909.11583 (2019 ).
Azizzadenesheli, K. et al. Unexpected unfavorable outcomes for generative adversarial tree search. Preprint at http://arxiv.org/abs/1806.05780 (2018 ).
Mnih, V. et al. Human-level control through deep support knowing. Nature 518, 529– 533 (2015 ).
Open, A. I. OpenAI 5. OpenAI https://blog.openai.com/openai-five/ (2018 ).
Vinyals, O. et al. Grandmaster level in StarCraft II utilizing multi-agent support knowing. Nature 575, 350– 354 (2019 ).
Jaderberg, M. et al. Support knowing with not being watched auxiliary jobs. Preprint at https://arxiv.org/abs/1611.05397 (2016 ).
Silver, D. et al. Mastering the video game of Go without human understanding. Nature 550, 354– 359 (2017 ).
Kocsis, L. & & Szepesvári, C. Outlaw based Monte-Carlo preparation. In European Conference on Artificial Intelligence 282– 293 (Springer, 2006).
Rosin, C. D. Multi-armed outlaws with episode context. Ann. Mathematics. Artif. Intell 61, 203– 230 (2011 ).
Schadd, M. P., Winands, M. H., Van Den Herik, H. J., Chaslot, G. M.-B. & & Uiterwijk, J. W. Single-player Monte-Carlo tree search. In International Conference on Computers and Games 1– 12 (Springer, 2008).
Pohlen, T. et al. Observe and look even more: attaining constant efficiency on Atari. Preprint at https://arxiv.org/abs/1805.11593 (2018 ).
Schaul, T., Quan, J., Antonoglou, I. & & Silver, D. Prioritized experience replay. In International Conference on Knowing Representations (2016 ).
Cloud TPU. Google Cloud https://cloud.google.com/tpu/ (2019 ).
Coulom, R. Whole-history score: a Bayesian score system for gamers of time-varying strength. In International Conference on Computers and Games 113– 124 (2008 ).
Nair, A. et al. Enormously parallel techniques for deep support knowing. Preprint at https://arxiv.org/abs/1507.04296 (2015 ).
Lanctot, M. et al. OpenSpiel: a structure for support knowing in video games. Preprint at http://arxiv.org/abs/1908.09453 (2019 ).