Mastering Atari, Go, chess and shogi by preparing with a discovered design


  • 1.

    Campbell, M., Hoane, A. J. Jr & & Hsu, F.-h. Deep Blue. Artif. Intell 134, 57– 83 (2002 ).

    Article

    Google Scholar

  • 2.

    Silver, D. et al. Mastering the video game of Opt for deep neural networks and tree search. Nature 529, 484– 489 (2016 ).

    ADS
    CAS
    Article

    Google Scholar

  • 3.

    Bellemare, M. G., Naddaf, Y., Veness, J. & & Bowling, M. The arcade discovering environment: an examination platform for basic representatives. J. Artif. Intell. Res 47, 253– 279 (2013 ).

    Article

    Google Scholar

  • 4.

    Machado, M. et al. Reviewing the game discovering environment: examination procedures and open issues for basic representatives. J. Artif. Intell. Res 61, 523– 562 (2018 ).

    MathSciNet
    Article

    Google Scholar

  • 5.

    Silver, D. et al. A basic support discovering algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140– 1144 (2018 ).

    ADS
    MathSciNet
    CAS
    Article

    Google Scholar

  • 6.

    Schaeffer, J. et al. A world champion quality checkers program. Artif. Intell 53, 273– 289 (1992 ).

    Article

    Google Scholar

  • 7.

    Brown, N. & & Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats leading specialists. Science 359, 418– 424 (2018 ).

    ADS
    MathSciNet
    CAS
    Article

    Google Scholar

  • 8.

    Moravčík, M. et al. Deepstack: expert-level expert system in heads-up no-limit poker. Science 356, 508– 513 (2017 ).

    ADS
    MathSciNet
    Article

    Google Scholar

  • 9.

    Vlahavas, I. & & Refanidis, I. Preparation and Scheduling Technical Report (EETN, 2013).

  • 10.

    Segler, M. H., Preuss, M. & & Waller, M. P. Preparation chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604– 610 (2018 ).

    ADS
    CAS
    Article

    Google Scholar

  • 11.

    Sutton, R. S. & & Barto, A. G. Support Knowing: An Intro second edn (MIT Press, 2018).

  • 12.

    Deisenroth, M. & & Rasmussen, C. PILCO: a model-based and data-efficient technique to policy search. In Proc. 28th Global Conference on Artificial Intelligence, ICML 2011 465– 472 (Omnipress, 2011).

  • 13.

    Heess, N. et al. Knowing constant control policies by stochastic worth gradients. In NIPS’ 15: Proc. 28th Global Conference on Neural Info Processing Systems Vol. 2 (eds Cortes, C. et al.) 2944– 2952 (MIT Press, 2015).

  • 14.

    Levine, S. & & Abbeel, P. Knowing neural network policies with directed policy search under unidentified characteristics. Adv. Neural Inf. Process. Syst 27, 1071– 1079 (2014 ).


    Google Scholar

  • 15.

    Hafner, D. et al. Knowing hidden characteristics for preparing from pixels. Preprint at https://arxiv.org/abs/1811.04551 (2018 ).

  • 16.

    Kaiser, L. et al. Model-based support discovering for atari. Preprint at https://arxiv.org/abs/1903.00374 (2019 ).

  • 17.

    Buesing, L. et al. Knowing and querying quick generative designs for support knowing. Preprint at https://arxiv.org/abs/1802.03006 (2018 ).

  • 18.

    Espeholt, L. et al. IMPALA: scalable dispersed deep-RL with significance weighted actor-learner architectures. In Proc. International Conference on Artificial Intelligence, ICML Vol. 80 (eds Dy, J. & & Krause, A.) 1407– 1416 (2018 ).

  • 19.

    Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. & & Munos, R. Recurrent experience replay in dispersed support knowing. In International Conference on Knowing Representations (2019 ).

  • 20.

    Horgan, D. et al. Dispersed prioritized experience replay. In International Conference on Knowing Representations (2018 ).

  • 21.

    Puterman, M. L. Markov Choice Processes: Discrete Stochastic Dynamic Programs first edn (John Wiley & & Sons, 1994).

  • 22.

    Coulom, R. Effective selectivity and backup operators in Monte-Carlo tree search. In International Conference on Computers and Games 72– 83 (Springer, 2006).

  • 23.

    Wahlström, N., Schön, T. B. & & Deisenroth, M. P. From pixels to torques: policy knowing with deep dynamical designs. Preprint at http://arxiv.org/abs/1502.02251 (2015 ).

  • 24.

    Watter, M., Springenberg, J. T., Boedecker, J. & & Riedmiller, M. Embed to manage: an in your area direct hidden characteristics design for control from raw images. In NIPS’ 15: Proc. 28th Global Conference on Neural Info Processing Systems Vol. 2 (eds Cortes, C. et al.) 2746– 2754 (MIT Press, 2015).

  • 25.

    Ha, D. & & Schmidhuber, J. Recurrent world designs assist in policy advancement. In NIPS’ 18: Proc. 32nd Global Conference on Neural Info Processing Systems (eds Bengio, S. et al.) 2455– 2467 (Curran Associates, 2018).

  • 26.

    Gelada, C., Kumar, S., Buckman, J., Nachum, O. & & Bellemare, M. G. DeepMDP: discovering constant hidden area designs for representation knowing. Proc. 36th Global Conference on Artificial Intelligence: Volume 97 of Proc. Artificial Intelligence Research Study (eds Chaudhuri, K. & & Salakhutdinov, R.) 2170– 2179 (PMLR, 2019).

  • 27.

    van Hasselt, H., Hessel, M. & & Aslanides, J. When to utilize parametric designs in support knowing? Preprint at https://arxiv.org/abs/1906.05243 (2019 ).

  • 28.

    Tamar, A., Wu, Y., Thomas, G., Levine, S. & & Abbeel, P. Worth model networks. Adv. Neural Inf. Process. Syst 29, 2154– 2162 (2016 ).


    Google Scholar

  • 29.

    Silver, D. et al. The predictron: end-to-end knowing and preparation. In Proc. 34th Global Conference on Artificial Intelligence Vol. 70 (eds Precup, D. & & Teh, Y. W.) 3191– 3199 (JMLR, 2017).

  • 30.

    Farahmand, A. M., Barreto, A. & & Nikovski, D. Value-aware loss function for model-based support knowing. In Proc. 20th International Conference on Expert System and Stats: Volume 54 of Proc. Artificial Intelligence Research Study (eds Singh, A. & & Zhu, J) 1486– 1494 (PMLR, 2017).

  • 31.

    Farahmand, A. Iterative value-aware design knowing. Adv. Neural Inf. Process. Syst 31, 9090– 9101 (2018 ).


    Google Scholar

  • 32.

    Farquhar, G., Rocktaeschel, T., Igl, M. & & Whiteson, S. TreeQN and ATreeC: differentiable tree preparation for deep support knowing. In International Conference on Knowing Representations (2018 ).

  • 33.

    Oh, J., Singh, S. & & Lee, H. Worth forecast network. Adv. Neural Inf. Process. Syst 30, 6118– 6128 (2017 ).


    Google Scholar

  • 34.

    Krizhevsky, A., Sutskever, I. & & Hinton, G. E. Imagenet category with deep convolutional neural networks. Adv. Neural Inf. Process. Syst 25, 1097– 1105 (2012 ).


    Google Scholar

  • 35.

    He, K., Zhang, X., Ren, S. & & Sun, J. Identity mappings in deep recurring networks. In 14th European Conference on Computer System Vision 630– 645 (2016 ).

  • 36.

    Hessel, M. et al. Rainbow: integrating enhancements in deep support knowing. In Thirty-Second AAAI Conference on Expert System (2018 ).

  • 37.

    Schmitt, S., Hessel, M. & & Simonyan, K. Off-policy actor-critic with shared experience replay. Preprint at https://arxiv.org/abs/1909.11583 (2019 ).

  • 38.

    Azizzadenesheli, K. et al. Unexpected unfavorable outcomes for generative adversarial tree search. Preprint at http://arxiv.org/abs/1806.05780 (2018 ).

  • 39.

    Mnih, V. et al. Human-level control through deep support knowing. Nature 518, 529– 533 (2015 ).

    ADS
    CAS
    Article

    Google Scholar

  • 40.

    Open, A. I. OpenAI 5. OpenAI https://blog.openai.com/openai-five/ (2018 ).

  • 41.

    Vinyals, O. et al. Grandmaster level in StarCraft II utilizing multi-agent support knowing. Nature 575, 350– 354 (2019 ).

    ADS
    CAS
    Article

    Google Scholar

  • 42.

    Jaderberg, M. et al. Support knowing with not being watched auxiliary jobs. Preprint at https://arxiv.org/abs/1611.05397 (2016 ).

  • 43.

    Silver, D. et al. Mastering the video game of Go without human understanding. Nature 550, 354– 359 (2017 ).

    ADS
    CAS
    Article

    Google Scholar

  • 44.

    Kocsis, L. & & Szepesvári, C. Outlaw based Monte-Carlo preparation. In European Conference on Artificial Intelligence 282– 293 (Springer, 2006).

  • 45.

    Rosin, C. D. Multi-armed outlaws with episode context. Ann. Mathematics. Artif. Intell 61, 203– 230 (2011 ).

    MathSciNet
    Article

    Google Scholar

  • 46.

    Schadd, M. P., Winands, M. H., Van Den Herik, H. J., Chaslot, G. M.-B. & & Uiterwijk, J. W. Single-player Monte-Carlo tree search. In International Conference on Computers and Games 1– 12 (Springer, 2008).

  • 47.

    Pohlen, T. et al. Observe and look even more: attaining constant efficiency on Atari. Preprint at https://arxiv.org/abs/1805.11593 (2018 ).

  • 48.

    Schaul, T., Quan, J., Antonoglou, I. & & Silver, D. Prioritized experience replay. In International Conference on Knowing Representations (2016 ).

  • 49.

    Cloud TPU. Google Cloud https://cloud.google.com/tpu/ (2019 ).

  • 50.

    Coulom, R. Whole-history score: a Bayesian score system for gamers of time-varying strength. In International Conference on Computers and Games 113– 124 (2008 ).

  • 51.

    Nair, A. et al. Enormously parallel techniques for deep support knowing. Preprint at https://arxiv.org/abs/1507.04296 (2015 ).

  • 52.

    Lanctot, M. et al. OpenSpiel: a structure for support knowing in video games. Preprint at http://arxiv.org/abs/1908.09453 (2019 ).



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *