A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

Pseudo random number generation through reinforcement learning and recurrent neural networks

Parton, Maurizio
2020-01-01

Abstract

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.
File in questo prodotto:
File Dimensione Formato  
algorithms-13-00307-v2.pdf

accesso aperto

Descrizione: Article
Tipologia: PDF editoriale
Dimensione 1.14 MB
Formato Adobe PDF
1.14 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/736716
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact