original paper : https://doi.org/10.48550/arXiv.1905.11108
Why SQIL?
The main idea in this paper is that the effectiveness of adversarial imitation methods can be achieved by a much simpler approach that does not require adversarial training, or indeed learning a reward function at all.
Give a constant reward +1 for matching the expert beahviour and constant reward 0 for all other behaviour