original paper : https://doi.org/10.48550/arXiv.1905.11108

Why SQIL?

The main idea in this paper is that the effectiveness of adversarial imitation methods can be achieved by a much simpler approach that does not require adversarial training, or indeed learning a reward function at all.

Give a constant reward +1 for matching the expert beahviour and constant reward 0 for all other behaviour

https://arxiv.org/pdf/1905.11108.pdf