All of the notation for RL by Sutton
states: x → s
agents
actions: u → a
rewards: -cost → R
environments

That all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal (called reward).

