Working with RLang

There are many ways RLang can be integrated into a project. The simplest involves using a pre-existing RLang-enabled reinforcement learning agent that utilizes RLang information in a predefined manner. In this scenario, it is up to the user to provide a written RLang program which can be parsed using parse() or parse_file() if the program resides in a .rlang file.

It is also possible to write your own RLang-enabled agent that can utilize the information stored in an RLang program (accessible via the RLangKnowledge class). This route requires a more thorough understanding of the RLang Python groundings module.

Using a Pre-built RLang Agent

An RLang project using a pre-built agent might have the following directory structure:

gridworld/
    main.py           \\ Python code for running the project
    gridworld.rlang   \\ RLang program containing world information
    vocab.json        \\ An optional (but useful) file that holds metadata and can reference additional groundings
    groundings.py     \\ An optional file that can store RLang grounding defined in Python

where main.py could be as simple as:

from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.agents import QLearningAgent

import rlang
from rlang.agents import RLangQLearningAgent

mdp, states = create_mdp()
agent = QLearningAgent(mdp.get_actions())     # Create a baseline Q-Learning agent

knowledge = rlang.parse_file("gridworld.rlang")  # Parse RLang program into knowledge object
rlang_agent = RLangQLearningAgent(mdp.get_actions(), states, knowledge) # Create RLang Q-Learning agent

run_agents_on_mdp([agent, rlang_agent], mdp)  # Compare performance of agents on mdp

and gridworld.rlang could look like this:

import "vocab.json"

Constant lava_locs := [[3, 2], [1, 4], [2, 4], [2, 5]]

Factor position := S[0, 1]
Factor x := position[0]
Factor y := position[1]

Proposition reached_goal := x == 5 and y == 1
Proposition reached_wall := x == 3 and y == 1
Proposition in_lava := position in lava_locs

Effect main:
    if in_lava:
        Reward -1
    if reached_goal:
        Reward 1
    if reached_wall:
        S' -> S

For help on how to write an RLang program, see RLang Language Reference.

Using a Vocabulary File

While optional, vocabulary files allow for extremely powerful functionality. A minimal vocabulary.json file might contain metadata on the MDP that’s being interfaced with like the size of the state and action space:

{
  "domain": "gridworld",
  "mdp_metadata": {
    "state_space": {
      "size": 2,
      "dtype": "int"
    },
    "action_space": {
      "shape": 1,
      "dtype": "int"
    }
  }
}

A more powerful vocabulary file can be used to reference additional RLang groundings declared inside an auxiliary Python file. This vocabulary file includes two feature groundings (angle_target and hover_target) from an auxiliary module called grounding.py:

Example vocab.json with additional groundings
{
  "domain": "lunarlander",
  "mdp_metadata": {
    "state_space": {
      "size": 8,
      "dtype": "float"
    },
    "action_space": {
      "shape": 1,
      "dtype": "int"
    }
  },
  "modules": [
    {
      "module_name": "grounding",
      "file_name": "examples/lunar_lander/grounding.py"
    }
  ],
  "vocabulary": {
    "features": [
      {
        "name": "angle_target",
        "grounding": "grounding.angle_target"
      },
      {
        "name": "hover_target",
        "grounding": "grounding.hover_target"
      }
    ]
  }
}

An accompanying grounding file grounding.py
from rlang.grounding import Feature

def _angle_target(state):
    position = state[0:2]
    velocity = state[2:4]
    angle_targ = position[0] * 0.5 + velocity[0] * 1.0  # angle should point towards center
    if angle_targ > 0.4:
        angle_targ = 0.4  # more than 0.4 radians (22 degrees) is bad
    if angle_targ < -0.4:
        angle_targ = -0.4
    return angle_targ

def _hover_target(state):
    position = state[0:2]
    hover_targ = 0.55 * abs(
        position[0]
    )
    return hover_targ


angle_target = Feature(_angle_target)
hover_target = Feature(_hover_target)

angle_target and hover_target can now be referenced in an RLang program like any other grounding.

Note

It is possible to use more than one Python module to supply additional groundings

Creating a Custom RLang Agent using RLangKnowledge

To get the most out of RLang, users should implement their own RLang-enabled reinforcement learning agents. Doing so requires becoming familiar with RLang’s groundings module and most importantly the RLangKnowledge object, which holds all RLang groundings parsed from an RLang program. RLangQLearningAgent is a good example on how to integrate RLang knowledge into an RL agent:

RLangQLearningAgentClass.py
 1from collections import defaultdict
 2
 3from simple_rl.agents import QLearningAgent
 4
 5from ..grounding.utils.primitives import VectorState
 6
 7
 8class RLangQLearningAgent(QLearningAgent):
 9    """Implementation for a Q Learning agent that utilizes RLang hints"""
10
11    def __init__(self, actions, states, knowledge, name="RLang-Q-learning", use_transition=False, alpha=0.1, gamma=0.99,
12                 epsilon=0.1, explore="uniform", anneal=False, default_q=0):
13        """
14        Args:
15            actions (list): Contains strings denoting the actions.
16            states (list): A list of all possible states.
17            knowledge (list): An RLangKnowledge object.
18            name (str): Denotes the name of the agent.
19            alpha (float): Learning rate.
20            gamma (float): Discount factor.
21            epsilon (float): Exploration term.
22            explore (str): One of {softmax, uniform}. Denotes explore policy.
23            default_q (float): the default value to initialize every entry in the q-table with [by default, set to 0.0]
24        """
25
26        def weighted_reward(r_func, state_dict):
27            reward = 0
28            for k, v in state_dict.items():
29                reward += r_func(state=VectorState(k)) * v
30            return reward
31
32        def weighted_value(q_func, state_dict):
33            reward = 0
34            for k, v in state_dict.items():
35                maxx = q_func[k][actions[0]]
36                for a in actions:
37                    val = q_func[k][a]
38                    if val > maxx:
39                        maxx = val
40                reward += maxx * v
41            return reward
42
43        q_func = defaultdict(lambda: defaultdict(lambda: default_q))
44        reward_function = knowledge.reward_function
45
46        if reward_function:
47            for s in states:
48                for i in range(len(actions)):
49                    a = actions[i]
50                    q_func[s][a] = reward_function(state=VectorState(s), action=i)
51
52        transition_function = knowledge.transition_function
53
54        if use_transition and transition_function and reward_function:
55            for s in states:
56                for i in range(len(actions)):
57                    a = actions[i]
58                    s_primei = transition_function(state=VectorState(s), action=i)
59                    if s_primei:
60                        # Q learning Update
61                        r_prime = weighted_reward(reward_function, s_primei)
62                        v_s_prime = weighted_value(q_func, s_primei)
63                        q_func[s][a] += alpha * (r_prime + gamma * v_s_prime)
64
65        super().__init__(actions, name=name, alpha=alpha, gamma=gamma,
66                         epsilon=epsilon, explore=explore, anneal=anneal, custom_q_init=q_func, default_q=default_q)

This section should perhaps discuss the knowledge object in more detail and provide examples.