Getting Started

RLang is a Domain-Specific Language for communicating domain knowledge to an RL agent. Using RLang, you can specify information about policies, options, transition dynamics, and state factors and features. * Using the RLang Python package, you can parse RLang programs into algorithm-agnostic Python objects. This page provides a quick tutorial on getting set up with RLang and writing an RLang program.

RLange Agent-Environment Diagram

The RLang pipeline

Installing RLang

RLang is not on PyPi yet. The repo is on github.

Check the readme for the latest installation instructions, which might be the following:

$ brew install swig (on mac)
$ python -m pip install rlang/rlang/dist/rlang-0.2.1-py3-none-any.whl

Full Example

In this example, we’ll take a look at how RLang can be used to provide domain knowledge about a gridworld environment that can be used to speed up the learning of an RL agent. This example is pulled directly from the RLang package:

examples/gridworld/
    main.py           \\ Python code for running the project
    gridworld.rlang   \\ RLang program containing world information
    vocab.json        \\ Holds metadata and can reference additional groundings

The project files are included below:

main.py
 1from simple_rl.run_experiments import run_agents_on_mdp
 2from simple_rl.tasks import GridWorldMDP
 3from simple_rl.agents import QLearningAgent
 4
 5import context
 6import rlang
 7from rlang.agents import RLangQLearningAgent
 8
 9
10def create_mdp():
11    # MDP parameters
12    width, height = 6, 6
13    lava_locs = [(3, 2), (1, 4), (2, 4), (2, 5)]
14    walls = [(3, 1)]
15    goal_locs = [(5, 1)]
16
17    mdp = GridWorldMDP(width, height, walls=walls, lava_locs=lava_locs, goal_locs=goal_locs, slip_prob=0.0,
18                       step_cost=0)
19    states = list()
20    for w in range(mdp.width):
21        for h in range(mdp.height):
22            states.append((w, h))
23
24    return mdp, states
25
26
27def simple_experiment():
28    mdp, states = create_mdp()
29    agent = QLearningAgent(mdp.get_actions())
30    run_agents_on_mdp([agent], mdp)
31
32
33def rlang_experiment():
34    # We need to know these MDP and Q Learning parameters
35    mdp, states = create_mdp()
36
37    # Parse RLang program into knowledge object
38    knowledge = rlang.parse_file("gridworld.rlang")
39
40    # Create a baseline Q-Learning agent
41    agent = QLearningAgent(mdp.get_actions())
42
43    # Create RLang Q-Learning agent
44    rlang_agent = RLangQLearningAgent(
45        actions=mdp.get_actions(), states=states, knowledge=knowledge)
46
47    # Compare performance of agents on mdp
48    run_agents_on_mdp([agent, rlang_agent], mdp)
49
50
51if __name__ == '__main__':
52    rlang_experiment()

gridworld.rlang
 1# To start, we imported a vocabulary file contains metadata on the MDP
 2# and references to additional RLang groundings.
 3
 4import "vocab.json"
 5
 6# The position of the agent as well as its x and y coordinates are defined
 7# as Factors, each representing a portion of the state space.
 8
 9Factor position := S[0, 1]
10Factor x := position[0]
11Factor y := position[1]
12
13Constant lava_locs := [[3, 2], [1, 4], [2, 4], [2, 5]]
14
15Proposition reached_goal := x == 5 and y == 1 
16Proposition reached_wall := x == 3 and y == 1
17Proposition in_lava := position in lava_locs
18
19# The following Actions correspond to four 
20# discrete actions the agent can take:
21#   0 - move up
22#   1 - move down
23#   2 - move left
24#   3 - move right
25
26Action up := 0
27Action down := 1
28Action left := 2
29Action right := 3
30
31# The predicted consequence of each of these actions is 
32# specified using an Effect: the position of the agent in the next state 
33# and the next state itself will update accordingly.
34
35Effect action_effect:
36    if A == up:
37        position' -> position + [0, 1]
38        S' -> S + [0, 1]
39    elif A == down:
40        position' -> position + [0, -1]
41        S' -> S + [0, -1]
42    elif A == left:
43        position' -> position + [-1, 0]
44        S' -> S + [-1, 0]
45    elif A == right:
46        position' -> position + [1, 0]
47        S' -> S + [1, 0]
48
49
50# A reward of 1 is given for reaching the goal coordinates,
51# and -1 for moving on lava. 
52# The following Effect references the previously defined Propositions.
53
54Effect main:
55    if in_lava:
56        Reward -1
57    if reached_goal:
58        Reward 1
59    if reached_wall:
60        S' -> S     # state remains the same.
61    else:
62        -> action_effect

vocab.json
{
  "domain": "gridworld",
  "mdp_metadata": {
    "state_space": {
      "size": 2,
      "dtype": "int"
    },
    "action_space": {
      "shape": 1,
      "dtype": "int"
    }
  }
}

*

For a full list of groundings, see RLang Language Reference.