Getting Started

RLang is a Domain-Specific Language for communicating domain knowledge to an RL agent. Using RLang, you can specify information about policies, options, transition dynamics, and state factors and features. * Using the RLang Python package, you can parse RLang programs into algorithm-agnostic Python objects. This page provides a quick tutorial on getting set up with RLang and writing an RLang program.

The RLang pipeline

Installing RLang

RLang is not on PyPi yet. The repo is on github.

Check the readme for the latest installation instructions, which might be the following:

$ brew install swig (on mac)
$ python -m pip install rlang/rlang/dist/rlang-0.2.1-py3-none-any.whl

Full Example

In this example, we’ll take a look at how RLang can be used to provide domain knowledge about a gridworld environment that can be used to speed up the learning of an RL agent. This example is pulled directly from the RLang package:

examples/gridworld/
    main.py           \\ Python code for running the project
    gridworld.rlang   \\ RLang program containing world information
    vocab.json        \\ Holds metadata and can reference additional groundings

The project files are included below:

main.py

from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.tasks import GridWorldMDP
from simple_rl.agents import QLearningAgent

import context
import rlang
from rlang.agents import RLangQLearningAgent


def create_mdp():
    # MDP parameters
    width, height = 6, 6
    lava_locs = [(3, 2), (1, 4), (2, 4), (2, 5)]
    walls = [(3, 1)]
    goal_locs = [(5, 1)]

    mdp = GridWorldMDP(width, height, walls=walls, lava_locs=lava_locs, goal_locs=goal_locs, slip_prob=0.0,
                       step_cost=0)
    states = list()
    for w in range(mdp.width):
        for h in range(mdp.height):
            states.append((w, h))

    return mdp, states


def simple_experiment():
    mdp, states = create_mdp()
    agent = QLearningAgent(mdp.get_actions())
    run_agents_on_mdp([agent], mdp)


def rlang_experiment():
    # We need to know these MDP and Q Learning parameters
    mdp, states = create_mdp()

    # Parse RLang program into knowledge object
    knowledge = rlang.parse_file("gridworld.rlang")

    # Create a baseline Q-Learning agent
    agent = QLearningAgent(mdp.get_actions())

    # Create RLang Q-Learning agent
    rlang_agent = RLangQLearningAgent(
        actions=mdp.get_actions(), states=states, knowledge=knowledge)

    # Compare performance of agents on mdp
    run_agents_on_mdp([agent, rlang_agent], mdp)


if __name__ == '__main__':
    rlang_experiment()

gridworld.rlang

# To start, we imported a vocabulary file contains metadata on the MDP
# and references to additional RLang groundings.

import "vocab.json"

# The position of the agent as well as its x and y coordinates are defined
# as Factors, each representing a portion of the state space.

Factor position := S[0, 1]
Factor x := position[0]
Factor y := position[1]

Constant lava_locs := [[3, 2], [1, 4], [2, 4], [2, 5]]

Proposition reached_goal := x == 5 and y == 1 
Proposition reached_wall := x == 3 and y == 1
Proposition in_lava := position in lava_locs

# The following Actions correspond to four 
# discrete actions the agent can take:
#   0 - move up
#   1 - move down
#   2 - move left
#   3 - move right

Action up := 0
Action down := 1
Action left := 2
Action right := 3

# The predicted consequence of each of these actions is 
# specified using an Effect: the position of the agent in the next state 
# and the next state itself will update accordingly.

Effect action_effect:
    if A == up:
        position' -> position + [0, 1]
        S' -> S + [0, 1]
    elif A == down:
        position' -> position + [0, -1]
        S' -> S + [0, -1]
    elif A == left:
        position' -> position + [-1, 0]
        S' -> S + [-1, 0]
    elif A == right:
        position' -> position + [1, 0]
        S' -> S + [1, 0]


# A reward of 1 is given for reaching the goal coordinates,
# and -1 for moving on lava. 
# The following Effect references the previously defined Propositions.

Effect main:
    if in_lava:
        Reward -1
    if reached_goal:
        Reward 1
    if reached_wall:
        S' -> S     # state remains the same.
    else:
        -> action_effect

vocab.json

{
  "domain": "gridworld",
  "mdp_metadata": {
    "state_space": {
      "size": 2,
      "dtype": "int"
    },
    "action_space": {
      "shape": 1,
      "dtype": "int"
    }
  }
}

*: For a full list of groundings, see RLang Language Reference.