Working with RLang
There are many ways RLang can be integrated into a project. The simplest involves using a pre-existing RLang-enabled
reinforcement learning agent that utilizes RLang information in a predefined manner. In this scenario, it is up to the
user to provide a written RLang program which can be parsed using parse()
or parse_file()
if the
program resides in a .rlang
file.
It is also possible to write your own RLang-enabled agent that can utilize the information stored in an RLang program
(accessible via the RLangKnowledge
class). This route requires a more thorough understanding of the RLang
Python groundings
module.
Contents
Using a Pre-built RLang Agent
An RLang project using a pre-built agent might have the following directory structure:
gridworld/
main.py \\ Python code for running the project
gridworld.rlang \\ RLang program containing world information
vocab.json \\ An optional (but useful) file that holds metadata and can reference additional groundings
groundings.py \\ An optional file that can store RLang grounding defined in Python
where main.py
could be as simple as:
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.agents import QLearningAgent
import rlang
from rlang.agents import RLangQLearningAgent
mdp, states = create_mdp()
agent = QLearningAgent(mdp.get_actions()) # Create a baseline Q-Learning agent
knowledge = rlang.parse_file("gridworld.rlang") # Parse RLang program into knowledge object
rlang_agent = RLangQLearningAgent(mdp.get_actions(), states, knowledge) # Create RLang Q-Learning agent
run_agents_on_mdp([agent, rlang_agent], mdp) # Compare performance of agents on mdp
and gridworld.rlang
could look like this:
import "vocab.json"
Constant lava_locs := [[3, 2], [1, 4], [2, 4], [2, 5]]
Factor position := S[0, 1]
Factor x := position[0]
Factor y := position[1]
Proposition reached_goal := x == 5 and y == 1
Proposition reached_wall := x == 3 and y == 1
Proposition in_lava := position in lava_locs
Effect main:
if in_lava:
Reward -1
if reached_goal:
Reward 1
if reached_wall:
S' -> S
For help on how to write an RLang program, see RLang Language Reference.
Using a Vocabulary File
While optional, vocabulary files allow for extremely powerful functionality. A minimal vocabulary.json
file might
contain metadata on the MDP that’s being interfaced with like the size of the state and action space:
{
"domain": "gridworld",
"mdp_metadata": {
"state_space": {
"size": 2,
"dtype": "int"
},
"action_space": {
"shape": 1,
"dtype": "int"
}
}
}
A more powerful vocabulary file can be used to reference additional RLang groundings declared inside an auxiliary
Python file. This vocabulary file includes two feature
groundings (angle_target
and hover_target
)
from an auxiliary module called grounding.py
:
Example vocab.json
with additional groundings
{
"domain": "lunarlander",
"mdp_metadata": {
"state_space": {
"size": 8,
"dtype": "float"
},
"action_space": {
"shape": 1,
"dtype": "int"
}
},
"modules": [
{
"module_name": "grounding",
"file_name": "examples/lunar_lander/grounding.py"
}
],
"vocabulary": {
"features": [
{
"name": "angle_target",
"grounding": "grounding.angle_target"
},
{
"name": "hover_target",
"grounding": "grounding.hover_target"
}
]
}
}
An accompanying grounding file grounding.py
from rlang.grounding import Feature
def _angle_target(state):
position = state[0:2]
velocity = state[2:4]
angle_targ = position[0] * 0.5 + velocity[0] * 1.0 # angle should point towards center
if angle_targ > 0.4:
angle_targ = 0.4 # more than 0.4 radians (22 degrees) is bad
if angle_targ < -0.4:
angle_targ = -0.4
return angle_targ
def _hover_target(state):
position = state[0:2]
hover_targ = 0.55 * abs(
position[0]
)
return hover_targ
angle_target = Feature(_angle_target)
hover_target = Feature(_hover_target)
angle_target
and hover_target
can now be referenced in an RLang program like any other grounding.
Note
It is possible to use more than one Python module to supply additional groundings
Creating a Custom RLang Agent using RLangKnowledge
To get the most out of RLang, users should implement their own RLang-enabled reinforcement learning agents. Doing so
requires becoming familiar with RLang’s groundings
module and most importantly the
RLangKnowledge
object, which holds all RLang groundings parsed from an RLang program.
RLangQLearningAgent
is a good example on how to integrate RLang knowledge into an RL agent:
RLangQLearningAgentClass.py
1from collections import defaultdict
2
3from simple_rl.agents import QLearningAgent
4
5from ..grounding.utils.primitives import VectorState
6
7
8class RLangQLearningAgent(QLearningAgent):
9 """Implementation for a Q Learning agent that utilizes RLang hints"""
10
11 def __init__(self, actions, states, knowledge, name="RLang-Q-learning", use_transition=False, alpha=0.1, gamma=0.99,
12 epsilon=0.1, explore="uniform", anneal=False, default_q=0):
13 """
14 Args:
15 actions (list): Contains strings denoting the actions.
16 states (list): A list of all possible states.
17 knowledge (list): An RLangKnowledge object.
18 name (str): Denotes the name of the agent.
19 alpha (float): Learning rate.
20 gamma (float): Discount factor.
21 epsilon (float): Exploration term.
22 explore (str): One of {softmax, uniform}. Denotes explore policy.
23 default_q (float): the default value to initialize every entry in the q-table with [by default, set to 0.0]
24 """
25
26 def weighted_reward(r_func, state_dict):
27 reward = 0
28 for k, v in state_dict.items():
29 reward += r_func(state=VectorState(k)) * v
30 return reward
31
32 def weighted_value(q_func, state_dict):
33 reward = 0
34 for k, v in state_dict.items():
35 maxx = q_func[k][actions[0]]
36 for a in actions:
37 val = q_func[k][a]
38 if val > maxx:
39 maxx = val
40 reward += maxx * v
41 return reward
42
43 q_func = defaultdict(lambda: defaultdict(lambda: default_q))
44 reward_function = knowledge.reward_function
45
46 if reward_function:
47 for s in states:
48 for i in range(len(actions)):
49 a = actions[i]
50 q_func[s][a] = reward_function(state=VectorState(s), action=i)
51
52 transition_function = knowledge.transition_function
53
54 if use_transition and transition_function and reward_function:
55 for s in states:
56 for i in range(len(actions)):
57 a = actions[i]
58 s_primei = transition_function(state=VectorState(s), action=i)
59 if s_primei:
60 # Q learning Update
61 r_prime = weighted_reward(reward_function, s_primei)
62 v_s_prime = weighted_value(q_func, s_primei)
63 q_func[s][a] += alpha * (r_prime + gamma * v_s_prime)
64
65 super().__init__(actions, name=name, alpha=alpha, gamma=gamma,
66 epsilon=epsilon, explore=explore, anneal=anneal, custom_q_init=q_func, default_q=default_q)
This section should perhaps discuss the knowledge object in more detail and provide examples.