RLang Language Reference
This page covers the core syntax and semantics of the RLang language.
Contents
Structure of an RLang Program
An RLang program has the following structure:
program ::= import* declaration*
where each import statement imports a local vocabulary file (e.g. import "vocab.json"
) and each declaration is the
instantiation of an RLang grounding:
declaration ::= constant NL+ | constant NL+ | action NL+ | factor NL+ | proposition NL+ | goal NL+ | feature NL+ | markov_feature NL+ | object_def NL+ | class_def | option | policy | effect
RLang Grounding |
Domain |
Codomain |
Package Documentation |
---|---|---|---|
\(\emptyset\) |
\(\mathbb{R}^n\), list of \(\mathbb{R}^n\) |
||
\(\emptyset\) |
\(\mathcal{A}\) |
||
\(\mathcal{S}\) |
\(\mathbb{R}^n\) |
||
\(\mathcal{S}\) |
\(\{\top, \bot\}\) |
||
\(\mathcal{S}\) |
\(\{\top, \bot\}\) |
||
\(\mathcal{S}\) |
\(\mathbb{R}^n\) |
||
\(\mathcal{S}\times\mathcal{A}\times\mathcal{S}\) |
\(\mathbb{R}^n\) |
||
\(\mathcal{S}\) |
\(O\) |
||
\(\mathcal{S}\) |
\(\mathcal{A}\) |
||
\(\mathcal{S}\) |
\(\mathcal{A}\) |
||
\(\mathcal{S}\times\mathcal{A}\times\mathcal{S}\) |
\(\{\mathcal{S}, \top, \bot, \mathbb{R}^n, R\}\) * |
|
- *
\(\top, \bot, \mathbb{R}^n\) refer to the potential value of an RLang grounding on the next state. \(R\) refers to a reward.
Syntax of RLang Groundings
Every RLang grounding is a function with a domain in \(\mathcal{S}\times\mathcal{A}\times\mathcal{S}\)
and a co-domain in \(\mathcal{S}, \mathcal{A}, \mathbb{R}^n\) where \(n\in \mathbb{N}\), or
\(\{\top, \bot\}\), depending on the grounding’s type. Each grounding declared in an RLang program grounds to one
or more Python RLang objects which are in the groundings
module and are accessible to the user after
parsing using the RLangKnowledge
class.
Note
Every RLang grounding declared in an program is static. Groundings cannot be re-bound.
Constants
constant ::= "Constant" IDENTIFIER ":=" (arithmetic_exp | boolean_exp)
Constants can be defined and used later in other RLang groundings.
Constant lava_positions := [[0, 1], [5, 2]]
Constant step_cost := -0.1
Constants ground to ConstantGrounding
.
Actions
action ::= "Action" IDENTIFIER ":=" (any_number | any_num_array_exp)
Actions can be defined for reference in Policies and Options.
Action up := [0, 1]
Actions ground to ActionReference
.
Factors
factor ::= "Factor" IDENTIFIER ":=" any_bound_var
Factors are used to reference independent state variables.
They represent portions of the state space and can be defined using Python’s slicing syntax [start?:end?]
on the
current state variable S
:
Factor x_position := S[0]
Factor y_position := S[1]
Factor inventory := S[2:]
Factors ground to Factor
.
Features
feature ::= "Feature" IDENTIFIER ":=" arithmetic_exp
Features are used to define more complex functions of state. They can be defined using arithmetic operations (+, -, \(*\), /), numeric literals, function compositions.
Feature distance_to_gold := abs([0,4] - position)
Features ground to Feature
.
Propositions
proposition ::= "Proposition" IDENTIFIER ":=" boolean_exp
Propositions are functions of the form \(\mathcal{S} \rightarrow \{\top, \bot\}\), generating a boolean value.
They can be defined using logical operators (and
, or
, not
) and order relations of the real numbers
(<, <= , >, >=, =, !=)
Proposition at_workbench := position in workbench_locations
Proposition have_bridge_material := iron >= 1 and wood >= 1
Propositions ground to Proposition
.
Goals
goal ::= "Goal" IDENTIFIER ":=" boolean_exp
Goals are used to specify goal states given by a proposition.
Goal get_gold := gold >= 1
Goals ground to Goal
.
Markov Features
markov_feature ::= "MarkovFeature" IDENTIFIER ":=" arithmetic_exp
Markov Features allow users to compute features on an (\(s,a,s'\)) experience tuple and can be then used to define partial specification of functions related to the task, such as action-value functions and transition functions.
The prime operator ('
) can be used to reference the value of an RLang grounding on the next state.
MarkovFeature inventory_change := inventory' - inventory
MarkovFeatures ground to MarkovFeature
.
Objects
object_def ::= "Object" IDENTIFIER ":=" object_instantiation object_instantiation ::= any_bound_class "(" object_constructor_arg_list ")"
Users can instantiate abstract objects which can properties that are functions of an (\(s,a,s'\)) experience tuple. Object classes can come from a grounding or be defined within an RLang file using Classes. Object properties can be referenced within RLang using dot syntax.
Class Color:
red: int
green: int
blue: int
Object color_from_state := Color(S[0], S[1], S[2])
Proposition is_red := color_from_state.red == 256
Objects ground to MDPObjectGrounding
.
Classes
class_def ::= "Class" IDENTIFIER ("(" any_bound_class ")")? ":" INDENT attrs DEDENT; attrs ::= (definitions+=attribute_definition NL *)+; attr ::= IDENTIFIER ":" type_def; simple_type ::= INT | FLOAT | STR | BOOL | any_bound_class;
Users can instantiate classes for abstract objects. A class definition specifies attributes and their types.
Class Color:
red: int
green: int
blue: int
Object red := Color(256, 0, 0)
You can also inherit classes defined in RLang or even from a grounding file:
Class ColorAlpha(Color):
alpha: int
Object semi_red := ColorAlpha(256, 0, 0, 128)
Important
Object attributes are only very loosely typed.
Important
Strings are not yet supported.
Classes ground to subclasses of MDPObject
.
Policies
policy ::= "Policy" IDENTIFIER ":" INDENT policy_statement NL* DEDENT
Policies prompt the agent to perform an action/subpolicy in a given situation.
The keyword Execute
is used to perform an action or call another policy. Policies can be specified in RLang using
conditional expressions using the keywords if
, elif
, and else
.
The following policy instructs the agent to craft iron tools at a workbench by first collecting iron and then navigating to the workbench.
Policy main:
if iron >= 2:
if at_workbench:
Execute Use # Use is an action
else:
Execute go_to_workbench # go_to_workbench is a policy
else:
Execute collect_iron
Note
Naming a policy main
recognizes it as the main policy, which accessed from a RLangKnowledge
object with knowledge.policy
. There can only be one main policy.
Policies can be made probabilistic using with P(float)
:
Policy random_move:
with P(0.5):
Execute up
or with P(0.5):
Execute down
Policy random_move_syntax_sugar:
Execute up with P(0.5)
or Execute down with P(0.5)
Policies ground to Policy
.
Options
option ::= "Option" IDENTIFIER ":" INDENT "init" option_condition INDENT policy_statement NL* DEDENT "until" option_condition NL* DEDENT
Temporally-extended abstract actions can be specified using Options, which include initiation and termination
propositions. Initiation propositions are denoted using the keyword init
, and termination propositions are denoted
using until
:
Option build_bridge:
init have_bridge_material and at_workbench
Execute craft_bridge
until bridge in inventory
Note
Any
can also be specified in place of both the init
and until
propositions and functions the same as True
.
Options ground to Option
.
Action Restrictions
Action Restrictions are used to specify constraints on the set of possible actions an agent can take in a given circumstance.
The keyword Restrict
removes an action from consideration in the given situation, meaning that the action will have
probability zero even after learning.
ActionRestriction dont_get_burned:
if (position + [0, 1]) in lava_locations:
Restrict up
Effects
effect ::= "Effect" IDENTIFIER ":" INDENT effect_statement* DEDENT effect_statement ::= reward | prediction | effect_reference
Effects provide an interface for specifying partial information about the transition and reward functions, allowing users to denote the consequences of an action when performed in a given state.
The following effect captures the predicted consequence of moving left on the x_position
factor,
stating that the x_position
of the agent in the next state will be less than in the current state.
This Effect also specifies a -0.1 step penalty regardless of the current state or action.
Effect movement_effect:
if x_position >= 1 and A == left:
x_position' -> x_position - 1
Reward -0.1
When using a factored MDP, Effects can also be used to specify factored transition functions, i.e. transition functions for individual factors, which we call predictions:
Here is a prediction made about the full transition function:
Effect tic_tac_toe:
if three_in_a_row:
S' -> empty_board # Board is reset
Effects ground to Effect
, which holds a TransitionFunction
, a RewardFunction
,
and a list of Prediction
objects.
Expressions and Keywords
RLang provides support for the following expression types.
Important
Restrictions on the kinds of bound variables (i.e. S
or health
) usable in the following expressions depends on the domains of the groundings they are used in. E.g. Factors can’t contain A
or S'
because they have domain \(\mathcal{S}\).
Arithmetic Expressions
Arithmetic expressions are the most common expression used in defining RLang groundings.
arithmetic_exp ::= L_PAR arithmetic_exp R_PAR | arithmetic_exp (TIMES | DIVIDE) arithmetic_exp | arithmetic_exp (PLUS | MINUS) arithmetic_exp | any_number | any_array | any_bound_var
The following arithmetic expression could appear inside a Feature:
(2 * health) - 1 + S[0]
Boolean Expressions
Boolean expressions are also commonly used in Propositions, Goals, Effects, Options, and Policies.
boolean_exp ::= L_PAR boolean_exp R_PAR | boolean_exp AND boolean_exp | boolean_exp OR boolean_exp | NOT boolean_exp | arithmetic_exp IN arithmetic_exp | boolean_exp (EQ_TO | NOT_EQ) boolean_exp | arithmetic_exp (EQ_TO | LT | GT | LT_EQ | GT_EQ | NOT_EQ) arithmetic_exp | any_bound_var | (TRUE | FALSE)
Examples of boolean expressions:
True
True and not (at_workbench)
health * 2 == 6
Conditional Expressions
The statements usable in a conditional expression differ between Policies and Effects.
conditional_exp ::= IF boolean_exp COL INDENT statement NL* DEDENT (ELIF boolean_exp COL INDENT statement NL* DEDENT)* (ELSE COL INDENT statement NL* DEDENT)?;
Some examples of conditional expressions:
if S[0] == 1 and y_pos == 0:
Execute stay
elif y_pos < 0:
Execute up
else:
Execute down
Probabilistic Expressions
Probabilistic expressions can be used inside Policies, Options, and Effects.
prob_statement ::= prob_condition ":" INDENT statement NL* DEDENT | statement prob_condition NL+ prob_condition ::= "with P(" (any_number | integer_fraction) ")"
Some examples:
Effect probabilistic_reward:
with P(0.2):
Reward 10
or Reward 1 with P(0.8)
Policy up_or_down:
Execute up with P(1/2)
or Execute down with P(1/2)
Special Variables
S
, A
, S'
are reserved keywords referring to the current state, the current action, and the next state, respectively.
Depending on the type an RLang object, one or more of these keywords can be referenced in the definition of the object.
S # Current state - Used in Factors and Features
A # Current action - Used in Effects
S' # Next state - Used most often in MarkovFeatures