# learning_grounded_action_abstractions_from_language__e5894b5f.pdf Published as a conference paper at ICLR 2024 LEARNING ADAPTIVE PLANNING REPRESENTATIONS WITH NATURAL LANGUAGE GUIDANCE Lionel Wong1 Jiayuan Mao1* Pratyusha Sharma1* Zachary S. Siegel2 Jiahai Feng3 Noa Korneev4 Joshua B. Tenenbaum1 Jacob Andreas1 1MIT 2Princeton University 3UC Berkeley 4Microsoft Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decisionmaking, offering more accurate plans and better generalization to complex tasks. 1 INTRODUCTION People make complex plans over long timescales, flexibly adapting what we know about the world in general to govern how we act in specific situations. To make breakfast in the morning, we might convert a broad knowledge of cooking and kitchens into tens of fine-grained motor actions in order to find, crack, and fry a specific egg; to achieve a complex research objective, we might plan a routine over days or weeks that begins with the low-level actions necessary to ride the subway to work. The problem of adapting general world knowledge to support flexible long-term planning is one of the unifying challenges of AI. While decades of research have developed representations and algorithms for solving restricted and shorter-term planning problems, generalized and long-horizon planning remains a core, outstanding challenge for essentially all AI paradigms, including classical planning (Erol et al., 1994), reinforcement learning (Sutton et al., 1999), and modern generative AI (Wang et al., 2023a). How do humans solve this computational challenge? A growing body of work in cognitive science suggests that people come up with hierarchical, problem-specific representations of their actions and environment to suit their goals, tailoring how they represent, remember, and reason about the world to plan efficiently for a particular set of tasks (e.g., Ho et al., 2022). In AI, a large body of work has studied hierarchical planning using domain-specific temporal abstractions progressively decomposing high-level goals into sequences abstract actions that eventually bottom out in low-level control. An extensive body of work has explored how to plan using these hierarchical action spaces, including robotic task-and-motion planning (TAMP) systems (Garrett et al., 2021) and hierarchical RL frameworks (Sutton et al., 1999). However, identifying a set of abstract actions that are relevant and useful for achieving any given set of goals remains the central bottleneck in general. Intuitively, useful high-level actions must satisfy many different criteria: they should enable time-efficient high-level planning, correspond feasible lowlevel action sequences, and compose and generalize to new tasks. Despite efforts to learn high-level actions automatically in both classical planning (Nejati et al., 2006) and RL formulations (Dietterich, 2000), most state-of-the-art robotics and planning systems rely on human expertise to hand-engineer new planning representations for each new domain (Ahn et al., 2022). Asterisk indicates equal contribution. Correspondence to zyzzyva@mit.edu. Code for this paper will be released at: https://github.com/Catherine Wong/llm-operators Published as a conference paper at ICLR 2024 (a) To solve grounded planning tasks Mini Minecraft ALFRED Household Bring a hot egg to the table. Place chilled wine in the cabinet. Put away sliced bread into the fridge. Place a cold potato slice in the oven. Craft a wood plank. Craft a bed. (b) ...we learn a library of grounded actions with a hierarchical planning framework (:action chill-object-1 :parameters (?r ?o ?l) :precondition (and (receptacle Type ?r Fridge Type) (receptacle At Location ?r ?l) ... (holds ?o)) :effect (and (is Hot ?o))) (i) Propose symbolic action abstractions (:action chill-object-2 ...) (:action heat-object-1 ...) (ii) Ground with bi-level planning (pick-up wine) (chill-object-1 wine fridge) ...(more steps) (put wine cabinet) (iii) Verified grounded action library (:action chill-object-1 ...) chill-object-1 (:action slice-object-1 ...) (:action slice-object-2 ...) . . . more grounded actions . . . Apply Pick Place Low-level Policy Search Figure 1: We solve complex planning tasks specified in language and grounded in interactive environments by jointly learning a library of symbolic high-level action abstractions and modular low-level controllers associated with each abstraction. Our system leverages background information in language as a prior to propose useful action abstractions, then uses a hierarchical planning framework to verify and ground them. In this paper, we introduce Action Domain Acquisition (Ada), a framework for using background knowledge from language (conveyed via language models) as an initial source of task-relevant domain knowledge. Ada uses language models (LMs) in an interactive planning loop to assemble a library of composable, hierarchical actions tailored to a given environment and task space. Each action consists of two components: (1) a high-level abstraction represented as a symbolic planning operator (Fikes & Nilsson, 1971) that specifies preconditions and action effects as sets of predicates; and (2) a low-level controller that can achieve the action s effects by predicting a sequence of low-level actions with a neural network or local search procedure. We study planning in a multitask reinforcement learning framework, in which agents interact with their environments to must solve collections of tasks of varying complexity. Through interaction, Ada incrementally builds a library of actions, ensuring at each step that learned high-level actions compose to produce valid abstract plans and realizable low-level trajectories. We evaluate Ada (Fig. 1) on two benchmarks, Mini Minecraft and ALFRED (Shridhar et al., 2020). We compare this approach against three baselines that leverage LMs for sequential decision-making in other ways: to parse linguistic goals into formal specifications that are solved directly by a planner (as in Liu et al. (2023)), to directly predict sequences of high-level subgoals (as in Ahn et al. (2022)), and to predict libraries of actions defined in general imperative code (as in Wang et al. (2023a)). In both domains, we show that Ada learns action abstractions that allow it to solve dramatically more tasks on each benchmark than these baselines, and that these abstractions compose to enable efficient and accurate planning in complex, unseen tasks. 2 PROBLEM FORMULATION We assume access to an environment X, U, T , where X is the (raw) state space, U is the (low-level) action space (e.g., robot commands), and T is a deterministic transition function T : X U X. We also have a set of features (or predicates ) P that define an abstract state space S: each abstract state s S is composed of a set of objects and their features. For example, a simple scene that contains bread on a table could be encoded as an abstract state with two objects A and B, and atoms {bread(A), table(B), on(A, B)}. We assume the mapping from environmental states to abstract states Φ : X S is given and fixed (though see Migimatsu & Bohg, 2022 for how it might be learned). In addition to the environment, we have a collection of tasks t. Each t is described by a natural language instruction ℓt, corresponding to a goal predicate (which is not directly observed). In this paper, we assume that predicates may be defined in terms of abstract states, i.e., gt : S {T, F}. Our goal is to build an agent that, given the initial state x0 X and the natural language instruction ℓt, can generate a sequence of low-level actions {u1, u2, , u H} UH such that gt(Φ(x H)) is true (where x H is the terminal state of sequentially applying {ui} on x0). The agent receives reward signal only upon achieving the goal specified by gt. Given a very large number of interactions, a sufficiently expressive reflex policy could, in principle, learn a policy that maps from low-level states to low-level actions conditioned on the language instruction π(u | x; ℓt). However, for very long horizons H and large state spaces (e.g., composed of many objects and compositional goals), such algorithms can be highly inefficient or effectively infeasible. The key idea behind our approach is to use natural language descriptions ℓt to bootstrap a high-level action space A over the abstract state space S to accelerate learning and planning. Published as a conference paper at ICLR 2024 Bring a hot egg. Initial state: 𝑥! Instruction: ℓ" (c) State and Action Abstraction Predicates (𝓟): Operator (𝓐): (:action Heat Object :parameters (?l ?r ?o) :precondition (and (receptacle Type ?r Microwave Type) (holds ?a ?o) ...) :effect (and (is Hot ?o)) ...) ... (more operators) (a) Task Input Controllers: 𝜋𝑢 𝑥 ; 𝑎): 𝒳 𝒜 𝒰 Abstract State High-Level Planner Abstract Goal Abstract Plan {𝒂𝒊} (pick egg#1, ...) (move To microwave) ... (more actions) Low-Level Plan {𝒖𝒊} Move Pick Move ...... (b) Bi-Level Planning and Execution receptacle Type(...) object Type(...), is Hot(...), ... Figure 2: Representation for our (a) task input, (b) the bi-level planning and execution pipeline for inference time, and (c) the abstract state and action representation. Candidate Ops. for iteration 𝒊= 1, 2, 3, ... ℓ%} 𝒜" {𝑎"} Sym. Planner Symbolic Goal Abstract State High-Level Action Low-Level Action {𝑢"} Controller Execute and Score Operators 𝒜" Figure 3: The overall framework. Given task environment states and descriptions, at each iteration, we first propose candidate abstract actions (operators) A i, then uses bi-level planning and execution to solve tasks. We add operators to the operator library based on the execution result. Formally, our approach learns a library of high-level actions (operators) A. As illustrated in Fig. 2b, each a A is a tuple of name, args, pre, eff, controller . name is the name of the action, args is a list of variables, usually denoted by ?x, ?y, etc., pre is a precondition formula based on the variables args and the features P, and eff is the effect, which is also defined in terms of args and P. Finally, controller : X U is a low-level policy associated with the action. The semantics of the preconditions and effects is: for any state x such that pre(Φ(x)), executing controller starting in x (for an indefinite number of steps) will yield a state x such that eff(Φ(x )) (Lifschitz, 1986). In this framework, A defines a partial, abstract world model of the underlying state space. As shown in Fig. 2b, given the set of high-level actions and a parse of the instruction ℓt into a first-order logic formula, we can leverage symbolic planners (e.g., Helmert, 2006) to first compute a high-level plan {a1, , a K} AK that achieves the goal ℓt symbolically, and then refine the high-level plan into a low-level plan with the action controllers. This bi-level planning approach decomposes long-horizon planning problems into several short-horizon problems. Furthermore, it can also leverage the compositionality of high-level actions A to generalize to longer plans. 3 ACTION ABSTRACTIONS FROM LANGUAGE As illustrated in Fig. 3, our framework, Action Domain Acquisition (Ada) learns action abstractions iteratively as it attempts to solve tasks. Our algorithm is given a dataset of tasks and their corresponding language descriptions, the feature set P, and optionally an initial set of high-level action operators A0. At each iteration i, we first use a large language model (LLM) to propose a set of novel high-level action definitions A i based on the features P and the language goals {ℓt} (Section 3.1). Next, we use a LLM to also translate each language instruction ℓt into a symbolic goal description Ft, and use a bi-level planner to compute a low-level plan to accomplish ℓt (Section 3.2). Then, based on the planning and execution results, we score each operator in Ai and add ones to the verified library if they have yielded successful execution results (Section 3.4). To accelerate low-level planning, we simultaneously learn local subgoal-conditioned policies (i.e., the controllers for each operator; Section 3.3). Algorithm 1 summarizes the overall framework. A core goal of our approach is to adapt the initial action abstractions proposed from an LLM prior into a set of useful operators A that permit efficient and accurate planning on a dataset of tasks and ideally, that generalize to future tasks. While language provides a key initial prior, our formulation refines and verifies the operator library to adapt to a given planning procedure and environment (similar to other action-learning formulations like Silver et al., 2021). Our formulation ensures not only that the learned operators respect the dynamics of the environment, but also fit their grain of abstraction according to the capacity of the controller, trading off between fast high-level planning and efficient low-level control conditioned on each abstraction. Published as a conference paper at ICLR 2024 Algorithm 1 Action Abstraction Learning from Language Input: Dataset of tasks and their language descriptions {ℓt} Input: Predicate set P Input: Optionally, an initial set of abstract operators A0, or A0 = 1: Initialize subgoal-conditioned policy πθ. 2: for i = 1, 2, , M do 3: Ai Ai 1 Propose Operator Definitions(P, {ℓt}) Section 3.1 4: for each unsolved task j: (x(j) 0 , ℓ(j) t ) do 5: u Bi Level Plan(Ai, ℓ(j) t , π) Section 3.2 6: result(j) Execute(x(j) 0 , u) Execute the plan 7: θ Update Subgoal Policy(θ, result) Section 3.3 8: Ai Score And Filter(Ai, result) Section 3.4 return AM (:action heat-object :parameters (?l ?r ?o) :precondition (and (receptacle Type ?r Microwave Type) (at Location ?l) (receptacle At Location ?r ?l) (holds ?o)) :effect (and (is Hot ?o))) (a) Stage 1: Propose Symbolic Task Decomposition ;; Examples ;; Bake a bread and bring it to the table. (pick-up bread) (place bread oven) (bake bread) ... ;; Sauté some cabbage. ...... (more language to symbolic plans) (pick-up egg) ...... (more steps) (place pot stove) (heat-object kitchen microwave egg) ...... (more steps) LLM Generation (b) Stage 2: Propose Symbolic Operator Definitions ;; Examples (:action saute ... :precondition (and (receptacle Type ?r Pan) ... :effect ...) (:action bake :precondition (and (receptacle Type ?r Oven) ... :effect ...) ...... (more operator examples) heat-object ? ? ? Extract Undefined Operator Names Bring a hot egg to the table. Objects: egg, stove, ...... State: on(egg, table), cold(egg), open(microwave), ...... Figure 4: Our two-stage prompting method for generating candidate operator definitions. (a) Given a task instruction, we first prompt an LLM to generate a candidate symbolic task decomposition. (b) We then extract undefined operator names that appear in the sequences and prompt an LLM to generate symbolic definitions. 3.1 OPERATOR PROPOSAL: Ai Ai 1 Propose Operator Definitions(P, {ℓt}) At each iteration i, we use a pretrained LLM to extend the previous operator library Ai 1 with a large set of candidate operator definitions proposed by the LLM based on the task language descriptions and environment features P. This yields an extended candidate library A i where each a A i = name, args, pre, eff where name is a human-readable action name and args, pre, eff are a PDDL operator definition. We employ a two-stage prompting strategy: symbolic task decomposition followed by symbolic operator definition. Example. Fig. 4 shows a concrete example. Given a task instruction (Bring a hot egg to the table) and the abstract state description, we first prompt the LLM to generate an abstract task decomposition, which may contain operator names that are undefined in the current operator library. Next, we extract the names of those undefined operators and prompt LLMs to generate the actual symbolic operator descriptions, in this case, the new heat-object operator. Symbolic task decomposition. For a given task ℓt and a initial state x0, we first translate the raw state x0 into a symbolic description Φ(x0). To constrain the length of the state description, we only include unary features in the abstract state (i.e., only object categories and properties). Subsequently, we present a few-shot prompt to the LLM and query it to generate a proposed task decomposition conditioned on the language description ℓt. It generates a sequence of named high-level actions and their arguments, which explicitly can include high-level actions that are not yet defined in the current action library. We then extract all the operator names proposed across tasks as the candidate high-level operators. Note that while in principle we might use the LLM-proposed task decomposition itself as a high-level plan, we find empirically that this is less accurate and efficient than a formal planner. Symbolic operator definition. With the proposed operator names and their usage examples (i.e., the actions and their arguments in the proposed plans), we then few-shot prompt the LLM to generate candidate operator definitions in the PDDL format (argument types, and pre/postconditions defined based on features in P). We also post-process the generated operator definitions to remove feature Published as a conference paper at ICLR 2024 names not present in P and correct syntactic errors. We describe implementation details for our syntax correction strategy in the appendix. 3.2 GOAL PROPOSAL AND PLANNING: result(j) Execute(x(j) 0 , Bi Level Plan(Ai, ℓ(j) t , π)) At each iteration i, we then attempt to Bi Level Plan for unsolved tasks in the dataset. This step attempts to find and execute a low-level action sequence {u1, u2, , u H} UH for each task using the proposed operators in A i that satisfies the unknown goal predicate gt for each task. This provides the environment reward signal for action learning. Our Bi Level Plan has three steps. Symbolic goal proposal: As defined in Sec. 2, each task is associated with a queryable but unknown goal predicate gt that can be represented as a first-order logic formula ft over symbolic features in P. Our agent only has access to a linguistic task description ℓt, so we use a few-shot prompted LLM to predict candidate goal formulas F t conditioned on ℓt and features P. High-level planning: Given each candidate goal formula f t F t, the initial abstract problem state s0, and the current candidate operator library A , we search for a high-level plan PA = {(a1, o1i...), , (a K, o Ki...)} as a sequence of high-level actions from A concretized with object arguments o, such that executing the action sequence would satisfy f t according to the operator definitions. This is a standard symbolic PDDL planning formulation; we use an off-the-shelf symbolic planner, Fast Downward (Helmert, 2006) to find high-level plans. Low-level planning and environment feedback: We then search for a low-level plan as a sequence of low-level actions {u1, u2, , u H} UH, conditioned on the high-level plan structure. Each concretized action tuple (ai, o1i...) PA defines a local subgoal sgi, as the operator postcondition parameterized by the object arguments o. For each (ai, o1i...) PA, we therefore search for a sequence of low-level actions ui1, ui2... that satisfies the local subgoal sgi. We search with a fixed budget per subgoal, and fail early if we are unable to satisfy the local subgoal sgi. If we successfully find a complete sequence of low-level actions satisfying all local subgoals sgi in PA, we execute all low-level actions and query the hidden goal predicate gt to determine environment reward. We implement a basic learning procedure to simultaneously learn subgoal-conditioned controllers over time (described in Section 3.3), but our formulation is general and supports many hierarchical planning schemes (such as sampling-based low-level planners (La Valle, 1998) or RL algorithms). 3.3 LOW-LEVEL LEARNING AND GUIDED SEARCH: θ Update Subgoal Policy(θ, result) The sequence of subgoals sgi corresponding to high-level plans PA already restricts the local lowlevel planning horizon. However, we further learn subgoal-conditioned low-level policies π(u|x; sg) from environment feedback during training to accelerate low-level planning. To exploit shared structure across subgoals, we learn a shared controller for all operators from x X and conjunctions of predicates in sg. To maximize learning during training, we use a hindsight goal relabeling scheme (Andrychowicz et al., 2017), supervising on all conjunctions of predicates in the state as we roll out low-level search. While the shared controller could be learned as a supervised neural policy, we find that our learned operators sufficiently restrict the search to permit learning an even simpler count-based model from X, sg u U. We provide additional details in the Appendix. 3.4 SCORING LLM OPERATOR PROPOSALS: Ai Score And Filter(Ai, result) Finally, we update the learned operator library Ai to retain candidate operators that were useful and successful in bi-level planning. Concretely, we estimate operator candidate a i A i accuracy across the bi-level plan executions as s/b where b counts the total times a i appeared in a high-level plan and s counts successful execution of the corresponding low-level action sequence to achieve the subgoal associated with a i. We retain operators if b > τb and s/b > τr, where τb, τr are hyperparameters. Note that this scoring procedure learns whether operators are accurate and support low-level planning independently of whether the LLM-predicted goals f t matched the true unknown goal predicates gt. 4 EXPERIMENTS Domains. We evaluate our approach on two-language specified planning-benchmarks: Mini Minecraft and ALFRED (Shridhar et al., 2020). Mini Minecraft (Fig. 5, top) is a procedurally-generated Minecraft-like benchmark (Chen et al., 2021; Luo et al., 2023) on a 2D grid world that requires Published as a conference paper at ICLR 2024 Craft a bed. (:action craft-bed :parameters ( ?i1 - inventory ?i2 - inventory ?ti inventory ?s - object ?o1 - object ?o2 - object ?t - object ?t - tile) :precondition (and (agent-at ?t) (object-at ?s ?t) (obj-type ?s Work Station) (inventory ?i1 ?o1) (obj-type ?i1 Wood Plank) (inventory ?i2 ?o2) (obj-type ?i2 Wool) (inventory-empty ?ti) (obj-type ?t Hypothetical)) :effect (and (not (inventory-empty ?ti)) (inventory ?ti ?t) (not (obj-type ?t Hypothetical)) (obj-type ?t Bed) (not (inventory ?i1 ?o1)) ;; ... more effects ommited ) ) (a) Example task. (b) An example operator proposed and verified by our algorithm. (c) Visualization of the crafting actions used. iron ingot iron ore pickaxe Mini Minecraft (a) Example tasks. Wash the dirty bowl before putting the bowl on the counter. (b) Example operators proposed and verified by our algorithm. Put chilled wine in the cabinet. Warm a plate and place it on the table. Place a cold potato slice in the oven. (:action Cool Object :parameters ( ?toolreceptacle - receptacle ?a agent ?l - location ?o - object) :precondition (and (receptacle Type ?toolreceptacle Fridge Type) (at Location ?a ?l) (holds ?a ?o) (receptacle At Location ?toolreceptacle ?l)) :effect (and (is Cool ?o)) ) (:action Slice Object :parameters ( ?toolobject - object ?a agent ?l - location ?o - object) :precondition (and (object Type ?toolobject Knife Type) (at Location ?a ?l) (object At Location ?o ?l) (sliceable ?o) (holds ?a ?toolobject)) :effect (and (is Sliced ?o)) ) Figure 5: Top: (a) The Mini Minecraft environment, showing an intermediate step towards crafting a bed. (b) Operator proposed by an LLM and verified by our algorithm through planning and execution. (c) Low-level actions involved in crafting the bed. Bottom: (a) The ALFRED household environment. (b) Example operators proposed by LLM and verified by our algorithm, which are composed to solve the cold potato slice task. complex, extended planning. The agent can use tools to mine resources and craft objects. The ability to create new objects that themselves permit new actions yields an enormous action space at each time step (>2000 actions) and very long-horizon tasks (26 high-level steps for the most complex task, without path-planning.) ALFRED (Fig. 5, bottom) is a household planning benchmark of human-annotated but formally verifiable tasks defined over a simulated Unity environment (Shridhar et al., 2020). The tasks include object rearrangements and those with object states such as heating and cleaning. Ground-truth high-level plans in the ALFRED benchmark compose 5-10 high-level operators, and low-level action trajectories have on average 50 low-level actions. There over 100 objects that the agent can interact with in each interactive environment. See the Appendix for details. Experimental setup. We evaluate in an iterative continual learning setting; except on the compositional evaluations, we learn from n=2 iterations through all (randomly ordered) tasks and report final accuracy on those tasks. All experiments and baselines use GPT-3.5. For each task, at each iteration, we sample n=4 initial goal proposals and n=4 initial task decompositions, and n=3 operator definition proposals for each operator name. We report best-of accuracy, scoring a task as solved if verification passes on at least one of the proposed goals. For Minecraft, we set the motion planning budget for each subgoal to 1000 nodes. For ALFRED, which requires a slow Unity simulation, we set it to 50 nodes. Additional temperature and sampling details are in the Appendix. We evaluate on three Mini Minecraft benchmark variations to test how our approach generalizes to complex, compositional goals. In the simplest Mining benchmark, all goals involve mining a target item from an appropriate initial resource with an appropriate tool (e.g., Mining iron from iron ore with an axe). In the harder Crafting benchmark, goals involve crafting a target artifact (e.g., a bed), which may require mining a few target resources. The most challenging Compositional benchmark combines mining and crafting tasks, in environments that only begin with raw resources and two starting tools (axe and pickaxe). Agents may need to compose multiple skills to obtain other downstream resources (see Fig. 5 for an example). To test action generalization, we report evaluation on the Compositional using only actions learned previously in the Mining and Crafting benchmarks. We similarly evaluate on an ALFRED benchmark of Simple and Compositional tasks drawn from the original task distribution in Shridhar et al. (2020). This distribution contains simple tasks that require picking up an object and placing it in a new location, picking up objects, applying a single household skill to an object and moving them to a new location (e.g., Put a clean apple on the dining table), and compositional tasks that require multiple skills (e.g., Place a hot sliced potato on the Published as a conference paper at ICLR 2024 Mini Minecraft (n=3) LLM Predicts? Library? Mining Crafting Compositional Low-level Planning Only Goal 31% (σ=0.0%) 9% (σ=0.0%) 9% (σ=0.0%) Subgoal Prediction Sub-goals 33% (σ=1.6%) 36% (σ=5.6%) 6% (σ=1.7%) Code Policy Prediction Sub-policies 15% (σ=1.2%) 39% (σ=3.2%) 10% (σ=1.7%) Ada (Ours) Goal+Operators 100% (σ=0.0%) 100% (σ=7.5%) 100% (σ=4.1%) ALFRED (n=3 replications) LLM Predicts? Library? Original (Simple + Compositional Tasks) Low-level Planning Only Goal 21% (σ=1.0%) Subgoal Prediction Sub-goal 2% (σ=0.4%) Code Policy Prediction Sub-policies 2% (σ=0.9%) Ada (Ours) Goal+Operators 79% (σ=0.9%) Table 1: (Top) Results on Mini Minecraft. Our algorithm successfully recovers all intermediate operators for mining and crafting, which enable generalization to more compositional tasks (which use up to 26 operators) without any additional learning. (Bottom) Results on ALFRED. Our algorithm recovers all required household operators, which generalize to more complex compositional tasks. All results report mean performance and STD from n=3 random replications for all models. counter). We use a random subset of n=223 tasks, selected from an initial 250 that we manually filter to remove completely misspecified goals (which omit any mention of the target object or skill). Baselines. We compare our method to three baselines of language-guided planning. Low-level Planning Only uses an LLM to predict only the symbolic goal specification conditioned on the high-level predicates and linguistic goal, then uses the low-level planner to search directly for actions that satisfy that goal. This baseline implements a model like LLM+P (Liu et al., 2023), which uses LLMs to translate linguistic goals into planning-compatible formal specifications, then attempt to plan directly towards these with no additional representation learning. Subgoal Prediction uses an LLM to predict a sequence of high-level subgoals (as PDDL pre/postconditions with object arguments), conditioned on the high-level predicates, and task goal and initial environment state. This baseline implements a model like Say Can (Ahn et al., 2022), which uses LLMs to directly predict goal and a sequence of decomposed formal subgoal representations, then applies low-level planning over these formal subgoals. Code Policy Prediction uses an LLM to predict the definitions of a library of imperative local code policies in Python (with cases and control flow) over an imperative API that can query state and execute low-level actions.) Then, as Fast Downward planning is no longer applicable, we also use the LLM to predict the function call sequences with arguments for each task. This baseline implements a model like Voyager (Wang et al., 2023a), which uses an LLM to predict a library of skills implemented as imperative code for solving individual tasks. Like Voyager, we verify the individual code skills during interactive planning, but do not use a more global learning objective to attempt to learn a concise or non-redundant library. 4.1 RESULTS What action libraries do we learn? Fig. 5 shows example operators learned on each domain (Appendix A.3 contains the full libraries of operators learned on both domains from a randomly sampled run of the n=3 replications). In Mini Minecraft, we manually inspect the library and find that we learn operators that correctly specify the appropriate tools, resources, and outputs for all intermediate mining actions (on Mining) and crafting actions (on Crafting), allowing perfect direct generalization to the Compositional tasks without any additional training on these complex tasks. In ALFRED, we compare the learned libraries from all runs to the ground-truth operator library hand-engineered in Shridhar et al. (2020). The ground-truth operator set contains 8 distinct operators corresponding to different compositional skills (e.g., Slicing, Heating, Cleaning, Cooling). Across all replications, model reliably recovers semantically identical (same predicate preconditions and postconditions) definitions for all of these ground-truth operators, except for a single operator that is defined disjunctively (the ground-truth Slice skill specifies either of two types of knives), which we occasionally learn as two distinct operators or only recover with one of these two types. We also inspect the learning trajectory and find that, through the interactive learning loop, we successfully reject many initially proposed operator definitions sampled from the language model that turn out to be redundant (which would make high-level planning inefficient), inaccurate (including apriori reasonable proposals that do not fit the environment specifications, such as proposing to clean objects with just a towel, when our goal verifiers require washing them with water in a sink), or Published as a conference paper at ICLR 2024 underspecified (such as those that omit key preconditions, yielding under-decomposed high-level task plans that make low-level planning difficult). Do these actions support complex planning and generalization? Table 2 shows quantitative results from n=3 randomly-initialized replications of all models, to account for random noise in sampling from the language model and stochasticity in the underlying environment (ALFRED). On Minecraft, where goal specification is completely clear due to the synthetic language, we solve all tasks in each evaluation variation, including the challenging Compositional setting the action libraries learned from simpler mining/crafting tasks generalize completely to complex tasks that require crafting all intermediate resources and tools from scratch. On ALFRED, we vastly outperform all other baselines, demonstrating that the learned operators are much more effective for planning and compose generalizably to more complex tasks. We qualitatively find that failures on ALFRED occur for several reasons. One is goal misspecification, when the LLM does not successfully recover the formal goal predicate (often due to ambiguity in human language), though we find that on average, 92% of the time, the ground truth goal appears as one of the top-4 goals translated by the LLM. We also find failures due to low-level policy inaccuracy, when the learned policies fail to account for low-level, often geometric details of the environment (e.g., the learned policies are not sufficiently precise to place a tall bottle on an appropriately tall shelf). More rarely, we see planning failures caused by slight operator overspecification (e.g., the Slice case discussed above, in which we do not recover the specific disjunction over possible knives that can be used to slice.) Both operator and goal specification errors could be addressed in principal by sampling more (and more diverse) proposals. How does our approach compare to using the LLM to predict just goals, or predict task sequences? As shown in Table 2, our approach vastly outperforms the Low-level Planning Only baseline on both domains, demonstrating the value of the action library for longer horizon planning. We also find a substantial improvement over the Subgoal Prediction baseline. While the LLM frequently predicts important high-level aspects of the task subgoal structure (as it does to propose operator definitions), it frequently struggles to robustly sequence these subgoals and predict appropriate concrete object groundings that correctly obey the initial problem conditions or changing environment state. These errors accumulate over the planning horizon, reflected in decreasing accuracy on the compositional Minecraft tasks (on ALFRED, this baseline struggles to solve any more than the basic pick-and-place tasks, as the LLM struggles to predict subgoals that accurately track whether objects are in appliances or whether the agent s single gripper is full with an existing tool.) How does our approach compare to using the LLM to learn and predict plans using imperative code libraries? Somewhat surprisingly, we find that the Code Policy prediction baseline performs unevenly and often very poorly on our benchmarks. (We include additional results in A.2.1 showing that our model also dramatically outperforms this baseline using GPT-4 as the base LLM.) We find several key reasons for the poor performance of this baseline relative to our model, each which validate the key conceptual contributions of our approach. First, the baseline relies on the LLM as the planner as the skills are written as general Python functions, rather than any planner-specific representation, we do not use an optimized planner like Fast Downward. As with Subgoal Prediction, we find that the LLM is not a consistent or accurate planner. While it retrieves generally relevant skills from the library for each task, it often struggles to sequence them accurately or predict appropriate arguments given the initial problem state. Second, we find that imperative code is less suited in general as a hierarchical planning representation for these domains than the high-level PDDL and low-level local policy search representation we use in our model. This is because it uses control flow to account for environment details that would otherwise be handled by local search relative to a high-level PDDL action. Finally, our model specifically frames the library learning objective around learning a compact library of skills that enables efficient planning, whereas our Voyager re-implementation (as in Wang et al. (2023a)) simply grows a library of skills which are individually executable and can be used to solve individual, shorter tasks. Empirically, as with the original model in Wang et al. (2023a), this baseline learns hundreds of distinct code definitions on these datasets, which makes it harder to accurately plan and generalize to more complex tasks. Taken together, these challenges support our overarching library learning objective for hierarchical planning. 5 RELATED WORK Planning for language goals. A large body of recent work attempts to use LLMs to solve planning tasks specified in language. One approach is to directly predict action sequences (Huang et al., 2022; Valmeekam et al., 2022; Silver et al., 2022; Wang et al., 2023b), but this has yielded mixed Published as a conference paper at ICLR 2024 results as LLMs can struggle to generalize or produce correct plans as problems grow more complex. To combat this, one line of work has explored structured and iterative prompting regimes (e.g., chain-of-thought and feedback) (Mu et al., 2023; Silver et al., 2023; Zhu et al., 2023). Increasingly, other neuro-symbolic work uses LLMs to predict formal goal or action representations that can be verified or solved with symbolic planners (Song et al., 2023; Ahn et al., 2022; Xie et al., 2023; Arora & Kambhampati, 2023). These approaches leverage the benefits of a known planning domain model. Our goal in this paper is to leverage language models to learn this domain model. Another line of research aims at using LLMs to generate formal planning domain models for specific problems (Liu et al., 2023) and subsequently uses classical planners to solve the task. However, they are not considering generating grounded or hierarchical actions in an environment and not learning a library of operators that can be reused across different tasks. More broadly, we share the broad goal of building agents that can understand language and execute actions to achieve goals (Tellex et al., 2011; Misra et al., 2017; Nair et al., 2022). See also Luketina et al. (2019) and Tellex et al. (2020). Learning planning domain and action representations from language. Another group of work has been focusing on learning latent action representations from language (Corona et al., 2021; Andreas et al., 2017; Jiang et al., 2019; Sharma et al., 2022; Luo et al., 2023). Our work differs from them in that we are learning a planning-compatible action abstraction from LLMs, instead of relying on human demonstrations and annotated step-by-step instructions. The more recent Wang et al. (2023a) adopts a similar overall problem specification, to learn libraries of actions as imperative code-based policies. Our results show that learning planning abstractions enables better integration with hierarchical planning, and, as a result, better performance and generalization to more complex problems. Other recent work (Nottingham et al., 2023) learns an environment model from interactive experience, represented as a task dependency graph; we seek to learn a richer state transition model (which represents the effects of actions) decomposed as operators that can be formally composed to verifiably satisfy arbitrarily complex new goals. Guan et al. (2024), published concurrently, seeks to learn PDDL representations; we show how these can be grounded hierarchically. Language and code. In addition to Wang et al. (2023a), a growing body of work in program synthesis, both by learning lifted program abstractions that compress longer existing or synthesized programs (Bowers et al., 2023; Ellis et al., 2023; Wong et al., 2021; Cao et al., 2023). These approaches (including Wang et al. (2023a)) generally learn libraries defined over imperative and functional programming languages, such as LISP and Python. Our work is closely inspired by these and seeks to learn representations suited specifically to solving long-range planning problems. Hierarchical planning abstractions. The hierarchical planning knowledge that we learn from LLMs and interactions in the environments are related to hierarchical task networks (Erol et al., 1994; Nejati et al., 2006), hierarchical goal networks (Alford et al., 2016), abstract PDDL domains (Konidaris et al., 2018; Bonet & Geffner, 2020; Chitnis et al., 2021; Asai & Muise, 2020; Mao et al., 2022; 2023), and domain control knowledge (de la Rosa & Mc Ilraith, 2011). Most of these approaches require manually specified hierarchical planning abstractions; others learn them from demonstrations or interactions. By contrast, we leverage human language to guide the learning of such abstractions. 6 DISCUSSION AND FUTURE WORK Our evaluations suggest a powerful role for language within AI systems that form complex, longhorizon plans as a rich source of background knowledge about the right action abstractions for everyday planning domains, which contains broad human priors about environments, task decompositions, and potential future goals. A core goal of this paper was to demonstrate how to integrate this knowledge into the search, grounding, and verification toolkits developed in hierarchical planning. We leave open many possible extensions towards future work. Key limitations of our current framework point towards important directions for further integrating LMs and hierarchical planning to scale our approach: here, we build on an existing set of pre-defined symbolic predicates for initially representing the environment state; do not yet tackle fine-grained, geometric motor planning; and use a general LLM (rather than one fine-tuned for extended planning). Future work might generally tackle these problems by further asking how else linguistic knowledge and increasingly powerful or multimodal LLMs could be integrated here: to propose useful named predicates over initial perceptual inputs (e.g., images) (Migimatsu & Bohg, 2022); or to speed planning by bootstrapping hierarchical planning abstractions using the approach here, but then to progressively transfer planning to another model, including an LLM, to later compose and use the learned representations. Published as a conference paper at ICLR 2024 Acknowledgement. We thank anonymous reviewers for their valuable comments. We gratefully acknowledge support from ONR MURI grant N00014-16-1-2007; from the Center for Brain, Minds, and Machines (CBMM, funded by NSF STC award CCF-1231216); from NSF grant 2214177; from NSF grant CCF-2217064 and IIS-2212310; from Air Force Office of Scientific Research (AFOSR) grant FA9550-22-1-0249; from ONR MURI grant N00014-22-1-2740; from ARO grant W911NF-231-0034; from the MIT-IBM Watson AI Lab; from the MIT Quest for Intelligence; from Intel; and from the Boston Dynamics Artificial Intelligence Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors. Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as I Can, Not as I Say: Grounding Language in Robotic Affordances. ar Xiv:2204.01691, 2022. 1, 2, 7, 9 Ron Alford, Vikas Shivashankar, Mark Roberts, Jeremy Frank, and David W Aha. Hierarchical Planning: Relating Task and Goal Decomposition with Task Sharing. In IJCAI, 2016. 9 Jacob Andreas, Dan Klein, and Sergey Levine. Modular Multitask Reinforcement Learning with Policy Sketches. In ICML, 2017. 9 Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob Mc Grew, Josh Tobin, Open AI Pieter Abbeel, and Wojciech Zaremba. Hindsight Experience Replay. In Neur IPS, 2017. 5 Daman Arora and Subbarao Kambhampati. Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models. ar Xiv:2305.17077, 2023. 9 Masataro Asai and Christian Muise. Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (To Strips). In IJCAI, 2020. 9 Blai Bonet and Hector Geffner. Learning First-Order Symbolic Representations for Planning from the Structure of the State Space. In ECAI, 2020. 9 Matthew Bowers, Theo X Olausson, Lionel Wong, Gabriel Grand, Joshua B Tenenbaum, Kevin Ellis, and Armando Solar-Lezama. Top-Down Synthesis for Library Learning. PACMPL, 7(POPL): 1182 1213, 2023. 9 David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. Babble: Learning Better Abstractions with E-graphs and Anti-unification. PACMPL, 7(POPL): 396 424, 2023. 9 Valerie Chen, Abhinav Gupta, and Kenneth Marino. Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning. In ICLR, 2021. 5, 13 Rohan Chitnis, Tom Silver, Joshua B Tenenbaum, Leslie Pack Kaelbling, and Tom as Lozano-P erez. GLi B: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling. In AAAI, 2021. 9 Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, and Trevor Darrell. Modular Networks for Compositional Instruction Following. In NAACL-HLT, 2021. 9 Tom as de la Rosa and Sheila Mc Ilraith. Learning Domain Control Knowledge for TLPlan and Beyond. In ICAPS 2011 Workshop on Planning and Learning, 2011. 9 Thomas G Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. JAIR, 13:227 303, 2000. 1 Kevin Ellis, Lionel Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lore Anaya Pozo, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. Dream Coder: Growing Generalizable, Interpretable Knowledge with Wake Sleep Bayesian Program Learning. Philosophical Transactions of the Royal Society, 381(2251):20220050, 2023. 9 Published as a conference paper at ICLR 2024 Kutluhan Erol, James Hendler, and Dana S Nau. HTN Planning: Complexity and Expressivity. In AAAI, 1994. 1, 9 Richard E Fikes and Nils J Nilsson. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artif. Intell., 2(3-4):189 208, 1971. 2 Caelan Reed Garrett, Rohan Chitnis, Rachel Holladay, Beomjoon Kim, Leslie Pack Kaelbling, and Tom as Lozano-P erez. Integrated Task and Motion Planning. Ann. Rev. Control Robot. Auton. Syst., 4:265 293, 2021. 1 Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati. Leveraging pretrained large language models to construct and utilize world models for model-based task planning. Advances in Neural Information Processing Systems, 36, 2024. 9 Malte Helmert. The Fast Downward Planning System. JAIR, 26:191 246, 2006. 3, 5 Mark K Ho, David Abel, Carlos G Correa, Michael L Littman, Jonathan D Cohen, and Thomas L Griffiths. People Construct Simplified Mental Representations to Plan. Nature, 606(7912):129 136, 2022. 1 Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In ICML, 2022. 8 Yiding Jiang, Shixiang Shane Gu, Kevin P Murphy, and Chelsea Finn. Language as an Abstraction for Hierarchical Deep Reinforcement Learning. In Neur IPS, 2019. 9 George Konidaris, Leslie Pack Kaelbling, and Tomas Lozano-Perez. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning. JAIR, 61:215 289, 2018. 9 Steven La Valle. Rapidly-Exploring Random Trees: A New Tool for Path Planning. Research Report 9811, 1998. 5 Vladimir Lifschitz. On the Semantics of STRIPS. In Workshop on Reasoning about Actions and Plans, 1986. 3 Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. LLM+ P: Empowering Large Language Models with Optimal Planning Proficiency. ar Xiv:2304.11477, 2023. 2, 7, 9 Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, and Tim Rockt aschel. A Survey of Reinforcement Learning Informed by Natural Language. In IJCAI, 2019. 9 Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tom as Lozano-P erez, Joshua B Tenenbaum, and Leslie Pack Kaelbling. Learning Rational Subgoals from Demonstrations and Instructions. In AAAI, 2023. 5, 9, 13 Jiayuan Mao, Tomas Lozano-Perez, Joshua B. Tenenbaum, and Leslie Pack Kaelbing. PDSketch: Integrated Domain Programming, Learning, and Planning. In Neur IPS, 2022. 9 Jiayuan Mao, Tom as Lozano-P erez, Joshua B. Tenenbaum, and Leslie Pack Kaelbling. What planning problems can a relational neural network solve? In Neur IPS, 2023. 9 Toki Migimatsu and Jeannette Bohg. Grounding Predicates through Actions, 2022. 2, 9 Dipendra Misra, John Langford, and Yoav Artzi. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning. In EMNLP, 2017. 9 Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. Embodied GPT: Vision-Language Pre-training via Embodied Chain of Thought. ar Xiv:2305.15021, 2023. 9 Suraj Nair, Eric Mitchell, Kevin Chen, Silvio Savarese, and Chelsea Finn. Learning Language Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation. In Co RL, 2022. 9 Published as a conference paper at ICLR 2024 Negin Nejati, Pat Langley, and Tolga Konik. Learning Hierarchical Task Networks by Observation. In ICML, 2006. 1, 9 Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox. Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling. In ICML, 2023. 9 Pratyusha Sharma, Antonio Torralba, and Jacob Andreas. Skill Induction and Planning with Latent Language. In ACL, 2022. 9 Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In CVPR, 2020. 2, 5, 6, 7, 13 Tom Silver, Rohan Chitnis, Joshua Tenenbaum, Leslie Pack Kaelbling, and Tom as Lozano-P erez. Learning Symbolic Operators for Task and Motion Planning. In IROS, 2021. 3 Tom Silver, Varun Hariprasad, Reece S Shuttleworth, Nishanth Kumar, Tom as Lozano-P erez, and Leslie Pack Kaelbling. PDDL Planning with Pretrained Large Language Models. In Neur IPS Foundation Models for Decision Making Workshop, 2022. 8 Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B Tenenbaum, Leslie Pack Kaelbling, and Michael Katz. Generalized Planning in PDDL Domains with Pretrained Large Language Models. ar Xiv:2305.11014, 2023. 9 Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. In ICCV, 2023. 9 Richard S Sutton, Doina Precup, and Satinder Singh. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell., 112(1-2):181 211, 1999. 1 Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew Walter, Ashis Banerjee, Seth Teller, and Nicholas Roy. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. In AAAI, 2011. 9 Stefanie Tellex, Nakul Gopalan, Hadas Kress-Gazit, and Cynthia Matuszek. Robots That Use Language. Annual Review of Control, Robotics, & Autonomous Systems, 3:25 55, 2020. 9 Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. Large Language Models Still Can t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). In Neur IPS Foundation Models for Decision Making Workshop, 2022. 8 Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An Open-Ended Embodied Agent with Large Language Models. ar Xiv:2305.16291, 2023a. 1, 2, 7, 8, 9 Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, and Yitao Liang. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, 2023b. 8 Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, and Jacob Andreas. Leveraging Language to Learn Program Abstractions and Search Heuristics. In ICML, 2021. 9 Yaqi Xie, Chen Yu, Tongyao Zhu, Jinbin Bai, Ze Gong, and Harold Soh. Translating Natural Language to Planning Goals with Large-Language Models. ar Xiv:2302.05128, 2023. 9 Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, and Jifeng Dai. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Textbased Knowledge and Memory, 2023. 9 Published as a conference paper at ICLR 2024 We will release a complete code repository containing our full algorithm implementation, all baselines, and benchmark tasks. Here, we provide additional details on our implementational choices. A.1 BENCHMARKS Mini Minecraft (Fig. 5, top) is a procedurally-generated Minecraft-like benchmark (Chen et al., 2021; Luo et al., 2023) that requires complex, extended planning. The environment places an agent on a 2D map containing various resources, tools, and crafting stations. The agent can use appropriate tools to mine new items from raw resources (e.g. use an axe to obtain wood from trees), or collect resources into an inventory to craft new objects (e.g. combining sticks and iron ingots to craft a sword, which itself can be used to obtain feathers from a chicken). The ability to create new objects that themselves permit new actions yields an enormous action space at each time step (>2000 actions, considering different combinations of items to use) and very long-horizon tasks (26 steps for the most complex task, even without path-planning.) The provided environment predicates allow querying object types and inventory contents. Low-level actions allow the agent to move and apply tools to specific resources. To focus on complex crafting, we provide a low-level move-to action to move directly to specified locations. Linguistic goal specifications are synthetically generated from a simple grammar over craftable objects and resources (e.g. Craft a sword, Mine iron ore). ALFRED (Fig. 5, bottom) is a household planning benchmark of human-annotated but formally verifiable tasks defined over a simulated Unity environment (Shridhar et al., 2020). The interactive environment places an agent in varying 3D layouts, each containing appliances and dozens of household objects. The provided environment includes predicates for querying object types, object and agent locations, and classifiers over object states (eg. whether an object is hot or on). Low-level actions enable the agent to pick up and place objects, apply tools to other objects, and open, close, and turn on appliances. As specified in Shridhar et al. (2020), ground-truth high-level plans in the ALFRED benchmark compose 5-10 high-level operators, and low-level action trajectories have on average 50 low-level actions. There over 100 objects that the agent can interact with in each interactive environment. As with Minecraft, we provide a low-level method to move the agent directly to specified locations. While ALFRED is typically used to evaluate detailed instruction following, we focus on a goal-only setting that only uses the goal specifications. The human-annotated goals introduce ambiguity, underspecification, and errors with respect to the ground-truth verifiable tasks (eg. people refer to tables without specifying if they mean the side table, dining table, or desk; a light when there are multiple distinct lamps; or a cabbage when they want lettuce). A.2 ADDITIONAL METHODS IMPLEMENTATION DETAILS A.2.1 LLM PROMPTING We use gpt-3.5-turbo-16k for all experiments and baselines. Here, we describe the contents of the LLM few-shot prompts used in our method in more detail. Symbolic Task Decomposition For all unsolved tasks, at each iteration, we sample a set of symbolic task descriptions as a sequence of named high-level actions and their arguments. We construct a few-shot prompt consisting of the following components: 1. A brief natural language header (;;;; Given natural language goals, predict a sequence of PDDL actions); 2. A sequence of example (lt, PA) tuples containing linguistic goals and example task decompositions. To avoid biasing the language model in advance, we provide example task decompositions for similar, constructed tasks that do not use any of the skills that need to be learned in our two domains. For example, on ALFRED, these example task decompositions are for example tasks (bake a potato and put it in the fridge, place a baked, grated apple on top of the dining table, place a plate in a full sink., and pick up a laptop and then carry it over to the desk lamp, then restart the desk lamp.), and our example task decompositions suggest named operators Published as a conference paper at ICLR 2024 Bake Object, Grate Object, Fill Object, and Restart Object, none of which appear in the actual training set. 3. At iterations > 0, we also provide a sequence of sampled (lt, PA) tuples randomly sampled from any solved tasks and their discovered high-level plans. This means that few-shot prompting better represents the true task distribution over successive iterations. In our experiments, we prompt with temperature=1.0 and draw n=4 task decomposition samples per unsolved task. Symbolic Operator Definition For all unsolved tasks, at each iteration, we sample proposed operator definitions consisting of args, pre, eff conditioned on all undefined operator names that appear in the proposed task decompositions. For each operator name, we construct a few-shot prompt consisting of the following components: 1. A brief natural language header (You are a software engineer who will be writing planning operators in the PDDL planning language. These operators are based on the following PDDL domain definition. 2. The full set of environment predicates vocabulary of high-level environment predicates P, as well as valid named argument values (eg. object types). 3. A sequence of example name, args, pre, eff operator definitions demonstrating the PDDL definition format. As with task decomposition, of course, we do not provide any example operator definitions that we wish to learn from our dataset. 4. At iterations > 0, we include as many possible validated name, args, pre, eff operators defined in the current library (including new learned operators). If there are shared patterns between operators, this means that few-shot prompting also better represents the true operator structure over successive iterations. In our experiments, we prompt with temperature=1.0 and draw n=3 task decomposition samples per unsolved task. However, in our pilot experiments, we actually find that sampling directly from the token probabilities defined by this few-shot prompt does not produce sufficiently diverse definitions for each operator name. We instead directly prompt the LLM to produce up to N distinct operator definitions sequentially. We find that GPT 3.5 frequently produces syntactically invalid operator proposals proposed operators often include invent predicates and object types that are not defined in the environment vocabulary, do not obey the predicate typing rules, or do not have the correct number and types of arguments. While this might improve with finetuned or larger LLMs, we instead implement a simple post-processing heuristic to correct operators with syntactic errors, or reject operators altogether: as operator pre and postconditions are represented as conjunctions of predicates, we remove any invalid predicates (predicates that are invented or that specify invalid arguments); we collect all arguments named across the predicates and use the ground truth typing to produce the final args, and we reject any operators that have 0 valid postcondition predicates. This post-processing procedure frequently leaves operators underspecified (e.g., the resulting operators now are missing necessary preconditions, which were partially generated but syntactically incorrect in the proposal); we allow our full operator learning algorithm to verify and reject these operators. Symbolic Goal Proposal Finally, as described in 3.2, we also use an LLM to propose a set of candidate goal definitions as FOL formulas F t defined over the environment predicates P for each task. Our prompting technique is very similar to that used in the rest of our algorithm. For each task, we we construct a few-shot prompt consisting of the following components: 1. A brief natural language header (You are a software engineer who will be writing goal definitions for a robot in the PDDL planning language. 2. The full set of environment predicates vocabulary of high-level environment predicates P, as well as valid named argument values (eg. object types). 3. A sequence of example lt, ft language and FOL goal formulas. In our experiments, during training, unlike in the previous prompts (where including ground truth operators would solve the learning problem), we do sample an initial set of goal definitions from the training Published as a conference paper at ICLR 2024 Mini Minecraft LLM Predicts? Library? Mining Crafting Compositional Code Policy Prediction Sub-policies 12% 37% 11% Ours Goal+Operators 100% 100% 100% ALFRED (n=3 replications) LLM Predicts? Library? Original (Simple + Compositional Tasks) Code Policy Prediction Sub-policies 11% Ours Goal+Operators 70% Table 2: Results with GPT-4 as the LLM backbone: On both Mini Minecraft (Top) and ALFRED (Bottom), our algorithm recovers all required operators, which generalize to more complex compositional tasks. Switching to GPT-4 does not impact performance trends observed across the Code as Policies (Voyager) baseline and our method. distribution as our initial example supervision. We set supervision to a randomly sampled fraction (0.1) of the training distribution. 4. At iterations > 0, we also include lt, ft examples from successfully solved tasks. In our experiments, we prompt with temperature=1.0 and draw n=4 task decomposition samples per unsolved task. As with the operator proposal, we also find that sampling directly from the token probabilities defined by this few-shot prompt does not produce sufficiently diverse definitions for each linguistic goal to correct for ambiguity in the human language (eg. to define the multiple concrete Table types that a person might mean when referring to a table). We therefore again instead directly prompt the LLM to produce up to N distinct operator definitions sequentially. We also post-process proposed goals using the same syntactic criterion to remove invalid predicates in the FOL formula, and reject any empty goals. A.2.2 POLICY LEARNING AND GUIDED LOW-LEVEL SEARCH Concretely, we implement our policy-guided low-level action search as the following. We maintain a dictionary D that maps subgoals (a conjunction of atoms) to a set of candidate low-level action trajectories. When planning for a new subgoal sg, if D contains the trajectory, we prioritize trying candidate low-level trajectories in D. Otherwise, we fall back to a brute-force breadth-first search over all possible action trajectories. To populate D, during the BFS, we compute the difference in the environment state before and after the agent executes any sampled trajectory and the corresponding trajectory t that caused the state change. Here the state difference can be viewed as a subgoal sg achieved by executing t. Rather than directly adding the (sg, t) as a key-value pair to D, we lift the trajectory and environment state change by replacing concrete objects in sg and t by variables. Note that we update D with each sampled trajectory in the BFS even if it doesn t achieve the subgoal specified in the BFS search. When the low-level search receives a subgoal sg, we again lift it by replacing objects with variables, and try to match it with entries in D. If D contains multiple trajectories t for a given subgoal sg, we track how often a given trajectory succeeds for a subgoal and prioritize trajectories with the most successes. A.3 EXPERIMENTS Learned Operator Libraries on Minecraft The following shows the full PDDL domain definition including the initial provided vocabulary of symbolic environment constants and predicates, initial pick and place operators and example operator, and all ensuing learned operators combined from the Mining and Crafting benchmarks. 1 ( define ( domain crafting world v20230404 teleport ) 2 ( : requirements : s t r i p s ) 3 ( : types 6 inventory 7 object type 9 ( : constants Published as a conference paper at ICLR 2024 10 Key object type 11 Work Station object type 12 Pickaxe object type 13 Iron Ore Vein object type 14 Iron Ore object type 15 Iron Ingot object type 16 Coal Ore Vein object type 17 Coal object type 18 Gold Ore Vein object type 19 Gold Ore object type 20 Gold Ingot object type 21 Cobblestone Stash object type 22 Cobblestone object type 23 Axe object type 24 Tree object type 25 Wood object type 26 Wood Plank object type 27 Stick object type 28 Sword object type 29 Chicken object type 30 Feather object type 31 Arrow object type 32 Shears object type 33 Sheep object type 34 Wool object type 35 Bed object type 36 Boat object type 37 Sugar Cane Plant object type 38 Sugar Cane object type 39 Paper object type 40 Bowl object type 41 Potato Plant object type 42 Potato object type 43 Cooked Potato object type 44 Beetroot Crop object type 45 Beetroot object type 46 Beetroot Soup object type 48 Hypothetical object type 49 Trash object type 51 ( : predicates 52 ( t i l e u p ?t1 t i l e ?t2 t i l e ) 53 ( tile down ?t1 t i l e ?t2 t i l e ) 54 ( t i l e l e f t ?t1 t i l e ?t2 t i l e ) 55 ( t i l e r i g h t ?t1 t i l e ?t2 t i l e ) 57 ( agent at ? t t i l e ) 58 ( object at ?x object ? t t i l e ) 59 ( inventory holding ? i inventory ?x object ) 60 ( inventory empty ? i inventory ) 62 ( object of type ?x object ?ot object type ) 65 ( : action move to 66 : parameters (? t1 t i l e ?t2 t i l e ) 67 : precondition ( and ( agent at ?t1 ) ) 68 : e f f e c t ( and ( agent at ?t2 ) ( not ( agent at ?t1 ) ) ) 70 ( : action pick up 71 : parameters (? i inventory ?x object ? t t i l e ) 72 : precondition ( and ( agent at ? t ) ( object at ?x ? t ) ( inventory empty ? i ) ) Published as a conference paper at ICLR 2024 73 : e f f e c t ( and ( inventory holding ? i ?x ) ( not ( object at ?x ? t ) ) ( not ( inventory empty ? i ) ) ) 75 ( : action place down 76 : parameters (? i inventory ?x object ? t t i l e ) 77 : precondition ( and ( agent at ? t ) ( inventory holding ? i ?x ) ) 78 : e f f e c t ( and ( object at ?x ? t ) ( not ( inventory holding ? i ?x ) ) ( inventory empty ? i ) ) 80 ( : action mine iron ore 81 : parameters (? t o o l i n v inventory ? t a r g e t i n v inventory ?x object ? t o o l object ? target object ? t t i l e ) 82 : precondition ( and 83 ( agent at ? t ) 84 ( object at ?x ? t ) 85 ( object of type ?x Iron Ore Vein ) 86 ( inventory holding ? t o o l i n v ? t o o l ) 87 ( object of type ? t o o l Pickaxe ) 88 ( inventory empty ? t a r g e t i n v ) 89 ( object of type ? target Hypothetical ) 91 : e f f e c t ( and 92 ( not ( inventory empty ? t a r g e t i n v ) ) 93 ( inventory holding ? t a r g e t i n v ? target ) 94 ( not ( object of type ? target Hypothetical ) ) 95 ( object of type ? target Iron Ore ) 98 ( : action mine wood 2 99 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 101 : precondition ( and 102 ( agent at ? t ) 103 ( object at ?x ? t ) 104 ( object of type ?x Tree ) 105 ( inventory holding ? t o o l i n v ? t o o l ) 106 ( object of type ? t o o l Axe ) 107 ( inventory empty ? t a r g e t i n v ) 108 ( object of type ? target Hypothetical ) 110 : e f f e c t ( and 111 ( not ( inventory empty ? t a r g e t i n v ) ) 112 ( inventory holding ? t a r g e t i n v ? target ) 113 ( not ( object of type ? target Hypothetical ) ) 114 ( object of type ? target Wood) 117 ( : action mine wool1 0 118 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 120 : precondition ( and 121 ( agent at ? t ) 122 ( object at ?x ? t ) 123 ( object of type ?x Sheep ) 124 ( inventory holding ? t o o l i n v ? t o o l ) 125 ( object of type ? t o o l Shears ) 126 ( inventory empty ? t a r g e t i n v ) 127 ( object of type ? target Hypothetical ) 129 : e f f e c t ( and 130 ( not ( inventory empty ? t a r g e t i n v ) ) 131 ( inventory holding ? t a r g e t i n v ? target ) 132 ( not ( object of type ? target Hypothetical ) ) Published as a conference paper at ICLR 2024 133 ( object of type ? target Wool ) 136 ( : action mine potato 0 137 : parameters (? t t i l e ?x object ? t a r g e t i n v inventory ? target object ) 139 : precondition ( and 140 ( agent at ? t ) 141 ( object at ?x ? t ) 142 ( object of type ?x Potato Plant ) 143 ( inventory empty ? t a r g e t i n v ) 144 ( object of type ? target Hypothetical ) 146 : e f f e c t ( and 147 ( not ( inventory empty ? t a r g e t i n v ) ) 148 ( inventory holding ? t a r g e t i n v ? target ) 149 ( not ( object of type ? target Hypothetical ) ) 150 ( object of type ? target Potato ) 153 ( : action mine sugar cane 2 154 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 156 : precondition ( and 157 ( agent at ? t ) 158 ( object at ?x ? t ) 159 ( object of type ?x Sugar Cane Plant ) 160 ( inventory holding ? t o o l i n v ? t o o l ) 161 ( object of type ? t o o l Axe ) 162 ( inventory empty ? t a r g e t i n v ) 163 ( object of type ? target Hypothetical ) 165 : e f f e c t ( and 166 ( not ( inventory empty ? t a r g e t i n v ) ) 167 ( inventory holding ? t a r g e t i n v ? target ) 168 ( not ( object of type ? target Hypothetical ) ) 169 ( object of type ? target Sugar Cane ) 172 ( : action mine beetroot 1 173 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 175 : precondition ( and 176 ( agent at ? t ) 177 ( object at ?x ? t ) 178 ( object of type ?x Beetroot Crop ) 179 ( inventory holding ? t o o l i n v ? t o o l ) 180 ( inventory empty ? t a r g e t i n v ) 181 ( object of type ? target Hypothetical ) 183 : e f f e c t ( and 184 ( not ( inventory empty ? t a r g e t i n v ) ) 185 ( inventory holding ? t a r g e t i n v ? target ) 186 ( not ( object of type ? target Hypothetical ) ) 187 ( object of type ? target Beetroot ) 190 ( : action mine feather 1 191 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 193 : precondition ( and Published as a conference paper at ICLR 2024 194 ( agent at ? t ) 195 ( object at ?x ? t ) 196 ( object of type ?x Chicken ) 197 ( inventory holding ? t o o l i n v ? t o o l ) 198 ( object of type ? t o o l Sword ) 199 ( inventory empty ? t a r g e t i n v ) 200 ( object of type ? target Hypothetical ) 202 : e f f e c t ( and 203 ( not ( inventory empty ? t a r g e t i n v ) ) 204 ( inventory holding ? t a r g e t i n v ? target ) 205 ( not ( object of type ? target Hypothetical ) ) 206 ( object of type ? target Feather ) 209 ( : action mine cobblestone 2 210 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 212 : precondition ( and 213 ( agent at ? t ) 214 ( object at ?x ? t ) 215 ( object of type ?x Cobblestone Stash ) 216 ( inventory holding ? t o o l i n v ? t o o l ) 217 ( object of type ? t o o l Pickaxe ) 218 ( inventory empty ? t a r g e t i n v ) 219 ( object of type ? target Hypothetical ) 221 : e f f e c t ( and 222 ( not ( inventory empty ? t a r g e t i n v ) ) 223 ( inventory holding ? t a r g e t i n v ? target ) 224 ( not ( object of type ? target Hypothetical ) ) 225 ( object of type ? target Cobblestone ) 228 ( : action mine gold ore1 2 229 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 231 : precondition ( and 232 ( agent at ? t ) 233 ( object at ?x ? t ) 234 ( object of type ?x Gold Ore Vein ) 235 ( inventory holding ? t o o l i n v ? t o o l ) 236 ( object of type ? t o o l Pickaxe ) 237 ( inventory empty ? t a r g e t i n v ) 238 ( object of type ? target Hypothetical ) 240 : e f f e c t ( and 241 ( not ( inventory empty ? t a r g e t i n v ) ) 242 ( inventory holding ? t a r g e t i n v ? target ) 243 ( not ( object of type ? target Hypothetical ) ) 244 ( object of type ? target Gold Ore ) 247 ( : action mine coal1 0 248 : parameters (? t t i l e ?x object ? t o o l i n v inventory ? t o o l object ? t a r g e t i n v inventory ? target object ) 250 : precondition ( and 251 ( agent at ? t ) 252 ( object at ?x ? t ) 253 ( object of type ?x Coal Ore Vein ) 254 ( inventory holding ? t o o l i n v ? t o o l ) 255 ( object of type ? t o o l Pickaxe ) Published as a conference paper at ICLR 2024 256 ( inventory empty ? t a r g e t i n v ) 257 ( object of type ? target Hypothetical ) 259 : e f f e c t ( and 260 ( not ( inventory empty ? t a r g e t i n v ) ) 261 ( inventory holding ? t a r g e t i n v ? target ) 262 ( not ( object of type ? target Hypothetical ) ) 263 ( object of type ? target Coal ) 266 ( : action mine beetroot1 0 267 : parameters (? t t i l e ?x object ? t a r g e t i n v inventory ? target object ) 269 : precondition ( and 270 ( agent at ? t ) 271 ( object at ?x ? t ) 272 ( object of type ?x Beetroot Crop ) 273 ( inventory empty ? t a r g e t i n v ) 274 ( object of type ? target Hypothetical ) 276 : e f f e c t ( and 277 ( not ( inventory empty ? t a r g e t i n v ) ) 278 ( inventory holding ? t a r g e t i n v ? target ) 279 ( not ( object of type ? target Hypothetical ) ) 280 ( object of type ? target Beetroot ) 283 ( : action craft wood plank 284 : parameters (? ingredientinv1 inventory ? t a r g e t i n v inventory ? s t a t i o n object ? ingredient1 object ? targe t object ? t t i l e ) 285 : precondition ( and 286 ( agent at ? t ) 287 ( object at ? s t a t i o n ? t ) 288 ( object of type ? s t a t i o n Work Station ) 289 ( inventory holding ? ingredientinv1 ? ingredient1 ) 290 ( object of type ? ingredient1 Wood) 291 ( inventory empty ? t a r g e t i n v ) 292 ( object of type ? target Hypothetical ) 294 : e f f e c t ( and 295 ( not ( inventory empty ? t a r g e t i n v ) ) 296 ( inventory holding ? t a r g e t i n v ? target ) 297 ( not ( object of type ? target Hypothetical ) ) 298 ( object of type ? target Wood Plank ) 299 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 300 ( inventory empty ? ingredientinv1 ) 301 ( not ( object of type ? ingredient1 Wood) ) 302 ( object of type ? ingredient1 Hypothetical ) 305 ( : action craft arrow 306 : parameters (? ingredientinv1 inventory ? ingredientinv2 inventory ? t a r g e t i n v inventory ? s t a t i o n object ? ingredient1 object ? ingredient2 object ? target object ? t t i l e ) 307 : precondition ( and 308 ( agent at ? t ) 309 ( object at ? s t a t i o n ? t ) 310 ( object of type ? s t a t i o n Work Station ) 311 ( inventory holding ? ingredientinv1 ? ingredient1 ) 312 ( object of type ? ingredient1 Stick ) 313 ( inventory holding ? ingredientinv2 ? ingredient2 ) 314 ( object of type ? ingredient2 Feather ) 315 ( inventory empty ? t a r g e t i n v ) 316 ( object of type ? target Hypothetical ) Published as a conference paper at ICLR 2024 318 : e f f e c t ( and 319 ( not ( inventory empty ? t a r g e t i n v ) ) 320 ( inventory holding ? t a r g e t i n v ? target ) 321 ( not ( object of type ? target Hypothetical ) ) 322 ( object of type ? target Arrow ) 323 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 324 ( inventory empty ? ingredientinv1 ) 325 ( not ( object of type ? ingredient1 Stick ) ) 326 ( object of type ? ingredient1 Hypothetical ) 327 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 328 ( inventory empty ? ingredientinv2 ) 329 ( not ( object of type ? ingredient2 Feather ) ) 330 ( object of type ? ingredient2 Hypothetical ) 333 ( : action craft beetroot soup 0 334 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 336 : precondition ( and 337 ( agent at ? t ) 338 ( object at ? s t a t i o n ? t ) 339 ( object of type ? s t a t i o n Work Station ) 340 ( inventory holding ? ingredientinv1 ? ingredient1 ) 341 ( object of type ? ingredient1 Beetroot ) 342 ( inventory holding ? ingredientinv2 ? ingredient2 ) 343 ( object of type ? ingredient2 Bowl ) 344 ( inventory empty ? t a r g e t i n v ) 345 ( object of type ? target Hypothetical ) 347 : e f f e c t ( and 348 ( not ( inventory empty ? t a r g e t i n v ) ) 349 ( inventory holding ? t a r g e t i n v ? target ) 350 ( not ( object of type ? target Hypothetical ) ) 351 ( object of type ? target Beetroot Soup ) 352 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 353 ( inventory empty ? ingredientinv1 ) 354 ( not ( object of type ? ingredient1 Beetroot ) ) 355 ( object of type ? ingredient1 Hypothetical ) 356 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 357 ( inventory empty ? ingredientinv2 ) 358 ( not ( object of type ? ingredient2 Bowl ) ) 359 ( object of type ? ingredient2 Hypothetical ) 362 ( : action craft paper 0 363 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? t a r g e t i n v inventory ? target object ) 365 : precondition ( and 366 ( agent at ? t ) 367 ( object at ? s t a t i o n ? t ) 368 ( object of type ? s t a t i o n Work Station ) 369 ( inventory holding ? ingredientinv1 ? ingredient1 ) 370 ( object of type ? ingredient1 Sugar Cane ) 371 ( inventory empty ? t a r g e t i n v ) 372 ( object of type ? target Hypothetical ) 374 : e f f e c t ( and 375 ( not ( inventory empty ? t a r g e t i n v ) ) 376 ( inventory holding ? t a r g e t i n v ? target ) 377 ( not ( object of type ? target Hypothetical ) ) 378 ( object of type ? target Paper ) Published as a conference paper at ICLR 2024 379 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 380 ( inventory empty ? ingredientinv1 ) 381 ( not ( object of type ? ingredient1 Sugar Cane ) ) 382 ( object of type ? ingredient1 Hypothetical ) 385 ( : action craft shears2 2 386 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? t a r g e t i n v inventory ? target object ) 388 : precondition ( and 389 ( agent at ? t ) 390 ( object at ? s t a t i o n ? t ) 391 ( object of type ? s t a t i o n Work Station ) 392 ( inventory holding ? ingredientinv1 ? ingredient1 ) 393 ( object of type ? ingredient1 Gold Ingot ) 394 ( inventory empty ? t a r g e t i n v ) 395 ( object of type ? target Hypothetical ) 397 : e f f e c t ( and 398 ( not ( inventory empty ? t a r g e t i n v ) ) 399 ( inventory holding ? t a r g e t i n v ? target ) 400 ( not ( object of type ? target Hypothetical ) ) 401 ( object of type ? target Shears ) 402 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 403 ( inventory empty ? ingredientinv1 ) 404 ( not ( object of type ? ingredient1 Gold Ingot ) ) 405 ( object of type ? ingredient1 Hypothetical ) 408 ( : action craft bowl 1 409 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 411 : precondition ( and 412 ( agent at ? t ) 413 ( object at ? s t a t i o n ? t ) 414 ( object of type ? s t a t i o n Work Station ) 415 ( inventory holding ? ingredientinv1 ? ingredient1 ) 416 ( object of type ? ingredient1 Wood Plank ) 417 ( inventory holding ? ingredientinv2 ? ingredient2 ) 418 ( object of type ? ingredient2 Wood Plank ) 419 ( inventory empty ? t a r g e t i n v ) 420 ( object of type ? target Hypothetical ) 422 : e f f e c t ( and 423 ( not ( inventory empty ? t a r g e t i n v ) ) 424 ( inventory holding ? t a r g e t i n v ? target ) 425 ( not ( object of type ? target Hypothetical ) ) 426 ( object of type ? target Bowl ) 427 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 428 ( inventory empty ? ingredientinv1 ) 429 ( not ( object of type ? ingredient1 Wood Plank ) ) 430 ( object of type ? ingredient1 Hypothetical ) 431 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 432 ( inventory empty ? ingredientinv2 ) 433 ( not ( object of type ? ingredient2 Wood Plank ) ) 434 ( object of type ? ingredient2 Hypothetical ) 437 ( : action craft boat 0 438 : parameters (? t t i l e ? s t a t i o n object ? i n g r e d i e n t i n v inventory ? ingredient object ? t a r g e t i n v inventory ? target object ) Published as a conference paper at ICLR 2024 440 : precondition ( and 441 ( agent at ? t ) 442 ( object at ? s t a t i o n ? t ) 443 ( object of type ? s t a t i o n Work Station ) 444 ( inventory holding ? i n g r e d i e n t i n v ? ingredient ) 445 ( object of type ? ingredient Wood Plank ) 446 ( inventory empty ? t a r g e t i n v ) 447 ( object of type ? target Hypothetical ) 449 : e f f e c t ( and 450 ( not ( inventory empty ? t a r g e t i n v ) ) 451 ( inventory holding ? t a r g e t i n v ? target ) 452 ( not ( object of type ? target Hypothetical ) ) 453 ( object of type ? target Boat ) 454 ( not ( inventory holding ? i n g r e d i e n t i n v ? ingredient ) ) 455 ( inventory empty ? i n g r e d i e n t i n v ) 456 ( not ( object of type ? ingredient Wood Plank ) ) 457 ( object of type ? ingredient Hypothetical ) 460 ( : action craft cooked potato 1 461 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 463 : precondition ( and 464 ( agent at ? t ) 465 ( object at ? s t a t i o n ? t ) 466 ( object of type ? s t a t i o n Work Station ) 467 ( inventory holding ? ingredientinv1 ? ingredient1 ) 468 ( object of type ? ingredient1 Potato ) 469 ( inventory holding ? ingredientinv2 ? ingredient2 ) 470 ( object of type ? ingredient2 Coal ) 471 ( inventory empty ? t a r g e t i n v ) 472 ( object of type ? target Hypothetical ) 474 : e f f e c t ( and 475 ( not ( inventory empty ? t a r g e t i n v ) ) 476 ( inventory holding ? t a r g e t i n v ? target ) 477 ( not ( object of type ? target Hypothetical ) ) 478 ( object of type ? target Cooked Potato ) 479 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 480 ( inventory empty ? ingredientinv1 ) 481 ( not ( object of type ? ingredient1 Potato ) ) 482 ( object of type ? ingredient1 Hypothetical ) 483 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 484 ( inventory empty ? ingredientinv2 ) 485 ( not ( object of type ? ingredient2 Coal ) ) 486 ( object of type ? ingredient2 Hypothetical ) 489 ( : action craft gold ingot 1 490 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 492 : precondition ( and 493 ( agent at ? t ) 494 ( object at ? s t a t i o n ? t ) 495 ( object of type ? s t a t i o n Work Station ) 496 ( inventory holding ? ingredientinv1 ? ingredient1 ) 497 ( object of type ? ingredient1 Gold Ore ) 498 ( inventory holding ? ingredientinv2 ? ingredient2 ) 499 ( object of type ? ingredient2 Coal ) 500 ( inventory empty ? t a r g e t i n v ) Published as a conference paper at ICLR 2024 501 ( object of type ? target Hypothetical ) 503 : e f f e c t ( and 504 ( not ( inventory empty ? t a r g e t i n v ) ) 505 ( inventory holding ? t a r g e t i n v ? target ) 506 ( not ( object of type ? target Hypothetical ) ) 507 ( object of type ? target Gold Ingot ) 508 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 509 ( inventory empty ? ingredientinv1 ) 510 ( not ( object of type ? ingredient1 Gold Ore ) ) 511 ( object of type ? ingredient1 Hypothetical ) 512 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 513 ( inventory empty ? ingredientinv2 ) 514 ( not ( object of type ? ingredient2 Coal ) ) 515 ( object of type ? ingredient2 Hypothetical ) 518 ( : action c r a f t s t i c k 0 519 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? t a r g e t i n v inventory ? target object ) 521 : precondition ( and 522 ( agent at ? t ) 523 ( object at ? s t a t i o n ? t ) 524 ( object of type ? s t a t i o n Work Station ) 525 ( inventory holding ? ingredientinv1 ? ingredient1 ) 526 ( object of type ? ingredient1 Wood Plank ) 527 ( inventory empty ? t a r g e t i n v ) 528 ( object of type ? target Hypothetical ) 530 : e f f e c t ( and 531 ( not ( inventory empty ? t a r g e t i n v ) ) 532 ( inventory holding ? t a r g e t i n v ? target ) 533 ( not ( object of type ? target Hypothetical ) ) 534 ( object of type ? target Stick ) 535 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 536 ( inventory empty ? ingredientinv1 ) 537 ( not ( object of type ? ingredient1 Wood Plank ) ) 538 ( object of type ? ingredient1 Hypothetical ) 541 ( : action craft sword 0 542 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 544 : precondition ( and 545 ( agent at ? t ) 546 ( object at ? s t a t i o n ? t ) 547 ( object of type ? s t a t i o n Work Station ) 548 ( inventory holding ? ingredientinv1 ? ingredient1 ) 549 ( object of type ? ingredient1 Stick ) 550 ( inventory holding ? ingredientinv2 ? ingredient2 ) 551 ( object of type ? ingredient2 Iron Ingot ) 552 ( inventory empty ? t a r g e t i n v ) 553 ( object of type ? target Hypothetical ) 555 : e f f e c t ( and 556 ( not ( inventory empty ? t a r g e t i n v ) ) 557 ( inventory holding ? t a r g e t i n v ? target ) 558 ( not ( object of type ? target Hypothetical ) ) 559 ( object of type ? target Sword ) 560 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 561 ( inventory empty ? ingredientinv1 ) 562 ( not ( object of type ? ingredient1 Stick ) ) Published as a conference paper at ICLR 2024 563 ( object of type ? ingredient1 Hypothetical ) 564 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 565 ( inventory empty ? ingredientinv2 ) 566 ( not ( object of type ? ingredient2 Iron Ingot ) ) 567 ( object of type ? ingredient2 Hypothetical ) 570 ( : action craft bed 1 571 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 573 : precondition ( and 574 ( agent at ? t ) 575 ( object at ? s t a t i o n ? t ) 576 ( object of type ? s t a t i o n Work Station ) 577 ( inventory holding ? ingredientinv1 ? ingredient1 ) 578 ( object of type ? ingredient1 Wood Plank ) 579 ( inventory holding ? ingredientinv2 ? ingredient2 ) 580 ( object of type ? ingredient2 Wool ) 581 ( inventory empty ? t a r g e t i n v ) 582 ( object of type ? target Hypothetical ) 584 : e f f e c t ( and 585 ( not ( inventory empty ? t a r g e t i n v ) ) 586 ( inventory holding ? t a r g e t i n v ? target ) 587 ( not ( object of type ? target Hypothetical ) ) 588 ( object of type ? target Bed) 589 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 590 ( inventory empty ? ingredientinv1 ) 591 ( not ( object of type ? ingredient1 Wood Plank ) ) 592 ( object of type ? ingredient1 Hypothetical ) 593 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 594 ( inventory empty ? ingredientinv2 ) 595 ( not ( object of type ? ingredient2 Wool ) ) 596 ( object of type ? ingredient2 Hypothetical ) 599 ( : action c r a f t i r o n i n g o t 2 600 : parameters (? t t i l e ? s t a t i o n object ? ingredientinv1 inventory ? ingredient1 object ? ingredientinv2 inventory ? ingredient2 object ? t a r g e t i n v inventory ? target object ) 602 : precondition ( and 603 ( agent at ? t ) 604 ( object at ? s t a t i o n ? t ) 605 ( object of type ? s t a t i o n Work Station ) 606 ( inventory holding ? ingredientinv1 ? ingredient1 ) 607 ( object of type ? ingredient1 Iron Ore ) 608 ( inventory holding ? ingredientinv2 ? ingredient2 ) 609 ( object of type ? ingredient2 Coal ) 610 ( inventory empty ? t a r g e t i n v ) 611 ( object of type ? target Hypothetical ) 613 : e f f e c t ( and 614 ( not ( inventory empty ? t a r g e t i n v ) ) 615 ( inventory holding ? t a r g e t i n v ? target ) 616 ( not ( object of type ? target Hypothetical ) ) 617 ( object of type ? target Iron Ingot ) 618 ( not ( inventory holding ? ingredientinv1 ? ingredient1 ) ) 619 ( inventory empty ? ingredientinv1 ) 620 ( not ( object of type ? ingredient1 Iron Ore ) ) 621 ( object of type ? ingredient1 Hypothetical ) 622 ( not ( inventory holding ? ingredientinv2 ? ingredient2 ) ) 623 ( inventory empty ? ingredientinv2 ) Published as a conference paper at ICLR 2024 624 ( not ( object of type ? ingredient2 Coal ) ) 625 ( object of type ? ingredient2 Hypothetical ) Learned Operator Libraries on ALFRED The following shows the full PDDL domain definition including the initial provided vocabulary of symbolic environment constants and predicates, initial pick and place operators, and all ensuing learned operators. 1 ( define ( domain a l f r e d ) 2 ( : requirements : adl 4 ( : types 5 agent loc at io n receptacle object rtype otype 7 ( : constants 8 Candle Type otype 9 Shower Glass Type otype 10 CDType otype 11 Tomato Type otype 12 Mirror Type otype 13 Scrub Brush Type otype 14 Mug Type otype 15 Toaster Type otype 16 Painting Type otype 17 Cell Phone Type otype 18 Ladle Type otype 19 Bread Type otype 20 Pot Type otype 21 Book Type otype 22 Tennis Racket Type otype 23 Butter Knife Type otype 24 Shower Door Type otype 25 Key Chain Type otype 26 Baseball Bat Type otype 27 Egg Type otype 28 Pen Type otype 29 Fork Type otype 30 Vase Type otype 31 Cloth Type otype 32 Window Type otype 33 Pencil Type otype 34 Statue Type otype 35 Light Switch Type otype 36 Watch Type otype 37 Spatula Type otype 38 Paper Towel Roll Type otype 39 Floor Lamp Type otype 40 Kettle Type otype 41 Soap Bottle Type otype 42 Boots Type otype 43 Towel Type otype 44 Pillow Type otype 45 Alarm Clock Type otype 46 Potato Type otype 47 Chair Type otype 48 Plunger Type otype 49 Spray Bottle Type otype 50 Hand Towel Type otype 51 Bathtub Type otype 52 Remote Control Type otype 53 Pepper Shaker Type otype 54 Plate Type otype Published as a conference paper at ICLR 2024 55 Basket Ball Type otype 56 Desk Lamp Type otype 57 Footstool Type otype 58 Glassbottle Type otype 59 Paper Towel Type otype 60 Credit Card Type otype 61 Pan Type otype 62 Toilet Paper Type otype 63 Salt Shaker Type otype 64 Poster Type otype 65 Toilet Paper Roll Type otype 66 Lettuce Type otype 67 Wine Bottle Type otype 68 Knife Type otype 69 Laundry Hamper Lid Type otype 70 Spoon Type otype 71 Tissue Box Type otype 72 Bowl Type otype 73 Box Type otype 74 Soap Bar Type otype 75 House Plant Type otype 76 Newspaper Type otype 77 Cup Type otype 78 Dish Sponge Type otype 79 Laptop Type otype 80 Television Type otype 81 Stove Knob Type otype 82 Curtains Type otype 83 Blinds Type otype 84 Teddy Bear Type otype 85 Apple Type otype 86 Watering Can Type otype 87 Sink Type otype 89 Arm Chair Type rtype 90 Bed Type rtype 91 Bathtub Basin Type rtype 92 Dresser Type rtype 93 Safe Type rtype 94 Dining Table Type rtype 95 Sofa Type rtype 96 Hand Towel Holder Type rtype 97 Stove Burner Type rtype 98 Cart Type rtype 99 Desk Type rtype 100 Coffee Machine Type rtype 101 Microwave Type rtype 102 Toilet Type rtype 103 Counter Top Type rtype 104 Garbage Can Type rtype 105 Coffee Table Type rtype 106 Cabinet Type rtype 107 Sink Basin Type rtype 108 Ottoman Type rtype 109 Toilet Paper Hanger Type rtype 110 Towel Holder Type rtype 111 Fridge Type rtype 112 Drawer Type rtype 113 Side Table Type rtype 114 Shelf Type rtype 115 Laundry Hamper Type rtype 118 ; ; Predicates defined on t h i s domain . Note the types f o r each predicate . Published as a conference paper at ICLR 2024 119 ( : predicates 120 ( at Location ?a agent ? l l o c a t i o n ) 121 ( receptacle At Location ? r receptacle ? l l o c a t i o n ) 122 ( object At Location ?o object ? l l o c a t i o n ) 123 ( in Receptacle ?o object ? r receptacle ) 124 ( receptacle Type ? r receptacle ? t rtype ) 125 ( object Type ?o object ? t otype ) 126 ( holds ?a agent ?o object ) 127 ( holds Any ?a agent ) 128 ( holds Any Receptacle Object ?a agent ) 130 ( openable ? r receptacle ) 131 ( opened ? r receptacle ) 132 ( is Clean ?o object ) 133 ( cleanable ?o object ) 134 ( is Hot ?o object ) 135 ( heatable ?o object ) 136 ( is Cool ?o object ) 137 ( coolable ?o object ) 138 ( toggleable ?o object ) 139 ( is Toggled ?o object ) 140 ( s l i c e a b l e ?o object ) 141 ( i s S li ce d ?o object ) 143 ( : action Pickup Object Not In Receptacle 144 : parameters (?a agent ? l l o c a t i o n ?o object ) 145 : precondition ( and 146 ( at Location ?a ? l ) 147 ( object At Location ?o ? l ) 148 ( not ( holds Any ?a ) ) 149 ( f o r a l l 150 (? re receptacle ) 151 ( not ( in Receptacle ?o ?re ) ) 154 : e f f e c t ( and 155 ( not ( object At Location ?o ? l ) ) 156 ( holds ?a ?o ) 157 ( holds Any ?a ) 161 ( : action Put Object In Receptacle 162 : parameters (?a agent ? l l o c a t i o n ?ot otype ?o object ? r receptacle ) 163 : precondition ( and 164 ( at Location ?a ? l ) 165 ( receptacle At Location ? r ? l ) 166 ( object Type ?o ?ot ) 167 ( holds ?a ?o ) 168 ( not ( holds Any Receptacle Object ?a ) ) 170 : e f f e c t ( and 171 ( in Receptacle ?o ? r ) 172 ( not ( holds ?a ?o ) ) 173 ( not ( holds Any ?a ) ) 174 ( object At Location ?o ? l ) 178 ( : action Pickup Object In Receptacle 179 : parameters (?a agent ? l l o c a t i o n ?o object ? r receptacle ) 180 : precondition ( and 181 ( at Location ?a ? l ) Published as a conference paper at ICLR 2024 182 ( object At Location ?o ? l ) 183 ( in Receptacle ?o ? r ) 184 ( not ( holds Any ?a ) ) 186 : e f f e c t ( and 187 ( not ( object At Location ?o ? l ) ) 188 ( not ( in Receptacle ?o ? r ) ) 189 ( holds ?a ?o ) 190 ( holds Any ?a ) 194 ( : action Rinse Object 2 195 : parameters (? toolreceptacle receptacle ?a agent ? l lo ca ti on ?o object ) 197 : precondition ( and 198 ( receptacle Type ? toolreceptacle Sink Basin Type ) 199 ( at Location ?a ? l ) 200 ( receptacle At Location ? toolreceptacle ? l ) 201 ( object At Location ?o ? l ) 202 ( cleanable ?o ) 204 : e f f e c t ( and 205 ( is Clean ?o ) 209 ( : action Turn On Object 2 210 : parameters (?a agent ? l l o c a t i o n ?o object ) 212 : precondition ( and 213 ( at Location ?a ? l ) 214 ( object At Location ?o ? l ) 215 ( toggleable ?o ) 217 : e f f e c t ( and 218 ( is Toggled ?o ) 222 ( : action Cool Object 0 223 : parameters (? toolreceptacle receptacle ?a agent ? l lo ca ti on ?o object ) 225 : precondition ( and 226 ( receptacle Type ? toolreceptacle Fridge Type ) 227 ( at Location ?a ? l ) 228 ( receptacle At Location ? toolreceptacle ? l ) 229 ( holds ?a ?o ) 231 : e f f e c t ( and 232 ( is Cool ?o ) 235 ( : action Slice Object 1 236 : parameters (? t o o l o b j e c t object ?a agent ? l l o c a t i o n ?o object ) 238 : precondition ( and 239 ( object Type ? t o o l o b j e c t Butter Knife Type ) 240 ( at Location ?a ? l ) 241 ( object At Location ?o ? l ) 242 ( s l i c e a b l e ?o ) 243 ( holds ?a ? t o o l o b j e c t ) Published as a conference paper at ICLR 2024 245 : e f f e c t ( and 246 ( i s S li ce d ?o ) 249 ( : action Slice Object 0 250 : parameters (? t o o l o b j e c t object ?a agent ? l l o c a t i o n ?o object ) 252 : precondition ( and 253 ( object Type ? t o o l o b j e c t Knife Type ) 254 ( at Location ?a ? l ) 255 ( object At Location ?o ? l ) 256 ( s l i c e a b l e ?o ) 257 ( holds ?a ? t o o l o b j e c t ) 259 : e f f e c t ( and 260 ( i s S li ce d ?o ) 263 ( : action Microwave Object 0 264 : parameters (? toolreceptacle receptacle ?a agent ? l lo ca ti on ?o object ) 266 : precondition ( and 267 ( receptacle Type ? toolreceptacle Microwave Type ) 268 ( at Location ?a ? l ) 269 ( receptacle At Location ? toolreceptacle ? l ) 270 ( holds ?a ?o ) 272 : e f f e c t ( and 273 ( is Hot ?o )