Researchers are tackling the persistent challenge of scalable long-horizon planning for robots operating within complex, real-world households. Zhihong Liu from Shanghai Innovation Institute and Xi’an Jiao Tong University, Yang Li, and Rengming Huang from Shanghai Innovation Institute and Shanghai Jiao Tong University, along with Cewu Lu and Panpan Cai from Shanghai Innovation Institute and Shanghai Jiao Tong University, present a novel household task planner called Any House Any Task (AHAT). This work is significant because it addresses the limitations of current methods which struggle with larger environments, lengthy plans, and unclear instructions. AHAT employs a large language model to translate ambiguous human instructions and scene descriptions into actionable subgoals, then utilises symbolic reasoning to create effective long-horizon plans, enhanced by a new reinforcement learning algorithm, TGPO, that refines the reasoning process. Through rigorous experimentation, the team demonstrate that AHAT substantially outperforms existing approaches in executing complex, human-style household tasks.

Researchers are addressing scalability issues in Large Language Models (LLMs) when applied to complex tasks; performance frequently diminishes as environment size, plan length, instruction ambiguity, and constraint complexity increase. AHAT’s central component is an LLM trained to translate task instructions and textual scene graphs into grounded subgoals expressed in the Planning Domain Definition Language (PDDL). These subgoals are then resolved to produce viable and optimal long-horizon plans through explicit symbolic reasoning. The system’s ability to maintain performance under these conditions is a key indicator of its potential for real-world application in dynamic and unpredictable environments. This robustness stems from directly optimising the model to improve subgoal solvability through reinforcement learning. Across benchmark household task-planning environments, Behaviour-1K and PARTNR, AHAT consistently achieved superior performance to the strongest baseline methods, including general-purpose large language models such as GPT-5 and Gemini-3, prompting-based planners like SayPlan and Delta, and learning-based methods including SFT, GRPO, and Reinforce++. Notably, AHAT demonstrated particularly strong results on challenging tasks featuring highly abstract instructions, where baseline methods experienced substantial performance degradation. Performance gains were especially evident when evaluating the system’s ability to handle increasing constraint complexity, plan length, and task ambiguity, suggesting a scalability advantage over existing approaches. The core of AHAT’s success lies in its ability to map task instructions and textual scene graphs into grounded subgoals defined in PDDL. This translation process, coupled with explicit symbolic reasoning, enables the generation of feasible and optimal long-horizon plans. The introduction of TGPO, a novel reinforcement learning algorithm integrating external correction of intermediate reasoning traces into Group Relative Policy Optimisation (GRPO), further enhances the decomposition of complex intentions. This algorithm allows for iterative refinement of the planning process, leading to more accurate and efficient task execution. Scientists introduce TGPO, a novel reinforcement learning algorithm that integrates external correction of intermediate reasoning trace into GRPO to assess the model’s ability to decompose complex and ambiguous intentions. Experiments demonstrate that AHAT achieves significant performance gains over state-of-the-art prompting, planning, and learning methods, particularly in human-style household tasks characterised by brief instructions but requiring complex execution plans. Long-horizon task planning in large household environments is an open challenge for robotics, requiring reasoning over diverse skills with heterogeneous constraints, identifying relevant resources from long-horizon scene contexts, and critically, interpreting abstract human instructions that concisely describe goals without explicitly revealing underlying intentions or subtasks. Prior work, such as SayPlan and DELTA, has studied related settings, but primarily focuses on concrete task instructions that explicitly specify subtasks and desired behaviours, and operates over a small action space with fewer than ten actions. As instruction ambiguity increases and the action space expands, the performance of these approaches degrades rapidly, limiting their applicability to realistic household tasks. To address subgoal-based planning and learning with reasoning trace revision. Rather than predicting low-level action sequences in an end-to-end manner, AHAT predicts a sequence of task subgoals, each formulated as a PDDL goal and solved by an off-the-shelf PDDL planner. The prediction is augmented with chain-of-thought reasoning, where the model first decomposes an abstract task into subtasks, and then predicts a corresponding PDDL goal for each subtask. This design allows the LLM to focus on intention understanding, task decomposition, and grounding subtasks into formal representations, while delegating constraint satisfaction and optimal plan synthesis to the symbolic planner. Standard reinforcement learning often struggles with abstract human tasks, where success criteria are implicit and difficult to specify explicitly. Researchers propose to bootstrap reinforcement learning via external correction of intermediate reasoning traces, where given a failed plan generated by the policy, an auxiliary model revises the reasoning trace of task decomposition, while leaving subgoal generation to the policy. The corrected trace is integrated into learning through token-level constrained sampling, forcing tokens corresponding to the revised decomposition trace during policy rollout, whereas subgoals are generated autoregressively by the policy. Rewards are evaluated on the resulting subgoal plan, and policy gradients are derived accordingly. While PDDL-based LLM planning and RL for large-language models have each been studied extensively alone, this work integrates them, training a model specialised for PDDL-based subgoal planning using reinforcement learning and applying it to abstract, long-horizon household tasks. To realise this integration, the framework is instantiated on top of GRPO, resulting in TGPO, which improves credit assignment by allowing policy gradients to be estimated under corrected intermediate reasoning while preserving on-policy optimisation of subgoals. To support training, a large-scale dataset comprising 50k long-horizon household planning tasks is constructed. Realistic and diverse user instructions are generated by prompting large language models with combinations of 308 household scene graphs, derived from HSSD and Gibson, and 1.6k user personas that vary across age, occupation, and cultural background. The diversity of the dataset supports generalizable model learning. AHAT is evaluated on a diverse suite of household planning tasks, including in-distribution settings with held-out scenes and tasks from the training set, out-of-distribution human tasks collected via crowd-sourcing, as well as two public household task-planning benchmarks, Behaviour-1K and PARTNR. Across all settings, AHAT consistently outperforms the strongest baselines, including general-purpose large language models (e. g. GPT-5 and Gemini-3), prompting-based planners (e. g. SayPlan and Delta), and learning-based methods such as SFT, GRPO, and Reinforce++. Results demonstrate that AHAT scales robustly with increasing constraint complexity, plan length, and task ambiguity. Symbolic planning casts task planning as searching for a sequence of discrete actions under explicit logical constraints. A standard formalism is PDDL, which specifies a planning domain and a problem in a structured, interpretable way. A PDDL planning problem can be written as P = ⟨D, I, G⟩, where the domain D defines the symbolic vocabulary (predicates and operator schemas), while the problem instantiates the domain with a concrete object set and provides an initial state I and a goal condition G. A symbolic state is a set of grounded predicates over the objects. Each grounded action a is applicable in state s if its preconditions Pre(a) hold, i. e. Pre(a) ⊆s; executing a transitions the state by applying its effects Eff(a). A PDDL planner (symbolic planner) seeks an action sequence A = (a1. aN) such that starting from I, the resulting state satisfies G. PDDL enables precise feasibility checking and plan synthesis under complex constraints, but its search complexity grows rapidly with the number of objects and grounded action instances. Recent work increasingly employs LLMs for household task planning, which can be broadly categorized into prompting-based approaches and training-based planners. Prompting-based planning encodes the environment into structured representations (e. g. scene graphs) and queries an LLM to generate plans. SayPlan reduces effective planning horizons via scene-graph representations and iterative replanning. Subsequent work improves long-horizon reasoning by augmenting state and memory, including maintaining historical actions and related objects, retrieval-augmented memory, and in-context task demonstrations for few-shot grounded planning. Other works explore alternative mechanisms for guiding plan generation, including perception, interaction, and interface constraints. Although LLMs exhibit strong commonsense knowledge, prompting-based pipelines typically incur high latency due to repeated LLM queries and performance is upper-bounded as general-purpose LLMs have not been trained for robot task planning. Training-based planners improve household task planning through supervision and reinforcement learning. LLM-Personalize aligns planning with human preferences via demonstrations and reinforced self-training, while Embodied-Reasoner and EmboMatrix adopt staged pipelines that combine supervision learning, and reinforcement learning. Other lines extract executable plans through demonstration-conditioned translation, guide generation with grounded models to ensure realizable actions, or learn affordance-centric value functions linking language to physical feasibility. While these end-to-end, training-based approaches can improve plan feasibility to some extent, their performance degrades significantly as operational constraints become more complex or planning horizons increase. Recent work increasingly combines LLMs and symbolic planners into hybrid LLM-PDDL pipelines for task planning. A common paradigm fixes the PDDL domain and prompts the LLM to translate natural-language instructions into PDDL problems or goals, which are then solved by a PDDL planner. To improve the correctness of PDDL files generated by LLMs, several works adopt iterative refinement. PDDLEGO incrementally constructs and revises PDDL using interaction feedback to enable adaptive planning under partial observability. ISR-LLM, Unidomain and NL2Plan similarly rely on repeated LLM-driven verification and self-correction to improve symbolic feasibility and robustness. BoN-iVML uses Best-of-N sampling to improve correctness. Beyond LLM refinement, other efforts induce human or environment feedback to improve the PDDL file correctness. To reduce the complexity of solving a task in a single pass, DELTA employs an LLM for decomposing long-horizon household tasks into sub-tasks, which are then solved by a symbolic planner. Overall, the above methods largely hinge on the general-purpose LLM’s capability to reliably generate solvable symbolic specifications; consequently, their performance typically degrades as environment size and operational complexity increase. In contrast, AHAT trains the model via reinforcement learning to predict PDDL subgoals, directly optimising the model to improve subgoals’ solvability, and remains robust as environment size and constraint complexity scale. AHAT is a large language model trained to perform long-horizon task planning in large household environments, based on a textual scene graph representation of the environment and abstract natural-language task instructions. Given a scene, task pair, AHAT predicts a sequence of sub-goals expressed in PDDL, which are subsequently solved in sequence using off-the-shelf PDDL planners. The persistent challenge of creating genuinely helpful robots in our homes isn’t about building better grippers or cameras, but about bridging the gap between vague human requests and the millions of tiny, precise actions needed to fulfil them. For years, roboticists have relied on painstakingly hand-coded instructions, a process that’s both brittle and unable to cope with the inherent messiness of real-world environments. Recent enthusiasm for LLMs offered a potential shortcut, promising robots could simply understand what we want. However, that promise has repeatedly run into scalability issues; the more complex the task, or the larger the house, the less reliable these systems become. This work addresses that critical limitation with a system called AHAT, which cleverly combines the intuitive reasoning of LLMs with the precision of symbolic planning. By translating ambiguous instructions into concrete subgoals, and then using established planning algorithms to achieve them, AHAT demonstrates a significant leap in performance, particularly in the kind of everyday household tasks where brevity and flexibility are paramount. The integration of reinforcement learning to refine the LLM’s initial reasoning is a particularly clever touch, allowing the system to learn from its mistakes and improve over time. However, the reliance on pre-defined “planning domains” remains a constraint. While effective within those domains, adapting AHAT to entirely new tasks or environments will still require significant effort. Furthermore, the system’s ability to handle genuinely novel situations, the unexpected object, the blocked pathway, remains an open question. The next step isn’t simply about scaling up the complexity of the plans, but about building systems that can truly reason about the world, anticipating problems and adapting to change with the same common sense that humans take for granted. The convergence of LLMs and symbolic AI is clearly a promising path, but genuine household autonomy still requires a deeper understanding of intelligence itself.

👉 More information
🗞 Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks
🧠 ArXiv: https://arxiv.org/abs/2602.12244

Tags:

group relative policy optimization household environments Large Language Models long-horizon planning Planning Domain Definition Robotics subgoal decomposition symbolic reasoning. Task Planning TGPO

Rohail T.

Latest Posts by Rohail T.:

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently

Protected: Quantum Computing Tackles Fluid Dynamics with a New, Flexible Algorithm

Protected: Silicon Unlocks Potential for Long-Distance Quantum Communication Networks