# symmetryaware_robot_design_with_structured_subgroups__bc64ad96.pdf

Symmetry-Aware Robot Design with Structured Subgroups

Heng Dong 1 Junyu Zhang 2 Tonghan Wang 3 Chongjie Zhang 1

Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently. Previous works on robot design have proven its ability to generate robots for various tasks. However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. Specifically, we represent symmetries with the subgroups of dihedral group and search for the optimal symmetry in structured subgroups. Then robots are designed under the searched symmetry. In this way, SARD can design efficient symmetric robots while covering the original design space, which is theoretically analyzed. We further empirically evaluate SARD on various tasks, and the results show its superior efficiency and generalizability.

1. Introduction

Humans have been dreaming of creating creatures with morphological intelligence for decades (Sims, 1994a;b; Yuan et al., 2021; Gupta et al., 2021b). A promising solution for this challenging problem is to generate robots with various functionalities in simulated environments (Wang et al., 2019; Yuan et al., 2021), in which robots functionalities are largely determined by their designs and control policies. Learning control policies for handcrafted robots with fixed designs has been extensively studied in previous works (Schulman et al., 2017; Fujimoto et al., 2018; Huang et al., 2020; Dong et al., 2022). However, as the other critical com-

1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Huazhong University of Science and Technology 3Harvard University. Correspondence to: Heng Dong <drdhxi@gmail.com>, Chongjie Zhang <zhangchongjie@gmail.com>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

Figure 1. Time-lapse images of robots with different symmetries performing various tasks: (a, c) running forward; (b, d) reaching random goals. Robots designed by prior work (Yuan et al., 2021) do not satisfy any non-trivial symmetry and might be hard to control: (a) the robot deviated from the right direction; (b) the robot missed the goal. Different tasks may require different symmetries: (c) bilateral symmetry is suitable for tasks involving only running forward; (d) radial symmetry for reaching random goals.

ponent, the design of robots has attracted scant attention and achieved limited success in the literature. The field of automatic robot design aims at searching for optimal robot morphologies that can be easily controlled and perform various tasks efficiently. This problem has been a long-lasting challenge, mainly for two reasons: 1) the design space, including skeletal structures and attributes of joints and limbs, is large and combinatorial, and 2) the evaluation of each design requires training and testing an optimal control policy which is often computationally expensive.

For automatic robot design, prior works (Gupta et al., 2021b; Wang et al., 2019) typically adopt evolutionary search (ES) algorithms, where robots are sampled from a large population and learn to perform tasks independently during an iteration. At the end of each iteration, robots with the worst performance are eliminated, and the surviving robots will produce child robots by random mutations to maintain the population. Recently, Yuan et al. (2021) discussed the low sample efficiency problem in ES-based methods, e.g., robots in the population do not share their training experiences and zeroth-order optimization methods such as ES are sampleinefficient for high-dimensional search space (Vemula et al., 2019). They used reinforcement learning (RL) to sample and optimize robot designs by incorporating the design process into the control process and sharing the design and control policies across all robots.

Despite the progress, robots designed by these approaches are intuitively abnormal, empirically hard to control, and

Symmetry-Aware Robot Design

ultimately result in poor performance. Examples are provided in the time-lapse images in Figure 1 (a-b), in which robots designed by prior work (Yuan et al., 2021) perform poorly on different tasks. We hypothesize that the underperformance can be attributable to the fact that most prior works directly search for robots in the whole vast design space without exploiting useful structures that can largely reduce the search space.

To verify this hypothesis, in this paper, we explore utilizing symmetry as the key characteristic to unveil the structure of the design space and hereby reduce learning complexity. Symmetry is one structure commonly observed in biological organisms (Savriama & Klingenberg, 2011), e.g., bilateral symmetry in flies (Evans & Bashaw, 2010), radial symmetry in jellyfish (Abrams et al., 2015), and spherical symmetry in bacteria (Shao et al., 2017). From the learning perspective, symmetry-aware robot design has two advantages. First, it requires searching for much fewer robot designs. If one design turns out to be unsuitable for the current task, other designs from the same symmetry can be searched less frequently as they are likely to be morphologically and functionally similar. Second, symmetric designs can reduce the degree of control required to learn balancing (Raibert, 1986c;a) as in Figure 1 (a)(c). Prior works noticed the benefits of symmetry (Gupta et al., 2021b; Wang et al., 2019), but only bilateral symmetry was considered. Other tasks may require different symmetries. For example, tasks that involve running in different directions require radial symmetry (Figure 1 (d)). However, none of the previous works explored learning suitable robot symmetries for different tasks.

In this paper, we introduce a novel Symmetry-Aware Robot Design (SARD) framework. Realizing this framework involves two major challenges. The first challenge is how to represent symmetries and how to find optimal symmetry. To consider a wide range of symmetries while also avoiding extra learning complexity, we propose to use the subgroups of dihedral group (Gallian, 2021) to represent symmetries. Each subgroup represents a kind of symmetry and a symmetric space. The trivial subgroup containing only the identity group element is exactly the original design space. To find the optimal symmetry efficiently, we utilize the group structures and adopt a simple local search algorithm applied in the structured subgroups for smoothly changing symmetry types to alleviate gradient conflict problem (Liu et al., 2021). The second challenge is how to design robots that satisfy a given symmetry. We propose a novel plug-and-play symmetry transformation module to map any robot design into a given symmetric space. We also provide theoretical analysis to verify that the transformed robot designs are in the given symmetric space and this module can cover the whole space.

We evaluate our SARD framework on six Mu Jo Co (Todorov

et al., 2012) tasks adapted from Gupta et al. (2021b). SARD significantly outperforms previous state-of-the-art algorithms in terms of both sample efficiency and final performance. Performance comparison and the visualization of the symmetry learning process strongly support the effectiveness of our symmetry searching and transformation approaches. Our experimental results highlight the importance of considering various symmetries in robot design.

2. Related Works

Modular RL. Robot design problem typically requires controlling robots with changeable morphologies, where the state and action spaces are incompatible across robots. This new issue cannot be tackled by traditional monolithic policies used in single-agent RL, but fortunately, modular RL that uses a shared policy to control each actuator separately holds the promise to solve it. Most prior works in this field represent the robot s morphology as a graph and use GNNs or message-passing networks as policies (Wang et al., 2018; Pathak et al., 2019; Huang et al., 2020). All of these GNNlike works show the benefits of modular policies over a monolithic policy in tasks tackling different morphologies. Recent works also proposed to use Transformer (Vaswani et al., 2017) to represent policies to overcome the difficulty of message-passing in complex morphologies to further improve performance (Kurin et al., 2020; Gupta et al., 2021a; Dong et al., 2022). In this paper, we build our method based on GNN-like policies for a fair comparison with the previous state-of-the-art baseline Transform2Act (Yuan et al., 2021) but note that our method is a plug-and-play module that can be used to any modular RL policies.

Robot Design. Automatic robot design problem aims at searching for robots that can be easily controlled and can perform various tasks efficiently. A line of works in this field focused only on designing the attributes of robots while ignoring skeletal structures (Ha et al., 2017; Yu et al., 2018; Ha, 2019), which is limited in designing robot morphology. Another line of work considers both attribute design and skeleton design, which is known as combinatorial design optimization. Previous works mainly utilize evolutionary search (ES) algorithms for combinatorial design optimization (Sims, 1994a; Nolfi & Floreano, 2000; Auerbach & Bongard, 2014; Cheney et al., 2018; Jelisavcic et al., 2019; Zhao et al., 2020), which require robots with different structures to perform certain task independently and do not share experiences between robots. This will result in severe sample inefficiency and may require thousands of CPUs to finish one experiment (Gupta et al., 2021b). Recently, Yuan et al. (2021) discussed this issue and used RL to optimize robot designs by incorporating design procedure into the decision-making process and formulating design optimization as learning a conditional policy. This method has shown

Symmetry-Aware Robot Design

Structured Subgroups of Dih%

symmetry axis

𝐺! 𝐺$ 𝐺! is a subgroup of 𝐺$

Initial Subgroup

Skeleton Design

Policy 𝜋+,-.

Attribute Design

Policy 𝜋/011 Skeleton Design

Policy 𝜋+,-.

Symmetry map Δ2

+,-. Symmetry map Δ2

+,-. Symmetry map Δ2

)" (# )# ($ )$

Symmetry map Π2

Skeleton Design Stage Attribute Design Stage

Design Stage

two middle points

current symmetry

designed robot

(𝑎) (𝑐) (𝑏)

Figure 2. Symmetry-aware robot design framework. (a) Search for the optimal symmetry in structured subgroups; (b) design the skeletal structures of the robot; (c) design the attributes of joints and limbs of the robot. Joints in the same color are in the same orbit in (b-c).

great sample efficiency. However, none of the above studies have explored the structure of the design space and searched for robots directly from the vast design space, which may end up with abnormal robots with poor performance. As for works that considered symmetry in robot design, NGE (Wang et al., 2019) only considered bilateral symmetry, while our work considers a wide range of symmetries for various tasks.

Symmetry in Real-World Creatures. Our method that utilizes symmetry as the classification characteristic shares a similar and intriguing principle in real-world creatures, which might have been developed millions of years ago according to fossil evidence (Evans et al., 2020). Certain symmetries are maintained by natural selection pressures and deviation from perfect symmetry is negatively correlated with species fitness (Enquist & Arak, 1994). Raibert (1986b) showed that during a series of bouncing and ballistic motions, symmetry contributes to achieving more complicated running behaviors, e.g., reciprocating leg symmetry is essential to make a quadruped gallop, symmetry of wings can help reduce energy expenditure in flight (Polak & Trivers, 1994). Our experiments in Section 5 also showed similar conclusions that symmetry can help reduce control costs.

3. Preliminaries

In this section, we introduce some background knowledge and notations necessary to present our method.

Problem Settings. We aim to search for a robot design D from a design space D to finish different tasks efficiently. A design D includes the robot s skeletal structure, limbspecific attributes, and joint-specific attributions (e.g., limb

length, size, and motor strength). Formally, a robot design can be represented by a graph D = (V, E, Z), where each node v V represents a joint in robot morphology and each edge e = (vi, vj) E represents a limb connecting two joints vi, vj. Each z Z is the attributes of a joint and the limb attached to this joint, including scalar values and vector values, and |V | = |Z|.

The designed robot then learns control policies to finish tasks by reinforcement learning (RL) algorithms. RL formulates a control problem as an infinite-horizon discounted Markov Decision Process (MDP), which is defined by a tuple M = (S, A, T , ρ0, R, γ), where these items represent state set, action set, transition dynamics, initial state distribution, reward function and discounted factor, respectively. For a fixed robot design and task, the objective of RL is to learn a control policy πC that maximizes the expected total discounted reward: J(πC) = EπC[P

Given that the robot design is changeable during learning, we condition the original transitions dynamics and reward function in M on design D, yielding a more general transition dynamics T (st+1|st, at, D) and reward function R(st, at, D). Hence the reinforcement learning objective is also conditioned on D, and we optimize J(πC, D). Now the design optimization problem can be naturally formulated as a bi-level optimization problem (Sinha et al., 2017; Colson et al., 2007): D = arg max D D J(πC, , D) s.t. πC, = arg maxπC J(πC, D) where the inner optimization problem is typically solved by RL but is especially computationally expensive for changeable robot designs. The outer optimization problem can be solved by evolutionary algorithms (Wang et al., 2019; Gupta et al., 2021b; Sims, 1994a) or RL (Yuan et al., 2021). In this paper, we follow Transform2Act to use RL as the outer problem solver for its

Symmetry-Aware Robot Design

efficiency compared with evolutionary algorithms.

Group Theory. We use group theory to represent different types of symmetries. Here we briefly introduce some notations and please refer to Appendix A.2 for details. A group G is a set with a binary operation such that it has four basic properties, i.e., associativity, closure, the existence of an identity (e G), and the existence of an inverse of each element (g 1 G, g G) (Gallian, 2021). If a subset H of G is also a group under the same operation of G, we refer to it as a subgroup of G and denote it by H < G.

A group action is a function G X X of group G on some space X, and it satisfies ex = x, (gh)x = g(hx), g, h G, x X. For an element g G, we define a transformation function αX g : X X given by x 7 gx, which can be interpreted as a transformation of the point x under group element g. For example, if x is a robot, g could be a rotation transformation of the robot along the z-axis passing through the torso of the robot. The orbit of a point x X is the set of all its transformation under G, denoted by OG(x) = {αX g (x)| g G}. An important property is that X can be partitioned by orbits: X = S

Dihedral Group. The dihedral group is a finite discrete group containing rotation and reflection transformations. A dihedral group Dihn(n 3) can be generated by rotation transformation ρ (counterclockwise rotation by 360 /n) and reflection transformation π (reflection along xaxis). Concretely, Dihn={ρk, πk 1|k=1, 2, , n}, where ρk=ρk, ρ0=ρn=e and πk=ρkπ. Each group element g has multiple representations. We consider permutation representation Pg and matrix representation Mg in this paper. Considering the designed robot in Figure 2(c) and taking π0 Dih4 as an example, Pπ0 exchanges joint v2, v4 and Mπ0 reflects their coordinates along x-axis.

In this paper, we use the subgroups of the dihedral group to represent various symmetries. The subgroups of dihedral groups have three types: (1) Hd= ρd , where 1 d n and d|n (n is divisible by d); (2) Ki= πi , where 0 i n-1; and (3) Hk,l= ρk, πl , where 0 l<k n-1 and k|n. The group structure of Dih4 is shown in Figure 2(a). For more details, please refer to Appendix A.3.

In this section, we present our Symmetry-Aware Robot Design (SARD) scheme that utilizes symmetry as a characteristic to exploit the structure of the design space and reduce the learning complexity. As in previous works on robot design, our learning framework consists of design searching and control policy learning. The focus of this paper is to incorporate symmetry into design search. To this end, our method consists of two major components: (1) searching for the optimal symmetry G (Section 4.2) and (2) learning

robot design under the given symmetry G (Section 4.1).

4.1. Learning Robot Design under a Given Symmetry

Algorithm 1 SARD: Symmetry-Aware Robot Design input group Dihn; number of intervals between symmetries K; symmetry sampling exploration rate ϵ; output symmetry G; design policy πD, control policy πC;

1: Initialize πD and πC; 2: (Sec.4.2): Initialize symmetry G {e}, value dict for symmetries V 0; 3: while not reaching max iterations do 4: Memory M ; 5: while M not reaching batch size do 6: D initial robot design; 7: while in Design Stage do 8: Sample design actions from πD; 9: (Sec. 4.1): Transform design actions with symmetry maps skel G , atrr G , ΠG (Eq. (1) to (3)); 10: Apply design actions to modify design D and store them to M; 11: end while 12: Use πC to control current robot design D and store trajectories to M. 13: end while 14: Update πC, πD with PPO using samples in M; 15: (Sec.4.2): Update V (G) mean episode rewards in M and sample a new symmetry from neighbors G Neighbor(G) with ϵ-greedy using V ; 16: end while

We now describe how to design robots that satisfy a given symmetry G. Symmetry refers to an object that is invariant under some transformations, and every subgroup of the dihedral group Dihn represents a type of symmetry, e.g., the designed robot in Figure 2(c) is invariant under the symmetry represented by K0 which contains a reflection transformation along the x-axis and the identity transformation. As the trivial subgroup that contains only identity transformation is considered, our method also covers the original design space. To formally represent symmetries, we define a G-symmetric property as follows:

Definition 4.1. A robot design D = (V, E, Z) is Gsymmetric if the robot is invariant under the transformation of group G. Specifically, g G, we have Dg = D, where Dg (Vg, Eg, Zg), Vg {αV g (v)|v V }, Eg {αE g (e)|e E}, Zg {αZ g (z)|z Z}.

where αV g , αE g , αZ g transform the design with group element g and formal definitions are given in Appendix A.4. All G-symmetric robots constitute the G-symmetric space. Using the transformation function αV g (v), we can define the orbit of v: OG(v) = {αV g (v)| g G}. For the designed

Symmetry-Aware Robot Design

robot in Figure 2(c), joints v2, v4 are in the same orbit, i.e., OK0(v2) = OK0(v4) = {v2, v4}.

Most RLor ES-based methods in robot design can be roughly divided into two stages: (1) Design Stage, where a new robot with design D is generated from an initial design D0 by a design policy πD or mutated by a random design policy πD; (2) Control Stage, where the generated robot interacts with the environment using control policy πC.

The Design Stage can be further divided into two sub-stages: Skeleton Design Stage and Attribute Design Stage, which generate the skeletal graph (V, E) and attributes Z of the design D, respectively. Thus the design policy πD will also be separated into two sub-policies: skeleton design policy πskel

and attribute design policy πattr, i.e., πD=(πskel, πattr).

Every episode starts with the Skeleton Design Stage by initializing a G-symmetric (where G<Dihn) initial design: D0=(V ={v1, v2, , vn}, E= , Z=0), i.e., the first robot in Figure 2(b). In each time step, each joint v V selects a discrete skeleton design action askel v Askel based on the skeleton design policy πskel(askel v |D, G). Here D is the current design, G is the given symmetry type and Askel is the skeleton action set including three actions: 1) Add Joint: joint v will add a child joint u to the skeletal graph; 2) Del Joint: joint v will remove itself from the skeletal graph if it has no child joints; 3) No Change: no changes will be made to joint v. All joints share the same action space. Then the robot design will transit to D and this sub-stage will last for N skel steps. To ensure that the designed robot is G-symmetric, we propose to keep the robot symmetric in each time step by making joints in the same orbit choose the same skeleton action, e.g., they could all choose the action selected by the joint with the smallest index. Therefore, joints v2, v4 in Figure 2(b) will both use the action selected by v2. This can be realized by a symmetry map skel G : Askel Askel, defined as follows:

skel G (askel v ) = askel µ(OG(v)), v V, (1)

where OG(v) is v s orbit under subgroup G, and µ : P(V ) V returns the element with the smallest index in OG(v). P(V ) is the power set of V . We use { skel G (askel v )}v V to generate a robot. In this way, the new design D is also G-symmetric.

At the Attribute Design Stage, each joint v V chooses a continuous attribute action aattr v Aattr for v and the limb attached to v based on attribute design policy πattr(aattr v |D, G). Here Aattr=Asca Avec is the attribute action set containing scalar values asca v Asca (motor strength, limb size, etc.) and vector values avec v Avec

(limb offset, etc.). Then the robot design will transit to D and this sub-stage will last for N attr steps. Like the first stage, we keep the robot symmetric in each time step to ensure that D is G-symmetric. For the scalar val-

ues, we define a similar symmetry map like Equation (1), attr G : Asca Asca:

attr G (asca v ) = asca µ(OG(v)), v V. (2)

Therefore, joints v7, v8 in Figure 2(c) will both adopt the action selected by v7.

As for the vector values, we propose a novel symmetry map to project them to G-symmetric space. For simplicity, here we assume avec v = (x, y) , where x, y R, only includes one coordinate here and z-value can be learned or set to a default value. The vector value actions of all joints form a matrix: c = (avec 1 , avec 2 , , avec |V |). The coordinate c should be invariant under transformation g G. Directly solving this problem is challenging, and we propose a novel symmetry map ΠG : R2 |V | R2 |V | that can project any c into G-symmetric space:

ΠG(c) = 1 |G|

g G Mgc Pg 1. (3)

where Mg and Pg are the matrix and permutation representations of g. This property is verified in this theorem: Theorem 4.2. The projected vector values ΠG(c) defined in Equation (3) are G-symmetric and, if c is already Gsymmetric, then ΠG(c) = c.

This theorem implies two facts: (1) realizability: the projected vector values are G-symmetric; and (2) completeness: ΠG(c) can cover the whole G-symmetric space. The proof is provided in Appendix A.5 and Figure 2(c) shows the transformation results of K0, which is also derived in Appendix A.5. It is straightforward to extend one coordinate to multiple coordinates by applying Equation (3) coordinatewise. Thus, we can use ΠG(c) as the vector value of attribute design actions to ensure that D is G-symmetric. Putting all symmetry maps together, we prove in Appendix A.6 the following theorem: Theorem 4.3. The transformed design space by transformations skel G (Equation (1)), attr G (Equation (2)) and ΠG (Equation (3)) is equivalent to G-symmetric space.

In the Design Stage, robots do not interact with the environments and will not receive rewards. The design policy is trained using PPO with rewards from the Control Stage.

The Control Stage is the same as the normal robot control problem, except that the control policy πC(a|s, D) is conditioning on the current design. Figure 1(c) shows a trajectory of the designed robots of Figure 2(c) in the Control Stage for Locomotion on Flat Terrain task. To incorporate D in to policies we implement policies πD and πC with graph neural networks (Scarselli et al., 2008; Bruna et al., 2013; Kipf & Welling, 2016), which are optimized by Proximal Policy Optimization (PPO) (Schulman et al., 2017), a standard policy gradient method (Williams, 1992).

Symmetry-Aware Robot Design

)$+( ! !/ *

($', .$# ,$('

)$+( ! !/ *

)$+( ! !/ *

)$+( ! !/ *

( (&(,$(' (' *$ %! !** $'

)$+( ! !/ *

( (&(,$(' (' % , !** $'

)$+( ! !/ *

'$)-% ,! (0

' * ",! ( (, !+, !*"(*& ' ! -'+,*- ,-*!

Figure 3. Training performance of SARD compared against baselines and ablations

4.2. Searching for the Optimal Symmetry

We now discuss how to find the optimal symmetry for differing tasks. One simplest way is to sample several subgroups from Dihn, evaluate them, and choose the subgroup with the highest performance every iteration. However, in this way, the sampled symmetry type in consecutive iterations Gi and Gi+1 might be dissimilar and thus the designed robots under these two symmetries might be vastly different, which may cause gradient conflict problems (Liu et al., 2021; Javaloy & Valera, 2021; Shi et al., 2021) as the control policy πC(a|s, D) is shared across all robots.

To mitigate this problem and smooth the gap between subgroups, we propose a novel search method by exploiting the structure of subgroups as in Figure 2(a). The core idea is to let subgroups Gi and Gi+1 in consecutive iterations be similar by ensuring that they are adjacent in group structure: Gi+1 Neighbor(Gi), which is defined in Appendix A.7. In Figure 2, Neighbor(K0) = {H4, K0, H2,0}.

In practice, we maintain a value dict for all subgroups based on mean episode reward. At the beginning of training, we set initial subgroup G0 = {e} which represents the original design space to avoid introducing any prior knowledge, and then in each iteration i+1, we sample a subgroup from Neighbor(Gi) using ϵ-greedy based on their values. The sampled subgroup Gi+1 is then used to generate robots in Section 4.1 for several episodes. At the end of each iteration, we update the value for the current symmetry.

However, it is possible that Gi, Gi+1 are not similar enough. See Figure 2(a) for intuition. To further smooth the gap between subgroups, we consider the middle points between two adjacent subgroups G, G . Assuming G < G , there is no subgroup between them as we discussed above. However, we only need to ensure that these middle points can be used in Skeleton Design Stage (Equation (1)) and Attribution Design Stage (Equations (2) and (3)). We can

prove that ΠG (c) = β0ΠG(c) + (1 β0)ΠG -G(c) where β0 = |G|/|G | and G -G {g|g G , g / G} (see Appendix A.8). We therefore define

ΠG,G ,β(c) βΠG(c) + (1 β)ΠG -G(c) (4)

where for any β in interval [β0, 1], ΠG,G ,β is the symmetry map of a middle point between G, G . ΠG,G ,β0=ΠG and ΠG,G ,1=ΠG. In practice, we divide the interval equally into K parts and consider the middle points as neighbors. For example, in Figure 2(a), K=3, denoting M2 by 1

3H4 + 2 3K0, we have Neighbor(K0) = { 1

3H2,0 + 2 3K0, K0}. In this way, the designed robots in consecutive iterations are much more similar than before. And we can prove a similar theorem as Theorem 4.2 in Appendix A.9.

We outline our SARD algorithm in Algorithm 1 and we also provide a detailed version in Appendix A.1.

5. Experiments

In this section, we benchmark our method SARD on various Mu Jo Co (Todorov et al., 2012) tasks. We evaluate the effectiveness of SARD by asking the following questions: (1) Can robot design help improve performance compared with handcrafted design? (Section 5.2) (2) Can SARD outperform other robot design baselines in various tasks? (Section 5.2) (3) How does SARD search for the desired symmetry and how does symmetry facilitate control policy learning? (Section 5.3) (4) Can the searched symmetry by SARD generalize to all tasks? (Section 5.4). For qualitative results, please refer to the videos on our anonymous project website1. And our code is available at Git Hub2.

1https://sites.google.com/view/ robot-design 2https://github.com/drdh/SARD

Symmetry-Aware Robot Design

0.45 2.2 4.05

Phase 1 Phase 2 Phase 3 Phase 4

2 3 𝐻$," + 1

1 3 𝐻$," + 2

Transform2Act Best Performance

Handcrafted Robot Best Performance

Figure 4. Visualization of the learning process of SARD. x-axis has been rescaled to better display early phases.

5.1. Experiment Setup

We run experiments on six tasks in this section. All runs are conducted with 4 random seeds and the mean performance as well as 95% confidence intervals are shown.

Environments. All six tasks are adapted from Gupta et al. (2021b), which are created based on Mu Jo Co (Todorov et al., 2012) physics simulator. The tasks vary in objectives (locomotion, approaching random goals, pushing an object,etc.), terrains (flat terrain, variable terrain, bowl-shaped terrain,etc.), and observation space. Here we briefly discuss the tasks: (1) Point Navigation. An agent is spawned at the center of a flat arena and has to reach a random goal in the arena; (2) Escape Bowl. An agent is spawned as the center of a bowl-shaped terrain and has to escape from the region; (3) Patrol. An agent needs to run forth and back between two goals; (4) Locomotion on Variable Terrain. An agent has to run forward as fast as possible in a variable terrain; (5) Locomotion on Flat Terrain. An agent has to run forward as fast as possible in flat terrain; (6) Manipulate Box. An agent is tasked with pushing a box into a randomly generated goal. For more details, please refer to Appendix B.1.

Baselines and Implementations. Our method SARD is implemented on the top of Transform2Act (Yuan et al., 2021), which is the previous state-of-the-art robot design method and is compared in this section. We use the same hyperparameters as Transform2Act for fair comparisons. To show the strength of robot design, we also compare SARD with Handcrafted Robot, which is the human-designed Ant using expert knowledge from Open AI. We did not include any

ES-based methods here because their sample inefficiency has been verified by Yuan et al. (2021).

Ablations. There are two contributions that characterize our method. (1) A plug-and-play transformation module that is used to generate robots under a given symmetry. (2) A search method that utilizes structured subgroups to find the optimal symmetry. Our novelties are mainly about the searching and utilization of symmetries, and these two contributions closely rely on each other. If we remove symmetry, we will get Transform2Act. Therefore, the effectiveness of symmetry-aware robot design can be demonstrated by comparing Transform2Act and SARD. To validate the effectiveness of the searching method, we further design the following ablation studies: (1) SARD (Unstructured). Do not consider the structure of subgroups discussed in Section 4.2 and sample subgroups directly from the whole set. (2) SARD (K=1) and SARD (K=5). Our method divides the interval between two adjacent subgroups into K(=3 in SARD) parts. We use these two ablations to show the effectiveness of smoothing the gap between subgroups.

5.2. Training Performance Comparison

We summarize the training performance in Figure 3 and we also show one representative robot designed by SARD at the end of training in the upper left corner of each subfigure to provide more intuition. SARD outperforms the previous state-of-the-art algorithm Transform2Act and handcrafted robot Ant in all tasks. This validates that (1) the designed robots can improve final performance compared with handcrafted robots and (2) symmetry-aware learning

Symmetry-Aware Robot Design

0 10 20 30 40 50 T (mil)

Episode Reward

Point Navigation

0 10 20 30 40 50 T (mil)

Episode Reward

0 10 20 30 40 50 T (mil)

Episode Reward

0 10 20 30 40 50 T (mil)

Episode Reward

Locomotion on Variable Terrain

0 10 20 30 40 50 T (mil)

Episode Reward

Locomotion on Flat Terrain

0 10 20 30 40 50 T (mil)

Episode Reward

Manipulate Box

SARD (Fix G=H2, 0) SARD (Fix G=H1, 0) SARD (Fix G=K0) SARD (Fix G=K1) Transform2Act

Figure 5. Generalization performance of the symmetry searched by SARD against baselines.

can effectively help solve the robot design problem.

As for ablations, SARD is better than SARD (Unstructured) in all tasks, which shows the usefulness of utilizing subgroup structures. Further, SARD outperforms SARD (K=1) in most tasks, which indicates the effectiveness of smoothing the gap between consecutive subgroups. Interestingly, SARD also outperforms SARD (K=5). This is because dividing the interval between two adjacent subgroups into too many parts will slower the learning such that it will take more iterations to move from one symmetry to another and the training could easily get stuck.

5.3. Robot Design Analysis

To investigate why SARD performs better than baselines and how can symmetry help improve performance, in this subsection, we visualize the learning process of SARD in Patrol task in Figure 4. According to the change of symmetry, the learning process can be divided into four phases. And in each phase, we also show the corresponding symmetry and a representative robot of this phase. In Phase 1, SARD explores some symmetries around the initial subgroup H4 (see Figure 2(a)) and changed symmetries frequently. In Phase 2, SARD chooses a new subgroup H2,0 and the episode reward is still low. In Phase 3, SARD moves to a middle point between two subgroups H2,0 and H1,0 that is closer to H2,0 and the learning curve starts to rise, which indicates the usefulness of the smoothing trick of SARD. In Phase 4, SARD finally finds a suitable subgroup that still lies between two subgroups but closer to H1,0. This subgroup is then left unchanged and SARD optimizes robot designs under this symmetry, in which case the search space is legitimately reduced. The left and right robots in Phase 4 are loaded from the beginning and end of this phase, respectively. Only some attributes of the robots are changed.

We also visualize the time-lapse images of control policies in Figure 4 to show how symmetry can foster learning to

control. Patrol task requires robots to run forth and back between two goals as fast as possible. At t=5M, SARD still has trouble in running to the goal ahead. At t=25M, SARD learns to reach the goal ahead but fails to learn to run backward and Transform2Act also gets stuck at this point. However, SARD eventually exceeds the best performance of Transform2Act at t=30M. This is because, in Patrol task, Transform2Act usually needs to learn to turn around, which can easily lead to a tumble and is hard for robots. But for SARD, thanks to its symmetry, it does not have to turn around and can directly run in the opposite direction, which shows one of the benefits of symmetry.

The visualization results reflect two facts: (1) symmetry in robots is helpful in terms of reducing the control complexity, and (2) SARD can help learn symmetry by the means of searching in structured subgroups.

5.4. Generalization of the Learned Symmetry

To further validate the generalization of the learned symmetry by SARD. We counted all results of SARD in Figure 3 and if the symmetry is a middle point between two subgroups, we choose to count the one closest to it. We find that 3/4 of the experiments ended up with H2,0 and H1,0 (H2,0 accounted for 45.83% and H1,0 accounted for 29.17%). Therefore, in this subsection, we run new experiments that fix symmetry to several potential ones to further test the effectiveness of SARD. The results are summarized in Figure 5. Here K1 and K0 is the bilateral symmetry used in previous works (Gupta et al., 2021b; Wang et al., 2019). SARD (Fix G = H2,0) and SARD (Fix G = H1,0) outperform other symmetries in most tasks, which validate the generalization of the learned symmetries.

Symmetry-Aware Robot Design

6. Conclusions

In this paper, we exploit the structure of design space in the robot design problem with the symmetry characteristics of robot. Our method SARD can generate efficient symmetric robots while still covering the original design space. The theoretical analyses and empirical evaluations of SARD have shown its superior strength.

Acknowledgments

This work is supported in part by Science and Technology Innovation 2030 - New Generation Artificial Intelligence Major Project (No. 2018AAA0100904) and the National Natural Science Foundation of China (62176135).

Abrams, M. J., Basinger, T., Yuan, W., Guo, C.-L., and Goentoro, L. Self-repairing symmetry in jellyfish through mechanically driven reorganization. Proceedings of the National Academy of Sciences, 112(26):E3365 E3373, 2015.

Auerbach, J. E. and Bongard, J. C. Environmental influence on the evolution of morphological complexity in machines. PLo S computational biology, 10(1):e1003399, 2014.

Bruna, J., Zaremba, W., Szlam, A., and Le Cun, Y. Spectral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203, 2013.

Cheney, N., Bongard, J., Sun Spiral, V., and Lipson, H. Scalable co-optimization of morphology and control in embodied machines. Journal of The Royal Society Interface, 15(143):20170937, 2018.

Colson, B., Marcotte, P., and Savard, G. An overview of bilevel optimization. Annals of operations research, 153 (1):235 256, 2007.

Dong, H., Wang, T., Liu, J., and Zhang, C. Low-rank modular reinforcement learning via muscle synergy. ar Xiv preprint ar Xiv:2210.15479, 2022.

Enquist, M. and Arak, A. Symmetry, beauty and evolution. Nature, 372(6502):169 172, 1994.

Evans, S. D., Hughes, I. V., Gehling, J. G., and Droser, M. L. Discovery of the oldest bilaterian from the ediacaran of south australia. Proceedings of the National Academy of Sciences, 117(14):7845 7850, 2020.

Evans, T. A. and Bashaw, G. J. Axon guidance at the midline: of mice and flies. Current opinion in neurobiology, 20(1):79 85, 2010.

Fey, M. and Lenssen, J. E. Fast graph representation learning with pytorch geometric. ar Xiv preprint ar Xiv:1903.02428, 2019.

Fujimoto, S., Hoof, H., and Meger, D. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587 1596. PMLR, 2018.

Gallian, J. A. Contemporary abstract algebra. Chapman and Hall/CRC, 2021.

Graham, J. H., Raz, S., Hel-Or, H., and Nevo, E. Fluctuating asymmetry: methods, theory, and applications. Symmetry, 2(2):466 540, 2010.

Gupta, A., Fan, L., Ganguli, S., and Fei-Fei, L. Metamorph: Learning universal controllers with transformers. In International Conference on Learning Representations, 2021a.

Gupta, A., Savarese, S., Ganguli, S., and Fei-Fei, L. Embodied intelligence via learning and evolution. Nature communications, 12(1):1 12, 2021b.

Ha, D. Reinforcement learning for improving agent design. Artificial life, 25(4):352 365, 2019.

Ha, S., Coros, S., Alspach, A., Kim, J., and Yamane, K. Joint optimization of robot design and motion parameters using the implicit function theorem. In Robotics: Science and systems, volume 8, 2017.

Huang, W., Mordatch, I., and Pathak, D. One policy to control them all: Shared modular policies for agent-agnostic control. In International Conference on Machine Learning, pp. 4455 4464. PMLR, 2020.

Javaloy, A. and Valera, I. Rotograd: Gradient homogenization in multitask learning. ar Xiv preprint ar Xiv:2103.02631, 2021.

Jelisavcic, M., Glette, K., Haasdijk, E., and Eiben, A. Lamarckian evolution of simulated modular robots. Frontiers in Robotics and AI, 6:9, 2019.

Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907, 2016.

Kurin, V., Igl, M., Rockt aschel, T., Boehmer, W., and Whiteson, S. My body is a cage: the role of morphology in graph-based incompatible control. In International Conference on Learning Representations, 2020.

Liu, B., Liu, X., Jin, X., Stone, P., and Liu, Q. Conflictaverse gradient descent for multi-task learning. Advances in Neural Information Processing Systems, 34:18878 18890, 2021.

Symmetry-Aware Robot Design

Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen, J. E., Rattan, G., and Grohe, M. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp. 4602 4609, 2019.

Nolfi, S. and Floreano, D. Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines. MIT press, 2000.

Pappas, J. L., Tiffany, M. A., and Gordon, R. The uncanny symmetry of some diatoms and not of others: A multiscale morphological characteristic and a puzzle for morphogenesis. Diatom Morphogenesis, pp. 19 67, 2021.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.

Pathak, D., Lu, C., Darrell, T., Isola, P., and Efros, A. A. Learning to control self-assembling morphologies: a study of generalization via modularity. Advances in Neural Information Processing Systems, 32, 2019.

Polak, M. and Trivers, R. The science of symmetry in biology. Trends in ecology & evolution, 9(4):122 124, 1994.

Raibert, M. H. Legged robots that balance. MIT press, 1986a.

Raibert, M. H. Running with symmetry. In Autonomous Robot Vehicles, pp. 45 61. Springer, 1986b.

Raibert, M. H. Symmetry in running. Science, 231(4743): 1292 1294, 1986c.

Savriama, Y. and Klingenberg, C. P. Beyond bilateral symmetry: geometric morphometric methods for any type of symmetry. BMC evolutionary biology, 11(1):1 24, 2011.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE transactions on neural networks, 20(1):61 80, 2008.

Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. High-dimensional continuous control using generalized advantage estimation. ar Xiv preprint ar Xiv:1506.02438, 2015.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. ar Xiv preprint ar Xiv:1707.06347, 2017.

Shao, X., Mugler, A., Kim, J., Jeong, H. J., Levin, B. R., and Nemenman, I. Growth of bacteria in 3-d colonies. PLo S computational biology, 13(7):e1005679, 2017.

Shi, Y., Seely, J., Torr, P. H., Siddharth, N., Hannun, A., Usunier, N., and Synnaeve, G. Gradient matching for domain generalization. ar Xiv preprint ar Xiv:2104.09937, 2021.

Sims, K. Evolving virtual creatures. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pp. 15 22, 1994a.

Sims, K. Evolving 3d morphology and behavior by competition. Artificial life, 1(4):353 372, 1994b.

Sinha, A., Malo, P., and Deb, K. A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 22(2):276 295, 2017.

Todorov, E., Erez, T., and Tassa, Y. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026 5033. IEEE, 2012.

van der Pol, E., Worrall, D., van Hoof, H., Oliehoek, F., and Welling, M. Mdp homomorphic networks: Group symmetries in reinforcement learning. Advances in Neural Information Processing Systems, 33:4199 4210, 2020.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. Advances in neural information processing systems, 30, 2017.

Vemula, A., Sun, W., and Bagnell, J. Contrasting exploration in parameter and action space: A zeroth-order optimization perspective. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2926 2935. PMLR, 2019.

Wang, T., Liao, R., Ba, J., and Fidler, S. Nervenet: Learning structured policy with graph neural networks. In International conference on learning representations, 2018.

Wang, T., Zhou, Y., Fidler, S., and Ba, J. Neural graph evolution: Towards efficient automatic robot design. ar Xiv preprint ar Xiv:1906.05370, 2019.

Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229 256, 1992.

Yu, W., Liu, C. K., and Turk, G. Policy transfer with strategy optimization. ar Xiv preprint ar Xiv:1810.05751, 2018.

Yuan, Y., Song, Y., Luo, Z., Sun, W., and Kitani, K. Transform2act: Learning a transform-and-control policy for efficient agent design. ar Xiv preprint ar Xiv:2110.03659, 2021.

Symmetry-Aware Robot Design

Zhao, A., Xu, J., Konakovi c-Lukovi c, M., Hughes, J., Spielberg, A., Rus, D., and Matusik, W. Robogrammar: graph grammar for terrain-optimized robot design. ACM Transactions on Graphics (TOG), 39(6):1 16, 2020.

Symmetry-Aware Robot Design

A. Method Details

A.1. Algorithm

We implement SARD based on Transform2Act in Section 5 and outline the algorithm in Algorithm 2. The highlighted lines are the changed procedures Transform2Act. The main components of this algorithm are:

Line 2, 29-30: Initialize and update current symmetry type;

Line 7-12: Skeleton Design Stage, in which SARD applies skeleton symmetry map skel G (defined in Equation (1)) to transform skeleton design actions.

Line 13-19: Attribute Design Stage, in which SARD applies attribute symmetry map attr G (defined in Equation (2)) and ΠG (defined in Equation (4)) to transform attribute design actions.

Line 20-26: Control Stage.

Algorithm 2 SARD: Symmetry-Aware Robot Design implemented based on Transform2Act input group Dihn; number of intervals between symmetries K; symmetry sampling exploration rate ϵ; skeleton design steps N skel; attribute design steps N attr; max steps of each episode H; output symmetry G; design policy πD = (πskel, πattr), control policy πC;

1: Initialize πD and πC; 2: Initialize symmetry G {e}; initialize value dict for symmetries V 0; 3: while not reaching max iterations do 4: Memory M ; 5: while M not reaching batch size do 6: D0 initial robot design; 7: for t = 1, 2, . . . , N skel do 8: Sample skeleton design action askel v,t πskel; 9: Apply skeleton symmetry map skel G defined in Equation (1) to transform askel v,t ; 10: Dt+1 apply {askel v,t }v Vt to modify skeleton (Vt, Et) in Dt; 11: r0 0; store (rt, {askel v,t }v Vt, Dt) into M; 12: end for 13: for t = N skel+1, . . . , N skel+N attr do 14: Sample attribute design action (asca v,t , avec v,t ) = aattr v,t πattr; 15: Apply attribute symmetry map for scalar values atrr G defined in Equation (2) to transform asca v,t ; 16: Apply attribute symmetry map for vector values ΠG defined in Equation (4) to transform avec v,t ; 17: Dt+1 apply {aattr v,t }v Vt to modify attributes Zt in Dt; 18: rt 0; store (rt, {aattr v,t }v Vt, Dt) into M; 19: end for 20: st+1 initial environment state; 21: for t = N skel+N attr+1, . . . , H do 22: Sample control actions a C t πC; 23: st+1 environment dynamics T (st+1|st, a C t , Dt); 24: rt environment reward R(st, a C t , Dt); Dt+1 Dt; 25: Store (rt, a C t , st, Dt) into M; 26: end for 27: end while 28: Update πC, πD with PPO using samples in M; 29: Update symmetry values V (G) mean episode rewards in M; 30: Sample a new symmetry from neighbors G Neighbor(G) with ϵ-greedy using symmetry values V ; 31: end while

A.2. Group Theory

A group G is a set with a binary operation ( ) such that it has four basic mathematical properties (Gallian, 2021):

Symmetry-Aware Robot Design

associativity: g1 (g2 g3) = (g1 g2) g3, g1, g2, g3 G.

closure: g1 g2 G, g1, g2 G.

the existence of an identity: e G

the existence of an inverse of each element: g 1 G, g G.

A.3. Dihedral Groups

Definition. The dihedral group is a finite discrete group containing rotation and reflection transformation. A dihedral group Dihn(n 3) can be generated by rotation transformation ρ (counterclockwise rotation by 360 /n) and reflection transformation π (reflection along x-axis), i.e., Dihn = ρ, π|ρn = π2 = 1, πρπ 1 = ρ 1 . Concretely, Dihn = {ρk, πk 1|k = 1, 2, , n}, where ρk = ρk, ρ0 = ρn = e and πk = ρkπ.

Group Element Representation. Each group element can have multiple representations. We consider permutation representation and matrix representation in this paper. The permutation and matrix representations of g( Dih4)) are list below:

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

, Mρ0 = 1 0 0 1

0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0

, Mρ1 = 0 1 1 0

0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0

, Mρ2 = 1 0 0 1

0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0

, Mρ3 = 0 1 1 0

1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0

, Mπ0 = 1 0 0 1

0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0

, Mπ1 = 0 1 1 0

0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1

, Mπ2 = 1 0 0 1

0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0

, Mπ3 = 0 1 1 0

Here we take π0 Dih4 as an example. Pπ0 exchanges 2, 4-columns and keeps the other unchanged when right multiplied with coordinate pairs (R2 4), and Mπ0 reflects the coordinates along x-axis when left multiplied with the original coordinates. Taking the designed robot in Figure 2(c) as an example, Pπ0 exchanges joint v2, v4 and Mπ0 reflects their coordinates along x-axis.

Subgroups. The subgroups of dihedral groups can be classed into three categories:

1. Hd = ρd , where 1 d n and d|n (n is divisible by d).

2. Ki = πi , where 0 i n-1.

3. Hk,l = ρk, πl , where 0 l < k n-1 and k|n.

The group structure of Dih4 is shown in Figure 2(a). An interesting result is that the number of the subgroups of Dihn is P

w W (1 + w), where W is the set of all positive divisors of n. For example, W = {1, 2, 4} for Dih4 and thus it has 10 subgroups.

Symmetry-Aware Robot Design

We list all subgroups of Dih4 below:

H1 = {ρ0, ρ1, ρ2, ρ3},

H2 = {ρ0, ρ2},

K0 = {ρ0, π0},

K1 = {ρ0, π1},

K2 = {ρ0, π2},

K3 = {ρ0, π3},

H1,0 = {ρ0, ρ1, ρ2, ρ3, π0, π1, π2, π3},

H2,0 = {ρ0, ρ2, π0, π2},

H2,1 = {ρ0, ρ2, π1, π3}.

Extend the Dimension of Pg. The original permutation representation Pg {0, 1}n n, we extend it to {0, 1}|V | |V |. In the design D = {V, E, Z}, the number of joints near the torso is exactly n. And for other joints in layer k, we only need to use their parent joints permutation representation. Specifically, we select the corresponding rows and columns of permutation representation belonging to their parent joints and denote it by Pk, then the new permutation is:

Pg P2 P3 ... PJ

where J is the total number of layers (see Appendix A.4 for details).

A.4. Formal Definitions of the Transformation Functions

We now define the transformation functions used in G-symmetric in Definition 4.1: αV g (v), αE g (e), αZ g (z).

Based on the distance to the root node (torso in robots), nodes V can be divided into several disjoint subsets: V = SJ j=1 Vj,

where the distance of v Vj to root node is j. And thus the attribute set Z can also be divided: Z = SJ j=1 Zj. Note that each attribute corresponds to a joint, thus we always have |Vj| = |Zj|, j. Assuming the current dihedral group is Dihn, to ensure the designed robot is G-symmetric (G<Dihn), we let |V1| = n by setting the initial design D0 = (V = {v1, v2, , vn}, E = , Z = 0) and forbidding growing any new joints from the root node (torso). We further set the embedding of vj to δj {0, 1}1 n, where only the j-th row of δj is 1. And define Enc(vj) = δj, Dec(δj) = vj, we have αV g (v) = Dec(Enc(v) Pg), v V1, (5)

where Pg is the permutation representation of g. For the nodes in other layers, the embeddings are the of their parent nodes and αV g the same. Using αV g , we have:

αE g (e) = αE g ((vi, vj)) = (αV g (vi), αV g (vj)), e E. (6)

As for αZ g (z), where z = (zsca, zvec), the scalar value zsca is unchanged and the vector value zvec is transformed using Mg, i.e., the matrix representation of g:

αZ g (z) = αZ g ((zsca, zvec)) = (zsca, Mgzvec). (7)

A.5. Proof of Theorem 4.2 of ΠG

To prove Theorem 4.2, we first discuss the feasible symmetric space of avec v , then prove the symmetry map can project any design into this symmetric space and leave any design in the symmetric space unchanged. For simplicity, here we assume

Symmetry-Aware Robot Design

avec v = (x, y) , where x, y R, only includes one coordinate. The vector value actions of all joints form a matrix:

c = (avec 1 , avec 2 , , avec |V |) = x1 x2 , x|V | y1 y2 , y|V |

The coordinate c should be invariant under transformation g G. That is, the transformed coordinate c =Mgc should be the same as c, where Mg is the matrix representation of g. In Figure 2(c), we only consider the first four joints here and have

c = Mπ0c = 1 0 0 1

x1 x2 x3 x4 y1 y2 y3 y4

= x1 x2 x3 x4 y1 y2 y3 y4

However, the order of the items in c and c might not be aligned, e.g., v2 in c should be aligned with v4 in c. To solve this issue, we inversely permutate the order of the items in c by left multiplying it with the permutation representation of g 1: Pg 1. Since π 1 0 = π0, we have

c Pπ0 = x1 x4 x3 x2 y1 y4 y3 y2

Note that the original permutation representation Pg 1 {0, 1}n n, we extend it to {0, 1}|V | |V | in Appendix A.3. Finally we need to ensure that Mgc Pg 1 = c, g G. That is,

c CG {c R2 |V ||Mgc Pg 1 = c, g G}. (11)

Now we prove that The symmetry map ΠG defined in Equation (3) projects any coordinate c R2 |V | into the set CG defined in Equation (11). And if c CG, the symmetry map will leave c unchanged. Similar proofs are provided in van der Pol et al. (2020) for a different purpose.

For h G, c CG,

MhΠG(c)Ph 1 = Mh

g G Mgc Pg 1

g G Mh Mgc Pg 1Ph 1 (13)

g G Mhgc Pg 1h 1 (14)

g G Mhgc P(hg) 1 (15)

h 1r G Mrc P(r) 1 (16)

r h G Mrc P(r) 1 (17)

r G Mrc P(r) 1 (18)

= ΠG(c). (19)

Thus we have ΠG(c) CG.

Symmetry-Aware Robot Design

As for the other property, g G, c CG, we have Mgc Pg 1 = c, that is Mgc = c P 1 g 1 = c Pg. This follows:

ΠG(c) = 1 |G|

g G Mgc Pg 1 (20)

g G c Pg Pg 1 (21)

thus c CG, ΠG(c) = c.

Combining these two properties, Theorem 4.2 is proved.

An Example for Symmetry Map ΠG. We provide an example of K0 of symmetry map ΠG here. In Figure 2(c), note that K0 = {π0, e} and we thus have

Mπ0c Pπ 1 0 + Mec Pe 1 (24)

x1 x4 x3 x2 y1 y4 y3 y2

+ x1 x2 x3 x4 y1 y2 y3 y4

= x1 (x2 + x4)/2 x3 (x2 + x4)/2 0 (y2 y4)/2 0 (y4 y2)/2

A.6. Proof of Theorem 4.3

Define Ξ ( skel G , attr G , ΠG), where skel G is defined in Equation (1), attr G is defined in Equation (2), ΠG is defined in Equation (3). Here we prove the equivalence between transformed design space by transformations Ξ and G-symmetric space.

In Section 4.1, we have shown that any transformed robots by Ξ are G-symmetric. Thus we only need to show that any G-symmetric robots can be generated by Ξ. To prove this, a sufficient condition is the fixing property of Ξ, i.e., Ξ will leave any G-symmetric robots unchanged. Since we have proved the fixing property of ΠG, here we only need to show that skel G and attr G also have this property.

In a G-symmetric robot design D = {V, E, Z}, for any joint v V and any joint u OG(v), v and u must have the same number of child joints and same attributes (according to the definition of G-symmetric). Assuming D is generated layer by layer and the current design is D = (V , E , Z ), in layer j, v Vj, we have that all joints in OG(v) can choose the same skeleton action and scalar attribute action as D. That is askel v = Add Joint if P

(v,u) E,(v,u)/ E 1 > 0 else No Change, and aatrr v = zv Z. Thus the design is left unchanged.

A.7. Neighbors of a Subgroup

G Neighbor(G), we have

G < G or G < G ;

if G < G and H, G < H < G , we have G = H or G = H;

if G < G and H, G < H < G, we have G = H or G = H;

A.8. Derivation of ΠG,G ,β

Here we prove that

ΠG (c) = β0ΠG(c) + (1 β0)ΠG -G(c) (27)

Symmetry-Aware Robot Design

where β0 = |G|/|G | and G -G {g|g G , g / G}.

ΠG (c) = 1 |G|

g G Mgc Pg 1 (28)

|G | 1 |G |

g G Mgc Pg 1 + |G | |G|

|G | 1 |G | |G|

g G G Mgc Pg 1 (29)

|G |ΠG(c) + |G | |G|

|G | ΠG -G(c) (30)

= β0ΠG(c) + (1 β0)ΠG -G(c). (31)

A.9. A Theorem for ΠG,G ,β

For symmetry map ΠG,G ,β defined in Equation (4), we also have a similar theorem as Theorem 4.2 in the following:

Theorem A.1. The projected vector values ΠG,G ,β defined in Equation (4) are G-symmetric. And if c is already G - symmetric, ΠG,G ,β(c) = c.

For h G, c CG,

MhΠG,G ,βPh 1 = Mh (βΠG(c) + (1 β)ΠG -G(c)) Ph 1 (32)

g G Mgc Pg 1 + (1 β) |G | |G|

g G G Mgc Pg 1

g G Mh Mgc Pg 1Ph 1 + (1 β) |G | |G|

g G G Mh Mgc Pg 1Ph 1 (34)

g G Mhgc P(hg) 1 + (1 β) |G | |G|

g G G Mhgc P(hg) 1 (35)

h 1r G Mrc Pr 1 + (1 β) |G | |G|

h 1r G G Mrc Pr 1 (36)

r h G Mrc Pr 1 + (1 β) |G | |G|

r h(G G) Mrc Pr 1 (37)

r G Mrc Pr 1 + (1 β) |G | |G|

r G G Mrc Pr 1 (38)

= βΠG(c) + (1 β)ΠG -G(c) (39)

= ΠG,G ,β(c). (40)

Here in Equation (38), h G = G is a basic property of group theory, and we prove h(G -G) = G -G. Any element in h(G -G) can be represented by hg where g G -G. If hg / G -G, we have hg G. Assuming g1 G such that hg = g1, we have g = h 1g1 G, which leads to a contradiction and thus hg G -G. On the other hand, g2 G -G, we have g2 = h(h 1g2) and only need to show that h 1g2 G -G. Otherwise, assuming h 1g2 / G -G, we have h 1g2 G and thus g2 G, which also leads to a contradiction and thus g2 h(G -G). In conclusion, h(G -G) = G -G

Therefore, we have ΠG,G ,β(c) CG.

Symmetry-Aware Robot Design

As for the fixing property, g G, c CG , we have Mgc Pg 1 = c, that is Mgc = c P 1 g 1 = c Pg. This follows:

ΠG,G ,β(c) = β |G |

g G Mgc Pg 1 + (1 β) |G | |G|

g G G Mgc Pg 1 (41)

g G c Pg Pg 1 + (1 β) |G | |G|

g G G c Pg Pg 1 (42)

g G c + (1 β) |G | |G|

g G G c (43)

thus c CG , ΠG,G ,β(c) = c.

B. Experiment Details

B.1. Details of the Tasks

Point Navigation Escape Bowl Patrol

Locomotion on Variable Terrain Locomotion on Flat Terrain Manipulate Box

Figure 6. Visualization of the training tasks adapted from Gupta et al. (2021b)

We run experiments on six Mu Jo Co (Todorov et al., 2012) tasks adapted from Gupta et al. (2021b). These tasks can be categorized into 3 domains to test agility (Patrol, Point Navigation), stability (Escape Bowl,Locomotion on Variable Terrain,Locomotion on Flat Terrain), and manipulation (Manipulate Box) abilities of the designed robots. The detailed descriptions of these tasks are listed below.

Point Navigation. The agent is generated at the center of a 100 100 m2 flat arena and needs to reach a random goal (red square in Figure 6) in this arena. The ability to move in any specified direction quickly leads to success in this task. At each time step, the agent receives a reward rt shown below,

rt = wagdag wc a 2

where dag is the geodesic distance difference in the current time step and previous time step between the agent and the goal, wag = 100, wc = 0.001, wc is a penalty term for action a.

Escape Bowl. Generated at the center of a bowl-shaped terrain surrounded by small hills, the agent has to escape from the hilly region. This task requires the agent to maximize the geodesic distance from the initial location while going through a

Symmetry-Aware Robot Design

random hilly terrain. At each time step, the agent receives a reward rt shown below,

rt = wddas wc a 2

where das is the geodesic distance difference in the current time step and previous time step between the agent and the initial location, wd = 1, wc = 0.001.

Patrol. In the task, the agent is required to run back and forth between two target locations at a distance of 10 meters along the x-axis. Quick change of direction when the goal (red square in Figure 6) alters and rapid movement leads to the success of this task. The reward function is similar to the point navigation task. Additionally, we flip the goal location and provide the agent a sparse reward of 10 as it is within 0.5m from the goal location.

Locomotion on Variable Terrain. At the beginning of an episode, an agent is generated on the one end of a 100 100 m2

square arena. By randomly sampling a sequence of obstacles from a uniform distribution over a predefined range of parameter values, we can build a brand new terrain in each episode. While the length of flat segments in variable terrain l [1, 3]m along the desired direction of motion, the length of obstacle segments l [4, 8]m. We primarily utilize 3 types of obstacles. 1) Hills, which is parameterized by the amplitude a of sin wave in which a [0.6, 1.2]m. 2) Steps, a sequence of 8 steps of height 0.2m. 3) Rubble, a sequence of random bumps (small hills) created by clipping a repeating triangular sawtooth wave at the top and height h of each bump clip samples from [0.2, 0.3]m stochastically. The goal of the agent is to maximize forward displacement over an episode and this environment is quite challenging for the agent to perform well.

Locomotion on Flat Terrain. Similar to locomotion on variable terrain task, an agent is initialized on the one end of a 150 150 m2 square arena and aims at maximizing forward displacement over an episode.

Manipulate Box. In a 60 40 m2 arena similar to variable terrain, the agent is required to move a box (small cube with 0.2m shown in Figure 6) from the initial position to the target place (red square). Both the initial box location and final target location are randomly chosen with constraints that lead to a further path to the destination in each episode.

B.2. Implementation of SARD

We implement SARD based on Transform2Act (Yuan et al., 2021), which uses GNN-based (Scarselli et al., 2008; Bruna et al., 2013; Kipf & Welling, 2016) control policies. GNN-based policies can deal with variable input sizes across different robot designs by sharing parameters between joints. This property allows us to share policies across all designed robots. Note that our method is general and can be combined with any other network structures used in modular RL, e.g., message passing networks (Huang et al., 2020) and Transformers (Dong et al., 2022; Kurin et al., 2020).

However, this sharing also brings negative impacts, e.g., joints in similar states will choose similar actions, which may severely hinder performance. To solve this problem, Transform2Act proposed to add a joint-specialized MLP (JSMLP) after the GNNs. We follow this setting for fair comparisons.

For design policy and control policy learning, we use Proximal Policy Optimization (PPO) (Schulman et al., 2017), a standard policy gradient method (Williams, 1992) for optimizing these two policies.

Here we provide the hyperparameters needed to replicate our experiments in Table 1, and we also include our codes in the supplementary.

Experiments are carried out on NVIDIA GTX 2080 Ti GPUs. Taking Point Navigation as an example, SARD requires approximately 10G of RAM and 4G of video memory and takes about 36 hours to finish 50M timesteps of training.

B.3. Details of Baselines

Transform2Act. We use the official implementation of Transform2Act, where all networks and optimizations are implemented with Py Torch (Paszke et al., 2019). The GNN layers are Graph Conv (Morris et al., 2019) implemented in Py Torch Geometric package (Fey & Lenssen, 2019). All policies are optimized with PPO (Schulman et al., 2017) with generalized advantage estimation (GAE) (Schulman et al., 2015). The authors searched the hyperparameters and we also list the selected values in Table 1. We also removed the initial design of Transform2Act to avoid any prior knowledge.

Handcrafted Robot. To show the strength of robot design, we also compare SARD with Handcrafted Robot, which is the human-designed robot Ant2 using expert knowledge from Open AI. We directly load the XML file and skip the Design Stage.

2https://github.com/openai/gym/blob/master/gym/envs/mujoco/assets/ant.xml

Symmetry-Aware Robot Design

Table 1. Hyperparameters of SARD and Transform2Act. Hyperparameters Value Skeleton Design Stage Time Steps N skel 5 Attribute Design Stage Time Steps N attr 1 GNN Layer Type Graph Conv JSMLP Activation Function Tanh GNN Size (64,64,64) JSMLP Size (128,128) Policy Learning Rate 5e-5 Value Learning Rate 3e-4 PPO Clip 0.2 PPO Batch Size 50000 PPO Mini Bach Size 2048 PPO Iterations Per Batch 10 Training Epochs 1000 Discount Factor γ 0.995 GAE λ 0.95 Subgroup Exploration Rate ϵ 0.01

Table 2. Training performance of SARD based on different base algorithms.

NGE Transform2Act SARD+NGE SARD+Transform2Act

Point Navigation 1131.50 458.45 1618.10 1022.01 4729.00 835.62 4262.78 738.17 Escape Bowl 8.65 1.88 32.37 13.62 15.55 3.69 88.61 13.23 Patrol 1120.30 425.89 1995.95 709.07 3104.67 1082.03 3116.47 801.43 Locomotion on Variable Terrain 170.85 48.05 443.22 74.72 408.65 74.95 1204.01 96.16 Locomotion on Flat Terrain 238.65 86.75 1067.16 463.55 835.75 366.25 2438.26 297.09 Manipulate Box 1061.90 541.47 1073.11 467.38 1793.00 27.80 1604.27 137.72

The Control Stage and optimization are the same as ours. We run 50M time steps for it and show the best performance.

B.4. Details of Ablations

SARD (Unstructured). The Neighbor(Gi) function at iteration i is set to contain all subgroups and other components are the same as SARD.

SARD (K=5). Divide the interval between two adjacent subgroups into K = 5 parts and keep others the same as SARD.

SARD (K=1). Do not divide the interval between two adjacent subgroups and only use original group structures to define Neighbor(Gi) function at iteration i.

C. Extra Results

C.1. Combine SARD with Other Robot Design Method

Our method is a plug-and-play module that can be utilized in other robot design methods. Here we provide experimental results of combining our method (SARD) with NGE (Wang et al., 2019), which is an ES-based robot design method. We report the results in Table 2. Here SARD+NGE is the implementation of SARD based on NGE, and SARD+Transform2Act is our original implementation of SARD based on Transform2Act, denoted by SARD (K=3) in our paper. All runs are conducted with 3 random seeds and each data item in the table is formatted as mean std . As shown in Table 2, SARD+NGE outperforms vanilla NGE in all tasks and is even better than our original implementation SARD+Transform2Act in two tasks. The performance improvement of SARD+Base Algo over Base Algo (Base Algo {NGE, Transform2Act}) showcases the generality of SARD.

Symmetry-Aware Robot Design

Table 3. Training performance of SARD compared with Transform2Act on its original tasks.

Swimmer 2D Locomotion Gap Crosser

Transform2Act 607.50 89.02 3329.00 2094.56 1352.20 558.07 SARD 975.50 9.10 3194.00 1695.06 1824.43 1322.30

C.2. Results of SARD on the Tasks Used in Transform2Act

For a complete comparison, we also provide extra results on the tasks used in Transform2Act (Yuan et al., 2021). We show the final performance comparison between our method SARD and Transform2Act in Table 3. Here SARD is our original implementation of SARD based on Transform2Act, denoted by SARD (K=3) in our paper. All runs are conducted with 3 random seeds and each data item in the table is formatted as mean std . The results that SARD outperforms Transform2Act in most tasks further validate the strength of our method. Also, please note that the 3D Locomotion task in their paper is similar to Locomotion on Flat Terrain in Table 2, thus we omit this task here. The reported results of Transform2Act are based on their released code.

0 10 20 30 40 50 T (mil)

Episode Reward

Point Navigation

0 10 20 30 40 50 T (mil)

Episode Reward

0 10 20 30 40 50 T (mil)

Episode Reward

0 10 20 30 40 50 T (mil)

Episode Reward

Locomotion on Variable Terrain

0 10 20 30 40 50 T (mil)

Episode Reward

Locomotion on Flat Terrain

0 10 20 30 40 50 T (mil)

Episode Reward

Manipulate Box

SARD (Ours, n {4}) SARD (n {3, 4, 5}) SARD (n {3}) SARD (n {5})

Figure 7. Hyperparameter search of Dihn.

C.3. Different Dihedral Groups

In this paper, we use the subgroups of the dihedral group Dihn to represent various symmetries and in Section 5, we set the hyperparameter n to 4. Here we conduct a hyperparameter search to verify this choice. The result is shown in Figure 7.

SARD(n {3}), SARD(n {4}), SARD(n {5}) are SARD with different dihedral group, i.e., Dih3, Dih4, Dih5, respectively. As for SARD(n {3, 4, 5}), we use these three groups simultaneously by regarding group elements with the same matrix representations as neighbors.

SARD outperforms all others in most tasks and is only a little worse than SARD(n {5} in Point Navigation task. This result validates our hyperparameter choice.

D. Discussions of Dihedral Groups

In this paper, we use Dihedral groups to describe the symmetry of robots mainly for two reasons. (1) The Dihedral groups are generally enough to represent a wide range of symmetries of the robot s morphologies. This is because the Dihedral groups are generated by basic reflectional and rotational symmetries, which can describe the characteristics of most effective robot morphologies. Besides, related works from biology (Savriama & Klingenberg, 2011; Pappas et al., 2021; Graham et al., 2010) also use the Dihedral group as an effective tool to study the symmetry of real-world creatures. (2) Using larger groups may bring in extra learning complexity and lead to poor performance, even though larger groups can contain more symmetries. In general, the Dihedral group is a good trade-off between expressiveness and complexity

Symmetry-Aware Robot Design

E. Limitations

In this paper, we use the subgroups of dihedral groups Dihn to represent a wide range of symmetries while still avoiding extra learning complexity. Our method has shown superior efficiency, but the dihedral group is a 2D symmetry group and only contains transformations in the xy-plane. Perhaps because of the influence of gravity, dihedral groups are enough for representing the symmetries of most real-world creatures. However, it is still worthwhile to explore 3D symmetry groups (Savriama & Klingenberg, 2011) in the virtual robot design problem, which might be a promising future work.

In addition, although the idea of symmetry can be applied to a wide range of tasks, it may not be suitable for tasks that do not require symmetry, such as single-arm robotic manipulation tasks where we need to design a robot arm as well as its gripper for a particular manipulation task. Intuitively, efficient designs are mostly asymmetric in these situations, and a symmetry constraint might prevent the arm and manipulator from operating in a more effective way, thus hindering training.