# symmetryaware_robot_design_with_structured_subgroups__bc64ad96.pdf Symmetry-Aware Robot Design with Structured Subgroups Heng Dong 1 Junyu Zhang 2 Tonghan Wang 3 Chongjie Zhang 1 Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently. Previous works on robot design have proven its ability to generate robots for various tasks. However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. Specifically, we represent symmetries with the subgroups of dihedral group and search for the optimal symmetry in structured subgroups. Then robots are designed under the searched symmetry. In this way, SARD can design efficient symmetric robots while covering the original design space, which is theoretically analyzed. We further empirically evaluate SARD on various tasks, and the results show its superior efficiency and generalizability. 1. Introduction Humans have been dreaming of creating creatures with morphological intelligence for decades (Sims, 1994a;b; Yuan et al., 2021; Gupta et al., 2021b). A promising solution for this challenging problem is to generate robots with various functionalities in simulated environments (Wang et al., 2019; Yuan et al., 2021), in which robots functionalities are largely determined by their designs and control policies. Learning control policies for handcrafted robots with fixed designs has been extensively studied in previous works (Schulman et al., 2017; Fujimoto et al., 2018; Huang et al., 2020; Dong et al., 2022). However, as the other critical com- 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Huazhong University of Science and Technology 3Harvard University. Correspondence to: Heng Dong , Chongjie Zhang . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). Figure 1. Time-lapse images of robots with different symmetries performing various tasks: (a, c) running forward; (b, d) reaching random goals. Robots designed by prior work (Yuan et al., 2021) do not satisfy any non-trivial symmetry and might be hard to control: (a) the robot deviated from the right direction; (b) the robot missed the goal. Different tasks may require different symmetries: (c) bilateral symmetry is suitable for tasks involving only running forward; (d) radial symmetry for reaching random goals. ponent, the design of robots has attracted scant attention and achieved limited success in the literature. The field of automatic robot design aims at searching for optimal robot morphologies that can be easily controlled and perform various tasks efficiently. This problem has been a long-lasting challenge, mainly for two reasons: 1) the design space, including skeletal structures and attributes of joints and limbs, is large and combinatorial, and 2) the evaluation of each design requires training and testing an optimal control policy which is often computationally expensive. For automatic robot design, prior works (Gupta et al., 2021b; Wang et al., 2019) typically adopt evolutionary search (ES) algorithms, where robots are sampled from a large population and learn to perform tasks independently during an iteration. At the end of each iteration, robots with the worst performance are eliminated, and the surviving robots will produce child robots by random mutations to maintain the population. Recently, Yuan et al. (2021) discussed the low sample efficiency problem in ES-based methods, e.g., robots in the population do not share their training experiences and zeroth-order optimization methods such as ES are sampleinefficient for high-dimensional search space (Vemula et al., 2019). They used reinforcement learning (RL) to sample and optimize robot designs by incorporating the design process into the control process and sharing the design and control policies across all robots. Despite the progress, robots designed by these approaches are intuitively abnormal, empirically hard to control, and Symmetry-Aware Robot Design ultimately result in poor performance. Examples are provided in the time-lapse images in Figure 1 (a-b), in which robots designed by prior work (Yuan et al., 2021) perform poorly on different tasks. We hypothesize that the underperformance can be attributable to the fact that most prior works directly search for robots in the whole vast design space without exploiting useful structures that can largely reduce the search space. To verify this hypothesis, in this paper, we explore utilizing symmetry as the key characteristic to unveil the structure of the design space and hereby reduce learning complexity. Symmetry is one structure commonly observed in biological organisms (Savriama & Klingenberg, 2011), e.g., bilateral symmetry in flies (Evans & Bashaw, 2010), radial symmetry in jellyfish (Abrams et al., 2015), and spherical symmetry in bacteria (Shao et al., 2017). From the learning perspective, symmetry-aware robot design has two advantages. First, it requires searching for much fewer robot designs. If one design turns out to be unsuitable for the current task, other designs from the same symmetry can be searched less frequently as they are likely to be morphologically and functionally similar. Second, symmetric designs can reduce the degree of control required to learn balancing (Raibert, 1986c;a) as in Figure 1 (a)(c). Prior works noticed the benefits of symmetry (Gupta et al., 2021b; Wang et al., 2019), but only bilateral symmetry was considered. Other tasks may require different symmetries. For example, tasks that involve running in different directions require radial symmetry (Figure 1 (d)). However, none of the previous works explored learning suitable robot symmetries for different tasks. In this paper, we introduce a novel Symmetry-Aware Robot Design (SARD) framework. Realizing this framework involves two major challenges. The first challenge is how to represent symmetries and how to find optimal symmetry. To consider a wide range of symmetries while also avoiding extra learning complexity, we propose to use the subgroups of dihedral group (Gallian, 2021) to represent symmetries. Each subgroup represents a kind of symmetry and a symmetric space. The trivial subgroup containing only the identity group element is exactly the original design space. To find the optimal symmetry efficiently, we utilize the group structures and adopt a simple local search algorithm applied in the structured subgroups for smoothly changing symmetry types to alleviate gradient conflict problem (Liu et al., 2021). The second challenge is how to design robots that satisfy a given symmetry. We propose a novel plug-and-play symmetry transformation module to map any robot design into a given symmetric space. We also provide theoretical analysis to verify that the transformed robot designs are in the given symmetric space and this module can cover the whole space. We evaluate our SARD framework on six Mu Jo Co (Todorov et al., 2012) tasks adapted from Gupta et al. (2021b). SARD significantly outperforms previous state-of-the-art algorithms in terms of both sample efficiency and final performance. Performance comparison and the visualization of the symmetry learning process strongly support the effectiveness of our symmetry searching and transformation approaches. Our experimental results highlight the importance of considering various symmetries in robot design. 2. Related Works Modular RL. Robot design problem typically requires controlling robots with changeable morphologies, where the state and action spaces are incompatible across robots. This new issue cannot be tackled by traditional monolithic policies used in single-agent RL, but fortunately, modular RL that uses a shared policy to control each actuator separately holds the promise to solve it. Most prior works in this field represent the robot s morphology as a graph and use GNNs or message-passing networks as policies (Wang et al., 2018; Pathak et al., 2019; Huang et al., 2020). All of these GNNlike works show the benefits of modular policies over a monolithic policy in tasks tackling different morphologies. Recent works also proposed to use Transformer (Vaswani et al., 2017) to represent policies to overcome the difficulty of message-passing in complex morphologies to further improve performance (Kurin et al., 2020; Gupta et al., 2021a; Dong et al., 2022). In this paper, we build our method based on GNN-like policies for a fair comparison with the previous state-of-the-art baseline Transform2Act (Yuan et al., 2021) but note that our method is a plug-and-play module that can be used to any modular RL policies. Robot Design. Automatic robot design problem aims at searching for robots that can be easily controlled and can perform various tasks efficiently. A line of works in this field focused only on designing the attributes of robots while ignoring skeletal structures (Ha et al., 2017; Yu et al., 2018; Ha, 2019), which is limited in designing robot morphology. Another line of work considers both attribute design and skeleton design, which is known as combinatorial design optimization. Previous works mainly utilize evolutionary search (ES) algorithms for combinatorial design optimization (Sims, 1994a; Nolfi & Floreano, 2000; Auerbach & Bongard, 2014; Cheney et al., 2018; Jelisavcic et al., 2019; Zhao et al., 2020), which require robots with different structures to perform certain task independently and do not share experiences between robots. This will result in severe sample inefficiency and may require thousands of CPUs to finish one experiment (Gupta et al., 2021b). Recently, Yuan et al. (2021) discussed this issue and used RL to optimize robot designs by incorporating design procedure into the decision-making process and formulating design optimization as learning a conditional policy. This method has shown Symmetry-Aware Robot Design Structured Subgroups of Dih% symmetry axis 𝐺! 𝐺$ 𝐺! is a subgroup of 𝐺$ Initial Subgroup Skeleton Design Policy 𝜋+,-. Attribute Design Policy 𝜋/011 Skeleton Design Policy 𝜋+,-. Symmetry map Δ2 +,-. Symmetry map Δ2 +,-. Symmetry map Δ2 )" (# )# ($ )$ Symmetry map Π2 Skeleton Design Stage Attribute Design Stage Design Stage two middle points current symmetry designed robot (𝑎) (𝑐) (𝑏) Figure 2. Symmetry-aware robot design framework. (a) Search for the optimal symmetry in structured subgroups; (b) design the skeletal structures of the robot; (c) design the attributes of joints and limbs of the robot. Joints in the same color are in the same orbit in (b-c). great sample efficiency. However, none of the above studies have explored the structure of the design space and searched for robots directly from the vast design space, which may end up with abnormal robots with poor performance. As for works that considered symmetry in robot design, NGE (Wang et al., 2019) only considered bilateral symmetry, while our work considers a wide range of symmetries for various tasks. Symmetry in Real-World Creatures. Our method that utilizes symmetry as the classification characteristic shares a similar and intriguing principle in real-world creatures, which might have been developed millions of years ago according to fossil evidence (Evans et al., 2020). Certain symmetries are maintained by natural selection pressures and deviation from perfect symmetry is negatively correlated with species fitness (Enquist & Arak, 1994). Raibert (1986b) showed that during a series of bouncing and ballistic motions, symmetry contributes to achieving more complicated running behaviors, e.g., reciprocating leg symmetry is essential to make a quadruped gallop, symmetry of wings can help reduce energy expenditure in flight (Polak & Trivers, 1994). Our experiments in Section 5 also showed similar conclusions that symmetry can help reduce control costs. 3. Preliminaries In this section, we introduce some background knowledge and notations necessary to present our method. Problem Settings. We aim to search for a robot design D from a design space D to finish different tasks efficiently. A design D includes the robot s skeletal structure, limbspecific attributes, and joint-specific attributions (e.g., limb length, size, and motor strength). Formally, a robot design can be represented by a graph D = (V, E, Z), where each node v V represents a joint in robot morphology and each edge e = (vi, vj) E represents a limb connecting two joints vi, vj. Each z Z is the attributes of a joint and the limb attached to this joint, including scalar values and vector values, and |V | = |Z|. The designed robot then learns control policies to finish tasks by reinforcement learning (RL) algorithms. RL formulates a control problem as an infinite-horizon discounted Markov Decision Process (MDP), which is defined by a tuple M = (S, A, T , ρ0, R, γ), where these items represent state set, action set, transition dynamics, initial state distribution, reward function and discounted factor, respectively. For a fixed robot design and task, the objective of RL is to learn a control policy πC that maximizes the expected total discounted reward: J(πC) = EπC[P Given that the robot design is changeable during learning, we condition the original transitions dynamics and reward function in M on design D, yielding a more general transition dynamics T (st+1|st, at, D) and reward function R(st, at, D). Hence the reinforcement learning objective is also conditioned on D, and we optimize J(πC, D). Now the design optimization problem can be naturally formulated as a bi-level optimization problem (Sinha et al., 2017; Colson et al., 2007): D = arg max D D J(πC, , D) s.t. πC, = arg maxπC J(πC, D) where the inner optimization problem is typically solved by RL but is especially computationally expensive for changeable robot designs. The outer optimization problem can be solved by evolutionary algorithms (Wang et al., 2019; Gupta et al., 2021b; Sims, 1994a) or RL (Yuan et al., 2021). In this paper, we follow Transform2Act to use RL as the outer problem solver for its Symmetry-Aware Robot Design efficiency compared with evolutionary algorithms. Group Theory. We use group theory to represent different types of symmetries. Here we briefly introduce some notations and please refer to Appendix A.2 for details. A group G is a set with a binary operation such that it has four basic properties, i.e., associativity, closure, the existence of an identity (e G), and the existence of an inverse of each element (g 1 G, g G) (Gallian, 2021). If a subset H of G is also a group under the same operation of G, we refer to it as a subgroup of G and denote it by H < G. A group action is a function G X X of group G on some space X, and it satisfies ex = x, (gh)x = g(hx), g, h G, x X. For an element g G, we define a transformation function αX g : X X given by x 7 gx, which can be interpreted as a transformation of the point x under group element g. For example, if x is a robot, g could be a rotation transformation of the robot along the z-axis passing through the torso of the robot. The orbit of a point x X is the set of all its transformation under G, denoted by OG(x) = {αX g (x)| g G}. An important property is that X can be partitioned by orbits: X = S Dihedral Group. The dihedral group is a finite discrete group containing rotation and reflection transformations. A dihedral group Dihn(n 3) can be generated by rotation transformation ρ (counterclockwise rotation by 360 /n) and reflection transformation π (reflection along xaxis). Concretely, Dihn={ρk, πk 1|k=1, 2, , n}, where ρk=ρk, ρ0=ρn=e and πk=ρkπ. Each group element g has multiple representations. We consider permutation representation Pg and matrix representation Mg in this paper. Considering the designed robot in Figure 2(c) and taking π0 Dih4 as an example, Pπ0 exchanges joint v2, v4 and Mπ0 reflects their coordinates along x-axis. In this paper, we use the subgroups of the dihedral group to represent various symmetries. The subgroups of dihedral groups have three types: (1) Hd= ρd , where 1 d n and d|n (n is divisible by d); (2) Ki= πi , where 0 i n-1; and (3) Hk,l= ρk, πl , where 0 l 0 else No Change, and aatrr v = zv Z. Thus the design is left unchanged. A.7. Neighbors of a Subgroup G Neighbor(G), we have G < G or G < G ; if G < G and H, G < H < G , we have G = H or G = H; if G < G and H, G < H < G, we have G = H or G = H; A.8. Derivation of ΠG,G ,β Here we prove that ΠG (c) = β0ΠG(c) + (1 β0)ΠG -G(c) (27) Symmetry-Aware Robot Design where β0 = |G|/|G | and G -G {g|g G , g / G}. ΠG (c) = 1 |G| g G Mgc Pg 1 (28) |G | 1 |G | g G Mgc Pg 1 + |G | |G| |G | 1 |G | |G| g G G Mgc Pg 1 (29) |G |ΠG(c) + |G | |G| |G | ΠG -G(c) (30) = β0ΠG(c) + (1 β0)ΠG -G(c). (31) A.9. A Theorem for ΠG,G ,β For symmetry map ΠG,G ,β defined in Equation (4), we also have a similar theorem as Theorem 4.2 in the following: Theorem A.1. The projected vector values ΠG,G ,β defined in Equation (4) are G-symmetric. And if c is already G - symmetric, ΠG,G ,β(c) = c. For h G, c CG, MhΠG,G ,βPh 1 = Mh (βΠG(c) + (1 β)ΠG -G(c)) Ph 1 (32) g G Mgc Pg 1 + (1 β) |G | |G| g G G Mgc Pg 1 g G Mh Mgc Pg 1Ph 1 + (1 β) |G | |G| g G G Mh Mgc Pg 1Ph 1 (34) g G Mhgc P(hg) 1 + (1 β) |G | |G| g G G Mhgc P(hg) 1 (35) h 1r G Mrc Pr 1 + (1 β) |G | |G| h 1r G G Mrc Pr 1 (36) r h G Mrc Pr 1 + (1 β) |G | |G| r h(G G) Mrc Pr 1 (37) r G Mrc Pr 1 + (1 β) |G | |G| r G G Mrc Pr 1 (38) = βΠG(c) + (1 β)ΠG -G(c) (39) = ΠG,G ,β(c). (40) Here in Equation (38), h G = G is a basic property of group theory, and we prove h(G -G) = G -G. Any element in h(G -G) can be represented by hg where g G -G. If hg / G -G, we have hg G. Assuming g1 G such that hg = g1, we have g = h 1g1 G, which leads to a contradiction and thus hg G -G. On the other hand, g2 G -G, we have g2 = h(h 1g2) and only need to show that h 1g2 G -G. Otherwise, assuming h 1g2 / G -G, we have h 1g2 G and thus g2 G, which also leads to a contradiction and thus g2 h(G -G). In conclusion, h(G -G) = G -G Therefore, we have ΠG,G ,β(c) CG. Symmetry-Aware Robot Design As for the fixing property, g G, c CG , we have Mgc Pg 1 = c, that is Mgc = c P 1 g 1 = c Pg. This follows: ΠG,G ,β(c) = β |G | g G Mgc Pg 1 + (1 β) |G | |G| g G G Mgc Pg 1 (41) g G c Pg Pg 1 + (1 β) |G | |G| g G G c Pg Pg 1 (42) g G c + (1 β) |G | |G| g G G c (43) thus c CG , ΠG,G ,β(c) = c. B. Experiment Details B.1. Details of the Tasks Point Navigation Escape Bowl Patrol Locomotion on Variable Terrain Locomotion on Flat Terrain Manipulate Box Figure 6. Visualization of the training tasks adapted from Gupta et al. (2021b) We run experiments on six Mu Jo Co (Todorov et al., 2012) tasks adapted from Gupta et al. (2021b). These tasks can be categorized into 3 domains to test agility (Patrol, Point Navigation), stability (Escape Bowl,Locomotion on Variable Terrain,Locomotion on Flat Terrain), and manipulation (Manipulate Box) abilities of the designed robots. The detailed descriptions of these tasks are listed below. Point Navigation. The agent is generated at the center of a 100 100 m2 flat arena and needs to reach a random goal (red square in Figure 6) in this arena. The ability to move in any specified direction quickly leads to success in this task. At each time step, the agent receives a reward rt shown below, rt = wagdag wc a 2 where dag is the geodesic distance difference in the current time step and previous time step between the agent and the goal, wag = 100, wc = 0.001, wc is a penalty term for action a. Escape Bowl. Generated at the center of a bowl-shaped terrain surrounded by small hills, the agent has to escape from the hilly region. This task requires the agent to maximize the geodesic distance from the initial location while going through a Symmetry-Aware Robot Design random hilly terrain. At each time step, the agent receives a reward rt shown below, rt = wddas wc a 2 where das is the geodesic distance difference in the current time step and previous time step between the agent and the initial location, wd = 1, wc = 0.001. Patrol. In the task, the agent is required to run back and forth between two target locations at a distance of 10 meters along the x-axis. Quick change of direction when the goal (red square in Figure 6) alters and rapid movement leads to the success of this task. The reward function is similar to the point navigation task. Additionally, we flip the goal location and provide the agent a sparse reward of 10 as it is within 0.5m from the goal location. Locomotion on Variable Terrain. At the beginning of an episode, an agent is generated on the one end of a 100 100 m2 square arena. By randomly sampling a sequence of obstacles from a uniform distribution over a predefined range of parameter values, we can build a brand new terrain in each episode. While the length of flat segments in variable terrain l [1, 3]m along the desired direction of motion, the length of obstacle segments l [4, 8]m. We primarily utilize 3 types of obstacles. 1) Hills, which is parameterized by the amplitude a of sin wave in which a [0.6, 1.2]m. 2) Steps, a sequence of 8 steps of height 0.2m. 3) Rubble, a sequence of random bumps (small hills) created by clipping a repeating triangular sawtooth wave at the top and height h of each bump clip samples from [0.2, 0.3]m stochastically. The goal of the agent is to maximize forward displacement over an episode and this environment is quite challenging for the agent to perform well. Locomotion on Flat Terrain. Similar to locomotion on variable terrain task, an agent is initialized on the one end of a 150 150 m2 square arena and aims at maximizing forward displacement over an episode. Manipulate Box. In a 60 40 m2 arena similar to variable terrain, the agent is required to move a box (small cube with 0.2m shown in Figure 6) from the initial position to the target place (red square). Both the initial box location and final target location are randomly chosen with constraints that lead to a further path to the destination in each episode. B.2. Implementation of SARD We implement SARD based on Transform2Act (Yuan et al., 2021), which uses GNN-based (Scarselli et al., 2008; Bruna et al., 2013; Kipf & Welling, 2016) control policies. GNN-based policies can deal with variable input sizes across different robot designs by sharing parameters between joints. This property allows us to share policies across all designed robots. Note that our method is general and can be combined with any other network structures used in modular RL, e.g., message passing networks (Huang et al., 2020) and Transformers (Dong et al., 2022; Kurin et al., 2020). However, this sharing also brings negative impacts, e.g., joints in similar states will choose similar actions, which may severely hinder performance. To solve this problem, Transform2Act proposed to add a joint-specialized MLP (JSMLP) after the GNNs. We follow this setting for fair comparisons. For design policy and control policy learning, we use Proximal Policy Optimization (PPO) (Schulman et al., 2017), a standard policy gradient method (Williams, 1992) for optimizing these two policies. Here we provide the hyperparameters needed to replicate our experiments in Table 1, and we also include our codes in the supplementary. Experiments are carried out on NVIDIA GTX 2080 Ti GPUs. Taking Point Navigation as an example, SARD requires approximately 10G of RAM and 4G of video memory and takes about 36 hours to finish 50M timesteps of training. B.3. Details of Baselines Transform2Act. We use the official implementation of Transform2Act, where all networks and optimizations are implemented with Py Torch (Paszke et al., 2019). The GNN layers are Graph Conv (Morris et al., 2019) implemented in Py Torch Geometric package (Fey & Lenssen, 2019). All policies are optimized with PPO (Schulman et al., 2017) with generalized advantage estimation (GAE) (Schulman et al., 2015). The authors searched the hyperparameters and we also list the selected values in Table 1. We also removed the initial design of Transform2Act to avoid any prior knowledge. Handcrafted Robot. To show the strength of robot design, we also compare SARD with Handcrafted Robot, which is the human-designed robot Ant2 using expert knowledge from Open AI. We directly load the XML file and skip the Design Stage. 2https://github.com/openai/gym/blob/master/gym/envs/mujoco/assets/ant.xml Symmetry-Aware Robot Design Table 1. Hyperparameters of SARD and Transform2Act. Hyperparameters Value Skeleton Design Stage Time Steps N skel 5 Attribute Design Stage Time Steps N attr 1 GNN Layer Type Graph Conv JSMLP Activation Function Tanh GNN Size (64,64,64) JSMLP Size (128,128) Policy Learning Rate 5e-5 Value Learning Rate 3e-4 PPO Clip 0.2 PPO Batch Size 50000 PPO Mini Bach Size 2048 PPO Iterations Per Batch 10 Training Epochs 1000 Discount Factor γ 0.995 GAE λ 0.95 Subgroup Exploration Rate ϵ 0.01 Table 2. Training performance of SARD based on different base algorithms. NGE Transform2Act SARD+NGE SARD+Transform2Act Point Navigation 1131.50 458.45 1618.10 1022.01 4729.00 835.62 4262.78 738.17 Escape Bowl 8.65 1.88 32.37 13.62 15.55 3.69 88.61 13.23 Patrol 1120.30 425.89 1995.95 709.07 3104.67 1082.03 3116.47 801.43 Locomotion on Variable Terrain 170.85 48.05 443.22 74.72 408.65 74.95 1204.01 96.16 Locomotion on Flat Terrain 238.65 86.75 1067.16 463.55 835.75 366.25 2438.26 297.09 Manipulate Box 1061.90 541.47 1073.11 467.38 1793.00 27.80 1604.27 137.72 The Control Stage and optimization are the same as ours. We run 50M time steps for it and show the best performance. B.4. Details of Ablations SARD (Unstructured). The Neighbor(Gi) function at iteration i is set to contain all subgroups and other components are the same as SARD. SARD (K=5). Divide the interval between two adjacent subgroups into K = 5 parts and keep others the same as SARD. SARD (K=1). Do not divide the interval between two adjacent subgroups and only use original group structures to define Neighbor(Gi) function at iteration i. C. Extra Results C.1. Combine SARD with Other Robot Design Method Our method is a plug-and-play module that can be utilized in other robot design methods. Here we provide experimental results of combining our method (SARD) with NGE (Wang et al., 2019), which is an ES-based robot design method. We report the results in Table 2. Here SARD+NGE is the implementation of SARD based on NGE, and SARD+Transform2Act is our original implementation of SARD based on Transform2Act, denoted by SARD (K=3) in our paper. All runs are conducted with 3 random seeds and each data item in the table is formatted as mean std . As shown in Table 2, SARD+NGE outperforms vanilla NGE in all tasks and is even better than our original implementation SARD+Transform2Act in two tasks. The performance improvement of SARD+Base Algo over Base Algo (Base Algo {NGE, Transform2Act}) showcases the generality of SARD. Symmetry-Aware Robot Design Table 3. Training performance of SARD compared with Transform2Act on its original tasks. Swimmer 2D Locomotion Gap Crosser Transform2Act 607.50 89.02 3329.00 2094.56 1352.20 558.07 SARD 975.50 9.10 3194.00 1695.06 1824.43 1322.30 C.2. Results of SARD on the Tasks Used in Transform2Act For a complete comparison, we also provide extra results on the tasks used in Transform2Act (Yuan et al., 2021). We show the final performance comparison between our method SARD and Transform2Act in Table 3. Here SARD is our original implementation of SARD based on Transform2Act, denoted by SARD (K=3) in our paper. All runs are conducted with 3 random seeds and each data item in the table is formatted as mean std . The results that SARD outperforms Transform2Act in most tasks further validate the strength of our method. Also, please note that the 3D Locomotion task in their paper is similar to Locomotion on Flat Terrain in Table 2, thus we omit this task here. The reported results of Transform2Act are based on their released code. 0 10 20 30 40 50 T (mil) Episode Reward Point Navigation 0 10 20 30 40 50 T (mil) Episode Reward 0 10 20 30 40 50 T (mil) Episode Reward 0 10 20 30 40 50 T (mil) Episode Reward Locomotion on Variable Terrain 0 10 20 30 40 50 T (mil) Episode Reward Locomotion on Flat Terrain 0 10 20 30 40 50 T (mil) Episode Reward Manipulate Box SARD (Ours, n {4}) SARD (n {3, 4, 5}) SARD (n {3}) SARD (n {5}) Figure 7. Hyperparameter search of Dihn. C.3. Different Dihedral Groups In this paper, we use the subgroups of the dihedral group Dihn to represent various symmetries and in Section 5, we set the hyperparameter n to 4. Here we conduct a hyperparameter search to verify this choice. The result is shown in Figure 7. SARD(n {3}), SARD(n {4}), SARD(n {5}) are SARD with different dihedral group, i.e., Dih3, Dih4, Dih5, respectively. As for SARD(n {3, 4, 5}), we use these three groups simultaneously by regarding group elements with the same matrix representations as neighbors. SARD outperforms all others in most tasks and is only a little worse than SARD(n {5} in Point Navigation task. This result validates our hyperparameter choice. D. Discussions of Dihedral Groups In this paper, we use Dihedral groups to describe the symmetry of robots mainly for two reasons. (1) The Dihedral groups are generally enough to represent a wide range of symmetries of the robot s morphologies. This is because the Dihedral groups are generated by basic reflectional and rotational symmetries, which can describe the characteristics of most effective robot morphologies. Besides, related works from biology (Savriama & Klingenberg, 2011; Pappas et al., 2021; Graham et al., 2010) also use the Dihedral group as an effective tool to study the symmetry of real-world creatures. (2) Using larger groups may bring in extra learning complexity and lead to poor performance, even though larger groups can contain more symmetries. In general, the Dihedral group is a good trade-off between expressiveness and complexity Symmetry-Aware Robot Design E. Limitations In this paper, we use the subgroups of dihedral groups Dihn to represent a wide range of symmetries while still avoiding extra learning complexity. Our method has shown superior efficiency, but the dihedral group is a 2D symmetry group and only contains transformations in the xy-plane. Perhaps because of the influence of gravity, dihedral groups are enough for representing the symmetries of most real-world creatures. However, it is still worthwhile to explore 3D symmetry groups (Savriama & Klingenberg, 2011) in the virtual robot design problem, which might be a promising future work. In addition, although the idea of symmetry can be applied to a wide range of tasks, it may not be suitable for tasks that do not require symmetry, such as single-arm robotic manipulation tasks where we need to design a robot arm as well as its gripper for a particular manipulation task. Intuitively, efficient designs are mostly asymmetric in these situations, and a symmetry constraint might prevent the arm and manipulator from operating in a more effective way, thus hindering training.