Brain-Body-Task Co-Adaptation can Improve
Autonomous Learning and Speed
of Bipedal Walking

Darío Urbina-Meléndez¹, Hesam Azadjou¹, Francisco J. Valero-Cuevas^∗,1,2 ¹ D.U.-M, H.A. and F.V.-C are with the Alfred E. Mann Department of Biomedical Engineering, University of Southern California, Los Angeles, CA 90089 USA. [urbiname][azadjou][valero]@usc.edu² F.V.-C is also with Division of Biokinesiology and Physical Therapy, University of Southern California, Los Angeles, CA 90089 USA.^∗ F.V.-C. is the corresponding author. [valero]@usc.edu

Abstract

Inspired by animals that co-adapt their brain and body to interact with the environment, we present a tendon-driven and over-actuated (i.e., n joint, n+1 actuators) bipedal robot that (i) exploits its backdrivable mechanical properties to manage body-environment interactions without explicit control, and (ii) uses a simple 3-layer neural network to learn to walk after only 2 minutes of ‘natural’ motor babbling (i.e., an exploration strategy that is compatible with leg and task dynamics; akin to childsplay). This brain-body collaboration first learns to produce feet cyclical movements ‘in air’ and, without further tuning, can produce locomotion when the biped is lowered to be in slight contact with the ground. In contrast, training with 2 minutes of ‘naïve’ motor babbling (i.e., an exploration strategy that ignores leg task dynamics), does not produce consistent cyclical movements ‘in air’, and produces erratic movements and no locomotion when in slight contact with the ground. When further lowering the biped and making the desired leg trajectories reach 1cm below ground (causing the desired-vs-obtained trajectories error to be unavoidable), cyclical movements based on either natural or naïve babbling presented almost equally persistent trends, and locomotion emerged with naïve babbling. Therefore, we show how continual learning of walking in unforeseen circumstances can be driven by continual physical adaptation rooted in the backdrivable properties of the plant and enhanced by exploration strategies that exploit plant dynamics. Our studies also demonstrate that the bio-inspired co-design and co-adaptations of limbs and control strategies can produce locomotion without explicit control of trajectory errors.

Index Terms:

biped, brain-body-task, co-adaptation, locomotion, motor-babbling, natural-babbling, limited-experience, tendon-driven

I Introduction

Active and explicit control of robotic bipedal locomotion poses multiple challenges, including: i) hybrid dynamics that transition among single- and double-leg stances and aerial phases [1], and ii) actuators with insufficient bandwidth to manage instantaneous impacts [2]. To address these challenges, studies that take inspiration from the musculature of organisms have incorporated mechanical components and architectures to reduce limb inertia by implementing cable (i.e., tendon) driven structures [3], and increase the use of passive limb properties to manage impacts [2, 4]. Furthermore, approaches like Zero Moment Point (ZMP) enable balance during bipedal locomotion via quasi-static foot placements [5], as in the ASIMO low-impact robot [6], which is built and programmed in a way that avoiding impacts with the environment is one important design consideration. Truly agile robots need to break from this ’fear’ of impacts and transition to more dynamical cases, theories like Hybrid Zero Dynamics have been developed where a reset map allows the system to go back to stable performance after the intrinsic impulse perturbations of ground interaction in dynamic behavior [1]. In the furthest extreme, there are robots whose own structure allows them to produce locomotion without feedback control (e.g. [7]). Proof of principle comes from passive walkers that can produce useful movements without sensors and/or actuators [8, 9].

Largely missing from current approaches, however, is the most enviable capability of biological organisms: the ability to co-adapt their control strategies with their bodies to learn locomotion on their own. Therefore, we focused on creating a tendon driven and over-actuated (i.e., 2 joints, 3 actuators) bipedal robot that implements such brain-body co-adaptation to learn locomotion. To do so, we combined two bio-inspired features: i) backdrivable limbs that adapt to environmental physical constraints (akin to musculotendons) and ii) motor babbling compatible with leg and task dynamics, that allows brain-body collaboration through sparse physical actions (akin to childsplay [10, 11]), to heuristically learn to perform tasks [12, 13, 14].

We present a “Natural” motor babbling strategy as an extension of G2P or “General to Particular” model-agnostic algorithm [15] which enables bio-inspired learning of locomotion movements in tendon driven robotic limbs. This natural babbling strategy is an improvement of the naïve babbling strategy previously used by G2P. Data collected during both, natural and naïve babbling, are used to train a simple 3 -layer artificial neural network (ANN) which represents the inverse map from 6D limb kinematics (i.e., for our robot proximal and distal joint position, velocities, and accelerations) to 3D motor control sequences (i.e., three motors actuating the joints through tendons).

In [15], it is seen how a naïve babbling strategy causes aproximately 80% of the data generated to lie on edges of the configuration space, away from the area where the locomotion solutions lie. In contrast to this naïve strategy (that persistenly coactivates antagonist actuators, imposing movements that can conflict with leg dynamics), natural babbling resembles muscle mutual inhibition in living organisms [16, 17]. This promotes a more informative sensory feedback, compatible with the limb properties. In detail, when performing natural babbling, motor activations: i) produce joint rotations away from their limits of rotation and ii) follow a sinusoidal patterns instead of step functions (with a phase shift of 180 +-20 degrees for pairs of motors that act on the same joint, in other words two antagonist motors are not simultaneously activated with high activation values). As a result the leg joints are more homogeneously exposed to the region of the configuration space where locomotion patterns lie, promoting a higher success rate of learning of robotic locomotion (ratio of experiments where walking is learned to those where it is not learned).

We demonstrate with a physical robot that locomotion can emerge from the co-adaptation of actions learned from limited experience (i.e. few shots of training) enabled by backdrivable limbs that implicitly manage body-environment interactions. The critical factors offered by this algorithm are the data efficiency and low-budget computational requirements, which can serve as a baseline for the lifelong learning of bipedal robots. Our study emulates the adaptive behavior of animals, where continual success of learned actions relies on useful brain-body-environment interaction [18, 19].

II Methods

II-A Robot characteristics

Refer to caption — Figure 1: A-Tendon route diagram of one leg, B- Render of the 3D model of the biped C.- Photograph of the tendon-driven bipedal robot. To reduce rotational inertia, motors M1 and M2 (Maxon DCX16S GB KL 24V, 21:1 reduction ratio gearhead) are placed distally to the joints.

For our experiments, we built and used a tendon-driven physical bipedal robot (Figure 1). Each of its legs has hip and knee joints and a ball foot to facilitate the relative rotation of the lower section of the leg with respect to the ground.

The mechanical power to the joints is provided by a structure that resembles a muscle: the force is provided by a motor, while the muscle-joint interface (which in our robot would be the motor-joint interface) is a string that we call tendon. This robot is over-actuated since it has more actuators than degrees of freedom (DoF). The tendon route is shown in Figure 1-A.

The tendon routing of our robot is an evolution of the routing for the robot in our already published paper [15], where all the motors were placed distally to the leg (i.e., in the hip). Here we simplify the tendon routing by having only two motors placed distal to the leg and one of them in the thigh. This design decision was made to reduce the torques driving the hip joint, thus potentially simplifying the task of learning a useful movement. The motors (Maxon DCX16S GB KL 24V) include a gearhead (with a reduction ratio of 21:1). Respectively, each motor is called M1, M2, and M3, for details on their location please refer to Figure 1. Comparing two motors A and B, both set to the same voltage level and mechanical load; A with a gearhead and B without one: motor A reduces the back-drivability of the limb while increasing its mechanical power output capabilities. This is an advantage for when the design of the robot is changed to a heavier one due to a bigger body size and/or the addition of more components (e.g., sensors and actuators).

The range of motion of the joints was bigger than for our previous robot designs, allowing us to explore the capability of the robot to track a desired trajectory independently of hard stops providing physical help. Here it is important to mention that in locomotion experiments the movement of a robot is typically physically limited by two components that serve as boundaries of its feasible configuration space: mechanical constraints (i.e., hard tops) in its own body, and environmental constraints (i.e, objects or ground itself). By designing our robot to have big ranges of motion normally not reachable while performing tasks (Figures 4-B and 5-B), we focus on the role that environmental constraints have on the resultant performance of a task.

To maintain rotational inertia as low as possible (having a direct impact on power consumption to meet the demands of leg movement), and to increase the stiffness of the legs, we used aluminum tubes as main components of the legs. We used additive manufacturing or 3D printing techniques, for the construction of the joints. We also considered the implementation of easy tendon attachment points to facilitate the replacement of tendons, which is the part of the robot that breaks more often.

We built a gantry to support the biped, only allowing its hip to move along the x and z axis in its sagittal plane. The gantry prevents the biped from falling down, allowing us to focus merely on the task of learning a locomotion cycle.

II-B General G2P overview

The first version of the learning algorithm that we use was developed in [15], it is called the General to Particular (G2P) algorithm. This algorithm uses an Artificial Neural Network (ANN) as a map from inputs to outputs (respectively desired kinematics to motor activations) (Figures 2 and 3). The ANN is trained with input-output data sets obtained from babbling and tested with input-output data sets obtained from babbling by predicting outputs given inputs. The predicted outputs are compared with ground truth motor activations outputs. The difference between predicted and obtained values is the error and the goal is to reduce such error. The testing/training data set size ratio is 0.25. We use one ANN per leg. In [15], G2P refines this map with a reinforcement learning approach, for this paper we do not consider such a section of the algorithm since we are interested in understanding the value of the data obtained during babbling.

The ANN used for Natural and Naïve G2P (Figures 2 and 3 respectively) represents the inverse map from 6D limb kinematics (i.e., for our robot proximal and distal joint position, velocities, and accelerations) to 3D motor control sequences (i.e., three motors actuating the joints through tendons), it has three fully connected layers (input, hidden and output layers) with 6, 15 and 3 nodes, respectively.

As the transfer functions for all nodes, we selected the hyperbolic tangent sigmoid function, which is an S-like function that produces a bounded output value in a range between -1 and 1. Additionally, we chose this function over the sigmoid since the gradient of the second is bigger than the first. The higher gradient produces a greater sensitivity to changes in the input values, producing higher updates in the weights of the networks (thus potentially faster learning). We also applied a scaling for the output layer (giving values between -1–1) to obtain values to cover the whole motor control range values (0-255).

The weights and biases were initialized based on the Nguyen–Widrow initialization algorithm [20, 21]; with this, we avoid initializing weights close to the regions where the gradient of the transfer function has very small or high values. Having initial values localized in the mentioned region creates undesired output saturation. To obtain the best results, this approach randomly initializes weights close to the midpoint of the transfer function (i.e., 0 for the cases of our experiments).

As a performance/error function, we used the mean square error (m.s.e.) approach. With this, the mean of the differences between values predicted by the ANN and the ground truth values are calculated. This loss function aims to minimize the overall prediction error.

This error is propagated backward to update the initial weights, the action performed with the Levenberg–Marquardt back propagation technique, the assignment of new weights is particularly done with Adaptive Moment Estimation (Adam), a gradient descent method chosen over MomeNtum, AdaGrad, RMSProp. Adam is the standard go-to method since it includes benefits from both Momentum and RMSProp. To find the best model weights, it leverages the usually seen speed of MomeNtum, and adaptability to gradients with different orientations commonly well handled by RMSProp. Each time the backpropagation is complete, it is considered that an epoch happened. We determined the maximum number of epochs to 100; also, the model training stops after there is no improvement after 5 epochs.

II-C Natural babbling: changes to G2P babbling strategy

As mentioned in the introduction, we made changes to the babbling strategy of G2P to more homogeneously expose the leg joints to the areas in its configuration space where locomotion patterns lie. To keep our focus on assessing the usefulness of the data to produce a mapping with which a desired trajectory can be tracked, we particularly tested the G2P capability to create motor activations to limb kinematics map without any refinement to such a map. With this paper, we show that (for a two DoF, three actuators leg) properly obtained data can be enough to train an ANN to produce useful movement (more details in results and discussion sections).

Before explaining the details of natural babbling, it is important to highlight that naïve babbling consists of random step PWM signal variation for each one of the motors and that each motor signal is independent of the others (frequency of steps change: 1.3 Hz), as shown in Figure 2, rightmost panel. For natural babbling, we modified the randomness of motor activations by including the rule that the activation level of two antagonist motors should be significantly different (As observed in motors (M) 2 and 3 in Figure 3, rightmost panel).

For natural babbling (Figure 3), each PWM signal for each of the motors follows a sinusoid profile. Considering that the mean value of the signals is 0, only the positive section is used. For each motor, the signal amplitude is varied randomly. M1 and M2 signals have a phase shift of 180 deg. This is to avoid simultaneous activations of the motors which cause no hip movement to happen [[16, 17]]. Every 15 seconds, the phase between M1 and M3 was increased by 36 deg and the baseline of each signal varies +-30 PWM units (approximately +- 1V). To get a sinusoid-like shape, steps in series need to be considered (this is a digital system, so we are discretizing the signal). Step frequency: 6 Hz. Sinusoid frequency (every time a period is completed): .6 Hz. Frequency of each signal peak: 1.3 Hz. Each peak (natural babbling) has approximately the same width as each step of naïve babbling. All frequencies are reported as approximate values. It is intrinsic to the microcontroller behavior to have slight variations in signaling and sampling frequency. The limits of rotation of each of the joints were never reached with natural babbling, a crucial point for our results and conclusions (Figure 4-B)

II-D Desired foot trajectory characteristics and variations

Before hardware experiments were performed, we did a forward kinematics analysis of possible limb movements that allowed us to obtain the desired joint evolution profile and ranges shown in Figure 4-A and B). The resultant foot trajectory is such that allows its front and back swings to have different heights (Figure 4-C).

As shown in Figure 6, we divide our experiment into three main conditions determined by the location of the desired trajectory with respect to the ground. The desired trajectory always has the same distance to the robot’s hip, changing the position of the desired trajectory requires changing the robot’s hip height by re-configuring the gantry that prevents the robot from falling down (Figure 1-C). Depending on its location, a fraction or no part of the desired trajectory is reachable by the feet of the biped. We divide our experiments in three cases:

1.

Condition 1: Desired trajectories in air- only in air movement, with no interaction with the ground. When performing movements, the feet trajectories will be limited only by the characteristics of the biped itself (Figure 6-A).
2.

Condition 2: Desired trajectories in slight contact with the ground- desired foot trajectories are only partially reachable since they are partially under the ground level. In other words, ground constraints the movement of the robot to stay over the boundary marked by the ground (Figure 6-B).
3.

Condition 3: Desired trajectories 1 cm under the ground -desired trajectories are unreachable, they are completely under the ground level. This is the condition where the biped’s movements are more constrained. Also, for this condition, the area of the feasible joint configuration space is smaller than in points 1 or 2 (i.e. here the biped movements are constrained to exist between the limits imposed by the ground and the limits marked by the limits of joint rotations) (Figure 6-C).

II-E Hardware experiments steps

The following steps were performed using both: naïve and natural babbling. Eight trials of this experiment were performed, four based on naïve babbling and four on natural babbling. If the biped displaces its body mass for 40 cm we consider this a successful walking trial. The success rate is calculated by dividing the number of successful trials by the number of performed trials of a particular kind (i.e., condition and type of babbling data used). When a result is reported as “mean”, it is the average value from four trials. For the mean cases of spread and detrended fluctuation analysis, the number of values considered is eight (left and right legs for each of four trials: total eight).

These are the steps we followed to perform our experiments:

1.

Collect babbling data for two minutes (Figure 5). Babbling characteristics are described in Section II-B.
2.

Train an ANN to map motor activations to limb kinematics as described in Section II-B.
3.

With desired trajectories in air (i.e., Section II-D, biped suspended in air, no ground constraint), track the desired foot trajectory (Figure 6-A).
4.

With desired trajectories in slight contact with the ground (i.e., Section II-D, biped’s hip at 40 cm off the ground). Perform trajectory tracking as in (Figure 6-B). Measure the time the biped takes to travel 40 cm in case there is successful walking.
5.

With desired trajectories 1 cm under the ground (i.e., Section II-D, biped’s hip at 39 cm off the ground, Figure 6-C). Measure the time the biped takes to travel 40 cm in case there is successful walking.

II-F Data analysis (Spread calculation)

We discretized the area within the desired trajectory into $1\times 1mm^{2}$ pixels and checked if the foot visited that pixel during a single babbling trial. Then by calculating the ratio of the occupied pixels to all pixels, we quantified the spread. Spread quantifies how well the algorithm (specifically, the babbling) can explore different kinematics by knowing the locations that feet have passed through.

II-G Data analysis (Detrended Fluctuation Analysis)

In Detrended Fluctuation Analysis (DFA), the fractal scaling component estimates a time series’ scaling behavior which represents the power law scaling behavior of the time series over various time scales. The steps for DFA are as follows:

1.

First, we detrended the time series data of the endpoint’s distance to the hip from each trial by dividing the time series into non-overlapping windows of equal length and then fitted a polynomial function of first degree to each window.
2.

Then we divided the detrended series into smaller segments of equal length (boxes). The scale factor determines the length of the boxes.
3.

Afterward, we calculated the root-mean-square fluctuation (F) for each box in the detrended series.
4.

Then, we calculated the average root-mean-square fluctuations across all the boxes at a given scale.
5.

We repeated steps 1 to 4 for different scale factor values and plotted the average fluctuation versus the scale factor (DFA curve).
6.

Finally, we analyzed the DFA curve to check the time series data for long-term correlations. The DFA curve shows a power-law relationship between the fluctuation and the scale factor quantified by the slope alpha (fractal scaling component) using linear regression on a log-log scale.

A higher fractal scaling component indicates that the time series exhibits stronger long-term correlations or persistence over various time scales, which means that the fluctuations in the time series at larger time scales are more correlated, and the time series has a more persistent trend. Conversely, for a lower fractal scaling component, this analysis indicates weaker long-term correlations or anti-persistence in the time series, which means that the fluctuations at larger time scales are less correlated, and the time series has a less persistent trend [22, 23, 24]. We use the persistence of trends and strength of correlation in the legs’ movements as a criterion to compare how well and robustly the biped walks (in case walking is achieved) in different cases and conditions.

III Results

III-A Exploiting limb mechanical properties increases the spread of training data and increases success rate of locomotion learning

All results reported in this subsection correspond to babbling data and walking attempts for Condition 2: Desired trajectories in slight contact with the ground (Figure 6-B). As a reminder, the success rate is calculated by dividing the number of successful trials by the number of performed trials of a particular kind (i.e., condition and type of babbling data used).

Two minutes of natural babbling data are enough to produce locomotion, while 2 minutes of naïve babbling data are not enough (Figure 6-B). With natural babbling G2P learned walking in 75% of the trials compared to 0% for naïve babbling trials. Mean displacement speed for successful natural-babbling-based trials was 1.9 cm/sec. Speed for 3 out of 4 successful trials: 2.45, 1.96, 1.3 cm/sec.

The difference, as previously described, between the naïve and natural cases resides in the babbling data. More spread babbling data (i.e., natural babbling data are more spread compared to naïve babbling data, as shown in Figure 5) shows that the babbling was more successful in exploring the leg kinematics, which is the primary purpose of babbling. Consequently, compared to natural cases, a lower success rate happen when training with naïve babbling data.

As shown in Figure 5, natural babbling data are closer to the regions of the configuration space where locomotion solutions lie. If we analyze the spread of this data within the area delimited by a desired trajectory, we see that the spread for the natural babbling data is higher than that of the naïve babbling data. For the trial presented in Figure 5, left-right leg spread of naïve babbling data: 0.14 and 0.18 respectively; left-right leg spread of natural babbling data: 0.60 and 0.53 respectively. Mean spread values for naïve and natural babbling data respectively are: 0.55 and 0.95.

In [25] it is described how a model to be able to describe a system, and to accurately predict its behavior, needs to be trained with more training samples spanning throughout the entire range of possible values such samples could possibly have. In our experiments most of the naïve babbling points lie away and few inside the desired trajectory, in many cases failing on training a model that can accurately predict the behavior inside the desired trajectory. In this work behavior will be the motor commands to pull on the tendons to produce cyclical movements that are close to the desired trajectory. This is seen in Figure 6-A where the blue trajectories based on a model trained with naïve babbling data fail to closely resemble the desired trajectory. In contrast natural babbling points, which are more spread and lie inside of the desired trajectory are better to train a model which can predict the motor activations required to produce cyclical foot trajectory patterns that better resemble the desired trajectory. This is seen in Figure 6-A where the green trajectories based on a model trained with natural babbling data better resemble the desired trajectory compared to the case of the naïve babbling based experiments.

III-B Placing desired trajectories completely under ground level increases walking success rate and produces faster walking

When the desired trajectories were 1 cm under the ground (Condition 3) (Figure 6-B), G2P learned supported bipedal walking in 100% of the trials based on both naïve and natural babbling. Naïve case speeds (1.79, 3.27, 1.7, 2.18 cm/sec), natural case speeds (5.03, 4.93, 6.19, 3.81 cm/sec) Respectively, mean displacement speeds for this cases were 2.23 cm/sec and 4.99 cm/sec. For trials based on natural babbling, when going from the condition where the desired trajectories are in slight contact with the ground (Condition 2) to the condition where the desired trajectories are 1 cm under ground, mean speed increased by 262%, and success rate was increased from 75% to 100%. For the trials based on naive babbling the success rate was increased from 0 to 100%.

For the condition where the desired trajectories are in slight contact with the ground (Condition 2), the biped can only barely touch ground with fully straight legs, reducing the work that the legs produce to only the swing of the hip. In contrast, when the desired trajectories are 1 cm under ground (Condition 3), the biped can produce work with both hip swing and knee flexion (Figure 6).

Compared to the in-air performance of the biped (desired trajectories in air), when the desired trajectories are in slight contact with the ground, the scaling behavior (See Detrended Fluctuation Analysis in Methods, Section II-G) for all our experiments drops (Figure 7). This shows, as expected, that following the trend on the ground for the biped is more complicated than in the in-air condition. When trained with naïve babbling data and when the desired trajectories are 1 cm under ground (compared to when the desired trajectories are slightly in touch with the ground), the generated movement shows significantly higher scaling components (p approximately of 0.03), indicating more persistent locomotion. On the other hand, when trained with natural babbling data and when the desired trajectories are 1 cm under ground (compared to when the desired trajectories are slightly in touch with the ground) , there is not a significantly different scaling component (p approximately of 0.22); however, there is less variance from trial to trial.

For cases trained with natural babbling data, when taking the desired trajectory from slightly touching ground to being completely under ground, there is an increment in walking speed. The reason for this is that, for cases based on natural babbling training data, walking has already emerged when the desired trajectory slightly touches ground . In the other hand, for cases based on naïve babbling training data, walking first emerges when the desired trajectories are placed 1 cm under ground. Both naïve and natural cases present an improvement when trajectories are placed under ground level, but naïve cases has less improvement after locomotion emerges than cases based on natural babbling.

IV Discussion

This paper aims to motivate the creation of bipedal robots that learn locomotion via data-driven co-adaptation with the dynamics of the plant to manage interactions with the environment. This is made possible by using motor babbling to inform a motion planning strategy that produces cyclical movements that can undergo useful adaptations thanks to the backdrivable and impact-resilient properties of the legs. These properties allow the unsupervised modification of a previously learned behavior to enable the emergence of locomotion under different (previously unseen) conditions. We find that a bio-inspired approach to ‘natural’ motor babbling compatible with the dynamics of the tendon-driven legs improves the success of locomotion learning and performance compared to ‘naïve’ arbitrary motor babbling. The techniques presented here could be further complemented by other relevant approaches such as the calculation of parameters useful to maintain a balanced gait such as zero moment point (ZMP) [5], or hybrid zero dynamics (HZD) [1] that explicitly considers the transitions between locomotor contact states. Even though these techniques are not necessary for the successful performance of our robot, in general they are potential options to further complement the experiments of this paper which do not focus on balance, but particularly on the generation of useful cyclical movements for locomotion.

A central aspect of our results is that the robot’s backdrivable limbs interact with the environment by allowing their movements to adapt to where the desired trajectory of a walking action is located with respect to the ground: in air, in partial contact with the ground (partially reachable) or under ground level (unreachable). For each of these conditions, interference of the desired trajectory with the ground was progressively greater, and the adaptation of a previously learned action was automatically modulated. Thus, the success of the resulting behavior does not depend on explicitly modulating or reducing errors. Rather—similar to the adaptive behavior observed in the locomotion of insects [26], crustaceans [27], and birds [28]—successful locomotion emerges because of, and not in spite of, brain (or controller)-body-environment interactions [18]. This adaptation happened with a performance strategy not explicitly aware of interference or impacts with the environment.

For the natural babbling case (compared to naïve case), we found higher fractal scaling components for cyclical movements with ground interference, as shown in Figure 7-Condition 2, suggesting they are more persistent. In the case of naïve babbling, increased ground interference had a more profound effect. When there was slight contact with the ground, we saw no locomotion and lower fractal scaling components (Figure 7-Condition 2). But when contact with the ground was further increased (Figure 7-Condition 3), locomotion emerged from cyclical movements with higher fractal scaling components comparable to those for natural babbling. These results points to counterintuitive controller-body-environment interactions that produce better locomotion as interference with the ground increases. While we expected the reduction of the workspace of the leg to hamper locomotion, it seems the compliance of the legs (due to their backdriveability) adapt sufficiently well to shape the limits cycles to produce locomotion without the control signals being explicitly aware of control trajectory errors.

Another fundamental aspect of this study is that we prescribed a type of motor babbling (i.e., natural motor babbling) that is compatible with, and exploits the bio-inspired mechanical properties of the tendon-driven limbs. Although similar in principle to Berniker et al. [29], where the anatomical properties of a bio-inspired limb are exploited, we develop these strategies directly in hardware (and not in simulation). Moreover, we do not explicitly simplify the task by prescribing recurring muscle patterns (i.e., muscle synergies) to produce limb movements. It is our natural motor babbling that implicitly finds useful patterns of motor activations to the tendons. In fact, our natural motor babbling is one of the important extensions to out prior work on autonomous learning of locomotion [15]. By using this type of motor babbling that tends to avoid antagonist motor commands, we take inspiration from biological organisms where co-contraction can be energetically wasteful. Actions that leverage the backdrivable mechanical properties of the plant, compatible with the over-and under-determined actuation of its tendon-driven limbs, are parallel to one of the fundamental blocks of limb function [30, 19] to produce oscillatory limb movements (e.g., leg swin [17]) .

V Conclusion

We made changes to the training babbling strategy of G2P to more homogeneously expose a biped’s leg joints to the areas in its configuration space where locomotion patterns lie. We did that by implementing a natural babbling strategy that exploits the tendon-driven bio-inspired mechanical properties of its limbs (i.e. oscillatory movements produced by oscillatory activations, with significant difference activation level between antagonist motors). We observed that natural babbling reduces the spread of training data and increases the success rate of locomotion learning when environmental constraints are minimal (Condition 2 of our experiments). Furthermore we also observed that increasing environmental constraint to the system (interference between ground and desired trajectories) increased the tendency of the plant to behave homogeneously between different trials (regardless of trials being based on natural or naïve babbling). This shows how, even though the environment (i.e., ground) generates a higher desired vs. obtained trajectory errors, it also collaborates with the backdrivable biped legs by “guiding” them to perform a successful task by reducing their feasible configuration space.

We present proof-of-principle that effective locomotion can emerge from brain-body-environment interactions driven by a controller that does not aim to reduce errors with respect to desired locomotion trajectories. We find that these effective interactions arise from the co-adaptation facilitated by bio-inspired backdrivable properties of limbs. Moreover, the cyclical movements motor commands are informed by pseudo-random motor babbling that exploits and leverages the bio-inspired tendon-driven mechanical and dynamical properties of the limbs. This demonstrates the bio-inspired co-design and co-adaptations of limbs and control strategies can produce locomotion without explicit control of trajectory errors.

VI Supplementary material

$https://github.com/DarioUrbina/natural\_babbling$

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author Contributions

DU-M lead: the writing of this manuscript, conceptualization of the studies, building of the hardware for the experiments, experimentation, result analysis, figure making and discussion. DU-M performed experiments and did data collection. HA performed the Detrended Fluctuation Analysis and its figure, participated in the discussions to analyse results and helped writing parts of the methods and introduction of the manuscript. FV-C adviced DU-M on the initial conceptualization of the studies and provided general direction for the project. All persons designated as authors qualify for authorship, and all those who qualify for authorship are listed.

Funding

Support for DU-M was provided by the joint research fellowship granted by Consejo Nacional de Ciencia y Tecnología and the Viterbi School of Engineering (CONACYT-Mexico) at the University of Southern California (USC). Support to HA was provided by the Graduate School of USC through the Provost Fellowship. Research reported in this publication was supported in part by the National Institute of Neurological Disorders and Stroke of the National Institutes of Health under award number R21NS11361, Department of Defense CDMRP Grant MR150091, DARPA-L2M program grants W911NF1820264 and W911NF2120070, and National Science Foundation Collaborative Research in Computational Neuroscience under award number CRCNS Japan-US 2113096 to FJV-C. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, the National Science Foundation, the Department of Defense, or DARPA.

Acknowledgments

The authors would like to thank Daniel Wang, Irie Cooper and Yifan Xue for their support in designing and manufacturing the physical system and electronics. Author DU-M would like to thank professor James Finley for discussions on the paper rationale as well as for providing detailed feedback on the methods and relevance of the findings. Author DU-M would like to thank professor Nicolas Schweighofer for pointing out the weak points and opportunities of improvement of the presented studies. The authors would also like to thank Suraj Chakravarthi Raja for providing valuable technical and scientific insights about the experiments reported in this publication. The authors would like to thank Grace Niyo for her support with proofreading the paper manuscript. The authors acknowledge the access to equipment for building experimental hardware provided by the Baum Family Makerspace, from the Viterbi School of Engineering.

References

[1] A. D. Ames and I. Poulakakis, “Hybrid zero dynamics control of legged robots,” 2018.
[2] J. W. Hurst, J. E. Chestnutt, and A. A. Rizzi, “The actuator with mechanically adjustable series compliance,” IEEE Transactions on Robotics, vol. 26, no. 4, pp. 597–606, 2010.
[3] D. Urbina-Meléndez, D. Wang, and F. Valero-Cuevas, “Bio-inspired tendon-driven robotic limbs,” (No Title), p. 60, 2021.
[4] J. J. Rond, M. C. Cardani, M. I. Campbell, and J. W. Hurst, “Mitigating peak impact forces by customizing the passive foot dynamics of legged robots,” Journal of Mechanisms and Robotics, vol. 12, no. 5, p. 051010, 2020.
[5] M. Vukobratović and B. Borovac, “hybrid zero-moment point—thirty five years of its life,” International journal of humanoid robotics, vol. 1, no. 01, pp. 157–173, 2004.
[6] Y. Sakagami, R. Watanabe, C. Aoyama, S. Matsunaga, N. Higaki, and K. Fujimura, “The intelligent asimo: System overview and integration,” in IEEE/RSJ international conference on intelligent robots and systems, vol. 3. IEEE, 2002, pp. 2478–2483.
[7] A. Badri-Spröwitz, A. Aghamaleki Sarvestani, M. Sitti, and M. A. Daley, “Birdbot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,” Science Robotics, vol. 7, no. 64, p. eabg4055, 2022.
[8] T. McGeer et al., “Passive dynamic walking,” Int. J. Robotics Res., vol. 9, no. 2, pp. 62–82, 1990.
[9] M. Srinivasan and A. Ruina, “Computer optimization of a minimal biped model discovers walking and running,” Nature, vol. 439, no. 7072, pp. 72–75, 2006.
[10] M. S. Fine and K. A. Thoroughman, “Trial-by-trial transformation of error into sensorimotor adaptation changes with environmental dynamics,” Journal of neurophysiology, vol. 98, no. 3, pp. 1392–1404, 2007.
[11] K. E. Adolph, W. G. Cole, M. Komati, J. S. Garciaguirre, D. Badaly, J. M. Lingeman, G. L. Chan, and R. B. Sotsky, “How do you learn to walk? thousands of steps and dozens of falls per day,” Psychological science, vol. 23, no. 11, pp. 1387–1394, 2012.
[12] J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn, “Bayesian model-agnostic meta-learning,” Advances in neural information processing systems, vol. 31, 2018.
[13] R. Kwiatkowski and H. Lipson, “Task-agnostic self-modeling machines,” Science Robotics, vol. 4, no. 26, p. eaau9354, 2019.
[14] Y. He, C. Zang, P. Zeng, Q. Dong, D. Liu, and Y. Liu, “Convolutional shrinkage neural networks based model-agnostic meta-learning for few-shot learning,” Neural Processing Letters, pp. 1–14, 2022.
[15] A. Marjaninejad, D. Urbina-Meléndez, B. A. Cohn, and F. J. Valero-Cuevas, “Autonomous functional movements in a tendon-driven limb via limited experience,” Nature machine intelligence, vol. 1, no. 3, pp. 144–154, 2019.
[16] B. Day, C. Marsden, J. Obeso, and J. Rothwell, “Reciprocal inhibition between the muscles of the human forearm.” The Journal of physiology, vol. 349, no. 1, pp. 519–534, 1984.
[17] W. O. Friesen, “Reciprocal inhibition: a mechanism underlying oscillatory animal movements,” Neuroscience & Biobehavioral Reviews, vol. 18, no. 4, pp. 547–553, 1994.
[18] H. J. Chiel and R. D. Beer, “The brain has a body: adaptive behavior emerges from interactions of nervous system, body and environment,” Trends in neurosciences, vol. 20, no. 12, pp. 553–557, 1997.
[19] F. J. Valero-Cuevas and A. Erwin, “Bio-robots step towards brain–body co-adaptation,” Nature Machine Intelligence, vol. 4, no. 9, pp. 737–738, 2022.
[20] D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights,” in 1990 IJCNN international joint conference on neural networks. IEEE, 1990, pp. 21–26.
[21] M. Wayahdi, M. Zarlis, and P. Putra, “Initialization of the nguyen-widrow and kohonen algorithm on the backpropagation method in the classifying process of temperature data in medan,” in Journal of Physics: Conference Series, vol. 1235, no. 1. IOP Publishing, 2019, p. 012031.
[22] C.-K. Peng, S. Havlin, H. E. Stanley, and A. L. Goldberger, “Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series,” Chaos: an interdisciplinary journal of nonlinear science, vol. 5, no. 1, pp. 82–87, 1995.
[23] C.-K. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E. Stanley, and A. L. Goldberger, “Mosaic organization of dna nucleotides,” Physical review e, vol. 49, no. 2, p. 1685, 1994.
[24] E. A. Ihlen, “Introduction to multifractal detrended fluctuation analysis in matlab,” Frontiers in physiology, vol. 3, p. 141, 2012.
[25] C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” in Database Theory—ICDT 2001: 8th International Conference London, UK, January 4–6, 2001 Proceedings 8. Springer, 2001, pp. 420–434.
[26] R. J. Full and D. E. Koditschek, “Templates and anchors: neuromechanical hypotheses of legged locomotion on land,” Journal of experimental biology, vol. 202, no. 23, pp. 3325–3332, 1999.
[27] C. F. Herreid and R. J. Full, “Locomotion of hermit crabs (coenobita compressus) on beach and treadmill,” Journal of Experimental Biology, vol. 120, no. 1, pp. 283–296, 1986.
[28] M. A. Daley, G. Felix, and A. A. Biewener, “Running stability is enhanced by a proximo-distal gradient in joint neuromechanical control,” Journal of Experimental Biology, vol. 210, no. 3, pp. 383–394, 2007.
[29] M. Berniker, A. Jarc, E. Bizzi, and M. C. Tresch, “Simplified and effective motor control based on muscle synergies to exploit musculoskeletal dynamics,” Proceedings of the National Academy of Sciences, vol. 106, no. 18, pp. 7601–7606, 2009.
[30] F. J. Valero-Cuevas, Fundamentals of neuromechanics. Springer, 2016, vol. 8.

Brain-Body-Task Co-Adaptation can Improve Autonomous Learning and Speed of Bipedal Walking