Models of benthic bipedalism

F. Giardina John A. Paulson School of Engineering and Applied Sciences, Harvard University L. Mahadevan John A. Paulson School of Engineering and Applied Sciences, Harvard University Department of Physics, Harvard University Department of Organismic and Evolutionary Biology, Harvard University

Abstract

Walking is a common bipedal and quadrupedal gait and is often associated with terrestrial and aquatic organisms. Inspired by recent evidence of the neural underpinnings of primitive aquatic walking in the little skate Leucoraja erinacea, we introduce a theoretical model of aquatic walking that reveals robust and efficient gaits with modest requirements for body morphology and control. The model predicts undulatory behavior of the system body with a regular foot placement pattern which is also observed in the animal, and additionally predicts the existence of gait bistability between two states, one with a large energetic cost for locomotion and another associated with almost no energetic cost. We show that these can be discovered using a simple reinforcement learning scheme. To test these theoretical frameworks, we built a bipedal robot and show that its behaviors are similar to those of our minimal model: its gait is also periodic and exhibits bistability, with a low efficiency gait separated from a high efficiency gait by a ”jump” transition. Overall, our study highlights the physical constraints on the evolution of walking and provides a guide for the design of efficient biomimetic robots.

I Introduction

The transition of vertebrates from water to land is thought to have occurred around 400 million years ago and required fundamental changes in morphology and behavior. Used to living in near-neutral buoyancy, aquatic vertebrates had to adapt to the effects of gravity on land which required a change in locomotion strategy. Fins provided a basic form of legs that helped early land-dwelling vertebrates to support their body weight and switch from undulatory (swimming by lateral bending of the body) locomotion to ambulatory (walking) locomotion. A common view of the transition from undulation to walking is that it occurred gradually grillner2009measured , consistent with observations in contemporary tetrapods that most closely represent early legged locomotion. For instance, the salamander uses a combination of undulatory and ambulatory locomotion when walking on land chevallier2008organisation , supporting the hypothesis of gradual transition from swimming to walking during vertebrate terrestrialization. While the development of legs can be traced in the fossil record, the origins of neural circuits giving rise to the control required for ambulatory locomotion are unclear. However, recent work jung2018ancient suggests that the neural circuits required for limb control can be found in aquatic vertebrates who are distant relatives to the first tetrapods, indicating that the neuromuscular basis for legged locomotion was present in all vertebrates with paired fins. These observations raise the question of whether a walking gait was actively used in the earliest finned vertebrates or if it only emerged prior to terrestrialization. Given the plethora of extant benthic (living near the bottom of aquatic environments) fish and species that can walk short stretches on land king2011behavioral ; flammang2016tetrapod ; macesic2010comparative ; lucifora2002walking , it is conceivable that their ancestors with similar morphologies might have used these ancient neural circuits to walk. A particularly striking example in this regard is the little skate Leucoraja erinacea (Figs. 1A,B) that is incapable of undulation due to its rather stiff vertebrae, and uses a benthic gait consisting of left-right alternating walking jung2018ancient . These observations are strongly suggestive that walking in aquatic environments can emerge without a prior undulatory gait and that the a requirement for the evolution of walking is the capability to independently control each fin along with a control strategy that synchronizes the walking motion. While there exists strong evidence that independent leg control was present in early vertebrates with paired fins jung2018ancient , what form the locomotion gait adopted and whether a neural control strategy for stable and efficient legged locomotion was feasible remain open questions.

Here, inspired by published video data on the dynamics of walking in the little skate, we study these questions by devising a minimal mathematical model to analyze the stability, energy efficiency, and control complexity of early benthic locomotion. We compare the most efficient gaits predicted by the model to the kinematics of the little skate and show that both are characterized by a left-right alternating leg pattern with an undulating center of mass and a regular foot placement pattern. Closed-form expressions for the dynamics of the model show that the most energy efficient gait is associated with no energetic cost of locomotion and merely requires a simple open-loop control strategy. Additionally, the model also predicts the co-existence of a second gait with much lower efficiency. To complement this explicit dynamic model, we use a reinforcement learning strategy and show that a little skate-like gait is the preferred solution in this framework, suggesting that an evolutionary process can converge to this in nature. Finally, to test our results with hardware, we build a simple bipedal robot and show that it also exhibits bistable behavior for certain control parameters, qualitatively consistent with our theory.

Refer to caption — Figure 1: Little skate locomotion behavior (data obtained from jung2018ancient with permission from the authors) . A, Ventral view of little skate indicating leg length $L$ , footsteps, and leg angle $\alpha$ measured relative to the centerline of the body. Scale bar 1cm. B, Sequence of walking gait indicating trajectory of the pelvic girdle (dashed white line), active legs (solid black lines), passive legs (dashed black lines), and footsteps. C, Trajectory of the pelvic girdle (black line) as a function of dimensionless position with footsteps (circles). D, Left and right leg angles $\alpha$ as a function of dimensionless time and mean foot placement angle $\alpha_{0}$ . The inset shows the dimensionless speed of the pelvic girdle as a function of dimensionless time with $v^{*}$ (dashed line) the approximate lower speed bound during steady state locomotion.

II Mathematical Model

Published videos of the little skate jung2018ancient allowed us to extract foot placements, trajectories, and leg angles as shown in Figs. 1C,D (See also Supplementary Information, SI, and SI Video). In steady locomotion, we observed that as the pelvic girdle position of the skate undulates during locomotion, only one foot is in contact with the ground at any time, and there is little slip between the leg and the ground during contact. This leads to a periodic foot placement pattern over time as shown in Fig. 1C, accompanied by periodic dynamics of the leg angle $\alpha$ (measured with respect to the centerline of the body), as shown in Fig. 1D.

Based on these observations, a minimal model of locomotion suggests modeling the body as a mass $m$ with moment of inertia $I$ which can move and rotate freely in the plane if no legs are placed on the ground, as shown in Fig. 2; assuming perfect neutral buoyancy removes the effect of gravity. Since the legs are very light relative to the massive body, we ignored their mass, assumed them to be directly attached to the center of mass, and allowed them to switch their ground contact state with a frequency $\omega$ . In the absence of slip, upon leg contact with the ground, the velocity of the body $v$ is perpendicular to the leg vector $\bm{r}_{c}$ which points from body to contact point $c$ . The foot-ground contact was modeled as a perfectly inelastic impact which dissipates any velocity component that violates the constraint at foot placement, consistent with a range of previously used simple models of locomotion garcia1998simplest ; goswami1998study ; usherwood2003understanding . During the impulse-free phases, a torque $T$ accelerates the body on a circular path around the active leg with length $L$ and the current contact point $c$ . For simplicity, we assumed the torque to be constant during step $i$ , i.e. $T_{i}(t)=T_{i}$ .

We described the resulting dynamics of this model in terms of generalized coordinates $\bm{q}=[x,y,\theta]^{T}$ , with $x,y$ defining the position in the plane and $\theta$ the body angle with respect to the inertial frame of reference. The rotational degree of freedom was assumed to be decoupled from translation during a single step, but coupling occurs at leg transition as the touchdown angle $\alpha_{0}$ is measured relative to body orientation. In an alternating leg sequence with constant frequency $\omega$ and torque $T$ , the body rotates during a step $i$ by $\Delta\theta_{i}=\pm T/2I\omega^{2}+\dot{\theta}_{i-1}/\omega$ , where $\dot{\theta}_{i}=\dot{\theta}_{i-1}\pm T/I\omega$ . It is easy to verify that the initial conditions $\dot{\theta}_{0}=\mp T/2I\omega,\theta=0$ (positive sign for right leg, negative for left) guarantee at every step $\theta_{i}=0$ . The system is subject to opening and closing bilateral contacts with the ground, which is a problem extensively studied in nonsmooth mechanics brogliato1999nonsmooth . We modeled a closing contact with an inelastic impact law and assumed an initial condition for $\theta_{i}=0,\forall i$ as described above. We considered the transition from active ground contact in the left leg to the right leg only, as the final result is leg independent. The center of mass velocity before impact at step $i$ is denoted by $v^{-}_{i}$ and after impact with $v_{i}^{+}$ . The direction of the velocities is given by the geometry of the problem as shown in Fig. 2A. Pre- and post-impact velocities must be perpendicular to the leg direction (defined by the angles $\alpha^{-}$ and $\alpha_{0}$ for detaching and attaching leg, respectively) due to the pivoting motion. At leg transition, the constraints of both legs may not be compatible, which requires a projection of the velocity of the detaching leg to the direction perpendicular to the attaching leg. This projection is the inelastic impact law at step $i$ and is defined as

v^{+}_{i}=\left|\cos\delta_{i}\right|v^{-}_{i}.

(1)

$\delta_{i}$ is the difference in leg angles at the transition given by $\delta_{i}=\alpha_{0}-\alpha^{-}_{i}-\pi$ as shown in Fig. 2A. Therefore, the mapping of velocity magnitude from pre- to post-impact state is only a function of the leg angles at collision. With the assumption of constant torque $T$ during contact phase, the gained velocity due to torque $T$ over the period $1/\omega$ is $\Delta v=T/\omega Lm$ . The post-impact velocity map follows

v^{+}_{i}=\left|\cos\delta_{i}\right|\left(v^{+}_{i-1}+\Delta v\right)

(2)

Equation (2) contains implicitly $\alpha^{-}_{i}$ in $\delta_{i}$ which itself depends on $v^{+}_{i-1}$ by the relation $\alpha^{-}_{i}=-\alpha_{0}+(v^{+}_{i-1}/\omega L+\gamma/2)$ , with the nondimensional torque $\gamma=T/\omega^{2}L^{2}m$ . This is the evolution of $\alpha$ over one step given the initial velocity and prescribed torque.

The kinematic data in Fig. 1D and SI suggests that, after a short transient phase, the leg torque $T$ , frequency $\omega$ , and leg placement angle $\alpha_{0}$ are constant over consecutive steps. This observation suggests the following open loop bipedal control strategy: alternate in leg activation and keep $T$ , $\omega$ and $\alpha_{0}$ constant over all steps. This allowed us to frame the energetic cost, speed, and stability of the locomotion gait mathematically. Since all control parameters are constant, the speeds at which loss due to collision and gain due to torque $T$ balance and result in no variation in speed over subsequent steps, i.e. the fixed points of the system, are given by the implicit equation

\bar{v}^{*}\left(1-\left|\cos{\delta}\right|\right)=\gamma\left|\cos{\delta}\right|,

(3)

with $\bar{v}^{*}=v^{*}/\omega L$ . The evolution of the fixed point corresponds to a Poincaré map of the dynamics of the studied body. For $v_{0}=v^{*}$ the resulting trajectory of the dynamical system is a limit cycle with period $2/\omega$ , where the prefactor is due to the bilateral symmetry. Continuous undulation of the body and a regular footstep pattern are the prominent features of the gait, which are also observed in recorded animal data shown in 1B. As mentioned above, the derivation of (3) does not include dynamics of the body rotation because we can decouple rotation from translation with the initial rotational velocity $\pm T/2\omega I$ and initial body orientation $\theta_{0}=0$ .

Stability of the discrete map (2) was quantified by applying a linear stability analysis, i.e. searching for $v^{*}$ such that

\left|\frac{\mathrm{d}v_{i}^{+}}{\mathrm{d}v_{i-1}^{+}}\right|_{v_{i-1}^{+}=v^{*}}<1.

(4)

To reveal the possible behaviors and gaits of the model we searched for solutions of the discrete map (3). Finding a solution (i.e. a fixed point $\bar{v}^{*}$ ) in the model equates to finding a periodic gait. For a constant foot placement angle $\alpha_{0}$ the fixed point velocity $\bar{v}^{*}$ is solely defined by the nondimensional torque $\gamma$ . Fig. 3 shows the solutions of (3) in a bifurcation diagram for a fixed foot placement angle $\alpha_{0}$ . Stability analysis of the solutions revealed there are three stable regions of interest in the diagram. For $\gamma\leq 0.2$ we found two solution branches that coexist, one at low speed and the other one at higher speed connected by an unstable region. A system on the edge of the lower branch will experience a sudden jump in its attracting fixed point as the nondimensional torque $\gamma$ is increased, consistent with the existence of a saddle-node bifurcation strogatz2018nonlinear . For $\gamma\in[0.2,0.75]$ only one stable fixed point exists which is the continuation of the upper branch. Lastly, at around $\gamma\approx 0.75$ a period doubling bifurcation occurs, and the system jumps between two solution branches for $\gamma>0.75$ . This gait is asymmetric with a sideway component relative to the body orientation.

Gaits with small $\gamma$ are biologically most plausible as they reduce energetic cost. In fact, there exists a point at $\gamma=0$ , on the upper saddle-node bifurcation in the bifurcation diagram, for which the post-impact velocity is nonzero, indicating a gait with zero energetic cost. This was confirmed by considering the kinetic energy over time: energy fluctuations vanished when $\gamma=0$ , which required the leg vectors $\bm{r}_{c}$ of the left and right leg at transition to be parallel, i.e. $\delta=0$ . The speed of this point which corresponds to the energy-optimal gait was studied next. For small $\gamma$ , the right-hand side of (3) is negligible, and we have the solutions $\bar{v}^{*}=0$ and $\left|\cos{\delta}\right|=1$ . The first fixed-point speed provides the trivial solution on the lower branch. For the second solution, recall that $\delta=2\alpha_{0}-\bar{v}^{*}-\gamma/2-\pi$ which implies $\bar{v}^{*}=2\alpha_{0}-n\pi,n\in\mathbb{Z}$ . For $n=1$ the solution corresponds to the energy-optimal gait. The physical interpretation of this result is that for the energy-optimal gait, the angle $\delta$ has to be zero, or put differently, the legs have to be parallel at the transition which removes any dissipation due to impact. Interestingly, the parameter $n$ suggests that there are an infinite number of energy-optimal gaits. In these gaits, the transition from one leg to the next occurs after $\left|n\right|/2$ rotations around the same ground contact point, effectively resulting in a pirouette-like rotation before the leg transition. Although the velocity can become rather large, these solutions are less interesting in the context of our investigation of directed bipedal locomotion and were not further analyzed.

While real systems will have to operate at $\gamma>0$ to ensure stability, the upper branch for $\gamma<0.2$ is not reachable from rest with the bipedal control strategy, and the system will always converge to the lower, slower and less efficient branch. To switch to the upper branch, one has to literally “leap” onto it using an impulse-like initial actuation strategy. Indeed, such motions are observed in the little skate as it takes off from rest by punting forward with a powerful stroke using both legs at the same time, followed by immediate switching to the alternating gait (See Figs. 1D, SI).

The effect of fluid drag on the fixed points $\bar{v}^{*}$ was studied next. The Reynolds number in the little skate is of the order $10^{3}$ and drag can be expressed using the drag equation $F_{d}=1/2\rho v^{2}C_{d}A$ , with $\rho$ the fluid density, $v$ the body velocity, $C_{d}$ the drag coefficient, and $A$ the reference area. The part in (2) which gets affected by this dissipative effect is $\Delta v$ , i.e. the velocity increase over the contact phase of one leg. The velocity in the case with drag obeys the differential equation

\frac{\mathrm{d}v}{\mathrm{d}t}=\frac{T}{mL}-\frac{1}{2}\rho v^{2}C_{d}A,

(5)

which has the solution

v(t)=\sqrt{\frac{2T}{L\rho C_{d}A}}\tanh\left(\sqrt{\frac{\rho C_{d}AT}{2m^{2}L}}(c_{1}mL+t)\right)

(6)

with $c_{1}$ a constant to be defined by initial conditions. This provides an analogous closed form of (2) and enables the search for fixed points in the system. The solution depends on the drag coefficient $C_{d}$ and we expected to retrieve the original drag-less solution for $C_{d}\rightarrow 0$ . Fig. 4 shows the bifurcation diagram for the case with fluid drag for various drag coefficients and reference area estimated from specimen dimensions. For small drag coefficients, the bifurcation diagram converges to the case with no drag. As the drag coefficient increases, the upper stable branch moves towards larger $\gamma$ values and the fixed-point velocities decrease. We do not know the actual drag coefficient for the little skate, but estimations of a benthic ray (Raja clavata) suggest a drag coefficient of 0.1 (see webb1989station ). At the considered Reynolds numbers the drag coefficient of a sphere in a fluid flow is $C_{d}\approx 0.5$ and even for this conservative case we found bistable parameter values and similar converged fixed-point speeds on the lower and upper branches compared to the no-drag case, although the nondimensional torque $\gamma$ required for a similar speed increases significantly. Around $C_{d}\approx 1.08$ the bistable region disappears completely and is replaced by a continuous stable region before reaching the period-doubling bifurcation.

III Reinforcement Learning of Bipedalism

Our analysis so far shows that aquatic bipedalism has few requirements towards morphology (rudimentary legs) and control (constant torque, touchdown angle and frequency), but we have imposed an alternating leg sequence with fixed foot placement angle and torque for the analysis. We have not addressed the question if this gait can be discovered and if it is optimal when these constraints are relaxed. Discovery in this context requires an aquatic organism with rudimentary appendages to learn the neural control for a bipedal gait, which may be hard if the gait is a “needle in a haystack” in the vast control space. The search for a favorable gait relates to the field of gait selection and optimization hoyt1981gait ; ruina2005collisional ; srinivasan2006computer ; alexander1980optimum . A natural question in the context of motor control learning is if a desired behavior can be acquired by reinforcement glimcher2011understanding . With the objective of maximizing the travelled distance and minimizing the required energy (or equivalently, minimizing energy consumption for a travelled distance), we trained a reinforcement learning (RL) agent lillicrap2015continuous ; sutton2018reinforcement on the model to obtain a given locomotion speed $v_{T}$ . The framework has four state parameters (planar position and velocity of the point mass) and four control parameters (three continuous ones for $T,\omega,\alpha_{0}$ and a binary one for the leg case $l$ ) - for details see Methods. We observed, in most instances of the learning routine like the one shown in Fig. 5, that a one-legged locomotion strategy emerges after only 4 episodes, two-legged locomotion emerges around episode 200, and periodic locomotion with alternating leg cycles emerges via RL around episode 600. This type of gait prevails as the most efficient one and other explored strategies have a worse cumulative reward. We ran $\sim$ 50 instances of learning for 5000 episodes with changing learning parameters and weights for the reward function and found that the left-right alternating gait emerged in 70% of instances and generated the highest reward. The best solution matches the little skate’s observed walking gait in Fig. 1, and we see an undulatory motion of the center of mass and a left-right alternating leg sequence (see Fig. 2C and SI video). For comparison with the bifurcation diagram, the evolution of the best learned RL policy in the parameter space corresponds to the dashed lines in Fig. 3, starting with no forward speed and $\gamma=1$ . The first step uses a large $\gamma$ and over subsequent steps minimizes it while increasing $\bar{v}^{*}$ , eventually settling close to the upper saddle-node bifurcation point at $\gamma=0$ , confirming the optimality of the solution discovered by RL. In the context of our model, these results suggest that, despite the vast solution space of gaits, a left-right alternating bipedal control strategy can and will be discovered and is the optimal solution for energy efficient locomotion.

IV Robotic Model of Bipedalism

Having shown that the bipedal control strategy is the preferred solution in an optimization framework, we turn to realize it experimentally, inspired by a range of recent studies in this vein such as mimicking the legged locomotion of mudskippers mcinroe2016tail , the reconstruction of feasible tetrapod gaits in extinct species nyakatura2019reverse , the cost of transport of high frequency swimming zhu2019tuna , and robophysical studies in general aguilar2016review . We used a supported (simulating neutrally buoyant environments) robotic biped as shown in Fig. 6A. The legs were mechanically constrained to satisfy $\alpha\in[0,2.15]rad$ and we fixed $\omega$ , $L$ , $m$ and varied $T$ to change $\gamma$ . The design of the robot aims to test the planar dynamics of aquatic walking, restricting vertical oscillations of the body (but not the vertical displacement of legs) for simplicity; an unsupported system would require stabilization of vertical body attitude and position by e.g. using a tail or pectoral fins that generate lift. We also ignored the effect of fluid drag for the study of planar dynamics as we found no qualitative change in the bifurcation diagram when fluid drag was considered.

Fig. 6B shows a bottom view of a series of snapshots at different times of two experiments. The system was initialized from rest and with $\gamma=0.123$ and $\gamma=0.148$ . The solid line corresponds to the center of mass trajectory of the robot and the dots to the footsteps of the snapshots. The coexistence of fixed points at $\gamma<0.13$ was tested by initializing the robot from rest and with a flying start, i.e. an initial nonzero velocity. As expected we observed two steady solutions: a slow and a fast gait. Note, however, as $\gamma$ increases, we no longer require a nonzero velocity initialization to reach the fast solution branch, effectively demonstrating that gait transition (resting to walking), acceleration, and stabilization are performed without the need for additional control. As observed in skate experiments and our model, the robot also exhibits undulatory behavior and a regular footstep pattern. The observed versus predicted locomotion speeds are shown in Fig. 6C. The observed locomotion speeds are low when started from rest for $\gamma<0.13$ , but converge to the upper branch of the bifurcation diagram for $\gamma>0.13$ , with exception of the $\gamma=0.123$ case where a convergence to the upper branch occurs at marginally lower $\gamma$ than predicted. All together, the gait is completely determined in terms of a constant rate of motion $\omega$ , range of motion $\alpha_{0}$ , and energetic cost as determined by the constant torque $T$ .

V Discussion

Under the assumptions of neutrally buoyant environments and rudimentary leg-like appendages we showed that a left-right alternating bipedal locomotion strategy exists in a minimal model and that it is stable, energy efficient, learnable, and easily realizable in a robotic platform. The fact that such a simple control strategy leads to remarkably robust and efficient behavior is related to the concept of passive dynamics, as seen for instance in a slinky toy “walking” down a slope without the need of complex control and the passive dynamic walkers mcgeer1990passive that revolutionized the way we think about human-like locomotion. Such systems demonstrate that the appropriate morphology for a particular environment often leads to the most efficient behavior with remarkable simple or no active feedback control. In the same vein, it was shown that anesthetized fish can swim upstream given the right environmental conditions beal2006passive which revealed that the concept of passive dynamics is indeed exploited in natural systems. The little skate robot presented here is not passive as the energy source for motion is an internal actuator and not an external source of energy like gravitational potential energy, but it exploits the same principles of a passive system: sustained locomotion under a constant energy source without feedback control.

The selection of the walking gait in the little skate may be a consequence of an increasing energetic cost of transport for swimming at slow locomotion speeds di2016skating , similar to the gait transition in other vertebrates hoyt1981gait . Metabolic rate measurements of walking skates are yet to be recorded and will provide further insights into energy expenditure as a driver of gait selection, but the passive bipedal gait presented here can help explain the energetic benefits of benthic walking in aquatic environments. It must be added that the little skate uses an alternative legged gait called punting koester2003punting which it uses, for example, to kick-start the left-right alternating locomotion strategy. Punting uses two legs simultaneously and was not discovered in our optimization framework, but it may be the preferred gait when fast acceleration is rewarded over energy efficient locomotion.

Our study complements earlier work on the theoretical existence of zero-energy gaits in terrestrial walkinggomes2011walking by showing how it arises in a minimal theoretical model for aquatic bipedalism, and approximately in robot experiments. In particular, it requires the legs to be collinear at the end/beginning of every footstep, effectively reducing the degrees of freedom of the problem. Instead of controlling the legs individually, one leg can simply mirror the motion of the other leg, reminiscent of mirror algorithms used in other impulsive robotic tasks like juggling buhler1990family . This type of gait can also be realized using a rigid body with two attached rigid legs; walking then corresponds to alternate rotations about a vertical axis, centered about one of two legs. This is similar to the waddling gait of penguins, where lateral undulation is thought to improve the energetics of locomotion griffin2000biomechanics . Of course such zero-energy models do not account for losses due to internal damping, cost of leg swing, or contribution of leg mass to collision, fluid drag etc. Adding fluid drag to our model, we found no qualitative difference in dynamics. Comparing the observed steady locomotion speed in the little skate $v^{*}\in[1.1,1.26]$ , we found that it is generally lower than our measured speeds in robots, and might correspond to the gait with no energy loss (see Figs. 1D, SI).

Together, our results demonstrate the minimal requirements in a neural control strategy (constant force input, stability, learnability) while obtaining high energy efficiency (zero-cost gait). The minimal model of aquatic bipedalism that we present yields a stable, open-loop, energy-efficient gait that can be learned using a minimal reinforcement scheme, and can be physically realized in robot experiments. It is also consistent with biological observations in the little skate. Our study also reinforces how the physical environment, the morphology of the organism and the neural substrates synergistically produce a coordinated walking gait, linking to fundamental questions in passive dynamics, self-organization, and embodiment pfeifer2007self . While we may never know exactly how the first walking gait arose, the combination of the embodied passive dynamics associated with a minimal legged morphology that are ancient standen2014developmental , and the presence of conserved neural circuits that are now known to be equally ancient jung2018ancient may well have helped pave the way for legged-gaits before our aquatic ancestors transitioned to terra firma. Understanding how the brain, body and environment worked together in heterogeneous aquatic and terrestrial environments likely needed to include proprioceptive feedback. But in reliably homogeneous environments, perhaps the simple strategy quantified here was where it all started.

VI Methods

Animal data. Kinematic data from little skates was obtained from supplementary movies in jung2018ancient with permission from the authors. Center of mass position, body orientation, and leg positions were extracted using the software Kinovea. Some characteristics of the extracted data are shown in Table 1. The animals were tested in a tank with a textured PDMS surface for traction of the legs with the substrate. Slip per step of the leg during stance phase varied across individuals and ranged between $0.1$ mm to $1$ mm which corresponds to 0.5% to 5% of the step length. Angle plots of $\alpha$ (Figs. 1D, SI) were obtained from measuring the angle between the centerline (from tracking the connecting line between pelvic girdle and mouth) and vectors pointing from the pelvic girdle to the leg tips. Velocities of pelvic girdle as a function of time (insets in Figs. 1D, SI) were computed from filtered trajectories (local regression using weighted linear least squares and a 2nd degree polynomial model using a span of 10% of the total number of data points) and numerically differentiating them with respect to time. Data was made dimensionless with a leg length $L=1.15cm$ and a frequency $\omega=1.1Hz$ which were extracted from the movies.

Table 1: Mean (bold) and standard deviation (parentheses) of kinematic data from 3 individual skates. Data averaged over 10 steps in experiment excluding initial acceleration.

v_{m}

mean nondimensional velocity,

\phi

phase difference of legs,

\alpha_{p}

peak leg angle, slip/step normalized by leg length.

Skate	$v_{m}$	$\phi$	$\alpha_{p}$	slip/step
1	$\bm{1.3}(0.2)$	$\bm{162^{\circ}}(34^{\circ})$	$\bm{2.19}(0.7)$	$\bm{0.088}(0.1)$
2	$\bm{1.38}(0.16)$	$\bm{170^{\circ}}(10^{\circ})$	$\bm{2.21}(0.1)$	$\bm{0.12}(0.14)$
3	$\bm{1.24}(0.3)$	$\bm{186^{\circ}}(10^{\circ})$	$\bm{2.33}(0.02)$	$\bm{0.006}(0.06)$

Reinforcement learning framework. For the model-based optimization of the little skate gait we used a reinforcement learning (RL) framework due to the obvious links between episodic and biological learning. Other optimization methods such as trajectory optimization can also find the optimal solution, but would not provide insight into the learnability of the walking gait via processes related to biological reinforcement glimcher2011understanding . We chose a deep deterministic policy gradient (DDPG) reinforcement learning agent for the optimization of the little skate gait. DDPG lillicrap2015continuous has the advantage of accepting continuous control inputs, which is commensurate with the biological control capabilities of the little skate. The dynamics for the RL environment are obtained by computing the next step position after placing leg $l$ at angle $\alpha_{0}$ on the ground and applying a torque $T$ for $1/\omega$ seconds. This provides the new position coordinates $x,y$ and velocities $\bm{v}=(\dot{x},\dot{y})^{T}$ . We ignored the rotational degree of freedom of the little skate center of mass for simplicity. At every episode, the center of mass is placed at the origin and its velocity set to zero. The reward of step $i$ was defined as

R_{i}=-|v_{y}-v_{T}|+\Delta y\left(1-\frac{T}{T_{max}}.\right)

(7)

The first term on the left side penalizes variations of the end-of-step vertical component of the velocity from the target speed $v_{T}$ . This term drives the system towards a constant locomotion speed $v_{T}$ . The second term accounts for optimization of the cost of transport, in that it rewards the product of travelled distance $\Delta y$ and negative normalized torque. $T_{max}$ is the maximum applicable torque in the system defined as a bound in the RL problem. The bounds for control parameters were $T\in[0,1],\omega\in[1,1000],\alpha_{0}\in[0,\pi]$ , and $l\in\{-1,1\}$ . We used Matlab’s reinfrocement learning toolbox to train the critic and actor networks with two fully connected layers with 400 and 300 nodes and rectifiers as activation functions (except for the actor ouput where we used a tanh function). The leg case (-1 left, 1 right) was treated as a continuous control variable which was put through a signum function before its use. To test the effects of learning parameters on the converged solution we ran combinations of values for noise variance $\{0.1,0.2,0.3\}$ , discount factor $\{0.8,0.9,0.99\}$ and learning rates $\{5\times 10^{-2},5\times 10^{-3},5\times 10^{-4}\}$ . We ran the RL routine for 5000 episodes (an episode was ended after a maximum of 30 steps or if the center of mass surpassed the bounds at $x=\pm l$ ) for all combinations of parameter values and found convergence to the optimal bipedal gait in 17 of 27 cases, one of them is shown in Fig. 5 (all routines with learning rates of $5\times 10^{-2}$ did not converge). We further asked, whether changing the relative weight of the two terms in the reward function (7) had an effect on the optimal solution of the gait. We ran 20 learning routines of 5000 episodes each and weighted the terms 1:3, 1:1, and 3:1. The solution yielding the highest reward was again of the type shown in Fig. 5 (left-right alternating strategy) and was found in 16 of 20 cases.

Robot experiment. We developed a supported legged robotic system to systematically test the model predictions. The robot body was created using PolyJet technology using VeroWhitePlus material. The robot is powered by a 6V Nickel-metal hydride battery and digitally controlled with an Arduino Uno microcontroller. A motor driver (pololu max14870) operated two 6V DC motors (pololu 50:1 micro metal gear motor medium power) to allow for leg rotation. A servo motor per leg ensured ground contact and clearance of the leg tips (Power HD Sub-Micro Servo HD-1440A). Small rubber pads were glued to the leg tips to reduce slip. Two pins were mounted to the robot structure which prevent the legs from exceeding the angle mechanically, and the main robot structure prevented the leg angle from becoming negative, i.e. we always have $\alpha\in[0,2.15]$ . The mass of the robot was $m=350g$ and leg length $L=8cm$ . The robot was connected with a long and stiff aluminum bar to a ball bearing which moves on a linear 1m steel rod. The steel bar was mounted at an angle of $0.5^{\circ}$ at which the ball bearing could slide along the steel rod. This resulted in a decrease in height of the bar position in direction of travel of the robot. Although this decrease in gravitational potential along the bar could potentially be used as a source for acceleration of the robot, friction inside the ball bearing resulted in a marginal velocity loss if the system is started with an initial speed $v_{0}$ . Note that this is a conservative set-up as our model predicts no velocity loss over time in case of no leg collisions with the ground. The robot is hanging above a glass plate 90cm in length onto which the feet could push against when activated to close ground contact. A webcam recorded the locomotion behavior from the bottom of the glass plate at 30fps and center of mass trajectories obtained by tracking a blue marker on the bottom of the robot were subsequently extracted by analyzing the videos using Matlab. See the SI for an illustration of the set-up.

The control strategy for walking was implemented as follows. Both legs are initialized at $\alpha_{0}=2.15$ before every trial. Leg switching frequency $\omega$ was set to 1.3Hz. At switching time both DC motors reverse their rotation direction and servo motors change their state from lifted to contact or from contact to lifted. The parameter $\gamma$ was tuned by changing the leg torque exerted by the DC motors, which was controlled by adjusting the supplied voltage set by the motor controller. See supplementary videos for various trials with a selection of $\gamma$ ’s.

The data generated for Fig. 6C was obtained from 5 independent robot experiments per error bar. All experiments were initialized from rest except the three error bars on the upper branch in the bistable region which were initialized with a nonzero velocity. The nondimensional initial speed of all flying starts is $v=2.4$ on average with standard deviation $0.4$ which was large enough to push the system to a state which is attracted by the upper branch but not too large such that the speed can converge within the limited number of steps. The experiment duration was 20 steps and the velocities correspond to the one after the 20th leg transition or the last leg transition in the field of the camera’s view. Trials at $\gamma\approx 0.05$ which were started with nonzero velocity often converged to the lower branch and slow velocities. The results shown for this case are the 5 successful cases where terminal velocity in the camera’s field of view did not vanish. However, we cannot guarantee that these cases have converged or if they would further decay in a larger experimental set-up, which may explain the prediction error. In the case of $\gamma\approx 0.123$ we observe slower speeds than expected, which can be explained with the fact that the gait has not completely converged to the steady state speed. These long transient phases were observed in simulations where $\gamma$ is close but past the end of the bistable region, which corresponds well to the position of $\gamma\approx 0.05$ .

Acknowledgements.

We acknowledge support from the Swiss National Science foundation (F.G.) and by a MacArthur Fellowship (L.M.).

References

(1) S. Grillner and T. M. Jessell, “Measured motion: searching for simplicity in spinal locomotor networks,” Current opinion in neurobiology, vol. 19, no. 6, pp. 572–586, 2009.
(2) S. Chevallier, A. J. Ijspeert, D. Ryczko, F. Nagy, and J.-M. Cabelguen, “Organisation of the spinal central pattern generators for locomotion in the salamander: biology and modelling,” Brain research reviews, vol. 57, no. 1, pp. 147–161, 2008.
(3) H. Jung, M. Baek, K. P. D’Elia, C. Boisvert, P. D. Currie, B.-H. Tay, B. Venkatesh, S. M. Brown, A. Heguy, D. Schoppik, et al., “The ancient origins of neural substrates for land walking,” Cell, vol. 172, no. 4, pp. 667–682, 2018.
(4) H. M. King, N. H. Shubin, M. I. Coates, and M. E. Hale, “Behavioral evidence for the evolution of walking and bounding before terrestriality in sarcopterygian fishes,” Proceedings of the National Academy of Sciences, vol. 108, no. 52, pp. 21146–21151, 2011.
(5) B. E. Flammang, A. Suvarnaraksha, J. Markiewicz, and D. Soares, “Tetrapod-like pelvic girdle in a walking cavefish,” Scientific reports, vol. 6, p. 23711, 2016.
(6) L. J. Macesic and S. M. Kajiura, “Comparative punting kinematics and pelvic fin musculature of benthic batoids,” Journal of morphology, vol. 271, no. 10, pp. 1219–1228, 2010.
(7) L. O. Lucifora and A. I. Vassallo, “Walking in skates (chondrichthyes, rajidae): anatomy, behaviour and analogies to tetrapod locomotion,” Biological Journal of the Linnean Society, vol. 77, no. 1, pp. 35–41, 2002.
(8) D. M. Koester and C. P. Spirito, “Punting: an unusual mode of locomotion in the little skate, leucoraja erinacea (chondrichthyes: Rajidae),” Copeia, vol. 2003, no. 3, pp. 553–561, 2003.
(9) R. Pfeifer, M. Lungarella, and F. Iida, “Self-organization, embodiment, and biologically inspired robotics,” science, vol. 318, no. 5853, pp. 1088–1093, 2007.
(10) M. Garcia, A. Chatterjee, A. Ruina, and M. Coleman, “The simplest walking model: stability, complexity, and scaling,” Journal of biomechanical engineering, vol. 120, no. 2, pp. 281–288, 1998.
(11) A. Goswami, B. Thuilot, and B. Espiau, “A study of the passive gait of a compass-like biped robot: Symmetry and chaos,” The International Journal of Robotics Research, vol. 17, no. 12, pp. 1282–1301, 1998.
(12) J. R. Usherwood and J. E. Bertram, “Understanding brachiation: insight from a collisional perspective,” Journal of Experimental Biology, vol. 206, no. 10, pp. 1631–1642, 2003.
(13) B. Brogliato, Nonsmooth mechanics. Springer, 1999.
(14) S. H. Strogatz, Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering. CRC Press, 2018.
(15) P. W. Webb, “Station-holding by three species of benthic fishes,” Journal of Experimental Biology, vol. 145, no. 1, pp. 303–320, 1989.
(16) D. F. Hoyt and C. R. Taylor, “Gait and the energetics of locomotion in horses,” Nature, vol. 292, no. 5820, p. 239, 1981.
(17) A. Ruina, J. E. Bertram, and M. Srinivasan, “A collisional model of the energetic cost of support work qualitatively explains leg sequencing in walking and galloping, pseudo-elastic leg behavior in running and the walk-to-run transition,” Journal of theoretical biology, vol. 237, no. 2, pp. 170–192, 2005.
(18) M. Srinivasan and A. Ruina, “Computer optimization of a minimal biped model discovers walking and running,” Nature, vol. 439, no. 7072, p. 72, 2006.
(19) R. M. Alexander, “Optimum walking techniques for quadrupeds and bipeds,” Journal of Zoology, vol. 192, no. 1, pp. 97–117, 1980.
(20) P. W. Glimcher, “Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis,” Proceedings of the National Academy of Sciences, vol. 108, no. Supplement 3, pp. 15647–15654, 2011.
(21) T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
(22) R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
(23) B. McInroe, H. C. Astley, C. Gong, S. M. Kawano, P. E. Schiebel, J. M. Rieser, H. Choset, R. W. Blob, and D. I. Goldman, “Tail use improves performance on soft substrates in models of early vertebrate land locomotors,” Science, vol. 353, no. 6295, pp. 154–158, 2016.
(24) J. A. Nyakatura, K. Melo, T. Horvat, K. Karakasiliotis, V. R. Allen, A. Andikfar, E. Andrada, P. Arnold, J. Lauströer, J. R. Hutchinson, et al., “Reverse-engineering the locomotion of a stem amniote,” Nature, vol. 565, no. 7739, p. 351, 2019.
(25) J. Zhu, C. White, D. Wainwright, V. Di Santo, G. Lauder, and H. Bart-Smith, “Tuna robotics: A high-frequency experimental platform exploring the performance space of swimming fishes,” Science Robotics, vol. 4, no. 34, p. eaax4615, 2019.
(26) J. Aguilar, T. Zhang, F. Qian, M. Kingsbury, B. McInroe, N. Mazouchova, C. Li, R. Maladen, C. Gong, M. Travers, et al., “A review on locomotion robophysics: the study of movement at the intersection of robotics, soft matter and dynamical systems,” Reports on Progress in Physics, vol. 79, no. 11, p. 110001, 2016.
(27) T. McGeer et al., “Passive dynamic walking,” I. J. Robotic Res., vol. 9, no. 2, pp. 62–82, 1990.
(28) D. Beal, F. Hover, M. Triantafyllou, J. Liao, and G. V. Lauder, “Passive propulsion in vortex wakes,” Journal of Fluid Mechanics, vol. 549, p. 385, 2006.
(29) V. Di Santo and C. P. Kenaley, “Skating by: low energetic costs of swimming in a batoid fish,” Journal of Experimental Biology, vol. 219, no. 12, pp. 1804–1807, 2016.
(30) M. Gomes and A. Ruina, “Walking model with no energy cost,” Physical Review E, vol. 83, no. 3, p. 032901, 2011.
(31) M. Buhler, D. E. Koditschek, and P. J. Kindlmann, “A family of robot control strategies for intermittent dynamical environments,” IEEE Control Systems Magazine, vol. 10, no. 2, pp. 16–22, 1990.
(32) T. M. Griffin and R. Kram, “Biomechanics: penguin waddling is not wasteful,” Nature, vol. 408, no. 6815, p. 929, 2000.
(33) E. M. Standen, T. Y. Du, and H. C. Larsson, “Developmental plasticity and the origin of tetrapods,” Nature, vol. 513, no. 7516, p. 54, 2014.