This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Metrics for 3D Object Pointing and Manipulation in Virtual Reality

Eleftherios Triantafyllidis, Wenbin Hu, Christopher McGreavy and Zhibin Li The authors are with the School of Informatics, The University of Edinburgh, Edinburgh, United Kingdom.{eleftherios.triantafyllidis,wenbin.hu,c.mcgreavy,zhibin.li}@ed.ac.uk
Abstract

Assessing the performance of human movements during teleoperation and virtual reality is a challenging problem, particularly in 3D space due to complex spatial settings. Despite the presence of a multitude of metrics, a compelling standardized 3D metric is yet missing, aggravating inter-study comparability between different studies. Hence, evaluating human performance in virtual environments is a long-standing research goal, and a performance metric that combines two or more metrics under one formulation remains largely unexplored, particularly in higher dimensions. The absence of such a metric is primarily attributed to the discrepancies between pointing and manipulation, the complex spatial variables in 3D, and the combination of translational and rotational movements altogether. In this work, four experiments were designed and conducted with progressively higher spatial complexity to study and compare existing metrics thoroughly. The research goal was to quantify the difficulty of these 3D tasks and model human performance sufficiently in full 3D peripersonal space. Consequently, a new model extension has been proposed and its applicability has been validated across all the experimental results, showing improved modelling and representation of human performance in combined movements of 3D object pointing and manipulation tasks than existing work. Lastly, the implications on 3D interaction, teleoperation and object task design in virtual reality are discussed.

Refer to caption
Figure 1: An operator interacts with objects in full 3D virtual reality with all task-related spatial variables.

I INTRODUCTION

With the recent advances in networking and Mixed Reality (MR) technologies and in conjunction with the increasingly more immersive applications of telepresence and teleoperation, the need to measure and model human performance in 3D space has increased exponentially [1, 2]. However, a compelling standardized metric for 3D object selection, such as those seen in Virtual Environments (VEs) and teleoperation, does not yet exist. The absence of a standardized metric severely limits inter-study comparability and most importantly transferability between the results of different studies, due to the multitude of different metrics researchers can use [3]. Consequently, progress towards the endeavour of a standardized formulation is still “scattered” and often disregards important aspects that would solidify a concrete and established metric [4].

To propose a higher dimensional metric for assessing human performance, we investigate Paul Fitts’ original predictive model, short for Fitts’ law [5, 6]. Proposed in 1954, the law has been extensively used in human-computer interaction (HCI) and ergonomics research and still represents the gold standard as a performance metric [2, 7]. This is attributed to the advantage of using Fitts’ law to measure human performance in a time-based approach based on spatial data, effectively combining both time and spatial based metrics under one formulation. Fitts’ law was originally formulated for 1D translational movements [5], but has also been extended to 2D tasks [7, 8] with its applicability highlighted in rotational tasks as well [9, 10, 11]. Recently, Fitts’ formulation was also extended to some extent in 3D space, limited to translational tasks, with numerous reformulations [1, 12, 13]. Yet, these either disregarded important spatial aspects or limited their findings to very specific settings. This aggravates inter-study comparisons particularly due to variations in tested motor tasks.

More specifically, previously proposed 3D metrics extending Fitts’ law, have either disregarded combining translational and rotational tasks under one setting [1, 12, 13], assessing directional [10, 11] or inclination variations [1, 10, 11, 13] in one exhaustive study. All of the aforementioned appear to have significant effects on human performance [2, 4]. More importantly, to date, most studies only focused on pointing tasks and by extent, there is a severely limited focus on manipulation tasks, e.g. with the incorporation of grasping and physical properties such as gravity and friction forces.

Consequently, we studied the aforementioned factors and spatial complexities seen in full 3D space to investigate their contribution towards a human performance model. More specifically, we performed an exhaustive user study in Peripersonal Space (PPS) i.e. close to the user’s body, in a virtual simulation environment using a VR headset and optical hand tracking. The virtual environment consisted of simulated teleoperation tasks with high-fidelity physics leveraging the anthropomorphic dexterous robotic hand. By collectively combining and adding 3D spatial variables in four progressively more complex experiments, we derived our own model extension and compared the proposed metric with the state of the art while also verifying their applicability at each stage.

From our results, to the best of our knowledge, we present a first of its kind 3D human performance metric based on Fitts’ law, which extends beyond current work by modelling full 3D space better than existing formulations. More specifically, our metric is able to capture 3D human motion entailing combined translational and rotational movements, with varying degrees of directions and inclinations in both object pointing and manipulation under a single formulation. This metric can be used to assess human performance by modelling the complex motions, Degrees of Freedoms (DoFs) and dimensions associated with VEs entailing Virtual Reality (VR) [1, 14] as well as teleoperation [3, 15, 16, 17]. Consequently, the effects of different user interfaces, devices and robotic systems on user performance can be modelled and assessed with such a metric with the added advantage of also combining time and spatial based metrics under one model.

The contributions of our work are summarized as follows:

  • We propose a new higher-dimensional metric to assess human motor performance in full 3D space;

  • An intuitive motion re-targeting of hands in high-fidelity physics to allow generalization of our method towards virtual and realistic object interaction and manipulation;

  • A thorough comparison of our proposed metric with others in the literature under different experimental settings and their validity towards higher dimensions;

  • Study of quantities and variables to better explain the 3D spatial relationship of objects and recommendations for the design of pointing and manipulation tasks in VR and teleoperation.

In the remainder of the paper, we first introduce the most widely used extension methods based on Fitts’ law and those to be compared with our metric. We then describe our methodology, apparatus and the design of the experiments which progressively add more complex spatial variables at each stage. Finally, we derive our proposed metric and discuss and compare our results with the extensions in the related work. A video presentation of this work is also available online111Video presentation of this work including the virtual tasks completed by the participants: https://youtu.be/MKH2gGJQq2o..

II RELATED WORK

In this section, we introduce Fitts’ original law for translational tasks and the current state of the art of its 2D and 3D extensions, as well as the importance of including rotation into a formulation. Ultimately, we investigate the incorporation of both translational and rotational movements under one combined model. In Table I we summarize all model extensions.

II-A Fitts’ Original Formulation

Fitts’ formulation has been extensively used in motion prediction in HCI and ergonomics research [5, 6]. The formulation predicts the Movement Time (MTMT) of how long it takes for users to point to a target on a screen. It is formulated as:

MT=a+bID,where ID=log2(2AW).\begin{gathered}MT=a+b\cdot ID,\text{{where} }ID=log_{2}\left(\frac{2A}{W}\right).\end{gathered} (1)

Here, AA represents the distance between the object and the target. WW represents the width of the target area. The logarithmic term IDID, represents the Index of Difficulty of the task, measured in bits per second. The resultant MTMT is measured in seconds. The constants aa and bb represent the y-intercept and slope respectively and are derived via regression analysis.

II-B Importance and Limitations of Fitts’ Law

While extensively used, the original formulation shown in Eq. 1 suffers in terms of simplicity when full 3D space is considered. More specifically, the formulation is limited to four-key areas, namely, (i) lower-dimensional 1D and 2D space, (ii) lower DoFs entailing only translational tasks without the combination of rotational movements, (iii) pointing tasks without the addition of physical properties such as gravity or friction e.g. manipulation of objects and (iv) single line movements without the use of spatial arrangements e.g. directions and inclinations. During interactions either in VR or teleoperation, all four aspects are a fundamental and inseparable part of human motion in 3D [2, 10, 14, 18].

Nevertheless, the ability of the law to combine both time and spatial based metrics under a single formulation renders the pursuit of extending it to 3D of significant importance. As identified, part of the motivation of this paper stems from supplementing different evaluation metrics with a single metric in 3D space. The derivation of a model extension based on Fitts’ law for full 3D space, is expected to increase the overall comparability between the results of different studies attempting to capture 3D human motion [2, 3], particularly attributed to the significant focus on the law in current work.

Hence, relying on multiple types of metrics to assess human performance in either VR or teleoperation, should ideally be avoided, as comparability between the results of such studies is rendered challenging [2, 16]. Earlier works on teleoperation [16] and recent ones with anthropomorphic robotic hands (19 DoFs) leveraging MR technologies [15], unfortunately, do not make use of metrics such as Fitts’ law and instead rely on multiple time-/spatial-/behaviour-based metrics. As a result, comparability even between similar work is aggravated, since all compared studies need to have identical evaluation metrics.

II-C 2-Dimensional Extensions

While originally developed for 1D tasks, Fitts’ predictive model has also been widely applied to 2D pointing tasks [7, 8, 19]. Hoffmann [8] conducted a series of discrete tapping tasks, using participants’ fingers as pointing probes with the width of the finger added to the IDID of Eq. 1 as:

ID=log2(2AW+F),ID=log_{2}\left(\frac{2A}{W+F}\right), (2)

where FF represents the index finger pad size, which can be interpreted as the size of the object to be transported to the target WW as a natural extension towards pick and place tasks [20]. Welford [19] proposed another variant of Eq. 1 as:

ID=log2(AW+0.5),ID=log_{2}\left(\frac{A}{W}+0.5\right), (3)

which has been demonstrated to do well in 2D task settings.

Mackenzie [7] extended Fitts’ original law, known as the Shannon equation, yielding a better fit and formulated as:

ID=log2(AW+1).ID=log_{2}\left(\frac{A}{W}+1\right). (4)

The robustness and linear fit of the Shannon model has been demonstrated for both translational [11, 13] and rotational tasks [11]. The resulting IDsIDs of Mackenzie’s model are one bit less than with Fitts’s formulation.

Contrary to Fitts’ model, the Shannon formulation is limited in terms of mathematical expressiveness. Though Mackenzie argued that the addition of the +1 term avoids negative IDsIDs in the original Eq. 1, this is not entirely true. A negative IDID in Fitts’ model would mean that the cursor or probe is already within the target area. This limits theoretical justification for the purpose of higher model fitting [4]. Nevertheless, adopting Mackenzie’s model facilitates the comparability of numerous extensions as this has become the norm for most of the subsequent models hereinafter presented.

II-D 3-Dimensional Extensions

While Fitts’ law has been applied to some extent towards 3D pointing tasks [12, 13], it does not accurately represent 3D movement [2, 11]. Murata and Iwase [13] introduced an extension of Fitts’ law to 3D pointing tasks taking into account directional angles θ\theta between an origin and a target. Their model is based on Eq. 4 and formulated as:

ID=log2(AW+1)+csinθ,ID=log_{2}\left(\frac{A}{W}+1\right)+c\cdot\sin{\theta}, (5)

with θ\theta being the azimuth angle and effectively added to the IDID. Their work found that directional angles θ\theta had an almost sinusoidal relationship with MTMT [13].

A later study by Cha and Myung [12] extended the above work by adding inclination angles, representing higher dimensions in finger aimed pointing tasks in the spherical coordinate system. Based on Eq. 2, it is formulated as:

MT=a+bθ1+csinθ2+dlog2(2AW+F),MT=a+b\cdot\theta_{1}+c\cdot\sin{\theta_{2}}+d\cdot log_{2}\left(\frac{2A}{W+F}\right), (6)

where θ1\theta_{1} and θ2\theta_{2} represent the inclination and azimuth angles from the starting point to the target respectively. Directional angles followed again an almost sinusoidal relationship with MTMT as shown in Murata’s and Iwase’s work [13]. The constants aa, bb, cc and dd are empirically determined through linear regression. However, in the work of [12], they limited their investigation to forward motions covering between ±30\pm 30^{\circ} and ±60\pm 60^{\circ} azimuth angles. Recent work, by Barrera Machuca and Stuerzlinger [1] accounted for the above by introducing pointing tasks with the use of 3D displays to point towards targets ranging from azimuth angles of 90-90^{\circ} to 9090^{\circ} and 00^{\circ} to 180180^{\circ}. Their work confirmed that left-to-right movements were easier than movements away from or towards the user. While this investigation covered a wider range of azimuth angles, it still disregarded inclination angles as the height of the objects was adjusted to the view height of each participant. However, human motor skills vary significantly with the direction of movements, for example, upward movements appear to be more demanding than downward ones [12]. Hence, it is important to study the effects of both directions and inclinations.

So far in our analysis, no model has investigated rotational variations or even combined rotational with translational movements all in one setting. Additionally, the identified spatial arrangements and factors in this section should ideally be included as well. Our work aims to address these by effectively combining them in one setting.

Performance Models Model Formulation and Equation Model Characteristics
MT (sec) IDID (bit/sec) Derived From Space Dir. Inc. Rot.
Fitts’ [5] MT=a+bIDMT=a+b\cdot ID ID=log2(2A/W)ID=log_{2}\left({2A}/{W}\right) [N/A] 2D No No Yes
Hoffmann’s [8] MT=a+bIDMT=a+b\cdot ID ID=log2(2A/W+F)ID=log_{2}\left({2A}/{W+F}\right) [5] 2D No No No
Welford’s [19] MT=a+bIDMT=a+b\cdot ID ID=log2(A/W+0.5)ID=log_{2}\left({A}/{W}+0.5\right) [5] 2D No No No
Shannon’s [7] MT=a+bIDMT=a+b\cdot ID ID=log2(A/W+1)ID=log_{2}\left({A}/{W}+1\right) [5] 2D No No Yes
Murata and Iwase’s [13] MT=a+bIDMT=a+b\cdot ID ID=log2(A/W+1)+csinθID=log_{2}\left({A}/{W}+1\right)+c\cdot\sin{\theta} [7] 3D Yes No No
Cha and Myung’s [12] MT=a+bθ1+csinθ2+dIDMT=a+b\cdot\theta_{1}+c\cdot\sin{\theta_{2}}+d\cdot ID ID=log2(2A/W+F)ID=log_{2}\left({2A}/{W+F}\right) [13, 8] 3D Yes Yes No
TABLE I: Summary of the models that are the most widely used to date in existing work based on Fitts’ law and those compared with our proposed model. Each model’s characteristics are summarized and whether important spatial variables are accounted for in the 3D domain. Dir.: Directions. Inc.: Inclines. Rot.: Rotation.
Refer to caption
Figure 2: Illustration of three types of tasks in 2D space: translational tasks, rotational tasks and the combination of both.

II-E Combining Translation and Rotation

To this point, we analyzed the most widely used extensions of Fitts’ law, limited to purely translational tasks. However, while performing almost any type of manipulation or pointing task, we attempt to match the rotation of the object as well, as to satisfy certain spatial criteria [2, 16, 21]. In addition, to effectively describing general human movement, the simultaneous presence of both translation and rotation needs to be accounted for. Both are an essential and inseparable part when manipulating either real or virtual objects in the 2D and 3D domain [10, 11, 14]. One early study showed that Fitts’ law can be adjusted and applied to purely 2D rotational tasks, but did not investigate combined movements [9].

The combination of translational and rotational tasks in 2D space is visually depicted in Figure 2. Stoelen and Akin [11] were the first to “merge” both translational and rotational terms into an extended formulation based on Eq. 4. They both suggested that rotation and translation can be accounted for by effectively adding the sum of indices of difficulty for each one. The implication is that both rotation and translation are independent transformations explaining the spatial relationship of an object. The simplicity of adding both terms in their study into one index of difficulty is supported by Kulik et al. [10]. The latter also found that long task completion times in object manipulation can be attributed to either increased cognitive workload or even to excessive accuracy requirements instructed by the researchers. Yet, the latter finding still requires further modelling of variations of accuracy requirements in order to quantify this relationship.

Nevertheless, in both cases in [10] and [11], combined movements and spatial arrangements were limited to 2D, i.e. following only a straight line. As future work, the authors pointed out that it is important to generalize and extend a metric to three dimensions for modelling virtual reality navigation and teleoperation [11]. Our work addresses these limitations.

III METHODOLOGY AND STUDY DESIGN

In this section, we introduce our system setup including the apparatus and technologies used. We also present our hand-control approach that allowed participants to interact with the virtual objects simulated by realistic high-fidelity physics.

III-A System Overview and Apparatus

3(a) illustrates the system overview, including the simulation engine Unity3D, all necessary hardware and software. Two input sensors were used to interact with the simulation. The first sensor was the optical hand tracking device – Leap Motion Hand Controller (LMHC). Equipped with two infrared cameras at 120Hz and with a 135135^{\circ} Field of View (FoV), the LMHC allowed to interface between the user’s hand movements and the physics engine PhysX in Unity3D. For visual feedback, high-resolution displays were used to limit distance overestimation and degraded longitudinal control, a known issue in VEs [1, 2]. Consequently, the Virtual Reality Head-Mounted Display (VRHMD) HTC Vive Pro was used, with a 2880 x 1600 pixel resolution display and 110110^{\circ} FoV at 90HzHz. The photosensors on the VRHMD also represented our second sensor, allowing for head rotations in the VE. Furthermore, ROS# was used to import physics models of robots and objects (Unified Robot Description Format). For all experiments, the LMHC was fixed on the front of the VRHMD.

The physics simulation time-step was set at 1000HzHz to ensure robust and stable performance with realistic forces and frictions. Finally, to ensure optimal hand tracking performance, lightning conditions were consistent and operational space was limited to about 100cm as the upper maximum reaching bounds from the chest of users. In addition, a low-pass filter with a cutoff frequency of 10Hz was applied to the LMHC to reduce noise during retargeting, ensure continuity and robustness.

III-B Hand Control and Input Interface

As shown in 3(a), the user’s hand movements were mapped onto an anthropomorphic Shadow Robotic hand in the simulation. The palm of the simulated hand had 6 DoF and could move freely around the virtual environment in all axes for both translation and rotation. To tele-control the virtual hand, the Cartesian hand movements and joint positions from the participant were mapped onto that in the virtual environment. The re-targeting approach was similar to the work in [14], where the joint positions of the user’s fingers were obtained by calculating the angle θ\theta between a joint bi1\overrightarrow{b_{i-1}} and its parent joint bi\overrightarrow{b_{i}}. The resultant angle θ\theta from the user’s finger is then incorporated in a joint PD controller to achieve the desired re-targeting joint motions from the user hand to the simulated robotic one. These are formulated as:

θ\displaystyle\theta =arccos(bibi1bibi1),\displaystyle=\arccos\left(\frac{\overrightarrow{b_{i}}\cdot\overrightarrow{b_{i-1}}}{\left\|\vec{b_{i}}\right\|\left\|\overrightarrow{b_{i-1}}\right\|}\right), (7)
τi\displaystyle\tau_{i} =KPi(θdiθci)KDiθ˙i,\displaystyle=K_{P_{i}}\cdot(\theta_{d_{i}}-\theta_{c_{i}})-K_{D_{i}}\cdot\dot{\theta}_{i},

where θ\theta represents the desired angle for the virtual hand to match. Furthermore, τ\tau are the torques applied to each joint ii. θdi\theta_{d_{i}} and θci\theta_{c_{i}} are the desired angles from the human hand from the LMHC and the current angle of the virtual hand joint in the simulation, respectively. Finally, θ˙i\dot{\theta}_{i} is the measured velocity of the virtual hand for computing the damping torque.

Furthermore, a velocity control signal was applied to the palm of the robotic hand based on the real hand’s position and orientation. Hence, the 6 DoFs of the robotic hand were controlled by matching its translation and orientation with that of the real one captured by the LMHC. The collision between the robotic hand and the objects in the simulation environment was realised via the built-in PhysX 4.0 engine in Unity3D. We furthermore retained the original joint limits as well as colliders of the virtual hand, as specified in the URDF file of the Shadow robot hand to ensure optimal and realistic actuation.

Refer to caption
(a) Physics simulation environment and hand setup.
Refer to caption
(b) Pointing tasks.
Refer to caption
(c) Manipulation tasks.
Figure 3: The system overview and the two types of interaction in the simulation environment. (a) depicts the simulation environment and the retargeting approach: the illustration of the user’s hand, the LMHC visualisation of the joint vectors and the motion retargeting of the robotic hand in Unity3D. (b) and (c) represent the pointing and manipulation tasks respectively, broken down into the four basic phases of interacting with an object. The only difference between (b) and (c) lies in phase P2, where the object is either grasped or attached to the hand depending on the type of interaction.

III-C Task Design and Spatial Assessment

In each task, participants were asked to move an object from a start to a target location. Contrary to the use of a sphere or the index finger of a participant in most related work [1, 12, 13], the use of a cube allowed us to assess rotational variations in our experiment. Due to its identifiable orientation and as one of the most basic 3D shapes, the cube presented a suitable choice to assess both translational and rotational tasks. Regarding rotation, we instructed participants to match the sides of the cube with that of the cube target, as parallel as possible. While using a cube introduces in essence four “correct” rotations and limits to some extent the range of rotations one can investigate (e.g. to a maximum of 45 degrees), it still represents the dominant and most widely used 3D shape in current work [14]. Our approach is influenced by [10, 11], which included rotational tasks, but were limited to 2D movements only following straight lines without directions and inclinations. The targets were arranged in spherical coordinates with the object at the centre. Figure 2 and Figure 1 illustrate movements in 2D and full 3D space respectively.

The interactive object manipulated by users was presented as a solid blue cube and the target location as a transparent cubic volume. When intersected and translational and/or rotational requirements met, depending on the task type and difficulty, the transparent target would turn green, indicating the success of the task and progressing to the next, as shown in 3(c) and 3(b). In the cases of pure translational tasks, a 50% overlap with the target was considered a success as with Fitts’ original experiment. For rotational tasks, the target rotation, α\alpha, needed to be matched within a certain rotation tolerance ω\omega, e.g. the overlap of object-to-target had to be matched in all axes. In Figure 4 we visually depict and describe the mathematical equations that needed to be satisfied in order for each type of task to be classified either as a success or an error.

Refer to caption
Refer to caption
Refer to caption
Translational Tasks Rotational Tasks Combined Tasks
T={1,AW20,A>W2T=\begin{cases}1,&A\leqslant\frac{W}{2}\\ 0,&A>\frac{W}{2}\end{cases} R={1,αiωi0,αi>ωiR=\begin{cases}1,&\alpha_{i}\leqslant\omega_{i}\\ 0,&\alpha_{i}>\omega_{i}\end{cases}, i:3i:\mathbb{R}^{3} C=TRC=T\cdot R
Notations for Translational Tasks: AA represents the distance between two 3D points, hence A=d(Wx,y,z,Fx,y,z)=i=13(xiyi)2A=d(W_{x,y,z},F_{x,y,z})=\sqrt{\sum_{i=1}^{3}\left(x_{i}-y_{i}\right)^{2}}.
Notations for Rotational Tasks: αi=|WαiFαi|\alpha_{i}=\left|W_{\alpha_{i}}-F_{\alpha_{i}}\right|. WαiW_{\alpha_{i}} and FαiF_{\alpha_{i}} represent the rotation of the target and the object in all axes (ii) respectively. αi=αi\alpha^{\prime}_{i}=\alpha_{i}.
Notations for Combined Tasks: Both translational and rotational requirements need to be met to be classified as success e.g. T=1T=1 and R=1R=1.
Figure 4: From left to right, the figure represents the progression requirements for Translational, Rotational and Combined tasks. The green cube volume illustrates successful placements. The table represents the task successes (1) or errors (0).

III-D Pointing vs Manipulation

In addition to varying all possible spatial variables in 3D object interaction, we also investigated pointing and manipulation tasks, which are the most common types of interactions in both VEs [1] and teleoperated simulation environments [14, 16]. Pointing tasks were investigated as still being the dominant interaction type in 3D user interfaces and it closely resembles Fitts’ original experiment. Furthermore, pointing tasks are a natural extension of peg insertion tasks, commonly seen in teleoperation [16]. To cover the most critical limitation in current literature in using Fitts’ law for robotics, we also studied manipulation tasks e.g. using grasping, as these are the dominant types of object interaction in teleoperation [2].

For pointing tasks, we used the index pad finger of the virtual hand, to “attach” the cube to the participant’s hand, allowing users to move their hand and the cube to the target location. The object would attach or collide during the intersection between the index finger and the object, and it would match the position of the pad index finger but retain its original orientation. On the other hand, for the manipulation tasks, realistic physics and friction forces were simulated, where participants had to grasp and transport the object to the target location by simulated contact forces. Gravity was enabled for manipulation and a table was simulated by a collision plane. 3(b) and 3(c) visually depict these two types of object interactions.

IV ANALYSIS AND PROCEDURE

All results have been tested for significance (95% CI) via the use of a Repeated-Measures ANOVA (RM-ANOVA). A Shapiro-Wilk Test was used to verify the normality of the data prior to the RM-ANOVA. Maulchy’s Test of sphericity was used to test sphericity and in cases of violation, a Greenhouse-Geisser Correction (GGC) was used to account for the violation and correct the degrees of freedom assuming ϵ\epsilon <<0.75. For non-parametric data violating normality, an Aligned-Rank Transform (ART) [22], was used to allow the use of the RM-ANOVA on the ranked data. A step-wise linear regression determined the effect of each variable for conceiving our metric equation and the effect on the predictability of MTMT. The independent spatial variables were either fitted using the criteria of the probability of F<.05F<.05 to enter, or F>.10F>.10 to remove. Finally, we used linear regression analysis (r2r^{2}) to analyze and compare our proposed model with existing work. Hereinafter, the significance levels are: *p<p<.05, **p<p<.01, ***p<p<.001, and n.s standing for “not significant”.

IV-A Participants

A total number of 20 participants (N=20N=20) were recruited in this study (4 females and 16 males), with ages ranging from 19 to 46 (μ=27.35\mu=27.35, σ=5.43\sigma=5.43). The selection criteria we set during recruitment was that each participant was (i) right-handed, (ii) had healthy hand control with (iii) normal/corrected vision and (iv) was familiar with either video games or VR. Participants that did not meet all of these criteria were excluded from the experiment. Participants were asked to find a balance between minimizing errors and selecting the targets as quickly as they could during target selection and placing.

IV-B Pre-Exposure and Approach

Prior to the commencement of the experiment, participants were briefed, gave formal written consent and their Individual Interpupillary Distance (IPD) was measured for the VRHMD. Furthermore, acclimatization to the simulation environment was allowed for all participants prior to the experiment, via a set of 96 training exercises. These covered both pointing and manipulation task types, e.g. 48 for each type. Furthermore, both translational and rotational movements were included in these exercises, as to cover the entirety of all spatial complexities later investigated. The training exercises were specific to the acclimatization procedure and independent of those in the experiment. We also randomized the order of all experiments for all participants to counterbalance potential acclimatization or task adaption. A grand total of 39,040 trials were recorded.

IV-C Experiment Procedure

To investigate the applicability of our method in full 3D space, we conducted four increasingly more complex experiments as shown in Table II. The breakdown in experiments allowed us to compare existing approaches in increasingly more complex spatial settings. All four experiments included two additional and different types of interactions: pointing and manipulation. Throughout, positional tracking was disabled but head tracking was allowed. Participants were nonetheless asked to retain their body pose without moving their torso. To limit potential tiredness, a resting break of 15 seconds took place for every 15 tasks, where users could rest their hands. To control the duration of all experiments, a maximum number of 15 (pointing) and 20 (manipulation) seconds were given for each task, after which the task ended and was considered an error.

Finally, as manipulation inherits multiple different grasping techniques, generally argued to be six distinctive grasping types according to the Southampton Hand Assessment Procedure (SHAP), we instructed participants to use precision/tip grasping for all manipulation tasks, as shown in 3(a) and 3(c). Error trials were excluded from the analysis. All of the aforementioned settings described in this section remained constant for all experiments.

Variables Variations Investigated
Experiment 1 Experiment 2 Experiment 3 Experiment 4
FF
Object Size
(3, 4, 5) (5) (4, 5) (4)
WW
Target Width
(5, 7.5, 10, 12.5) (5, 10) (5, 10) (4, 8)
AA
Target Separation
(12, 24, 36, 48) (12, 24) (0) (12, 24)
ϕ\phi
Direction Angle
(90) (0, 90, 180, 270) (0) (0, 90)
θ\theta
Inclination Angle
(0) (15, 30, 45) (0) (15, 30)
α\alpha
Angular Distance
(0) (0) (15, 30, 45) (30, 45)
ω\omega
Angular Tolerance
(0) (0) (2.5, 5, 7.5, 10) (7.5,15)
Variations x (Reps.) 48x(5)=240 48x(5)=240 48x(5)=240 64x(4)=256
TABLE II: Parameter settings of all spatial variations.

V EXPERIMENTS

Here we present all experiments (E1 to E4) with increasing spatial complexity, each has their results interpreted and analyzed for both pointing and manipulation. Table II illustrates all the variations of the investigated variables. Figure 5 visually depicts the relationship between these variables and MTMT; and Table III summarizes the statistical results and the significance levels. Finally in Table IV and Figure 6, we summarize and compare all models with their respective r2r^{2} values.

V-A Experiment 1: Purely Translational Tasks

In this experiment, we were primarily interested in approximating Fitts’ original experiment, except in 3D. Hence, we investigated the effects of purely translational tasks on the variables of object size FF, target width WW and target separation AA. Movements were along one line only, with no directions or inclinations as to keep the 3D task simple.

V-A1 Design

This study used a 3x4x4 within-subjects design, and the independent variables were: three object sizes (FF = 3, 4 and 5cm), four target sizes (WW = 5, 7.5, 10, 12.5cm) and four target separations (AA = 12, 24, 36, 48cm). The dependent variable was MTMT. Among 48 types of tasks, each had 5 repetitions for both pointing and manipulation, hence a total of 480 trials. With 20 participants, a total of 9600 trials were recorded.

V-A2 Pointing Results

The data was not normally distributed, as the Shapiro-Wilk Test yielded p=.029p=.029, and thus an ART was used prior to the RM-ANOVA to allow the analysis on the ranked data. The mean MTMT for pointing tasks was 1.63 ±\pm 0.40s. Four trials out of 4800 were excluded from the analysis due to errors. Our results showed that an increase in object size FF and target width WW significantly decreased MTMT, (p<.01p<.01) and (p<.001p<.001) respectively. Compared to FF and WW, the target separation AA had the biggest effect on MTMT, with an almost linearly increasing correlation (p<.001p<.001).

V-A3 Manipulation Results

The data was normally distributed (p=.229p=.229) and the mean MTMT was 2.13 ±\pm 0.45s. In total 44 trials out of 4800, i.e. 0.9% of the data, were excluded from the analysis due to errors. MTMT significantly decreased as object size FF and target width WW increased, (p<.001p<.001) and (p<.001p<.001) respectively. The target separation AA had again the biggest effect, almost linearly increasing with MTMT (p<.001p<.001).

Pointing Manipulation
Sphericity Test RM-ANOVA Test Sphericity Test RM-ANOVA Test
χ2\chi^{2} pp ϵ\epsilon FF-Test ηp2\eta_{p}^{2} pp r2r^{2} (%) χ2\chi^{2} pp ϵ\epsilon FF-Test ηp2\eta_{p}^{2} pp r2r^{2} (%)
Experiment 1
FF χ2(2)\chi^{2}(2)=2.468 n.s n/a F(2,30)=9.441 0.386 ** 0.5% χ2(2)\chi^{2}(2)=3.314 n.s n/a F(2,30)=14.468 0.491 *** 3.1%
WW χ2(5)\chi^{2}(5)=6.992 n.s n/a F(3,33)=102.522 0.903 *** 10.2% χ2(5)\chi^{2}(5)=5.409 n.s n/a F(3,33)=19.162 0.635 *** 5.3%
AA χ2(5)\chi^{2}(5)=8.360 n.s n/a F(3,33)=602.976 0.982 *** 80.4% χ2(5)\chi^{2}(5)=5.968 n.s n/a F(3,33)=197.465 0.947 *** 84.7%
Experiment 2
WW n/a n/a n/a F(1,23)=287.752 0.925 *** 40.0% n/a n/a n/a F(1,23)=51.431 0.691 *** 35.6%
AA n/a n/a n/a F(1,23)=190.993 0.893 *** 43.7% n/a n/a n/a F(1,23)=33.986 0.596 *** 23.0%
ϕ\phi χ2(5)\chi^{2}(5)=12.609 * 0.675 F(2.025,22.270)=6.918 0.386 * 3.0% χ2(5)\chi^{2}(5)=3.722 n.s n/a F(3,33)=0.851 0.072 n.s 1.0%
θ\theta χ2(2)\chi^{2}(2)=4.346 n.s n/a F(2,30)=7.160 0.323 * 1.3% χ2(2)\chi^{2}(2)=2.210 n.s n/a F(2,30)=5.535 0.270 * 5.4%
Experiment 3
FF n/a n/a n/a F(1,23)=1.551 0.063 n.s 0.0% n/a n/a n/a F(1,23)=1.195 0.049 n.s 0.5%
WW n/a n/a n/a F(1,23)=0.299 0.013 n.s 0.0% n/a n/a n/a F(1,23)=40.805 0.640 *** 6.4%
α\alpha χ2(2)\chi^{2}(2)=0.781 n.s n/a F(2,30)=23.639 0.612 *** 1.8% χ2(2)\chi^{2}(2)=3.589 n.s n/a F(2,30)=12.915 0.463 *** 2.9%
ω\omega χ2(5)\chi^{2}(5)=11.386 * 0.639 F(1.918,21.093)=184.323 0.944 *** 81.8% χ2(5)\chi^{2}(5)=4.454 n.s n/a F(3,33)=89.481 0.891 *** 73.3%
Experiment 4
WW n/a n/a n/a F(1,31)<<0.001 0.0 n.s 0.0% n/a n/a n/a F(1,31)=58.815 0.655 *** 13.1%
AA n/a n/a n/a F(1,31)=51.296 0.623 *** 8.5% n/a n/a n/a F(1,31)=48.664 0.611 *** 19.2%
ϕ\phi n/a n/a n/a F(1,31)=2.278 0.680 n.s 0.7% n/a n/a n/a F(1,31)=0.205 0.007 n.s 0.1%
θ\theta n/a n/a n/a F(1,31)=1.718 0.053 n.s 0.2% n/a n/a n/a F(1,31)=2.990 0.088 n.s 0.7%
α\alpha n/a n/a n/a F(1,31)=10.548 0.254 ** 1.2% n/a n/a n/a F(1,31)=10.193 0.247 ** 4.0%
ω\omega n/a n/a n/a F(1,31)=394.944 0.927 *** 77.3% n/a n/a n/a F(1,31)=122.753 0.798 *** 44.3%
TABLE III: Results of all variables and their influence towards MTMT, across all experiments. Percentages from r2r^{2} (%) indicate the weight or contribution of each variable towards MTMT and were obtained via step-wise linear regression.

V-A4 Remarks

From our analysis, we observed that all included variables had a significant effect on MTMT for both pointing and manipulation. Fitting the models of Fitts’, Shannon’s and Welford’s using the data, the results showed a high fitting for pointing, but a slightly less fitting for manipulation. Only Hoffmann’s model showed a significantly better fitting for manipulation, at an approximately 10% increase from the rest, which is likely due to the incorporation of the object size FF in the formulation. Murata & Iwase’ and Cha & Myung’s were excluded from this analysis as they yielded the same results as their based extensions, since directions and inclinations were not assessed in E1.

V-B Experiment 2: Translational Tasks with Directions and Inclinations

E2 is an extension of E1 by adding directions and inclinations, which have significant effects according to [12, 13].

V-B1 Design

The study used a 2x2x4x3 within-subjects design, and the independent variables were: two target sizes (WW = 5 and 10cm), two target separations (AA = 12 and 24cm), four direction angles (ϕ\phi = 0, 90, 180 and 270) and three inclination angles (θ\theta = 15, 30 and 45). The dependent variable was MTMT. A total of 9600 trials were recorded.

V-B2 Pointing Results

The data was normally distributed (p=.077p=.077) and the mean MTMT for pointing tasks was 1.47 ±\pm 0.23s. A total of 3 trials out of 4800 were excluded from the analysis due to errors. The target width WW significantly decreased MTMT with higher dimensions (p<.001p<.001), while high values of target separation AA significantly increased MTMT (p<.001p<.001). Directional angles ϕ\phi significantly affected MTMT as well (p<.05p<.05), with a slight sinusoidal relationship with MTMT and revealing that front-to-backward movements (0, 180) take slightly longer time than left-to-right ones (90, 270), in line with [1]. Finally, higher degrees of inclinations θ\theta translated in longer MTMT, (p<.05p<.05), matching the findings of [12].

V-B3 Manipulation Results

The data was normally distributed (p=.083p=.083) and the mean MTMT for manipulation tasks was 2.11 ±\pm 0.25s. A total of 60 trials out of 4800, 1.25% of the data, were excluded from the analysis due to error. MTMT significantly decreased as target width WW increased (p<.001p<.001). Consistent with E1, an increase in target separation AA significantly increased MTMT (p<.001p<.001). Contrary to pointing tasks, directional angles ϕ\phi did not have an apparent effect on MTMT (p=.476p=.476). Finally, increased incline angles θ\theta significantly affected MTMT (p<.05p<.05) as in [12].

V-B4 Remarks

All variables had significant effects on MTMT, except directional angles ϕ\phi in manipulation. Contrary to E1, we observed that Fitts’, Shannon’s and Welford’s formulation had a relatively higher fitting both in pointing and manipulation than Hoffmann’s model, by approximately 5%. This is confirmed with Cha & Myung’s model, which is based on Hoffmann’s, having a very similar fitting for both pointing and manipulation. Only Murata & Iwase’s directional model presented a marginally higher fitting than the rest in pointing and manipulation tasks.

V-C Experiment 3: Purely Rotational Tasks

To this point, only translational tasks were investigated. Yet, from the identified literature and intuition, rotation is an inseparable and fundamental part [2, 10, 11]. Hence, here translation is eliminated as both the object and the target will be overlapping in the centre, with only their rotation differing. Consequently, this task is an almost natural equivalent of Fitts’ original experiment, only for rotation.

V-C1 Design

The study used a 2x2x3x4 within-subjects design, in which the independent variables were: two object sizes (FF = 4 and 5cm), two target sizes (WW = 5 and 10cm), three target rotations (α\alpha = 15, 30 and 45) and four rotational tolerances (ω\omega = 2.5, 5, 7.5 and 10). The dependent variables was MTMT. 9600 trials were recorded.

V-C2 Pointing Results

The data deviated from normality (p<.001p<.001) and an ART was applied, and the mean MTMT for pointing tasks was 3.37 ±\pm 1.51s. A total of 225 trials out of 4800, 4.6% of the data, were excluded from the analysis due to error. An increase in both object size FF and target width WW showed a slight decrease towards MTMT, but not at a significant level, (p=.226p=.226) and (p=.590p=.590) respectively. However, we observed a significant correlation between the rotational separation α\alpha and MTMT (p<.001p<.001). The largest effect was observed with the rotational tolerance ω\omega, showing an almost inverse exponential relationship with MTMT (p<.001p<.001). We can infer that the effect of ω\omega was so significant that it had almost the same inverse effect on MTMT as the equivalent of WW and FF for translational tasks, though at a higher degree.

V-C3 Manipulation Results

The data violated normality (p<.01p<.01) and an ART was thus used. The mean MTMT for manipulation tasks was 2.79 ±\pm 0.67s. A total of 64 trials out of 4800, 1.3% of the data, were excluded due to error. In line with pointing, no significant effect was observed on MTMT with the object size FF (p=.286p=.286). In contrast to pointing, an increase in target width WW did significantly decrease MTMT (p<.001p<.001). Increasing rotational separation α\alpha also increased MTMT significantly (p<.001p<.001). Finally, as with pointing, increasing rotational tolerance ω\omega significantly decreased MTMT (p<.001p<.001) in a non-linear fashion. With the exception of target width WW, we observed the same relationship of the tested variables on MTMT for both pointing and manipulation. Rotational tolerance ω\omega predominately affected MTMT in a purely rotational setting. This was further investigated in the final experiment E4 to increase our understanding of the underlying factors.

Refer to caption
Figure 5: Results of all tested variables in all experiments and the relationship with MTMT for both pointing (red) and manipulation (blue). Figure sequence is from left to right and top to bottom. Bars represent the standard deviation.
Refer to caption
Figure 6: Regression plots of all models across E1 to E4, depicting line equations and r2r^{2}. Green boxes represent best model.

V-C4 Remarks

For this experiment, we fit all existing equations to accommodate the rotational nature of this experiment, as detailed in Table IV. By doing so, we are able to conduct and apply these methods to the rotational nature of this experiment, and also investigate any significant effects for comparison. We further extend the work of [11] and cover all known extensions. Overall, regarding variables influencing MTMT, rotational variables α\alpha and ω\omega were significant in both cases, although some inconsistencies were observed between pointing and manipulation. Noticeably, none of the adjusted models fitted this rotational task well, where all formulations were below r20.680r^{2}\leq 0.680 for both pointing and manipulation. Contrary to the translational experiments (E1 & E2), we see that these models did not fit the data very well in rotational tasks.

V-D Experiment 4: Combined Translational and Rotational Tasks with Directions and Inclinations

For E4, we combined all possible variations from E1, E2 and E3, constituting this experiment a fully combined movement task. We combine both translational and rotational task variations with varying directional and inclination gains to describe a full 3D task. To limit the number of variations, we restricted the directional movements ϕ\phi towards the front and right side direction (0, 90), still investigating view and lateral directional influences. We also kept the object size fixed at FF = 4cm. These design decisions were made to limit the number of tasks, allowing us to control our already exhaustive experiment.

V-D1 Design

For the final experiment, a 2x2x2x2x2x2 within-subjects design was used, in which the independent variables were: two target sizes (WW = 4 and 8cm), two target separations (AA = 12 and 24cm), two direction angles (ϕ\phi = 0 and 90), two inclination angles (θ\theta = 15 and 30), two target rotations (α\alpha = 30 and 45) and two rotational tolerances (ω\omega = 7.5 and 15). The dependent variable was MTMT. Representing 64 tasks and with 4 repetitions, we conducted 256 trials for both pointing and manipulation tasks, i.e. 512 trials per participant resulting in 10240 trials in total.

V-D2 Pointing Results

An ART was applied since normality was violated (p=.016p=.016). The mean MTMT for pointing tasks was 2.71 ±\pm 0.55s. A total of 120 error trials out of 4800, 2.5% of the data, were excluded from the analysis. Sphericity was met in all cases since all variations of the tested variables were two levels. There was no significant effect of target width WW on MTMT, (p>.05p>.05). On the other hand, the target separation AA, consistent with E1 and E2, significantly increased MTMT with higher separation values (p<.001p<.001). Neither directional arrangements ϕ\phi nor inclination angles θ\theta had a significant effect on MTMT, (p=.141p=.141) and (p=.200p=.200). As for rotation, α\alpha significantly affected MTMT (p<.01p<.01) and ω\omega had the highest influence on MTMT (p<.001p<.001), consistent with E3.

V-D3 Manipulation Results

The data was normally distributed (p=.091p=.091). The mean MTMT for manipulation tasks was 3.10 ±\pm 0.59s. A total of 82 trials out of 4800, 1.7% of the data, were excluded from the analysis due to error. Sphericity was met in all cases. Contrary to the pointing task, target width WW had a significant effect on MTMT (p<.001p<.001). Target separation AA significantly increased MTMT with higher separation values (p<.001p<.001). Directional angles ϕ\phi, as well as inclination angles θ\theta, followed the same trend as with the pointing equivalent, not affecting MTMT much (p=.099p=.099) and (p=.227p=.227) respectively. Rotational separation α\alpha did significantly affect the increase of MTMT as the degrees of separation rose (p<.01p<.01). Rotational tolerance ω\omega had again the biggest influence on MTMT, being consistent with the pointing equivalent and E3 (p<.001p<.001).

V-D4 Remarks

For E4, we added the IDsIDs of translation and rotation of all existing approaches. The rotation was adjusted as described in E3 and in Table IV. Despite these models in principle being incompatible for combined movements, we added these two IDsIDs to explore the potential of summing these two separate concepts of motion and to conduct a fair comparison. For example, the combined IDID for Fitts’ model would be IDt=log2(2AW)ID_{t}=log_{2}(\frac{2A}{W}) for translation, plus IDr=log2(2αω)ID_{r}=log_{2}(\frac{2\alpha}{\omega}) for rotation, with the other formulations following the same fashion. Overall, neither directional ϕ\phi nor inclination θ\theta angles had any significant effect on MTMT (p>0.05p>0.05) in pointing or manipulation. For combined movements, adding both IDsIDs of translation and rotation, Fitts’s, Shannon’s, Welford’s and even Murata & Iwase’s directional formulation proved insufficient when fitted for pointing (r2<0.5r^{2}<0.5), but yielded slightly better results for manipulation (r2<0.77r^{2}<0.77). Hoffmann’s had a significantly higher fitting towards pointing with an approximately 10% increase. Being consistent with E1, this suggests the necessity of adding the object size FF in a model. Cha & Myung’s had overall the highest fitting on the data but only marginally over the rest, primarily due to the incorporation of object size FF as seen also in Hoffmann’s.

VI MODEL DERIVATION

In most prior work related with extending Fitts’ law, a clear definition of distance is mostly missing [4]. For example, the distance from the object to the target (AA) can either be defined from the object centre to the target centre (AccA_{cc}), the object centre to the target edge (AceA_{ce}) or even the object edge to the target edge (AeeA_{ee}). Such a discrepancy highly influences the formula and can be formulated as:

An{A1=Acc,A2=Aec+W2,A3=Aee+W+F2,dn=i=1n(xiyi)2.\begin{gathered}A_{n}\begin{cases}A_{1}=A_{cc},\\ A_{2}=A_{ec}+\frac{W}{2},\\ A_{3}=A_{ee}+\frac{W+F}{2},\\ \end{cases}d_{n}=\sqrt{\sum_{i=1}^{n}\left(x_{i}-y_{i}\right)^{2}}.\end{gathered} (8)

From the above, a total of 3 different target separations are defined as AnA_{n} with Acc>Aec>AeeA_{cc}>A_{ec}>A_{ee}. The Euclidean distance between two points in 3D is represented as dnd_{n} with n=3n=3. At each stage of the experiments, our results show that 3D translational movement follows Fitts’ formulation closely. Target separation AA and target width WW are an integral part of the formulation. However, another variable with a determining effect in translational tasks was the object size FF, as part of Hoffmann’s model but not of Fitts’. Hence, the most appropriate definition of effective target separation would be A3A_{3} from Eq. 8 which would include both WW and FF. By incorporating this definition into Fitts’ model IDID shown in Eq. 1, we have:

IDt=log2(2AF+W+1).\begin{gathered}ID_{t}=log_{2}\left(\frac{2A}{F+W}+1\right).\\ \end{gathered} (9)

As for rotational movements, the graphs in Figure 5 show that while α\alpha follows an almost linear relationship with MTMT, and ω\omega follows an evident non-linear relationship. While ω\omega follows an almost inverse exponential decay, we believe a 1x2\frac{1}{x^{2}} would be more appropriate, yet more data points would be needed to shed additional light on this discrepancy.

With the results at hand and the indicated discrepancy between translation and rotation as well as using the aforementioned distance definition, we can formulate an improved and more suitable 3D model as:

Final Model: {MT=a+b[cIDt+dIDr],IDt=log2(2AF+W+1),IDr=log2(2αω2+1),\displaystyle\text{Final Model: }\begin{cases}MT&=a+b[c\cdot ID_{t}+d\cdot ID_{r}],\\ ID_{t}&=log_{2}\left(\frac{2A}{F+W}+1\right),\\ ID_{r}&=log_{2}\left(\frac{2\alpha}{\omega^{2}}+1\right),\\ \end{cases} (10)

where aa, bb, cc and dd are constants determined through regression. We retain Fitts’ original approach, and hence constants aa and bb are the y-intercept and slope respectively. Constants cc and dd represent the contribution of translation (IDtID_{t}) and rotation (IDrID_{r}) respectively, as these are two separate concepts of motion from our analyzed results. It is worthwhile to point out that the constant bb is superfluous in this case since it multiplies constants cc and dd. However, the addition of the bb constant allows the resemblance and familiarity of our model to that of Fitts’ original formulation as seen in Eq. 1.

This model follows Stoelen & Akin’s proposed formulation [11], and in particular the feasibility of adding the IDsIDs of both translation and rotation into one combined index of difficulty. Yet in the formulation of [11], both IDtID_{t} and IDrID_{r} were controlled by a single constant bb, as MT=a+b[IDt+IDr]MT=a+b\cdot[ID_{t}+ID_{r}]. However, the assumption that translation and rotation have an equal contribution (i.e. 50%50\% or 0.5) towards task difficulty (IDID) and by extent MTMT is imprecise and limited.

In particular, our results show that rotation almost predominately affected MTMT and was primarily attributed to high rotational accuracy (ω\omega = 2.5 in E3). For very small ω\omega, we observed an almost inverse exponential decay with MTMT, instead of a linear one. Translation and rotation should hence be perceived as separate terms, and each should be quantified by its own constants. This mitigates the limitation of equal weighting as with [11], which is not suitable as shown in our study. In the following section, we analyze the results and compare the proposed model with the other aforementioned versions.

Model Fit (r2\mathit{r^{2}}) Experiment 1 Experiment 2 Experiment 3 Experiment 4
(Translation Only) (Translation w/ Directions) (Rotation Only) (Trans. w/ Rot. & Directions)
Pointing Manipulation Pointing Manipulation Pointing Manipulation Pointing Manipulation
Fitts’ [5] 0.838 0.778 0.832 0.579 0.663 0.604 0.459 0.748
Hoffmann’s [8] 0.862 0.869 0.789 0.511 0.659 0.600 0.569 0.762
Welford’s [19] 0.852 0.776 0.832 0.577 0.673 0.610 0.465 0.755
Shannon’s [7] 0.859 0.773 0.831 0.575 0.680 0.613 0.469 0.758
Murata & Iwase [13] 0.859 0.773 0.845 0.585 0.680 0.613 0.476 0.759
Cha & Myung [12] 0.862 0.869 0.833 0.574 0.659 0.600 0.578 0.770
Our Model 0.887 0.877 0.788 0.509 0.901 0.804 0.805 0.803
TABLE IV: Overview of all methods with the respective r2r^{2} fitting. For E3 and E4 where rotation is present, existing models are in theory incompatible. However, to retain consistency and to allow for a fair comparison, as motivated by [11], these were adjusted and extended by us to accommodate rotation as well. For example Fitts’ IDt=log2(2AW)ID_{t}=log_{2}(\frac{2A}{W}) would become IDr=log2(2αω)ID_{r}=log_{2}(\frac{2\alpha}{\omega}) and both IDsIDs are added for E4. Bold numbers represent the best method in the column group.

VII DISCUSSION

This study compared the most widely used extensions on Fitt’s law through four designed experiments in a simulated teleoperation VR setting. This allowed us to evaluate the applicability of each model with various spatial complexities entailing translational and rotational movements. Each experiment included pointing and manipulation tasks, which are the most widely used types of interactions in collaborative VEs, VR simulators and robot teleoperation [1, 14, 16].

The study showed that in the most basic form of 3D object movement in a purely translational setting (E1), Fitts’ law and its extensions are adequate for both target pointing and manipulation along a one-directional line only. However, when complexity increased by including directional and inclination angles, these models had reduced performance at predicting the results of our experiments, which dropped slightly in pointing tasks but significantly more in manipulation tasks (E2).

Though Fitts’ law has been extended towards rotation, studies so far have limited their findings in 2D space [9, 10, 11]. Furthermore combining rotational and translational movements under one setting in 3D remains largely unexplored. While Kulik et al. [10] and Stoelen & Akin [11] studied combined movements, these were still limited to 2D space and only following movements across one line.

These current limitations in the state of the art showed their drawbacks in E3 and E4. All models in a purely rotational setting (E3) were insufficient when extended towards pointing and further aggravated during manipulation. However, the most important and critical experiment was E4, where we combined translational and rotational movements with varying spatial arrangements under one setting, rendering the last experiment a full 3D task. In E4, where we effectively investigated the most complex spatial settings in 3D, all compared models proved insufficient for both pointing and manipulation. These observations led us to the proposal of a new metric to overcome these limitations and the derived model outperformed other extensions in E1, E3 and most importantly in E4, as shown in Table IV and Figure 6.

Hence, our metric could be used to assess human movement in teleoperation [16] especially with the use of MR technologies [15]. For example, teleoperation studies attempting to evaluate the use of new sensory feedback or techniques, such as bilateral operation with haptic feedback to complete 3D tasks [18], could greatly benefit from a 3D metric on how it affects human performance. We support this from our results, as our formulation can capture the complex spatial settings in 3D space associated with such scenarios. Moreover, as our formulation is based on Fitts’ law and hence combines spatial and time-based metrics under a single model, further studies on HCI/HRI could benefit in terms of standardization [2, 3].

Research Implications and Findings
R.1 Fitts’ law can be extended towards 3D but does not adequately explain all spatial settings, and requires a clear definition of the distances associated in 3D.
R.2 Translation and rotation should be classified as separate concepts and quantities. Our analysis showed when rotation was present, it predominantly influenced MTMT over translation, by an almost 23\frac{2}{3}. Hence, a model has been built to account for these two separate concepts of motion by incorporating a constant for each, as these do not have an equal weight towards MTMT as observed by our experimental results.
R.3 Pointing and manipulation are perceived as inherently two very different types of tasks, but do follow and can be modelled by Fitts’ law even in 3D space.
R.4 Object size does matter. During pointing but especially in manipulation this holds true. For the latter, the object size is an integral part of the metric. This is achieved by adding the object size FF into the equation as suggested by Hoffmann [8] and confirmed in our model extension.
R.5 Rotation tolerance ω\omega or simply the rotational accuracy, significantly affects MTMT. This is particularly the case for very low rotational tolerances (2.52.5^{\circ}). Hence, strict rotational accuracy requirements during object placement (i.e. teleoperation) are likely to significantly increase operator completion timings.
Recommendations for Task Design in Pointing and Manipulation Tasks in VR and Teleoperation
RC1 Avoid having “strict” spatial requirements towards accuracy. This holds especially true for rotation where very low rotational tolerances contribute to very high movement times, following an almost exponential relationship.
RC2 Avoid if possible in having pointing tasks that require movements either towards or away from the view direction of the user. Our results marginally showed that front-to-backwards movements take significantly longer than left-to-right ones and vice versa.
TABLE V: Summary of main results and key implications on object interaction in VEs and simulated teleoperation tasks.

VII-A Interpreting Results on Combined Movements in 3D

For combined movements, even by adding the IDsIDs of translation and rotation (Stoelen and Akin [11], and Kulik et al. [10] in 2D tasks), the model is insufficient. The latter study observed only a linear fitting of r2=.780r^{2}=.780 when combining both translation and rotation in simple 2D settings following movements across one line. Yet direct comparisons between different studies are difficult, particularly due to correlation coefficients (r2r^{2}) being highly influenced by the involved data points. Consequently, different experimental settings and manifestations do not allow for inter-study comparability [2, 4]. Hence, we compare these models on our data directly across four experiments, as summarized in Table IV and Figure 6.

By comparing the most widely used model extensions to date, our proposed model demonstrated better fitting results than the existing extensions both in E1 and E3, and most importantly when both translational and rotational movements were combined (E4). In experiment E4, our model was better fitting for both pointing (r2=.805r^{2}=.805) and manipulation (r2=.803r^{2}=.803) than any other formulation based on Fitts’ law. This result can be attributed to three major factors.

Firstly, our results showed that equally weighing both IDsIDs of rotation and translation and by extent assuming that both have a linear effect on MTMT, is not supported by our analysis. Instead, it is crucial to separate these two terms, which can be accomplished by introducing different constants for each as explained in Section VI. Secondly, it shall be noted that the rotational tolerance ω\omega has a non-linear relationship with MTMT in our study, in contrast to a linear relationship as in [11]. This difference implies that higher degrees of difficulty require matching the rotation of an object in all 3D axes, instead of one as in their work, limited to 2D space [11]. Thirdly, we can also conclude that from the 2D formulations, Hoffmann’s formulation [8] indeed works well and indicates the necessity of adding the size of the object FF, hence why it is included in our model. We can thus infer that both WW and FF inversely affect MTMT and should therefore be included in a model.

Our experiments found no evidence to support the significance of either directional or inclination angles in a consistent manner. More specifically we observed that directional angles presented only a significant influence towards MTMT limited in E2 and only in pointing (p<.05p<.05). Furthermore, while inclination angles affected MTMT both for pointing and manipulation in E2 (p<.05p<.05), they presented an insignificant effect in E4 with combined movements. In E4, the translational and rotational variables influenced MTMT significantly more than inclinations or directions. Nevertheless, despite its negligible effect on MTMT, our study shows marginally worse performance when objects were presented along the view axis i.e. front-to-back (9090^{\circ}, 270270^{\circ}) rather than the lateral i.e. left-to-right (00^{\circ}, 180180^{\circ}). This is in line with [1].

Finally, manipulation tasks took a longer time to complete in all cases except in purely rotational tasks (E3), where pointing proved significantly more difficult, especially with the lowest target tolerance at ω=2.5\omega=2.5^{\circ}. Overall, this time difference can be partially explained by the duration spent on grasping the object. An important finding was that manipulation can be modelled under Fitt’s model if the important variable of object size FF is taken into account. The object size is mostly overlooked in the state of the art but is mitigated in Hoffmann’s work [8], as seen in Eq. 2.

In Table V, we summarize the main findings from our work and the implications on task design involving pointing tasks, 3D user interfaces as well as the manipulation of objects in 3D space, as seen in VR and robotic teleoperation.

VII-B Limitations and Future Work

Some limitations to overcome in a future study is to consider the multitude of different input devices and different grasping types one can use, which may significantly influence the formulation of human models [2]. Furthermore, human perception is a complex phenomenon and subject to each individual’s exposure to technologies as well as personality-related factors, yet accounting for these is particularly challenging [2].

In this work, we investigated seven distinctive spatial variables to explain and model combined translational and rotational variations in full 3D space, as shown in Table II. Consequently, due to the significant amount of variations introduced, we limited the investigation of each variable to a maximum of four levels. Future studies are advised to investigate an even wider range of these variables with more levels, to further increase our understanding of their contribution towards task difficulty. For example, changing the cube shape in our study, with a more complex 3D shape such as a toy car, will allow for a wider range of rotational variations to be investigated. When the object to be manipulated is non-rigid, fragile or deformable, it will place significant influences on a 3D metric for teleoperation [17].

As for all human performance models deriving from Fitts’ law, the main limitations are stationary pointing or manipulation tasks. All models studied in this paper do not take into account upper-body movements, such as movements from torsos and/or shoulders. Furthermore, biomechanical variations in participants, such as placing objects near the dominant hand or the size of the controlled end-effector are all determining factors for a model.

VIII CONCLUSION

In this work, we investigated a new human performance metric in full 3D for both pointing and manipulation with combined translational and rotational movements. We conducted four experiments, each adding progressively higher spatial complexity. In the most basic form of 3D translational pointing and manipulation (E1), we observed that existing approaches can be used to adequately model human performance, but were insufficient when spatial arrangements were introduced (E2), such as inclinations and directions. However, the majority of these approaches did not model well the rotational movements as shown in E3 and were insufficient when combining translation and rotation with spatial arrangements as studied in E4.

To the best of our knowledge, this study extensively compared the most widely used performance metrics based on Fitts’ law. We proposed a new performance model that can model human performance in full 3D space in a simulated teleoperation VR setting with two types of interactions. Our metric modelled human performance better than existing formulations, especially in increasing spatial complexities such as those seen in full 3D space.

However, numerous aspects should further be investigated and improved, such as a more comprehensive range of spatial variations (e.g. additional levels of rotations, spatial directions etc.), more complex shapes (non-primitive 3D objects), different grasping types, hand end-effectors as well as input devices. Nonetheless, this work was a first attempt towards a higher dimensional formulation to evaluate full-3D movements, which can be used in accessing human performance in a VR-based teleoperation setting and with the potential to be applied in robot shared control.

ACKNOWLEDGMENTS

Supported by the EPSRC Future AI and Robotics for Space (EP/R026092/1), EPSRC CDT in RAS (EP/L016834/1), and H2020 project Harmony (101017008). Ethics Approval 2019/22258. We also thank Iordanis Chatzinikolaidis for his input.

References

  • [1] M. D. Barrera Machuca and W. Stuerzlinger, “The effect of stereo display deficiencies on virtual hand pointing,” in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ser. CHI ’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3290605.3300437
  • [2] E. Triantafyllidis and Z. Li, “The challenges in modeling human performance in 3d space with fitts’ law,” in Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, ser. CHI EA ’21.   New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3411763.3443442
  • [3] A. Steinfeld, T. Fong, D. Kaber, M. Lewis, J. Scholtz, A. Schultz, and M. Goodrich, “Common metrics for human-robot interaction,” in Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, ser. HRI ’06.   New York, NY, USA: Association for Computing Machinery, 2006, p. 33–40. [Online]. Available: https://doi.org/10.1145/1121241.1121249
  • [4] H. Drewes, “Only one fitts’ law formula please!” in CHI ’10 Extended Abstracts on Human Factors in Computing Systems, ser. CHI EA ’10.   New York, NY, USA: Association for Computing Machinery, 2010, p. 2813–2822. [Online]. Available: https://doi.org/10.1145/1753846.1753867
  • [5] P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement.” Journal of experimental psychology, vol. 47, no. 6, p. 381, 1954.
  • [6] P. M. Fitts and J. R. Peterson, “Information capacity of discrete motor responses.” Journal of experimental psychology, vol. 67, no. 2, p. 103, 1964.
  • [7] I. S. MacKenzie, “Fitts’ law as a research and design tool in human-computer interaction,” Hum.-Comput. Interact., vol. 7, no. 1, p. 91–139, Mar. 1992. [Online]. Available: https://doi.org/10.1207/s15327051hci0701˙3
  • [8] E. R. Hoffmann, “Effective target tolerance in an inverted fitts task,” Ergonomics, vol. 38, no. 4, pp. 828–836, 1995. [Online]. Available: https://doi.org/10.1080/00140139508925153
  • [9] G. V. Kondraske, “An angular motion fitt’s law for human performance modeling and prediction,” in Proceedings of 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 1, 1994, pp. 307–308 vol.1.
  • [10] A. Kulik, A. Kunert, and B. Froehlich, “On motor performance in virtual 3d object manipulation,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 5, pp. 2041–2050, 2020.
  • [11] M. F. Stoelen and D. L. Akin, “Assessment of fitts’ law for quantifying combined rotational and translational movements,” Human Factors, vol. 52, no. 1, pp. 63–77, 2010, pMID: 20653226. [Online]. Available: https://doi.org/10.1177/0018720810366560
  • [12] Y. Cha and R. Myung, “Extended fitts’ law for 3d pointing tasks using 3d target arrangements,” International Journal of Industrial Ergonomics, vol. 43, no. 4, pp. 350 – 355, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0169814113000723
  • [13] A. Murata and H. Iwase, “Extending fitts’ law to a three-dimensional pointing task,” Human movement science, vol. 20, no. 6, pp. 791–805, 2001.
  • [14] E. Triantafyllidis, C. Mcgreavy, J. Gu, and Z. Li, “Study of multimodal interfaces and the improvements on teleoperation,” IEEE Access, vol. 8, pp. 78 213–78 227, 2020.
  • [15] S. Fani, S. Ciotti, M. G. Catalano, G. Grioli, A. Tognetti, G. Valenza, A. Ajoudani, and M. Bianchi, “Simplifying telerobotics: Wearability and teleimpedance improves human-robot interactions in teleoperation,” IEEE Robotics Automation Magazine, vol. 25, no. 1, pp. 77–88, 2018.
  • [16] J. M. O’Hara, “Telerobotic control of a dextrous manipulator using master and six-dof hand controllers for space assembly and servicing tasks,” Proceedings of the Human Factors Society Annual Meeting, vol. 31, no. 7, pp. 791–795, 1987. [Online]. Available: https://doi.org/10.1177/154193128703100723
  • [17] R. Wen, K. Yuan, Q. Wang, S. Heng, and Z. Li, “Force-guided high-precision grasping control of fragile and deformable objects using sEMG-based force prediction,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2762–2769, 2020.
  • [18] K. K. Babarahmati, C. Tiseo, Q. Rouxel, Z. Li, and M. Mistry, “Robust high-transparency haptic exploration for dexterous telemanipulation,” IEEE International Conference on Robotics and Automation (ICRA), 2021.
  • [19] A. T. Welford, Fundamentals of skill, ser. Methuen’s manuals of modern psychology.   London: Methuen, 1968.
  • [20] Y. Wang and C. L. MacKenzie, “Object manipulation in virtual environments: Relative size matters,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’99.   New York, NY, USA: Association for Computing Machinery, 1999, p. 48–55. [Online]. Available: https://doi.org/10.1145/302979.302989
  • [21] A. Billard and D. Kragic, “Trends and challenges in robot manipulation,” Science, vol. 364, no. 6446, 2019. [Online]. Available: https://science.sciencemag.org/content/364/6446/eaat8414
  • [22] J. O. Wobbrock, L. Findlater, D. Gergle, and J. J. Higgins, “The aligned rank transform for nonparametric factorial analyses using only anova procedures,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’11.   New York, NY, USA: ACM, 2011, pp. 143–146. [Online]. Available: http://doi.acm.org/10.1145/1978942.1978963