This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dimensional Analysis in Statistical Modelling

Tae Yoon Lee label=e1]dlxodbs@student.ubc.ca [    James V Zidek label=e2]jim@stat.ubc.ca label=u1 [[    url]www.stat.ubc.ca/ jim    Nancy Heckman label=e3]nancy@stat.ubc.ca label=u2 [[    url]www.stat.ubc.ca/ nancy University of British Columbia Department of Statistics
University of British Columbia
2207 Main Mall
Vancouver, BC
Canada V6T 1Z4
.
(2020)
Abstract

Building on recent work in statistical science, the paper presents a theory for modelling natural phenomena that unifies physical and statistical paradigms based on the underlying principle that a model must be non-dimensionalizable. After all, such phenomena cannot depend on how the experimenter chooses to assess them. Yet the model itself must be comprised of quantities that can be determined theoretically or empirically. Hence, the underlying principle requires that the model represents these natural processes correctly no matter what scales and units of measurement are selected. This goal was realized for physical modelling through the celebrated theories of Buckingham and Bridgman and for statistical modellers through the invariance principle of Hunt and Stein. Building on recent research in statistical science, the paper shows how the latter can embrace and extend the former. The invariance principle is extended to encompass the Bayesian paradigm, thereby enabling an assessment of model uncertainty. The paper covers topics not ordinarily seen in statistical science regarding dimensions, scales, and units of quantities in statistical modelling. It shows the special difficulties that can arise when models involve transcendental functions, such as the logarithm which is used e.g. in likelihood analysis and is a singularity in the family of Box-Cox family of transformations. Further, it demonstrates the importance of the scale of measurement, in particular how differently modellers must handle ratio- and interval-scales.

62A01, 00A71, 97F70,
Buckingham Pi-theorem,
statistical invariance principle,
Box–Cox transformation,
logarithmic transformation,
nondimensionalization,
dimensional analysis,
keywords:
[class=MSC2020]
keywords:
volume: 0issue: 0
\startlocaldefs\startlocaldefs\endlocaldefs\endlocaldefs

t1The research reported in this paper was supported by grants from the Natural Science and Engineering Research Council of Canada.

1 Introduction

The importance of dimension, scale and units of measurement in modelling is largely ignored in statistical modelling. In fact, an anonymous reviewer stated:

“Generally speaking, statisticians treat data as already dimensionless by taking the numeric part, which is equivalent to dividing them by their own units…”

Others have long recognized the role of dimensions, and hence of their scales and units of measurement, in modelling; dimensions can be used to simplify model fitting by reducing the number of quantities involved to a few non-dimensionalized ones. A principal contribution of this paper makes clear to statisticians the importance of dimensions, scales and units in statistical modelling. We also develop a framework that incorporates these important ideas via a statistical invariance approach. This allows us to extend the existing work’s focus on physical quantities, which by their nature must lie on ratio-scales, to other scales and to vector spaces for multivariate responses. Yet another contribution is addressing a number of issues that are crucial in laying the foundation for the extension. These include: adopting a sampling distribution supported on an interval-scale when the actual support for the sampling distribution is a subset of a ratio-scale; the meaninglessness of applying transcendental transformations such as the logarithm to quantities with units.

This paper, which is partly expository, describes and contributes to the unification of two overlapping paradigms for modelling natural phenomena. For simplicity we will refer to these as statistical and what Meinsma (2019) calls physical. Physical models are usually deterministic and developed for a specific phenomenon.

In this approach, model development cannot ignore the dimensions, scales and units of measurement on which empirical implementation and assessment will be based. Indeed, Buckingham (1914) believed that a valid model cannot depend on how an investigator chooses to measure the quantities that describe the phenomena of interest. After all, nature cannot know what the measuring methods of science are. Consequently, Buckingham suggested that any valid model must be nondimensionalizable, leading him to his celebrated Pi-theorem. In contrast, Bridgman (1931) argued that models must depend on the measurements but must be invariant under rescaling. Based on the latter premise he was able to derive the invariant “π\pi-functions” of the measurements that were central to Buckingham’s theory. The work of these pioneers spurred the development of what is now known as dimensional analysis (DA), its notions of dimensional homogeneity (DH) and its quantity calculus, explored in depth in Section 3.

The following example renders the ideas above in a more concrete form.

Example 1.

Newton’s second law of motion exemplifies the physical approach to modelling:

a=FM1.a=F\leavevmode\nobreak\ M^{-1}. (1.1)

Here aa denotes acceleration, the second derivative with respect to time of the location of an object computed with respect to the starting point of its motion. FF and MM are, respectively, the force acting on the object and its mass. The model in Equation (1.1) satisfies the fundamental requirement of DH – the units on the left hand side are the same as the units on the right hand side. Moreover, all three of the quantities in the model lie on a ratio-scale i.e. they are inherently positive, having a structural 0 for an origin when and where the motion began.

The work of Buckingham and Bridgman cited above implies the quantities in a valid model have to be transformable to dimensionless alternatives called π\pi-functions. In the case of Newton’s law, we can use MM and FF to transform aa into a dimensionless quantity to get the simpler but mathematically equivalent model involving a single π\pi-function:

π(a,F,M)aF1M=1.\pi(a,F,M)\equiv aF^{-1}M=1. (1.2)

In contrast to physical modelling that commonly takes a bottom-up approach, that of statistics as a discipline was top-down (Magnello, 2009) when, in the latter part of the nineteenth century, Karl Pearson established mathematical statistics with its focus on abstract classes of statistical models. Pearson’s view freed the statistician from dealing with the demanding contextual complexities of specific applications. In his abstract formulation, Pearson was able to: incorporate model uncertainty expressed probabilistically; define desirable model properties; determine conditions under which these occur; and develop practical tools to implement models that possess those properties for a specific application. The emphasis on mathematical foundations led inevitably to an almost total disregard of the constraints brought on by dimension, scale and units of measurement. Thus statisticians often use a symbol like XX to mean a number to be manipulated in a formal analysis in equations, models and transformations. On the other hand, scientists use such a symbol to represent some specific aspect of a natural phenomenon or process. The scientist’s goal is to characterize XX through a combination of basic principles and empirical analysis. This leads to the need to specify one or more “dimensions” of XX, e.g. length LL. That in turn leads to the need to specify an appropriate “scale” for XX, e.g. categorical, ordinal, interval or ratio. For interval- and ratio-scales, XX would have some associated units of measurement depending on the nature and resolution of the device making the measurement. How all of these parts of XX fit together is the subject addressed in the realms of measurement theory and DA.

In recent years, importance of dimensions, scales, and units of measurements has been progressively recognized in statistics. At a deep theoretical level, Hand (1996) considers different interpretations of measurement, studying what things can be measured and how numbers can be assigned to measurements. On a more practical side, inspired by applied scientists, Finney (1977) demonstrates how the principle of DH can be used to assess the validity of a statistical model. The first application of DA in statistical modeling appears in the work of Vignaux and Scott (1999), who develop a framework for applying DA to linear regression models. The practicality of DA is illustrated to a great extent in design of experiments by Albrecht et al. (2013). While much has been written in this area by nonstatisticians, such as Luce (1964), surprisingly little has been written by statisticians.

A natural statistical approach to these ideas is via the statistical invariance principle due to Hunt and Stein in unpublished work (Lehmann and Romano, 2010, Chapter 6). Despite the abstraction of Pearson’s approach they articulated an important principal of modelling- that when a test of a hypothesis about a natural phenomenon based on a sample of measurements rejects the null hypothesis, that decision should not change if the data were transformed to a different scale, e.g. from Celsius (interval scale) to Fahrenheit (ratio scale). That led them to the statistical invariance principle: methods and models must transform coherently under measurement scale transformations.

However, the link between DA and the statistical invariance does not seem to have been recognized until the work of Shen, Lin, and his co-investigators (Lin and Shen, 2013; Shen et al., 2014; Shen, 2015; Shen and Lin, 2018, 2019). They develop a theoretical framework for applying DA to statistical analysis and the design of experiments while showcasing the practical benefits through numerous examples involving physical quantities. In their framework, Buckingham’s Pi-theorem plays the key role in unifying DA and the statistical invariance. In our paper, we extend their work in two ways: (1) elucidating the link between DA and the statistical invariance by removing the dependency on Buckingham’s Pi-theorem and (2) in doing so, freeing ourselves from restricting to modelling physical quantities and ultimately embedding scientific modeling within a stochastic modeling framework in a general setting.

This paper considers issues that arise when XX lies on an interval-scale with values on the entire real line and when XX lies on a ratio-scale, that is, with non-negative values and a real origin 0 having a meaning of “nothingness”. A good example to keep in mind is the possible scales of temperature; it can be measured on the ratio-scale in units of degrees Kelvin (K{}^{\circ}K), where 0K0^{\circ}K means all molecular momentum is lost. Alternatively, it can be measured on the interval-scale in units of degrees Celsius (C{}^{\circ}C) where 0C0^{\circ}C means water freezes. The same probabilistic model cannot be used for both scales although often, in practice, the Gaussian model for the interval-scale is used as an approximation for the ratio-scale.

We conclude this Introduction with a summary of the paper’s contents and findings. Section 2 introduces us to the Unconscious Statistician through examples that illustrate the importance of units of measurement. That demonstrated importance of units then leads us into Section 3, which is largely a review of the basic elements of DA, a subject taught in the physical sciences but rarely in statistics. We describe a key tool, quantity calculus, which is the algebra of units of measurement and DH.

In Section 4, we discuss problems that might arise when statisticians transform variables. Sometimes the transformation leads to an unintended change of scale, e.g. when a Gaussian distribution on (,)(-\infty,\infty) is adopted as an approximation to a distribution on (0,)(0,\infty). This can matter a lot when populations with responses on a ratio-scale are being compared. We discuss when such an approximation and hence transformation may be justified. Even using the famous family of Box-Cox transformations can cause problems, in particular with the limiting case logarithmic transformation.

Having investigated units and scales, we then turn to the models themselves. It turns out that when restricted by the need for DH, the class of models is also restricted; that topic is explored in Section 5 where we review the famous Buckingham Pi-theorem. We also see for the first time the ratio-scale’s cousin, the interval-scale. All this points to the need for a general approach to scientific modelling that was foreseen in Hunt and Stein’s unpublished work on the famous invariance principle.

Section 6 gets us to the major contribution of the paper, namely extending the work of recent vintage by statistical scientists (Shen and his co-investigators) on the invariance principle as developed in its classical setting, the frequentist paradigm, and applied principally to variables on a ratio-scale. Our work constitutes a major extension of that scientific modelling paradigm.

In particular, Section 6.5 extends the invariance principle and moves our new approach to scientific modelling to the Bayesian paradigm. This major extension of both the scientific and statistical modelling approaches allows for quantities that could represent uncertain parameters, thereby embedding uncertainty quantification directly into the modeling paradigm. Indeed our new paradigm enables model uncertainty itself to be incorporated .

The paper wraps up with discussion in Section 7 and concluding remarks in Section 8. The supplementary material includes additional discussion, in particular a summary of the controversy of ancient vintage about whether or not taking the logarithm of measurements with units is valid, how Buckingham’s theory leads us to the famous Reynolds number, a general theory for handling data on an interval-scale, and finally a brief review of statistical decision analysis for Section 6.5.

2 The unconscious statistician

We start with critically examining key issues surrounding the topics of dimension and measurement scales through the Unconscious Statistician. We present three examples that illustrate some of the issues we’ll be exploring.

Example 2.

Ignoring scale and units of measurement when creating models can lead to difficulties; we cannot ignore the distinction between numbers and measurements. Consider the Poisson random variable XX. The claim is often made that the expected value and variance of XX are equal. But if XX has units, as it did when the distribution was first introduced in 1898 as the number of horse kick deaths in a year in the Prussian army (Härdle and Vogt, 2015), then clearly, the expectation and variance will have different units and therefore cannot be equated.

Example 3.

Consider a random variable representing length in millimetres, YY is normally distributed with mean μ\mu and variance σ2\sigma^{2}, independently measured nn times to yield data y1,,yny_{1},\ldots,y_{n}. Assume, as is common, that μ\mu is so large that there is a negligible chance that any of the yiy_{i}’s are negative (we return to this common assumption in Section 4).

Then the maximum likelihood estimate (MLE) of μ\mu is easily shown to be the sample average y¯\bar{y} and the MLE of σ2\sigma^{2} is then maximizer of the profile likelihood

L(σ2)=(σ2)n/2exp{nσ~2/(2σ2)}L(\sigma^{2})=(\sigma^{2})^{-n/2}\leavevmode\nobreak\ \exp\left\{-n\tilde{\sigma}^{2}/(2\sigma^{2})\right\} (2.1)

where σ~2=1n(yiy¯)2/n\tilde{\sigma}^{2}=\sum_{1}^{n}(y_{i}-\bar{y})^{2}/n, which has units of mm2. That estimate for σ2\sigma^{2} is the MLE of σ2\sigma^{2}, which is easily shown by differentiating L(σ2)L(\sigma^{2}) with respect to σ2\sigma^{2} and setting the result equal to zero. We note that, by any sensible definition of unit arithmetic, σ~2/σ2\tilde{\sigma}^{2}/\sigma^{2} is unitless and so the units of L(σ2)L(\sigma^{2}) are mm-n.

The Unconscious Statistician simplifies the maximization of LL by maximizing instead, its logarithm, believing that this alternative approach yields the same result. . The statistician finds the log of LL to be

l(σ2)=ln[(σ2)n/2exp{nσ~2/(2σ2)}]=n2[ln(σ2)+σ~2/σ2].l(\sigma^{2})=\ln\left[(\sigma^{2})^{-n/2}\leavevmode\nobreak\ \exp\left\{-n\tilde{\sigma}^{2}/(2\sigma^{2})\right\}\right]=-\frac{n}{2}\left[\ln(\sigma^{2})+\tilde{\sigma}^{2}/\sigma^{2}\right].

Since the second term is unitless, dimensional homogeneity implies that the first term ln(σ2)\ln(\sigma^{2}) must also be unitless. So where did the units go? Analyses in Subsection 4.4 suggest the units reduce to a unitless 11 by constructive processes for the logarithm that define it. The result is ln(σ2)=ln({σ2})\ln(\sigma^{2})=\ln(\{\sigma^{2}\}), the curly brackets demarcating the numerical part of σ2\sigma^{2}, gotten by dropping the units of measurement. But σ2\sigma^{2} itself has units mm2 and it seems unsettling to have them disappear simply by taking the logarithm.

However, the Unconscious Statistician ultimately gets the correct answer by failing to recognize the distinction between the scales of {σ2}\{\sigma^{2}\} and σ2\sigma^{2} So the derivative, which represents the relative rate of change between quantities on different scales, is computed as dln({σ2})/dσ2=d\leavevmode\nobreak\ \ln(\{\sigma^{2}\})/d\sigma^{2}= mmd2ln({σ2})/d{σ2}{}^{-2}d\leavevmode\nobreak\ \ln(\{\sigma^{2}\})/d\{\sigma^{2}\} rather than dln({σ2})/d{σ2}d\leavevmode\nobreak\ \ln(\{\sigma^{2}\})/d\{\sigma^{2}\}. This then restores the missing units in the final result. As a fringe benefit, the second derivative d2ln({σ2})/d2σ2d^{2}\ln(\{\sigma^{2}\})/d^{2}\sigma^{2}, whose inverse defines Fisher’s information, also turns out to have the appropriate units.

However, the story does not end there. The problem of logarithms and their units warrants further discussion such as that in Subsection 4.4. That discussion indicates that calculating the logarithm of the likelihood is, in general, not sensible.

Remark 1.

In the frequentist paradigm for statistical modelling, the likelihood is defined by the sampling distribution, which depends on the stopping rule employed in collecting the sample. The likelihood function then becomes an equivalence class. The likelihood ratio can then be used to specify a member of that class. In Example 3 a reference normal likelihood could be used with σ2\sigma^{2} set to a substantively meaningful σ02\sigma_{0}^{2}. The MLE of μ\mu and σ2\sigma^{2} would then maximize this relative likelihood. This leads again to μ^=y¯\hat{\mu}=\bar{y}, but now the MLE of σ2\sigma^{2} is found by maximizing the unitless L(σ2)/L(σ02)L(\sigma^{2})/L(\sigma_{0}^{2}):

L(σ2)L(σ02)=(σ2σ02)n/2exp{nσ~22σ02[(σ02σ2)1]}.\frac{L(\sigma^{2})}{L(\sigma_{0}^{2})}=\left(\frac{\sigma^{2}}{\sigma_{0}^{2}}\right)^{-n/2}\exp\left\{-\frac{n\tilde{\sigma}^{2}}{2\sigma_{0}^{2}}\left[\left(\frac{\sigma_{0}^{2}}{\sigma^{2}}\right)-1\right]\right\}.

We can now maximize this ratio as a function of the unitless t=σ2/σ02t=\sigma^{2}/\sigma_{0}^{2}, by taking logarithms, differentiating with respect to tt, setting it equal to 0 and solving for t^=σ~2/σ02\hat{t}=\tilde{\sigma}^{2}/\sigma_{0}^{2}, and so finding that σ^2=44.2\hat{\sigma}^{2}=44.2 mm2.

Two complimentary, unconscious choices in Example 3 lead ultimately to a correct MLE. Things don’t go so well for two unconscious statisticians seen in the next example.

Example 4.

Here, the data are assumed to follow the model that relates YiY_{i}, a length, to tit_{i}, a time:

Yi=1+θti+ϵi,i=1,,2n.Y_{i}=1+\theta t_{i}+\epsilon_{i},i=1,\leavevmode\nobreak\ \ldots,2n.

Here the ϵi\epsilon_{i}’s are independent and identically distributed as a N(0,σ2)N(0,\sigma^{2}) for a known σ\sigma. Suppose that t1==tn=1t_{1}=\cdots=t_{n}=1 hour while tn+1==t2n=2t_{n+1}=\cdots=t_{2n}=2 hours. Let Y¯1=i=1nYi/n\bar{Y}_{1}=\sum_{i=1}^{n}Y_{i}/n, and Y¯2=i=n+12nYi/n\bar{Y}_{2}=\sum_{i=n+1}^{2n}Y_{i}/n. An analysis might go as follows when two statisticians A and B get involved.

First they both compute the likelihood and learn that the MLE is found by minimizing the sum of squared residuals SSR(θ)SSR(\theta):

SSR(θ)=i=12n[Yi1θti]2SSR(\theta)=\sum_{i=1}^{2n}[Y_{i}-1-\theta t_{i}]^{2}

which gives the MLE of θ\theta,

θ^=i=12nti(Yi1)i=12nti2=nY¯1+2nY¯23n5n=Y¯1+2Y¯235.\hat{\theta}=\frac{\sum_{i=1}^{2n}t_{i}(Y_{i}-1)}{\sum_{i=1}^{2n}t_{i}^{2}}=\frac{n\bar{Y}_{1}+2n\bar{Y}_{2}-3n}{5n}=\frac{\bar{Y}_{1}+2\bar{Y}_{2}-3}{5}.

Then for prediction at time t=1t=1 hour, they get

Y^=1+θ^×1=1+Y¯1+2Y¯235.\widehat{Y}=1+\hat{\theta}\times 1=1+\frac{\bar{Y}_{1}+2\bar{Y}_{2}-3}{5}.

Suppose that Y¯1=1\bar{Y}_{1}=1 foot, or 12 inches, and Y¯2=3\bar{Y}_{2}=3 feet, or 36 inches. Statistician A uses feet and predicts YY at time t=1t=1 hour to be

Y^A=1+1+2×335=1.8feet=21.6inches.\widehat{Y}_{A}=1+\frac{1+2\times 3-3}{5}=1.8\leavevmode\nobreak\ {\rm{feet}}=21.6\leavevmode\nobreak\ {\rm{inches}}.

But Statistician B uses inches and predicts YY at t=1t=1 hour to be

Y^B=1+12+2×3635=17.2inches.\widehat{Y}_{B}=1+\frac{12+2\times 36-3}{5}=17.2\leavevmode\nobreak\ {\rm{inches}}.

What has gone wrong here? The problem is that the stated model implicitly depends on the units of measurement. For instance, the numerical value of the expectation of YiY_{i} when ti=0t_{i}=0 is equal to 1, no matter what the units of YiY_{i}. When ti=0t_{i}=0, Statistician A expects YiY_{i} to equal 1 foot and Statistician B expects YiY_{i} to equal 1 inch. We can see that the problem arises because the equation defining the model does not satisfy DH, since the “1” is unitless. In technical terms, we would say that this model is not invariant under scalar transformations. Invariance is important when defining a model that involves units. However, one could simply avoid the whole problem of units in model formulation by constructing the relationship between YiY_{i} and tit_{i} so that there are no units. This is exactly the goal of the Buckingham Pi-theorem, presented in Subsection 5.1.

3 Dimensional analysis

Key to unifying the work on scales of measurement and the statistical invariance principle is DA. DA has a long history, beginning with the discussion of dimension and measurement (Fourier, 1822). Since DA is key to the description of a natural phenomenon, DA lies at the root of physical modelling. A phenomenon’s description begins with the phenomenon’s features, each of which has a dimension, e.g. ‘mass’ (MM) in physics or ‘utility’ (UU) in economics. Each dimension is assigned a scale e.g. ‘categorical’, ‘ordinal’, ‘ratio’, or ‘interval’, a choice that might be dictated by practical as well as intrinsic considerations. Once the scales are chosen, each feature is mapped into a point on its scale. For a quantitative scale, the mapping will be made by measurement or counting, for a qualitative scale, by assignment of classification. Units of measurement may be assigned as appropriate for quantitative scales, depending on the metric chosen. For example, temperature might be measured on the Fahrenheit scale, the Kelvin scale or the Celsius scale. This paper will be restricted to quantitative features, more specifically those features on ratio- and interval-scales.

3.1 Foundations

One tenet of working with measured quantities is that units in an expression or equation must “match up”; relationships among measurable quantities require dimensional homogeneity. To check the validity of comparative statements about two quantities, X1X_{1} and X2X_{2}, such as X1=X2X_{1}=X_{2}, X1<X2X_{1}<X_{2} or X1>X2X_{1}>X_{2}, X1X_{1} and X2X_{2} must be the same dimension, such as time. To add X1X_{1} to X2X_{2}, X1X_{1} and X2X_{2} must not only be the same dimension but must also be on the same scale and expressed in the same units of measurement.

To discuss this explicitly, we use a standard notation (Joint Commitee on Guides in Metrology, 2012) and write a measured quantity XX as X={X}[X]X=\{X\}[X], where {X}\{X\} is the numerical part of XX. [X][X] may be read as the dimension of XX e.g. [X]=L[X]=L for length, for instance, or the units of XX on the chosen scale of measurement, e.g. [X]=cm[X]=cm. The latter by its nature means that the dimension is LL. If [X]=[1][X]=[1], then we say that XX is unitless or dimensionless. We define 1 or any number to be unitless, i.e., 1=1[1]1={1}[1], unless stated explicitly.

To develop an algebra for measured quantities, for a function ff we must say what we mean by {f(X)}\{f(X)\} (usually easy) and [f(X)][f(X)] (sometimes challenging). The path is clear for ff a simple function. For example, consider f(X)=X2f(X)=X^{2}. Clearly we must have X2={X}2[X]2X^{2}=\{X\}^{2}[X]^{2}, yielding, say, (3 inches)=29{}^{2}=9 inches2. But what if ff is a more complex function? This issue will be discussed in general in Subsection 4.2 and in detail for f(x)=ln(x)f(x)=\ln(x) in Subsection 4.4.

For simple functions, the manipulation of both numbers and units is governed by an algebra of rules referred to as quantity calculus (Taylor, 2018). This set of rules states that xx and yy

  • can be added, subtracted or compared if and only if [x]=[y][x]=[y];

  • can always be multiplied to get xy={xy}[xy]xy=\{xy\}[xy] where {xy}={x}{y}\{xy\}=\{x\}\{y\} and [xy]=[x][y][xy]=[x][y];

  • can always be divided when {x}0\{x\}\neq 0 to get y/x={y/x}[y/x]y/x=\{y/x\}[y/x] where {y/x}={y}/{x}\{y/x\}=\{y\}/\{x\} and [y/x]=[y]/[x][y/x]=[y]/[x];

and that

  • xx can be raised to a power that is a rational fraction γ\gamma, provided that the result is not an imaginary number, to get xγ={x}γ[x]γx^{\gamma}=\{x\}^{\gamma}[x]^{\gamma}.

Thus it makes sense to square-root transform ozone O3={O3}O_{3}=\{O_{3}\} parts per million (ppm) as {O3}1/2\{O_{3}\}^{1/2} ppm1/2 since ozone is measured on a ratio-scale with a true origin of 0 and hence must be non–negative (Dou, Le and Zidek, 2007). These rules can be applied iteratively a finite number of times to get expressions that are combinations of products of quantities raised to powers, along with sums and rational functions of such expressions.

This subsection concludes with an example that demonstrates the use of DH  and quantity calculus.

Example 5.

This example concerns a structural engineering model for lumber strength now called the “Canadian model”(Foschi and Yao, 1986). Here α(t)\alpha(t) is dimensionless and represents the somewhat abstract quantity of the damage accumulated to a piece of lumber by time tt. When α(t)=1\alpha(t)=1, the piece of lumber breaks. This is the only time when α(t)\alpha(t) is observed. The Canadian model posits that α˙\dot{\alpha}, the derivative of α\alpha with respect to time, satisfies

α˙(t)=a[τ(t)σ0τs]+b+c[τ(t)σ0τs]+nα(t),\dot{\alpha}(t)=a[\tau(t)-\sigma_{0}\tau_{s}]_{+}^{b}\leavevmode\nobreak\ +\leavevmode\nobreak\ c[\tau(t)-\sigma_{0}\tau_{s}]_{+}^{n}\leavevmode\nobreak\ \alpha(t), (3.1)

where aa, bb, cc, nn and σ0\sigma_{0} are log-normally distributed random effects for an individual specimen of lumber, τ(t)\tau(t), measured in pounds per square inch (psi), is the stress applied to the specimen cumulative to time tt, τs\tau_{s} (in psi) is the specimen’s short term breaking strength if it had experienced the stress pattern τ(t)=kt\tau(t)=kt for a fixed known kk (in psi per unit of time), and σ0\sigma_{0} is the unitless stress ratio threshold. The expression [t]+[t]_{+} is equal to tt if tt is non-negative and is equal to 0 otherwise. Let TFT_{F} denote the random time to failure for the specimen, under the specified stress history curve, meaning α(TF)=1\alpha(T_{F})=1.

As has been noted (Köhler and Svensson, 2002; Hoffmeyer and Sørensen, 2007; Zhai et al., 2012a; Wong and Zidek, 2018), this model is not dimensionally homogeneous. In particular, the units associated with both terms on the right hand side of Equation (3.1) involve random powers, bb and nn, leading to random units, respectively (psi)b and (psi)n. As noted by Wong and Zidek (2018), the coefficients aa and cc in Equation (3.1) cannot involve these random powers and so cannot compensate to make the model dimensionally homogeneous.

Rescaling is a formal way of addressing this problem. Zhai et al. (2012a) rescale by setting π(t)=τ(t)/τs\pi(t)=\tau(t)/\tau_{s}. They let μ\mu denote the population mean of τs\tau_{s} and write a modified equation of (3.1) as the dimensionally homogeneous model

μα˙(t)=a[π(t)σ0]+b+c[π(t)σ0]+nα(t).\mu\dot{\alpha}(t)=a^{*}[\pi(t)-\sigma_{0}]_{+}^{b}\leavevmode\nobreak\ +\leavevmode\nobreak\ c^{*}[\pi(t)-\sigma_{0}]_{+}^{n}\leavevmode\nobreak\ \alpha(t).

In contrast, Wong and Zidek (2018) propose another dimensionally homogeneous model

μα˙(t)\displaystyle\mu\dot{\alpha}(t) =\displaystyle= [(a~τs)(π(t)σ0)+]b+[(c~τs)(π(t)σ0)+]nα(t),\displaystyle[(\tilde{a}\tau_{s})(\pi(t)-\sigma_{0})_{+}]^{b}\leavevmode\nobreak\ +\leavevmode\nobreak\ [(\tilde{c}\tau_{s})(\pi(t)-\sigma_{0})_{+}]^{n}\leavevmode\nobreak\ \alpha(t),

where a~\tilde{a} and c~\tilde{c} are now random effects with dimensions Force1{}^{-1}\cdot Length2.

We see that there may be several ways to nondimensionalize a model. Another method, widely used in the physical sciences, involves always normalizing by the standard units specified by the Système International d’Unités (SIU), units such as meters or kilograms. So when the dimensions of a non–negative quantity XX like absolute temperature have an associated SIU of Q0={1}[Q0]Q_{0}=\{1\}[Q_{0}], XX can be converted to a unitless quantity by first expressing XX in SIUs and then by using quantity calculus to rescale it as X/Q0X/Q_{0}. The next example provides an important illustration of the application of the standardized unit approach.

Example 6.

Liquids contain both hydrogen and hydroxide ions. In pure water these ions appear in equal numbers. But the water becomes acidic when there are more hydrogen ions and basic when there are more hydroxide ions. Thus acidity is measured by the concentration of these ions. The customary measurement is in terms of the hydrogen ion concentration, denoted H+H^{+} and measured in the SIU of one mole of ions per litre of liquid. These units are denoted co and thus, in our notation, [H+]=[H^{+}]= co. However for substantive reasons, the pH index for the acidity of a liquid is now used to characterize acidity. The index is defined by pH=log10(H+/pH=-\log_{10}(H^{+}/c)o{}^{o}). Distilled water has a pH=7pH=7 while lemon juice has a pHpH level of about 33. Note that {H+}(0,)\{H^{+}\}\in(0,\infty) lies on a ratio-scale while pHpH lies on an interval-scale (,)(-\infty,\infty) – the transformation has changed the scale of measurement.

3.2 The problem of scales

The choice of scale restricts the choice of units of measurement, and these units dictate the type of model that may be used. However, comparing the size of two quantities on a ratio-scale must be made using their ratio, not their difference, whereas the opposite is true on an interval-scale where differences must be used.

Thus we need to study scales in the context of model building and hence in the context of quantity calculus. In his celebrated paper, Stevens (1946) starts by proposing four major scales for measurements or observations: categorical, ordinal, interval and ratio. This taxonomy is based on the notion of permissible transformations as is the work of our Section 5. However, our work is aimed at modelling while Stevens’s work is aimed at statistical analysis. Stevens defines permissible transformations as follows. He allows permutations as the transformations of data on all four scales, allows strictly increasing transformations of data on the ordinal, ratio and interval-scales, allows scalar transformations (f(x)=axf(x)=ax) of data on the ratio and interval-scales and allows linear transformations (f(x)=ax+bf(x)=ax+b) of data on the interval-scale.

Stevens created his taxonomy as a basis for classifying the family of all statistical procedures for their applicability in any given situation (Stevens, 1951). And for instance, Luce (1959) points out that, for measurements made on a ratio-scale, the geometric mean would be appropriate for estimating the central tendency of a population distribution (Velleman and Wilkinson, 1993). In contrast, when measurements are made on an interval-scale the arithmetic mean would be appropriate. The work of Stevens seems to be well-accepted in the social sciences, with Ward (2017) calling his work monumental. But Steven’s work is not widely recognized in statistics. Velleman and Wilkinson (1993) reviews the work of Stevens with an eye on potential applications in the then emerging area in statistics of artificial intelligence (AI), hoping to automate data analysis. They claim that “Unfortunately, the use of Stevens’s categories in selecting or recommending statistical analysis methods is inappropriate and can often be wrong.” They describe alternative scale taxonomies for statistics that have been proposed, notably by Mosteller and Tukey (1977). A common concern centers on the inadequacies of an automaton to select the statistical method for an AI application. Even the choice of scale itself will depend on the nature of the inquiry and thus is something to be determined by humans. For example, length might be observed on the relatively uninformative ordinal scale {short,medium,long}\{short,\leavevmode\nobreak\ medium\leavevmode\nobreak\ ,long\}, were it sufficient for the intended goal of a scientific inquiry, rather than on the seemingly more natural ratio-scale (0,)(0,\infty).

3.3 Why the origin matters

The interval-scale of real numbers allows for the taking of sums, differences, products, ratios, and integer powers of values observed on that scale. Rational powers of nonnegative values are also allowed although irrational powers lead into the domain of transcendental functions and difficulties of interpretation. The same operations are allowed for a ratio-scale of real numbers provided that the differences are non-negative. So superficially, these two scales seem nearly the same.

But there is a substantial qualitative difference between ratio- and interval-scales, so ignoring the importance of scale when building models can result in challenges in interpretation. The issue has to do with the meaning of the 0 on a ratio-scale. The next hypothetical example illustrates the point.

Example 7.

When the material in the storage cabinet at a manufacturing facility has been depleted, the amount left is 0. To understand the usefulness of this origin, consider if the facility’s inventory monitoring program recorded a drop of 100kg100kg during the past month. Without the knowledge of the origin, of where the amount of inventory lies on the scale, the implications of this drop are unclear. If the amount left in the facility is 99,900kg99,900kg the drop means one thing, while if the amount left is 50kg50kg, the interpretation would be completely different.

Since the amount of inventory lies on the ratio-scale, these changes should instead be reported using ratios. The recorder in the facility in the first case would report a decline in inventory of 100/100,000=0.001100/100,000=0.001 or 0.1%0.1\%. In the second case, the recorder would report a decline of 100/150=2/3100/150=2/3 or 66.7%66.7\%, the same drop but with a totally different meaning. This example explains why stock price changes are reported on a ratio-scale, as a percentage, and not on an interval-scale.

4 Transforming quantities

The scale of the measurement XX may be transformed in a variety of ways. No change of scale occurs when the transformation is a rescaling, where we know how to transform both the numerical part of XX and XX’s units of measurement. When the transformation is complex, the scale itself might change. For instance, if XX is measured on a ratio-scale, then the logarithm of XX will be on an interval-scale. Observe that in Example 6, the units of measurement in H+H^{+} were eliminated before transforming by the transcendental function log10\log_{10}. That raises the question: do we need to eliminate units before applying the logarithm? This question and the logarithmic transformation in science have led to vigorous debate for over six decades (Matta et al., 2010). We highlight and resolve some of that debate below in Section 4.4.

However, we begin with an even simpler situation seen in the next subsection, where we study the issues that may arise when interval-scales are superimposed on ratio-scales.

4.1 Switching scales

This subsection concerns a perhaps unconscious switch in a statistical analysis from a ratio scale, which lies on [0,)[0,\infty), to an interval-scale, which lies on (,)(-\infty,\infty).

The bell curve approximation.

Despite the fundamental difference between the ratio-scale and interval-scales, the normal approximation is often used to approximate the sampling distribution for a ratio-valued response quantity. This in effect replaces the ratio-scale with an interval-scale. In this situation, what should be used is the truncated normal distribution approximation, although this introduces undesired complexity. For example, if the approximation for the cumulative distribution function CDF were P(XxX>0)P(X\leq x\mid X>0) where XN(μ,σ2X\sim N(\mu,\sigma^{2}, we would have

E(ZX>0)=ϕ(μ/σ)Φ(μ/σ)E(Z\mid X>0)=\frac{\phi(\mu/\sigma)}{\Phi(\mu/\sigma)} (4.1)

where Z=(Xμ)/σZ=(X-\mu)/\sigma , while ϕ\phi and Φ\Phi denote respectively, the standardized Gaussian distribution’s probability density function and CDF. Observe that in Equation (4.1) is invariant under changes of the units in which XX is measured, as it should be. Furthermore the expectation of ZZ would be approximately 0 if μ/σ\mu/\sigma were large compared to 0 as it would be were the non-truncated Gaussian distribution for an interval-scale imposed on this ratio-scale. This would occur if the mean μ\mu were much larger than the standard distribution σ\sigma. That suggests the bell curve approximation would not work well as an approximation were the population under investigation widely dispersed. For example it might be satisfactory if XX represented the height of a randomly selected adult woman, but not if it were the the height of a randomly selected human female.

As mentioned at the beginning of this section, this switch occurs when approximating a distribution. This switch is ubiquitous and seen in most elementary statistics textbooks. The assumed Gaussian distribution model leads to the sample average as a measurement of the population average instead of the geometric mean, which should have been used (Luce, 1959). That same switch is made in such things as regression analysis and the design of experiments. The seductive simplicity has also led to the widespread use of the Gaussian process in spatial statistics and machine learning.

The justification of the widespread use of the Gaussian approximation may well lie in the belief that the natural origin 0 of the ratio-scale lies well below the range of values of XX likely to be found in a scientific study. This may well be the explanation of the reliance on interval-scales for temperature in Celsius and Fahrenheit on planet Earth at least since one would not expect to see temperatures anywhere near the true origin of temperature 0K0^{\circ}K on the Kelvin scale (ratio) that corresponds to 273C-273^{\circ}C on the Celsius scale (interval). We would note in passing that these two interval-scales for temperature also illustrate the statistical invariance principle (see Subsection 6.4); each scale is an affine transformation of the other.

We illustrate the difficulties that can arise when an interval-scale is misused in a hypothetical experiment where measurements are made on a ratio-scale, with serious consequences.

Example 8.

The justification above for the switch from a ratio to an interval-scale can be turned into a simple approximation that may help with the interpretation of the data. To elaborate, suppose interest lies in comparing two values of XX, x1x_{1} and x2x_{2}, that lie in a ratio-scale with 0<a<x1<x20<a<x_{1}<x_{2} for a known aa. Interest lies in the relative size of these quantities i.e. on r=x2/x1r=x_{2}/x_{1}. An approximation to rr through the first order Taylor expansion of x2/x1x_{2}/x_{1} at (x1,x2)=(a,a)(x_{1},x_{2})=(a,a) yields r1+(x2x1)/ar\approx 1+(x_{2}-x_{1})/a, thus providing an approximation to rr on an interval-scale. For instance, with a=120cm,x1=150cm,a=120cm,\leavevmode\nobreak\ x_{1}=150cm, and x2=180cmx_{2}=180cm, the ratio is r=1.20r=1.20 and the approximation, 1.251.25. Both are unitless. This points to the potential value of rescaled ratio data when a Gaussian approximation is to be used for the sampling distribution of a quantity on a ratio-scale.

4.2 Algebraic versus transcendental functions

A function uu, which describes the relationship among quantities X1,,XpX_{1},\dots,X_{p} as

u(X1,,Xp)=0,u(X_{1},\dots,X_{p})=0,

may be a sequence of transformations or operations involving the XiX_{i}’s, possibly combined with parameters. We know how to calculate the resulting units of measurement when uu consists of a finite sequence of permissible algebraic operations. The function consisting of the concatenation of such a sequence may formally be defined as a root of a polynomial equation that must satisfy the requirement of dimensional homogeneity (other desirable properties of uu along with methods for determining an allowable uu are discussed in Section 5). Such a function is called algebraic.

But uu may also involve non-algebraic operations leading to non-algebraic functions called transcendental (because they “transcend” an algebraic construction). Examples in the univariate case (p=1)p=1) are sin(X)\sin(X) and cosh(X)\cosh(X) and, for a given nonnegative constant α\alpha, αX\alpha^{X} and logα(X)\log_{\alpha}{(X)}. The formal definition of a non-algebraic function does not explicitly say whether or not such a function can be applied to quantities with units of measurement. Bridgman (1931) sidesteps this issue by arguing that it is mute since valid representations of natural phenomena can always be nondimensionalized (see Subsection 5.1). But the current Wikipedia entry on the subject states “transcendental functions are notable because they make sense only when their argument is dimensionless” (Wikipedia, 2020). The next subsection explores the much used class of Box-Cox family (Box and Cox, 1964) that includes transcendental functions.

4.3 The Box-Cox tranformation

Frequently in statistical modelling, a transformation is used to extend the domain of applicability of a procedure that assumes normally distributed measurements (De Oliveira, Kedem and Short, 1997). That transformation may also be seen as a formal part of statistical model building that facilitates maximum likelihood estimation of a single parameter (Draper and Cox, 1969). The Box-Cox (BC) transformations constitute an important class of such transformation and therefore a standard tool in the statistical scientist’s toolbox.

In its simplest form, a member of this family of transformations has the form of a function bc(X)=Xλbc(X)=X^{\lambda} for a real-valued parameter λ(,)\lambda\in(-\infty,\infty). Here XX would need to lie in [0,)[0,\infty), a ratio-scale, to avoid potential imaginary numbers. However, in practice interval-scales are sometimes allowed, a positive constant being added to avoid negative realizations of XX. This ad hoc procedure thus validates the use of a Gaussian distribution to approximate the sampling distribution for XX.

Since XX is measured on a ratio–scale, for any two points on that scale, bc(x2/x1)=bc(x2)/bc(x1)bc(x_{2}/x_{1})\leavevmode\nobreak\ =\leavevmode\nobreak\ bc(x_{2})/bc(x_{1}) while the scale is equivariant under multiplicative transformations, i.e., bc(ax)=aλbc(x)bc(ax)\leavevmode\nobreak\ =\leavevmode\nobreak\ a^{\lambda}\leavevmode\nobreak\ bc(x) for any point on that scale. Finally bc(X)>0bc(X)>0 so that the result of the transformation also lies on a ratio–scale, even when its intended goal is an approximately Gaussian distribution for the (transformed) response.

Box and Cox (1964) actually state their transformation as

bcλ(X)=Xλ1λ,(λ0),bc_{\lambda}(X)=\begin{array}[]{cc}\frac{X^{\lambda}-1}{\lambda},\nobreak\leavevmode&(\lambda\neq 0)\\ \end{array}, (4.2)

that moves the origin of the ratio–scale from 0 to 1-1 . It is readily seen that unless λ\lambda is a rational number, bcλ,λ0bc_{\lambda},\leavevmode\nobreak\ \lambda\neq 0 will be transcendental. That fact would be inconsequential in practice in as much as a modeller would only ever use a rational number for λ\lambda . Or at least that would be the case except that the BC class has been extended to include λ=0\lambda=0 by admitting for membership limλ0bcλ(X)=ln(X),X>0\lim_{\lambda\rightarrow 0}bc_{\lambda}(X)=\ln(X),\leavevmode\nobreak\ X>0 .

On closer inspection, we see that for validity, in Equation (4.2), the 11 needs to be replaced by (1[X])λ(1[X])^{\lambda} to include units of measurement. Then for λ0\lambda\neq 0 the transformation becomes

Xλbc(X)=Xλ1λ={X}λ1λλ[X]λ.X_{\lambda}\doteq bc(X)=\frac{X^{\lambda}-1}{\lambda}=\frac{\{X\}^{\lambda}-1^{\lambda}}{\lambda}[X]^{\lambda}. (4.3)

As λ0\lambda\rightarrow 0, the only tenable limit would seem to be

X0=ln(({X}))X_{0}=\ln{(\{X\})}

not ln((X))\ln{(X}). In other words, in taking logarithms in the above example, the authors may have unconsciously nondimensionalized the measurements. Taking antilogarithms would then return {X}\{X\}, not XX.

Equation 4.3 thus tells us the Box–Cox transformation may have the unintended consequence of transforming not only the numerical value of the measurements but also its units of measurement, which become [X]λ[X]^{\lambda}. This makes a statistical model difficult to interpret. For example imagine the challenge of a model with a random response in units of mm1/100mm^{1/100}. And since the transformation is nonlinear, returning to the original scales of the data would be difficult. For example, unbiased estimates would become biased and their standard errors could be hard to approximate.

Remark 2.

Box and Cox (1964) do not discuss the issue of scales in relation to the transformation it introduces. In that paper’s second example the logarithmic transformation is applied to XX, the number of number of cycles to failure, which may be regarded as unitless.

Remark 3.

The mathematical foundation of the Box–Cox family is quite complicated. Observe that in Equation (4.2), if λ\lambda is a rational number m/nm/n for some nonnegative integers mm and nn, bcλbc_{\lambda} will be an algebraic function of XX. So as λ\lambda varies over its domain, (,)(-\infty,\infty), the function flips back and forth from an algebraic to a transcendental function. For any fixed mm and point xx, as nn approaches infinity, the trajectory of

{bcm/n(x):n=1,2,,}\{bc_{m/n}(x):n=1,2,\dots,\}

converges to ln(x)\ln{x}, so the family now includes the logarithmic transformation as another transformation in the statistical analyst’s toolbox, which is used when the response distribution appears to have a long right tail. Thus a transcendental function has been added to the family of algebraic transformations obtained when λ\lambda is chosen to be a positive rational number. It does not seem to be known if all transcendental transformations lie in the closure of the class algebraic functions under the topology of pointwise convergence. However, when this family is shifted from the domain of mathematical statistics in the human brain to that of computational statistics in the computer’s processor, this complexity disappears. In the computational process, all functions are algebraic and neither the logarithm nor infinity exist.

The importance of the logarithmic transformation in statistical and scientific modelling, and issues that have arisen about its lack of units, leads next to a special subsection devoted to it.

4.4 The logarithm: a transcendental function

Does the logarithm have units?

We have argued (see Example 3) that the answer is “no.” First consider applying the logarithm to a unitless quantity xx. It is sensible to think that its value will have no units, and so we take this as fact. But what happens if we apply the logarithm to a quantity with units? For instance, is log(12 inches) = log(12) + log(inches)? This issue has been debated for decades across different scientific disciplines; we summarize recent debates in Appendix A.

We now discuss this issue in more detail and argue that the result must be a unitless quantity. We use the definition of the natural logarithm of xx as the area under the curve of the function f(u)=1/uf(u)=1/u (Molyneux, 1991). We follow the notation defined in Section 3.1, and for clarification, we write “1” as y1[x]y\equiv 1[x] and u={u}[x]u=\{u\}[x]. We then make the change of variables v=u/yv=u/y so that vv is unitless and get

ln(x)=yx1ud(u)=1x/y1yvd(yv)=1x/y1vd(v),\ln(x)=\int_{y}^{x}\frac{1}{u}\leavevmode\nobreak\ d(u)=\int_{1}^{x/y}\frac{1}{y\leavevmode\nobreak\ v}\leavevmode\nobreak\ d(y\leavevmode\nobreak\ v)=\int_{1}^{x/y}\frac{1}{v}d(v), (4.4)

which is a unitless quantity, as claimed.

We now derive the more specific result, that ln(x)=ln({x})\ln(x)=\ln(\{x\}). In other words, applying this transcendental function to a dimensional quantity xx simply causes the units to be lost. We show below, from first principles, that for vv unitless,

dln(v)dv=1v.\frac{d\ln(v)}{dv}=\frac{1}{v}.

This implies that

ln(w)=1w1v𝑑v,\ln(w)=\int_{1}^{w}\frac{1}{v}\leavevmode\nobreak\ dv,

which, combined with Equation (4.4), implies that ln(x)=ln({x})\ln(x)=\ln(\{x\}).

To show that the derivative of ln(v)\ln(v) is 1/v1/v, we turn to the original definition of the natural logarithm as the inverse of another transcendental function exp(v)\exp(v), at least if v>0v>0. In other words v=exp(lnv),v>0.v=\exp\leavevmode\nobreak\ (\ln v),\leavevmode\nobreak\ v>0. The chain rule now tells us that

1=dln(v)dvexp(lnv).1=\frac{d\ln(v)}{dv}\exp\leavevmode\nobreak\ (\ln v).

Thus

dln(v)dv=exp(lnv)=1v\frac{d\ln(v)}{dv}=\exp\leavevmode\nobreak\ (-\ln v)=\frac{1}{v}

for any real v>0v>0.

Can we take the logarithm of a dimensional quantity with units?

We argue that the answer is “no.” We reason along the lines of Molyneux (1991), who sensibly argues that, since lnx\ln x has no units even when xx has units, the result is meaningless. In other words, since it is disturbing that the value of a function is unitless, no matter what the argument, we should not take the logarithm of a dimensional quantity with units.

This view agrees with that of Meinsma (2019). He notes that the Shannon entropy of a probability density function ff, which has units, is defined in terms of the lnf(x)ln\leavevmode\nobreak\ f(x). He notes that Shannon found this to be an objectionable feature of his entropy but rationalized its use nevertheless. But not Meinsma, who concludes “To me it still does feel right….”

To consider the ramifications of ignoring this in a statistical model, suppose that zz is some measure of particulate air pollution in the logarithmic scale with z=lnxz=\ln x where xx is a measurement with units. This measurement appears as βz\beta\leavevmode\nobreak\ z in a scientific model of the impact of particulate air pollution on health (Cohen et al., 2004). In this model, even though zz is unitless, its numerical value depends on the numerical value of xx, via {z}=ln{x}\{z\}=\ln\{x\}. Thus the numerical value of zz depends on the units of measurement of xx. But, since zz itself is unitless, we cannot adjust β\beta to reflect changes in the units of xx. To make this point explicit, suppose that experimental data pointed to the value β=1,101,231.52\beta=1,101,231.52. We have no idea if air pollution was a serious health problem. Thus, we see the problem that arises with a model that involves the logarithm of a measurement with units. This property of the logarithm points to the need to nondimensionalize xx before applying the logarithmic transformation in scientific and statistical modelling, in keeping with the theories of Buckingham, Bridgman and Luce.

One of the major routes taken in debates about the validity of applying the natural logarithm to a dimensional quantity involves arguments based one way or another on a Taylor expansion. A key feature of these debates involves the claim that the expansion is impossible, since the terms in the expansion have different units and so cannot be summed (Mayumi and Giampietro, 2010). We show below that this claim is incorrect by showing that all of the terms in the expansion have no units (see Appendix A for more details).

Key to the Taylor expansion argument of validity is how to take the derivative of lnx\ln x when xx has units. Recall that above, we calculated the derivative of lnv\ln v for vv unitless. To define the derivative of lnx\ln x when xx has units, we proceed from first principles. Suppose we have a function ff with argument x={x}[x]x=\{x\}[x]. We define the derivative of ff with respect to xx as follows. Let Δ={Δ}[Δ]\Delta=\{\Delta\}[\Delta] and x={x}[x]x=\{x\}[x] and suppose that Δ\Delta and xx have the same units, that is, that [Δ]=[x][\Delta]=[x]. Otherwise, we would not be able to add xx and Δ\Delta in what follows. Then we define

f(x)\displaystyle f^{\prime}(x) \displaystyle\equiv lim{Δ}0f(x+Δ)f(x)Δ\displaystyle\lim_{\{\Delta\}\to 0}\frac{f(x+\Delta)-f(x)}{\Delta}
=\displaystyle= lim{Δ}0f({x+Δ}[x+Δ])f({x}[x]){Δ}[Δ]\displaystyle\lim_{\{\Delta\}\to 0}\frac{f(\{x+\Delta\}[x+\Delta])-f(\{x\}[x])}{\{\Delta\}[\Delta]}
=\displaystyle= lim{Δ}0f({x+Δ}[x])f({x}[x]){Δ}[x].\displaystyle\lim_{\{\Delta\}\to 0}\frac{f(\{x+\Delta\}[x])-f(\{x\}[x])}{\{\Delta\}[x]}.

For instance, for f(x)=x2f(x)=x^{2}

ddxx2\displaystyle\frac{d}{dx}x^{2} =\displaystyle= lim{Δ}0{x+Δ}2[x]2{x}2[x]2{Δ}[x]=lim{Δ}0{x+Δ}2{x}2{Δ}×[x]\displaystyle\lim_{\{\Delta\}\to 0}\frac{\{x+\Delta\}^{2}[x]^{2}-\{x\}^{2}[x]^{2}}{\{\Delta\}[x]}=\lim_{\{\Delta\}\to 0}\frac{\{x+\Delta\}^{2}-\{x\}^{2}}{\{\Delta\}}\times[x]
=\displaystyle= 2{x}[x]=2x.\displaystyle 2\{x\}[x]=2x.

To use Equation (4.4) to differentiate f(x)=ln(x)f(x)=\ln(x), we first write

ln(x+Δ)lnx=ln{x+Δ}ln{x}.\ln(x+\Delta)-\ln x=\ln\{x+\Delta\}-\ln\{x\}.

So

ddxlnx=lim{Δ}0ln{x+Δ}ln{x}{Δ}[x]=dln{x}d{x}×1[x]=1{x}1[x]=1x.\frac{d}{dx}\ln x=\lim_{\{\Delta\}\to 0}\frac{\ln\{x+\Delta\}-\ln\{x\}}{\{\Delta\}[x]}=\frac{d\ln\{x\}}{d\{x\}}\times\frac{1}{[x]}=\frac{1}{\{x\}}\frac{1}{[x]}=\frac{1}{x}.

Using this definition of the derivative we can carry out a Taylor series expansion about x=a>0x=a>0 to obtain

log(x)=log(a)+k=1g(k)(a)(xa)kk!,\log(x)=\log(a)+\sum_{k=1}^{\infty}g^{(k)}(a)\frac{(x-a)^{k}}{k!}, (4.6)

where

g(k)(a)=[dklog(x)/dxk|x=a.g^{(k)}(a)=\bigg{[}d^{k}\log(x)/dx^{k}\bigg{|}_{x=a}.

As g(x)=1/xg^{\prime}(x)=1/x, the first term, g(x)(xa)g^{\prime}(x)(x-a), in the infinite summation is unitless. Differentiating g(x)g^{\prime}(x) yields g′′(x)=1/x2g^{\prime\prime}(x)=1/x^{2} and once again, we see that the term g′′(x)(xa)2/2g^{\prime\prime}(x)(x-a)^{2}/2 is unitless. Continuing in this way, we see that the summation on the right side of equation (4.6) is unitless, and so the equation satisfies dimensional homogeneity. This reasoning differs from the incorrect reasoning of Mayumi and Giampietro (2010) in their argument that the logarithm cannot be applied to quantities with units because the terms in the Taylor expansion would have different units. Our reasoning also differs from that of Baiocchi (2012) who uses a different expansion to show that the logarithm cannot be applied to measurements with units, albeit without explicitly recognizing the need for lnx\ln x to be unitless. The expansion in Equation (4.6) is the same as that given in Matta et al. (2010), albeit not in an explicit form for lnx\ln x. Like us, they do discredit the Taylor expansion argument against applying lnx\ln x to quantities with units.

5 Allowable relationships among quantities

Having explored dimensional analysis and the kinds of difficulties that can arise when scales or units are ignored, we turn to a key step in unifying physical and statistical modelling. We now determine how to relate quantities and hence how to specify the ‘law’ that characterizes the phenomenon which is being modelled.

But what models may be considered legitimate? Answers for the sciences, given long ago, were based on the principle that for a model to completely describe a natural phenomenon, it cannot depend on the units of measurement that might be chosen to implement it. This answer was interpreted in two different ways. In the first interpretation, the model must be nondimensionalizable, i.e., it cannot have scales of measurement and hence cannot depend on units. In the second interpretation, the model must be invariant under all allowable transformations of scales. Both of these interpretations reduce the class of allowable relationships that describe the phenomenon being modelled and place restrictions on the complexity of any experiment that might be needed to implement that relationship.

5.1 Buckingham’s Pi-theorem

The section begins with Buckingham’s simple motivating example.

Example 9.

This example is a characterization of properties of gas in a container, namely, a characterization of the relationship amongst the pressure (pp), the volume (vv), the number of moles of gas (NN) and the absolute temperature (θ\theta) of the gas. The absolute temperature reflects the kinetic energy of the system and is measured in degrees Kelvin (K{}^{\circ}K), the SIUs for temperature. A fundamental relationship amongst these quantities is given by

pvθND=0\frac{pv}{\theta N}-D=0 (5.1)

for some constant DD that doesn’t depend on the gas. Since the dimension of pv/(Nθ)pv/(N\theta) is (force ×\times length3)/(# moles ×\times temperature), as expressed, the relationship in Equation (5.1) depends on the units associated with p,vp,v and θ\theta, whereas the physical phenomenon underlying the relationship does not. Buckingham gets around this by invoking a parameter RR (D\equiv D) with units (# moles ×\times temperature)/(force ×\times length3). He rewrites Equation (5.1) as

pvRθN1=0.\frac{pv}{R\theta N}-1=0.

Thus π=pv/(RθN)\pi=pv/(R\theta N) has no units. Buckingham calls this equation complete and hence nondimensionalizable. This equation is known as the Ideal Gas Law, with RR denoting the ideal gas constant (LibreTexts, 2019).

This example of nondimensionalizing by finding one expression, π\pi, as in Equation (5.1) can be extended to cases where we must nondimensionalize by finding several π\pi quantities. This extension is formalized in Buckingham’s Pi-theorem. Here is a formal statement (in slightly simplified form) as stated by Buckingham (1914) and discussed in a modern style in Bluman and Cole (1974).

Theorem 1.

Suppose X1,,XpX_{1},\dots,X_{p} are pp measurable quantities satisfying a defining relation

u(X1,,Xp)=0u(X_{1},\dots,X_{p})=0 (5.2)

that is dimensionally homogeneous. In addition, suppose that there are mm dimensions appearing in this equation, denoted L1,,LmL_{1},\ldots,L_{m}, and that the dimension of uu can be expressed [u]=L1α1××Lmαm[u]=L_{1}^{\alpha_{1}}\times\cdots\times L_{m}^{\alpha_{m}} and the dimension of each XjX_{j} can be expressed as [Xj]=L1αj1××Lmαjm[X_{j}]=L_{1}^{\alpha_{j1}}\times\cdots\times L_{m}^{\alpha_{jm}}. Then Equation (5.2) implies the existence of qq fundamental quantities, qpmq\geq p-m dimensionless quantities π1,,πq\pi_{1},\dots,\pi_{q} with πi=Πj=1pXjaji,i=1,,q\pi_{i}=\Pi_{j=1}^{p}X_{j}^{a_{ji}},\leavevmode\nobreak\ i=1,\dots,q, and a function UU such that

U(π1,,πq)=0.U(\pi_{1},\dots,\pi_{q})=0.

In this way uu has been nondimensionalized. The choice of π1,,πq\pi_{1},\ldots,\pi_{q} in general is not unique.

The theorem is proven constructively, so we can find π1,,πq\pi_{1},\ldots,\pi_{q} and UU. We first determine the mm fundamental dimensions used in X1,,XpX_{1},\ldots,X_{p}. We then use the quantities X1,,XpX_{1},\ldots,X_{p} to construct two sets of variables: a set of mm primary variables also called repeating variables and a set of qq secondary variables, which are nondimensional. For example, if X1X_{1} is the length of a box and X2X_{2} is the height and X3X_{3} is the width, then there is m=1m=1 fundamental dimension, the generic length denoted LL. We can choose X1X_{1} as the primary variable and use X1X_{1} to define two new variables π1=X2/X1\pi_{1}=X_{2}/X_{1} and π2=X3/X1\pi_{2}=X_{3}/X_{1}. These new variables, called secondary variables, are dimensionless. Buckingham’s theorem states the algebraic equation relating X1,X2X_{1},X_{2} and X3X_{3} can be re-written as an equation involving only π1\pi_{1} and π2\pi_{2}. Note that we could have also chosen either X2X_{2} or X3X_{3} as the primary variable.

A famous application of Buckingham’s theorem concerns the discovery of the Reynold’s number in fluid dynamics, which is discussed in Gibbings (2011). For brevity we include that example in Appendix B.

A link between Buckingham’s approach and statistical modelling was recognized in the paper of Albrecht et al. (2013) and commented on in Lin and Shen (2013). But its link with the statistical invariance principal seems to have been first identified in the thesis of Shen (2015). This connection provides a valuable approach for the statistical modelling of scientific phenomena. Shen builds a regression model starting with Buckingham’s approach and thereby a nondimensionalized relationship amongst the variables of interest. We propose a different approach in Section 6. We present Shen’s illustrative example next.

Example 10.

This example, from Shen (2015), concerns a model for the predictive relationship between the volume X3X_{3} of wood in a pine tree and its height X1X_{1} and diameter X2X_{2}. The dimensions are [X1]=L[X_{1}]=L, [X2]=L[X_{2}]=L and [X3]=L3[X_{3}]=L^{3}. Shen chooses X1X_{1} as the repeating variable and calculates the π\pi-functions π1=X2X11\pi_{1}=X_{2}X_{1}^{-1} and π2=X3X13\pi_{2}=X_{3}X_{1}^{-3}. He then applies the Pi-theorem to get the dimensionless version of the relationship amongst the variables:

π2=g(π1)\pi_{2}=g(\pi_{1})

for some function gg. He correctly recognizes that (π1,π2)(\pi_{1},\pi_{2}) is the maximal invariant under the scale transformation group, although the connection to the ratio-scale of Stevens is not made explicitly. He somewhat arbitrarily chooses the class of relationships given by

π2=kπ1γ.\pi_{2}=k\pi_{1}^{\gamma}. (5.3)

He linearizes the model in Equation (5.3) by taking the logarithm and adds a residual to get a standard regression model, susceptible to standard methods of analysis. In particular the least squares estimate γ^=1.942\hat{\gamma}=1.942 turns out to provide a good fit judging by a scatterplot.

Note that application of the logarithmic transformation is justified since the π\pi-functions are dimensionless.

5.2 Bridgman’s alternative

We now describe an alternative to the approach of Buckingham (1914) due to Bridgman (1931). At around the same time that Edgar Buckingham was working on his Pi-theorem, Percy William Bridgman was giving lectures at Harvard on the topic of nondimensionalization that were incorporated in a book whose first edition was published by Yale University Press in 1922. The second edition came out in 1931 (Gibbings, 2011). Bridgman thanks Buckingham for his papers but notes their approaches differ. And so they do. For a start, Bridgman asserts his disagreement with the position that seems to underlie Buckingham’s work that “a dimensional formula has some esoteric significance connected with the ‘ultimate nature’ of things.” Thus those that espouse that point of view must “find the true dimensions and when they are found, it is expected that something new will be suggested about the physical properties of the system.” Instead, Bridgman takes measurement itself as the starting point in modelling and even the collection of data: “Having obtained a sufficient area of numbers by which the different quantities are measured, we search for relations between these numbers, and if we are skillful and fortunate, we find relations which can be expressed in mathematical form.” He then seeks to characterize a measured quantity as either primary, that is, the product of direct measurement, or secondary, that is, computed from the measurements of primary quantities, as, for instance, velocity is computed from the primary quantities of length and time. Finally he sees the basic scientific issue as that of characterizing one quantity in terms of the others as in our explication of Buckingham’s work above in terms of the function uu.

Bridgman considers the functional relationship between secondary and primary measurements under what statistical scientists might call “equivariance” under multiplicative changes of scale in the primary units. He proves that the functional relationship must be based on monomials with possible fractional exponents, not unlike the form of the π\pi-functions above. Thus Bridgman is able to re-derive Buckingham’s π\pi formula, albeit with the added assumption that uu is differentiable with respect to its arguments.

5.3 Beyond ratio-scales

Nondimensionalization seems more difficult outside of the domain of the physical sciences. For example, the dimensions of quantities such as utility cannot be characterized by a ratio-scale. And the choice of the primary dimensions is not generally so clear, although Baiocchi (2012) does provide an example in macroeconomics where time [T][T], money [$][\$], goods [R][R] and utility [U][U] may together be sufficient to characterize all other quantities.

Bridgman’s results on allowable laws were limited to laws involving quantities measured on ratio-scales. A substantial body of work has been devoted to extending these results to laws involving quantities measured on nonratio-scales, beginning with the seminal paper of Luce (1959). To quote the paper by Aczél, Roberts and Rosenbaum (1986), which contains an extensive review of that work, “Luce shows that the general form of a ‘scientific law’ is greatly restricted by knowledge of the ‘admissible transformations’ of the dependent and independent variables.” It seems puzzling that this principle has been recognized little if at all in statistical science. This may be due to the fact fact that little attention is paid to such things as dimensions and units of measurement.

The substantial body of research that followed Luce’s publication covers a variety of scales, e.g. ordinal, among other things. Curiously that body of work largely ignores the work of Buckingham in favor of Bridgman even though the former preceded the latter. Also ignored is the work on statistical invariance described, which goes back to G. Hunt and C. Stein in 1946 in unpublished but well-known work that led to optimum statistical tests of hypotheses.

To describe this important work by Luce, we re-express Equation (5.2) as

Xp=u(X1,,Xp1)X_{p}=u^{*}(X_{1},\dots,X_{p-1}) (5.4)

for some function uu^{*} and thereby define a class of all possible laws that could relate XpX_{p} to the predictors X1,,Xp1X_{1},\dots,X_{p-1}, before turning to a purely data-based empirical assessment of the possible uu^{*}’s. Luce requires that uu^{*} satisfy an invariance condition. Specifically, he makes the strong assumption that the scale of each Xi,i=1,,p1X_{i},\leavevmode\nobreak\ i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p-1, is susceptible to a transformation TiiT_{i}\in{\cal F}_{i}, i.e. XiTi(Xi)X_{i}\rightarrow T_{i}(X_{i}) for some sets of possible transformations 1,,p1{\cal F}_{1},\dots,{\cal F}_{p-1}. Furthermore he assumes that the XiX_{i}’s are transformed independently of one another; no structural constraints are imposed. Luce assumes the existence of a function DD such that

u(T1(X1),,Tp1(Xp1))=D(T1,,Tp1)u(X1,,Xp1)u^{*}\left(T_{1}(X_{1}),\dots,T_{p-1}(X_{p-1})\right)=D(T_{1},\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ T_{p-1})\leavevmode\nobreak\ u^{*}(X_{1},\dots,X_{p-1}) (5.5)

for all possible transformations and choices of Xi,i=1,,p1X_{i},\leavevmode\nobreak\ i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p-1. He determines that under these conditions, if each Xi,i=1,,pX_{i},\leavevmode\nobreak\ i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p, lies on a ratio-scale then

u(X1,,Xp1)Πi=1p1Xiαi,u^{*}(X_{1},\dots,X_{p-1})\propto\Pi_{i=1}^{p-1}\leavevmode\nobreak\ X_{i}^{\alpha_{i}},

where the αi\alpha_{i}’s are nondimensional constants. This is Bridgman’s result, albeit proved by Luce without assuming differentiability of uu^{*}. If on the other hand some of the XiX_{i}’s, i=1,,p1i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p-1, are on a ratio-scale while others are on an interval-scale and XpX_{p} is on an interval-scale, then Luce proves uu^{*} cannot exist except in the case where p=2p=2 and X1X_{1} is on an interval-scale.

However, as noted by Aczél, Roberts and Rosenbaum (1986), the assumption in Equation (5.5) of the independence of the transformations TiT_{i} seems unduly strong for many situations, and weakening that assumption expands the number of possibilities for the form of uu^{*}. Further work culminated in that of Paganoni (1987). While this work was for XiX_{i}’s in a general vector space, for simplicity we present it here in our context, where XiIRX_{i}\in{\rm I\!R}, i=1,,pi=1,\ldots,p. Let 𝒳{\cal X} and 𝒫{\cal{P}} be nonempty subsets of IRp1{\rm I\!R}^{p-1} and {\cal R} a set of (p1)(p-1) by (p1)(p-1) real-valued matrices. Suppose that

  1. 1.

    x+p𝒳\textbf{x}+\textbf{p}\in{\cal X} for all x𝒳\textbf{x}\in{\cal X} and p𝒫\textbf{p}\in{\cal P};

  2. 2.

    the identity matrix is in {\cal R} and, for all RR\in{\cal R} and all x𝒳{\textbf{x}}\in{\cal X}, Rx𝒳R{\textbf{x}}\in{\cal X};

  3. 3.

    if 𝒫{0}{\cal P}\neq\{0\}, then λR\lambda R\in{\cal R} for all RR\in{\cal R} and all λ>0\lambda>0.

Suppose also that the function uu^{*} in Equation (5.4) satisfies

u(Rx+p)=α(R,p)u(x)+β(R,p)u^{*}(\textbf{R}\leavevmode\nobreak\ \textbf{x}+\textbf{p})=\alpha(\textbf{R},\textbf{p})\leavevmode\nobreak\ u^{*}(\textbf{x})\leavevmode\nobreak\ +\leavevmode\nobreak\ \beta(\textbf{R},\textbf{p})

for all R\textbf{R}\in\mathcal{R}, x𝒳\textbf{x}\in{\cal X} and p𝒫\textbf{p}\in{\cal P} for some positive-valued function α\alpha and real-valued function β\beta. Paganoni then determines the possible forms of α\alpha and β\beta.

6 Statistical invariance

Having covered some important issues at the foundations of modelling in previous sections, we now turn to the modelling itself. It usually starts with a study question on the relationship among a set of specified, observable or measurable attributes of members of a population, ωΩ\omega\in\Omega. A random sample of its members is to be collected to address the study question.

A fundamental principle (Principle 1) for constructing a model for natural phenomena, which is embraced by Buckingham’s Pi-theory, asserts that the model cannot depend on the scales and consequent units in which the attributes are to be measured. That principle can be extended to cover other features deemed to be irrelevant.

Principle 2 for constructing a model calls for judicious attribute choices and transformations to reduce the sample size needed to fit the model. The specified attributes could be design variables selected in advance of sampling to maximize the value of the study. A notable example comes from computationally expensive computer simulators. These are run at a selected set of input attributes to develop computationally cheap emulators. This in turn leads to a need to reduce the number of inputs in the predictive model that would need to be fitted empirically. A classical example is given in Appendix B where Buckingham’s theorem leads to a model with just a single predictand and a single predictor, the latter being derived from the original set of five predictors.

Finally we dichotomize general approaches to modelling. Approach 1 i.e. scientific modelling, leads to what Meinsma (2019) calls “physical models.” The models are generally deterministic with attributes measured on a ratio-scale. The uu in Equation (5.2) is commonly known at least up to unknown parameters (e.g., Example 1), before any sampling is done.

Approach 2 leads to a second type of models commonly seen in the social and medical sciences. There, we have a sample of nn attribute-vectors, each of dimension pp to which the invariance principle is applied. And that application can lead to a nondimensionalization of the data with a consequent reduction in the number of attributes, all based on the aggregated sample of nn attribute-vectors. But beyond eliminating irrelevant units of measurement, applying the principle can eliminate other irrelevant features of the data, such as angle of rotation. In our approach to be described, the entire sample is holistically incorporated into model development and implementation. Now a single maximal invariant is used to summarize the sample.

In keeping with the goal of generalizing Buckingham’s theory, our approach will focus on the construction of a predictive distribution model. Model uncertainties can then be characterized through such things as conditional variances and residual analysis. Furthermore principled empirical assessments of the validity of uu can be made given the replicate samples.

Scales play a prominent role in modelling as well. So for categorical attributes, e.g. R red, Y yellow, G green, the model should be invariant under permutations of the code by which the attributes are recorded. Models with ordinal attributes e.g. small, medium, large should be invariant under positive monotone transformation. But, as noted in Section 1, this paper will focus mainly on ratio-scales and interval-scales. For all scales, and both approaches to modelling, transformation groups to which we now turn play a key role.

6.1 Transformation groups

This subsection reviews the theory of transformation groups and the statistical invariance principle, a topic that has a rich history (Eaton, 1983). These are needed for extending the Buckingham Pi-theory. That need is recognized by Shen and Lin (2019), although their applications concern ratio-scales and physical models. To introduce these groups, for simplicity, in both this section and the next, we will focus on how groups transform the sample space. Later, in Sections 6.3 and 6.5, we will use the same concepts for the full general theory of statistical invariance and generalized statistical invariance.

Each ωΩ\omega\in\Omega has a vector of measurable attributes X=X(ω)𝒳X=X(\omega)\in{\cal X}:

X={X}[X]=(X1Xp).X=\{X\}[X]=\left(\begin{array}[]{c}X_{1}\\ \vdots\\ X_{p}\\ \end{array}\right).

A sample of ω\omega’s is to be drawn according to a probability distribution PP on Ω\Omega, with PP inducing a probability distribution on XX. Buckingham’s theory (see Subsection 5.1) aims at relating ω\omega’s attributes through a model like that in Equation (5.4). Our extension of that theory below will be stochastic in nature and assign XpX_{p} the special role of predictand.

A sample of size nn yields a sample of observations XijX_{ij} represented by 𝐗p×n𝒳n{\bf X}^{p\times n}\in{\cal X}^{n}. The statistical invariance principle posits that randomized statistical decision rules that determine actions should be invariant under 1:11:1 transformations by members gg of an algebraic group of transformations GG. That is, any pair of points 𝐱,𝐱𝒳n{\bf x},{\bf x}^{\prime}\in{\cal X}^{n} are considered equivalent for statistical inference if and only if 𝐱=g(𝐱){\bf x}=g({\bf x}^{\prime}) for some gGg\in G. This equivalence is denoted 𝐱𝐱{\bf x}\sim{\bf x}^{\prime}. By definition, the equivalence classes formed by GG are disjoint and exhaustive so we can index them by a parameter γΓ\gamma\in\Gamma and let 𝒳γn,γΓ{\cal X}^{n}_{\gamma},\leavevmode\nobreak\ \gamma\in\Gamma, represent an equivalence class. The 𝒳γn,γΓ{\cal X}^{n}_{\gamma},\leavevmode\nobreak\ \gamma\in\Gamma, are referred to as orbits, which could be indexed by a set of points, {𝐱γ,γΓ}\{{\bf x}_{\gamma},\leavevmode\nobreak\ \gamma\in\Gamma\}. If the set of points satisfies some regularity conditions, then it is called a cross-section, denoted 𝒳n/G{\cal X}^{n}/G, and its existence is studied by Wijsman (1967). Assuming a cross-section does exist, we may write

𝒳n=G×𝒳n/G.{\cal X}^{n}=G\times{\cal X}^{n}/G.

In other words, any point 𝐱𝒳n{\bf x}\in{\cal X}^{n} is represented by (g,𝐱γ)(g,{\bf x}_{\gamma}) for appropriately chosen gg and 𝐱γ{\bf x}_{\gamma}.

The statistical invariance principle states that a statistical decision rule must be invariant, that is, the rule must take the same value for all points in a single orbit γ\gamma. Maximal invariant functions play a special role in statistics. The function MM is invariant if its value is constant on each orbit. Further, MM is a maximal invariant if it takes different values on each orbit.

The following example shows the statistical invariance principle in action.

Example 11.

A hard-to-make, short-lived product has an exponentially distributed random time XhrX\leavevmode\nobreak\ hr to failure. A process-capability-analysis led to a published value of λ0\lambda_{0} for that product’s average time-to-failure. The need to assure that standard is valid has led to a periodic sample of size n=2n=2 resulting in a sample vector 𝐗=(X1,X2){\bf X}=(X_{1},X_{2}). To make inference about λ\lambda, the expected value of XX, following Remark 1, the analyst relies on the relative likelihood, i.e., ignoring irrelevant quantities

L~(λ)L(λ)L(λ0)=2[ln(λ~)+x¯~λ~1]\tilde{L}(\lambda)\doteq\frac{L(\lambda)}{L(\lambda_{0})}=-2[\ln{\tilde{\lambda}}+\tilde{\bar{x}}\tilde{\lambda}^{-1}] (6.1)

where in general for any quantity uu with the same units as λ0\lambda_{0}, u~=u/λ0\tilde{u}=u/\lambda_{0} is unitless. Differentiating the relative likelihood in Equation (6.1) yields the maximum likelihood (MLE)

λ~^MLE=x¯~.\hat{\tilde{\lambda}}_{MLE}=\tilde{\bar{x}}.

Using the relative likelihood thus leads to any change in λ\lambda relative to the published value being expressed by their ratio as mandated by their lying on a ratio scale. The same is true of relative change estimated by the MLE.

The group G={gc,c>0}G=\{g_{c},\leavevmode\nobreak\ c>0\} transforms any realization 𝐗=𝐱{\bf X}={\bf x} as follows:

gc(𝐱)=(cx1,cx2).g_{c}({\bf x})=(c\leavevmode\nobreak\ x_{1},c\leavevmode\nobreak\ x_{2}).

As a maximal invariant i.e. π\pi-function we may take,

π=M(x1,x2)=(x1/x,x2/x)=M(x~1,x~2),\pi=M(x_{1},x_{2})=(x_{1}/x_{\cdot},x_{2}/x_{\cdot})=M(\tilde{x}_{1},\tilde{x}_{2}),

where x=x1+x2x_{\cdot}=x_{1}+x_{2}. The range of MM in (,)2(-\infty,\infty)^{2} is given by

={(m1,m2):m2=1m1,m1,m2>0}.{\cal M}=\{(m_{1},m_{2}):m_{2}=1-m_{1},\leavevmode\nobreak\ m_{1},\leavevmode\nobreak\ m_{2}>0\}.

Points in {\cal M} index the orbits of the group GG. To locate a point 𝐱{\bf x} on the orbit, it entails taking 𝐦=(x1/x,x2/x){\bf{m}}=(x_{1}/x_{\cdot},x_{2}/x_{\cdot}) and applying the transformation gc,c=xg_{c},\leavevmode\nobreak\ c=x_{\cdot}, to 𝐦{\bf m}. Thus, the orbits created by GG are rays in the positive quadrant, emanating from, but not including, the point (0hr,0hr)(0\leavevmode\nobreak\ hr,0\leavevmode\nobreak\ hr). Thus 𝒳2{\cal X}^{2} is the union of these rays. Finally, we may let 𝒳2/G={\cal X}^{2}/G={\cal M}.

MM, as a π\pi function, plays a key role in developing the randomized (if necessary) statistical procedures that are invariant under transformations of 𝒳2{\cal X}^{2}. For example

λ~^MLE=x¯~=c×υ[M(x~1,x~2)]\hat{\tilde{\lambda}}_{MLE}=\tilde{\bar{x}}=c\times\upsilon[M(\tilde{x}_{1},\tilde{x}_{2})]

where c=x~c=\tilde{x}_{\cdot} and υ[M(x~1,x~2)]1/2\upsilon[M(\tilde{x}_{1},\tilde{x}_{2})]\equiv 1/2. But better choices of υ\upsilon may be dictated by the manufacturer’s loss function.

Note that here cc, υ\upsilon, and MM are all unitless.

Invariance of statistical procedures under the action of transformation groups may be a necessity of modelling. For instance, consider the extension of Newton’s second law (Example 1) to the case of vector fields where velocity replaces speed and direction now plays a role. The statistical model for this extension may need to be invariant under changes of direction. In other cases, invariance may be required under permutations and monotone transformations. So in summary transformation groups may play an important role in both scientific and statistical modelling.

6.2 Nondimensionalization

This section presents a novel feature of this paper, the need for dimensional consistency combined with the nondimensionalization principle, that no model should depend on the units in which the data have been measured. Of particular note is the comparison of the strict application of Buckingham’s π\pi-theory as described in Shen and Lin (2019) (Approach 1) and the one we are proposing (Approach 2). The comparison is best described in terms of an hypothetical example.

Example 12.

In a study of the magnitude of rainfall, the primary (repeating) variables are X1{X}_{1} and X2{X}_{2}, denoting the depth of the rain collected in a standardized cylinder and the duration of the rainfall, respectively. The third quantity X3X_{3} represents the magnitude of the rainfall as measured by an electronic sensor that computes a weighted average of X1X_{1} as a process over the continuous time period ending at time X2X_{2}. The dimensions of the three measurable quantities are [X1]=L[X_{1}]=L, [X2]=T[X_{2}]=T and [X3]=LT1[X_{3}]=LT^{-1}, which is secondary. Thus the attribute-vector is the column vector

X=(X1X2X3).X=\left(\begin{array}[]{c}X_{1}\\ X_{2}\\ X_{3}\end{array}\right).

The attribute-space is all possible values of X𝒳X\in{\cal X}.

The scales and units of measurement are selected by the investigators. These could be changed by an arbitrary linear transformation

X𝐂3×3X=[diag{c1,c2,c3}]X,ci>0.{X}\rightarrow{\bf C}^{3\times 3}{X}=[\mbox{diag}\{c_{1},c_{2},c_{3}\}]\leavevmode\nobreak\ X,\quad c_{i}>0. (6.2)

But the dimension of X3X_{3} ([L]/[T])([L]/[T]) is related to those of X1X_{1} ([L][L]), and X2X_{2} ([T][T]). This relationship must be taken into account when the scales of these dimensions are selected with their associated units of measurement.

To begin, an experiment is to be performed twice and all three attributes measured each time. The result will be a 3×23\times 2 dimensional matrix

𝐗=(X11X12X21X22X31X32).{\bf X}=\left(\begin{array}[]{ccc}X_{11}&X_{12}\\ X_{21}&X_{22}\\ X_{31}&X_{32}\\ \end{array}\right).

Thus, the sample space 𝒳2{\cal X}^{2} will be the set of all possible realizations of 𝐗{\bf X}.

Now a predictive model is to be constructed in accordance with Buckingham’s desideratum that the model should not depend on the measurement system the experimenters happen to choose. Furthermore, dimensional consistency dictates that any changes in the measurements must be consistently applied to all the attributes. More precisely the scales of measurement would require that c3=c1/c2c_{3}=c_{1}/c_{2} in the transformation matrix of Equation (6.2).

Approach 1 focuses on XX, not 𝐗{\bf{X}}, and is based on considering length and time as fundamental quantities, the primary attributes are X1X_{1} and X2X_{2}, with respective dimensions, LL and TT. The predictand, X3X_{3}, must be nondimensionalized as a Buckingham π\pi-function. Thus, we get

π3=X2X3X1\pi_{3}=\frac{X_{2}X_{3}}{X_{1}}

for each one of the two sampled vectors. The primary variables are labelled π1=π2=1\pi_{1}=\pi_{2}=1. We define the nondimensionalized attributes vector as

π(𝐗)=[1111X21X31/X11X22X32/X12].\pi({\bf{X}})=\left[\begin{array}[]{c c c}1&\nobreak\leavevmode\nobreak\leavevmode\nobreak\leavevmode\hfil&1\\ 1&\nobreak\leavevmode\nobreak\leavevmode\nobreak\leavevmode\hfil&1\\ X_{21}X_{31}/X_{11}&&X_{22}X_{32}/{X_{12}}\end{array}\right].

In other words, in Buckingham’s theory, the function that expresses the relationship among the variables for each of the two samples is

π3j=u(π1j,π2j)K\pi_{3j}=u^{*}(\pi_{1j},\pi_{2j})\equiv K

But the right hand side of this equation is a constant, which is not unreasonable since Buckingham’s model was intended to be deterministic. To deal with that issue we might adopt the ad hoc solution proposed by Shen in a different example (Shen, 2015, p. 17) by introducing a further variable, namely a model error ϵ\epsilon:

π3j=Kexp(ϵ3j).\pi_{3j}=K\exp{\epsilon_{3j}}. (6.3)

Taking logarithms and fitting the resulting model yields an estimate K^\hat{K} for KK.

In predictive form, for a future ω\omega, without an electronic sensor for measuring rainfall, Equation (6.3) yields, after estimating KK,

X3f=K^X1fX2fX_{3f}=\hat{K}\frac{X_{1f}}{X_{2f}}

where X1fX_{1f} and X2fX_{2f} are the depth and duration measurements. On the other hand, there are technical advantages to ignoring units of measurements as is commonly done in developing and validating statistical models, as noted by the anonymous reviewer quoted in Section 1. In that case we would obtain

{X3f}=K^{X1f}{X2f}.\{X_{3f}\}=\hat{K}\frac{\{X_{1f}\}}{\{X_{2f}\}}.
Remark 4.

A more formal approach would include the model error X0X_{0} in a nondimensional form so that π0=X0\pi_{0}=X_{0}. Going through the steps above yields

π3j=u(π0j,π1j,π2j),j=1,2.\pi_{3j}=u^{*}(\pi_{0j},\pi_{1j},\pi_{2j}),\leavevmode\nobreak\ j=1,2.

In contrast to Approach 1, our approach, Approach 2, treats the sample holistically, considering the whole data matrix 𝐗{\bf{X}}, not just XX. We nondimensionalize the problem by choosing as the primary variables X1jX_{1j} and X2jX_{2j}, j=1,2j=1,2, although other choices are available. Let X^i=(Xi1Xi2)1/2\hat{X}_{i}=(X_{i1}X_{i2})^{1/2}. We then form the π\pi-functions

π1j=X1jX^1,π2j=X2jX^2,π3j=X3jX^2X^1,j=1,2.\pi_{1j}=\frac{X_{1j}}{\hat{X}_{1}},\quad\pi_{2j}=\frac{X_{2j}}{\hat{X}_{2}},\quad\pi_{3j}=\frac{X_{3j}\hat{X}_{2}}{\hat{X}_{1}},\quad j=1,2.

Then for each of the two samples we obtain

π3j=u(π1j,π2j),j=1,2.\pi_{3j}=u^{*}(\pi_{1j},\pi_{2j}),\leavevmode\nobreak\ j=1,2.

In predictive form this result becomes

X3j=X^1X^2u(π1j,π2j),j=1,2.X_{3j}=\frac{\hat{X}_{1}}{\hat{X}_{2}}u^{*}(\pi_{1j},\pi_{2j}),\leavevmode\nobreak\ j=1,2.

Suppose we take

u(π1j,π2j)=Kπ1jπ2j,j=1,2u^{*}(\pi_{1j},\pi_{2j})=K\frac{\pi_{1j}}{\pi_{2j}},\leavevmode\nobreak\ j=1,2

for some positive KK. Then

X3j=X^1X^2u(π1j,π2j)=X^1X^2(Kπ1jπ2j)=KX1jX2j.X_{3j}=\frac{\hat{X}_{1}}{\hat{X}_{2}}u^{*}(\pi_{1j},\pi_{2}j)=\frac{\hat{X}_{1}}{\hat{X}_{2}}\left(K\frac{\pi_{1j}}{\pi_{2j}}\right)=K\frac{X_{1j}}{X_{2j}}.\\

From the last result we obtain the model of Shen and Lin (2019)

π3j=K,j=1,2.\pi_{3j}=K,\leavevmode\nobreak\ j=1,2.

However, the final choice for uu^{*} could be dictated by an analysis of the data, an advantage of our holistic approach.

Finally we summarize our choice of π\pi-functions as a maximal invariant

M(𝐗)=(π11π12π21π22π31π32).M({\bf X})=\left(\begin{array}[]{cc}\pi_{11}&\pi_{12}\\ \pi_{21}&\pi_{22}\\ \pi_{31}&\pi_{32}\\ \end{array}\right).
Remark 5.

This example shows that Approach 2 can yield the same model as Approach 1 even though Approach 1 is designed for a single 3×13\times 1 dimensional attribute vector unlike Approach 2, which starts with the entire 3×n3\times n sample matrix (with n=2n=2). This phenomenon will be investigated in future work.

Following Example 11, we can formalize the creation of orbits, π\pi-functions and so on in terms of a transformation group G={g=gc1,c2,c1/c2:ci>0[ci],i=1,2}G=\{g=g_{c_{1},c_{2},c_{1}/c_{2}}:\leavevmode\nobreak\ c_{i}>0\leavevmode\nobreak\ [c_{i}],\leavevmode\nobreak\ i=1,2\} acting on the attribute-vector XX. A subgroup GG^{*} obtains by restricting c3=c1/c2c_{3}=c_{1}/c_{2}.

We are now prepared to move to the general case and a generalization of the concepts seen in this example.

6.3 Invariant statistical models

This subsection builds on Subsections 6.1 and 6.2 to obtain a generalized version of Buckingham’s Pi-theory. This means transforming the scales of each of the so-called primary attributes X1,,XqX_{1},\dots,X_{q}, which leads ineluctably to transforming the scales of the remaining, secondary attributes Xq+1,,XpX_{q+1},\dots,X_{p}. Models like that in Equation (5.2) must reflect that link. In this subsection, we extend Buckingham’s idea beyond changes of scale by considering the application of a transformation of the attribute-scales. Our approach assumes a sample of attribute vectors. When prediction is the ultimate goal of inference, as in Example (1), our inferential aim is to construct a model as expressed in Equation (5.4).

Sample space.

We consider a sample of nn possibly dependent attribute-vectors collected from a sample of ω\omega’s from the population Ω\Omega. The sample matrix, 𝐗p×n{\bf X}^{p\times n}, is partitioned to reflect the primary and secondary attributes as follows.

𝐗=(X11X1nXp1Xpn)(𝐗1q×n𝐗2(pq)×n)(𝐗1𝐗2).{\bf X}=\left(\begin{array}[]{ccc}X_{11}&\dots&X_{1n}\\ \vdots&\vdots&\vdots\\ X_{p1}&\dots&X_{pn}\\ \end{array}\right)\equiv\left(\begin{array}[]{c}{\bf X}_{1}^{q\times n}\\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\bf X}_{2}^{(p-q)\times n}\end{array}\right)\equiv\left(\begin{array}[]{c}{\bf X}_{1}\\ {\bf X}_{2}\end{array}\right). (6.4)

Let 𝒳j{\cal X}_{j} denote all possible values of 𝐗j{\bf{X}}_{j}, j=1,2j=1,2. We define a group GG^{*} of transformations on 𝒳1×𝒳2{\cal X}_{1}\times{\cal X}_{2} through the following theorem. Each transformation is first defined on 𝒳1{\cal X}_{1}, with an extension to 𝒳2{\cal X}_{2} that yields unit consistency.

Theorem 2.

Let G1G_{1} be a group of transformations on 𝒳1{\cal X}_{1} with identity element e1e_{1}. Assume the following.

  1. 1.

    There exists a function HH defined on 𝒳1×𝒳2{\cal X}_{1}\times{\cal X}_{2} so that H(𝐗1,𝐗2)H({\bf X}_{1},{\bf X}_{2}) is always unitless.

  2. 2.

    For each gG1g\in G_{1}, there exists a g~g:𝐗2𝐗2\tilde{g}_{g}:{\bf X}_{2}\to{\bf X}_{2} with H(g(𝐗1),g~g(𝐗2))=H(𝐗1,𝐗2)H(g({\bf X}_{1}),\tilde{g}_{g}({\bf X}_{2}))=H({\bf X}_{1},{\bf X}_{2}) for all 𝐗1𝒳1{\bf X}_{1}\in{\cal X}_{1} and 𝐗2𝒳2{\bf X}_{2}\in{\cal X}_{2}.

  3. 3.

    For all 𝐗2𝒳2{\bf X}_{2}\in{\cal X}_{2}, g~e1(𝐗2)=𝐗𝟐\tilde{g}_{e_{1}}({\bf X}_{2})={\bf X_{2}}.

  4. 4.

    For all g1,g2G1g_{1},g_{2}\in G_{1}, g~g1g2=g~g1g~g2\tilde{g}_{g_{1}\circ g_{2}}=\tilde{g}_{g_{1}}\circ\tilde{g}_{g_{2}}.

Let GG^{*} be the set of all transformations from 𝒳1×𝒳2{\cal X}_{1}\times{\cal X}_{2} to 𝒳1×𝒳2{\cal X}_{1}\times{\cal X}_{2} of the form

g(𝐗1,𝐗2)=(g(𝐗1),g~g(𝐗2)),gG1.g^{*}({\bf X}_{1},{\bf X}_{2})=\left(g({\bf X}_{1}),\tilde{g}_{g}({\bf X}_{2})\right),\quad g\in G_{1}.

Then GG^{*} is a group under composition.

Proof.

To show that GG^{*} is closed under composition, let g1g_{1}^{*} and g2g_{2}^{*}, both in GG^{*}, be associated with, respectively, g1g_{1}and g2g_{2}, both in G1G_{1}. Then

(g1g2)(𝐗1,𝐗2)\displaystyle(g_{1}^{*}\circ g_{2}^{*})({\bf X}_{1},{\bf X}_{2}) =\displaystyle= g1(g2(𝐗1,𝐗2))=g1(g2(X1),g~g2(X2))\displaystyle g_{1}^{*}\left(g_{2}^{*}({\bf X}_{1},{\bf X}_{2})\right)=g_{1}^{*}\left(g_{2}(X_{1}),\tilde{g}_{g_{2}}(X_{2})\right)
=\displaystyle= (g1(g2(X1)),g~g1(g~g2(X2)))=((g1g2)(𝐗1),(g~g1g~g2)(𝐗2))\displaystyle\left(g_{1}(g_{2}(X_{1})),\tilde{g}_{g_{1}}(\tilde{g}_{g_{2}}(X_{2}))\right)=\left((g_{1}\circ g_{2})({\bf X}_{1}),(\tilde{g}_{g_{1}}\circ\tilde{g}_{g_{2}})({\bf X}_{2})\right)
=\displaystyle= ((g1g2)(𝐗1),g~g1g2(𝐗2))\displaystyle\left((g_{1}\circ g_{2})({\bf X}_{1}),\tilde{g}_{g_{1}\circ g_{2}}({\bf X}_{2})\right)

by Assumption 4. So g1g2g_{1}^{*}\circ g_{2}^{*} is associated with g1g2G1g_{1}\circ g_{2}\in G_{1}. We easily see that H((g1g2)(X1),g~g1g2(X2)=H(X1,X2)H((g_{1}\circ g_{2})(X_{1}),\tilde{g}_{g_{1}\circ g_{2}}(X_{2})=H(X_{1},X_{2}), and so g1g2g_{1}^{*}\circ g_{2}^{*} is in GG^{*}. Clearly, the identity element of GG^{*} is given by e(𝐗1,𝐗2)(e1(𝐗1),g~e1(𝐗2))e({\bf X}_{1},{\bf X}_{2})\equiv(e_{1}({\bf X}_{1}),\tilde{g}_{e_{1}}({\bf X}_{2})), which equals (𝐗1,𝐗2)({\bf X}_{1},{\bf X}_{2}) by the definition of e1e_{1} and by Assumption 3. The inverse of gGg\in G^{*} is easily found: if g(𝐗1,𝐗2)=(g(𝐗1),g~g(𝐗2))g^{*}({\bf X}_{1},{\bf X}_{2})=(g({\bf X}_{1}),\tilde{g}_{g}({\bf X}_{2})), then (g)1(𝐗1,𝐗2)=(g1(𝐗1),g~g1(𝐗2))(g^{*})^{-1}({\bf X}_{1},{\bf X}_{2})=(g^{-1}({\bf X}_{1}),\tilde{g}_{g^{-1}}({\bf X}_{2})). ∎

Illustrating the Theorem via Example 11, we have

𝐗1=(X11X12X21X22)and𝐗2=(X31X32).{\bf X}_{1}=\left(\begin{array}[]{cc}X_{11}\nobreak\leavevmode&\leavevmode\nobreak\ X_{12}\\ X_{21}\nobreak\leavevmode&\leavevmode\nobreak\ X_{22}\end{array}\right)\quad{\rm{and}}\quad{\bf X}_{2}=\left(\begin{array}[]{cc}X_{31}\nobreak\leavevmode&\leavevmode\nobreak\ X_{32}\\ \end{array}\right).

G1G_{1} has members

gc1,c2(𝐗1)=(c1X11c1X12c2X21c2X22).g_{c_{1},c_{2}}({\bf X}_{1})=\left(\begin{array}[]{cc}c_{1}X_{11}\nobreak\leavevmode&\leavevmode\nobreak\ c_{1}X_{12}\\ c_{2}X_{21}\nobreak\leavevmode&\leavevmode\nobreak\ c_{2}X_{22}\end{array}\right).

One choice for the function HH is

H(𝐗1,𝐗2)=(X31X21/X11,X32X22/X12).H({\bf X}_{1},{\bf X}_{2})=\left(X_{31}X_{21}/X_{11},\leavevmode\nobreak\ X_{32}X_{22}/X_{12}\right).

For each gc1,c2G1g_{c_{1},c_{2}}\in G_{1}, we see that g~gc1,c2(𝐗2)=(c1/c2)𝐗𝟐\tilde{g}_{g_{c_{1},c_{2}}}({\bf X}_{2})=(c_{1}/c_{2}){\bf X_{2}}. We also see that g~e1(𝐗2)=𝐗2.\tilde{g}_{e_{1}}({\bf X}_{2})={\bf X}_{2}. The set GG^{*} consists of transformations of the form

gc1,c2(𝐗1,𝐗2)=(c1X11c1X12c2X21c2X22(c1/c2)X31(c1/c2)X32).g^{*}_{c_{1},c_{2}}({\bf X}_{1},{\bf X}_{2})=\left(\begin{array}[]{cc}c_{1}X_{11}\nobreak\leavevmode&\leavevmode\nobreak\ c_{1}X_{12}\\ c_{2}X_{21}\nobreak\leavevmode&\leavevmode\nobreak\ c_{2}X_{22}\\ (c_{1}/c_{2})\leavevmode\nobreak\ X_{31}\nobreak\leavevmode&\leavevmode\nobreak\ (c_{1}/c_{2})\leavevmode\nobreak\ X_{32}\end{array}\right).

We easily see that g~g1g2=g~g1g~g2\tilde{g}_{g_{1}\circ g_{2}}=\tilde{g}_{g_{1}}\circ\tilde{g}_{g_{2}}. Therefore, GG^{*} is a group.


Thus, by the Theorem, given a group of transformations G1G_{1} on the primary attributes in the sample, we can construct a group GG^{*} of transformations on all attributes and we can write

𝒳1×𝒳2=G×(𝒳1×𝒳2)/G.{\cal X}_{1}\times{\cal X}_{2}=G^{*}\times({\cal X}_{1}\times{\cal X}_{2})/G^{*}.

Orbits will be indexed by γΓ\gamma\in\Gamma and π(𝐗)\pi({\bf X}) will denote a maximal invariant under the action of GG^{*}. Let

π(𝐗)=(πij(𝐗))p×n.\pi({\bf X})=(\pi_{ij}({\bf X}))^{p\times n}.

Therefore, by the statistical invariance principle, acceptable randomized decision rules, which include equivariant estimators as a special case, depend on 𝐗{\bf X} only through the maximal invariant. We obtain as a special case the Buckingham π\pi-functions as a special case where, in particular, the attributes are assessed on ratio-scales. Note that the π\pi-functions obtained in this way are not unique.

The maximal invariant’s distribution.

Suppose that the distribution of 𝐗{\bf X} is in the collection of probability distributions, 𝒫={Pn,𝝀,𝝀Λ}{\cal P}=\{P_{n,\bm{\lambda}},\leavevmode\nobreak\ \bm{\lambda}\in{\Lambda}\}. Assume, for all gGg\in G^{*}, the distribution of g(𝐗)g({\bf X}) is also contained in 𝒫{\cal P}. More precisely assume that for each gGg\in G^{*}, there is a one-to-one transformation g¯\bar{g} of Λ{\Lambda} onto Λ{\Lambda} such that 𝐗{\bf X} has distribution Pn,𝝀P_{n,\bm{\lambda}} if and only if g(𝐗)g({\bf X}) has distribution Pn,g¯(𝝀)P_{n,\bar{g}(\bm{\lambda})}. Assume further that the set G¯\bar{G}^{*} of all g¯\bar{g} is a transformation group under composition, with identity e¯\bar{e}. Assume also that G¯\bar{G}^{*} is homomorphic to GG^{*}, i.e. that there exists a one-to-one mapping hh from GG^{*} onto G¯\bar{G}^{*} such that, for all g,gGg,g^{*}\in G^{*}, h(gg)=h(g)h(g)h(g\circ g^{*})=h(g)\circ h(g^{*}); h(e)=e¯h(e)=\bar{e}, and h(g1)={h(g)}1h(g^{-1})=\{h(g)\}^{-1}.

Let π1\pi^{-1} denote the set inverse, that is, π1(C)={X𝒳n\pi^{-1}(C)=\{{\textbf{X}}\in{\cal X}^{n} : π(X)C}\pi({\textbf{X}})\in C\}. Then since π(X)=π(g(X))\pi({\textbf{X}})=\pi(g({\textbf{X}})) for any gGg\in G^{*}, for all gGg\in G^{*} and 𝝀Λ\bm{\lambda}\in{\Lambda},

Pn,𝝀[π(X)B]\displaystyle P_{n,\bm{\lambda}}[\pi({\textbf{X}})\in B] =\displaystyle= Pn,𝝀[π(g(X))B]\displaystyle P_{n,\bm{\lambda}}[\pi(g({\textbf{X}}))\in B]
=\displaystyle= Pn,𝝀[g(X)π1(B)]\displaystyle P_{n,\bm{\lambda}}[g(\textbf{X})\in\pi^{-1}(B)]
=\displaystyle= Pn,g¯(𝝀)[Xπ1(B)]\displaystyle P_{n,\bar{g}(\bm{\lambda})}[\textbf{X}\in\pi^{-1}(B)]
=\displaystyle= Pn,g¯(𝝀)[π(X)B].\displaystyle P_{n,\bar{g}(\bm{\lambda})}[\pi({\textbf{X}})\in B].

Thus, any 𝝀\bm{\lambda}^{*} “connected to” 𝝀{\bm{\lambda}} via some g¯G0\bar{g}\in G_{0} induces the same distribution on π(X)\pi({\textbf{X}}). This implies that υ(𝝀)P𝝀[π(X)B]\upsilon(\bm{\lambda})\doteq P_{\bm{\lambda}}[\pi({\textbf{X}})\in B] is invariant under transformations in G¯\bar{G}^{*} and hence that υ(𝝀)\upsilon(\bm{\lambda}) depends on 𝝀\bm{\lambda} only through a maximal invariant on Λ\Lambda. We denote that maximal invariant by 𝝅𝝀\bm{\pi}_{\bm{\lambda}}. Finally we relabel the distribution of π(X)\pi({\textbf{X}}) under 𝝀{\bm{\lambda}} (and under all of the associated 𝝀\bm{\lambda}^{*}’s by P𝝅𝝀P_{\bm{\pi}_{\bm{\lambda}}}.

The actions of the group GG^{*} have nondimensionalized 𝐗{\bf X} as 𝐗π(X){\bf X}\rightarrow\pi(\textbf{X}). Thus we obtain a stochastic version of the Pi-theorem. More precisely using the general notation [𝐔][{\bf U}] to represent “the distribution of” for any random object 𝐔{\bf U} we have a nondimensionalized conditional distribution of the nondimensionalized predictand from sample jj given the transformed predictors of all samples as

[πpjπ1:(p1),1:n,𝝅𝝀].[\pi_{pj}\mid\pi_{1:(p-1),1:n},\leavevmode\nobreak\ \bm{\pi}_{\bm{\lambda}}]. (6.5)

More specifically, we have derived the result seen in Equation (6.5), which is the conditional distribution assumed in a special case by Shen (2015) in his Assumption 2. Furthermore we predict for XpX_{p} by its conditional expectation, using the distribution in equation (6.5), which can be derived once the joint distribution of the attributes has been specified. The conditional variance would express the predictor’s uncertainty. Hence statistical invariance implies that information about the variables can be summarized by maximal invariants in the sample space and in the parameter space.

6.4 interval-scales

Returning to Equation (5.2), recall that underlying the Buckingham Pi-theorem are pp variables that together describe a natural phenomenon through the relationship expressed in that equation. The Pi-theorem assumes that qq of these variables are designated as the repeating or primary variables, while the remainder, which are secondary, have scales of measurement that involve the dimensions of the primary variables. It is the latter that are converted to the π\pi-functions in the Buckingham theorem. But as we have seen in Subsection 6.3, it is these same variables that together yield the maximal invariant under the actions of a suitably chosen group, which was fairly easily identified in the case of ratio-scales.

Subsection 6.3 provides the bridge between the statistical invariance principle and the deterministic modeling theories described in Section 5 (i.e., the deterministic modeling frameworks developed in the physical sciences where ratio–scales are appropriate). Appendix C develops a similar bridge with such models in the social sciences where quantities on interval-scales are involved. For such quantities, allowable transformations extend from simple scale transformations to affine transformations. Examples of such quantities can be found in Kovera (2010): the Intelligence Quotient (IQ), Scholastic Assessment Test (SAT), Graduate Record Examination (GRE), Graduate Management Admission Test (GMAT), and Miller Analogies Test (MAT). Models for such quantities might involve variables measured on a ratio–scale as well. Since much of the development parallels that in Subsection 6.3, we omit a lot of the details for brevity and those that we do provide are in Appendix C.

6.5 Extending invariance to random effects and the Bayesian paradigm

This section extends the previous sections to incorporate random effects and the Bayesian paradigm. Its foundations lie in statistical decision theory as sketched in Appendix D. Here the action that is a component of decision theory is prediction based on a prediction model as in Equation (5.4). A training sample of nn attribute vectors of length pp provides data for building the prediction model. Thus the predictors and predictand are observed for each of nn sampled ω\omega’s to yield the random sample’s p×np\times n matrix 𝐗{\bf X} seen in Equation (6.4), which we denote 𝐗training{\bf X}^{training}. Given a future (n+1)n+1)st attribute pp-vector X𝑓𝑢𝑡𝑢𝑟𝑒X^{\rm{\it{future}}}, the goal is the prediction of its ppth component, Xp𝑓𝑢𝑡𝑢𝑟𝑒X^{\rm{\it{future}}}_{p}, based on observations of its first p1p-1 components X1:(p1)𝑓𝑢𝑡𝑢𝑟𝑒X^{\rm{\it{future}}}_{1:(p-1)}, all within a Bayesian framework with an appropriate extension of the framework presented in earlier sections. The situation is the one confronting the analyst who must fit a regression model based on nn data points and then predict a response given only the future predictors. We let 𝐗sample{\bf{X}}^{sample} denote the current data matrix 𝐗training{\bf{X}}^{training} and the future data vector X𝑓𝑢𝑡𝑢𝑟𝑒X^{\rm{\it{future}}}, with 𝐗sample{\bf{X}}^{sample}, a p×(n+1)p\times(n+1) matrix in 𝒳{\cal{X}}.

The sampling distribution of 𝐗sample{\bf X}^{sample} is determined conditional on the random parameters 𝝀Λ\bm{\lambda}\in\Lambda. That means specifying 𝝀\bm{\lambda}’s prior distribution, which in turn is conditional on the set of (specified) hyperparameters ϕΦ\bm{\phi}\in\Phi.

To extend the invariance principle requires, in addition to the structures described above, an action space, 𝒜{\cal A}, that is the space of possible future predictions of the missing observation, a prior distribution on the parameter space and a loss function, which remain to be specified. We also require the specified transformation groups for 𝒳{\cal X} and Λ\Lambda, in addition to the transformation groups for 𝒜{\cal A} and Φ\Phi. In summary, we have the homomorphically related transformation groups G,G¯,G^G^{*},\bar{G},\hat{G} and G~\tilde{G} acting on, respectively, 𝒳,Λ,𝒜{\cal X},\Lambda,{\cal A} and Φ\Phi. The extended invariance principle then reduces points in these four spaces to their maximal invariants i.e. π\pi-functions, that can be used to index the orbits induced by their respective groups. Assuming a convex loss function, the Bayes predictor in this reduced problem is a nonrandomized decision rule leading to an action in 𝒜{\cal A}. Each of the spaces 𝒳,Λ,𝒜,Φ{\cal X},\Lambda,{\cal A},\Phi can (subject to regularity conditions) be represented in the form

W=Hg×W/HgW=H_{g}\times W/H_{g}

for the appropriate transformation group HgH_{g} (Zidek, 1969). The corresponding maximal invariants can be expressed as matrices:

πsample,πparameter,πactionandπhyperparameter.\pi^{{sample}},\leavevmode\nobreak\ \pi^{{parameter}},\leavevmode\nobreak\ \pi^{{action}}\leavevmode\nobreak\ {\rm{and}}\leavevmode\nobreak\ \pi^{{hyperparameter}}.

Finally using square brackets to represent the distributions involved, we get the predictive distribution of interest conditional on quantities we know:

[πp,n+1sampleπ1:(p1),n+1sample,π1:p,1:nsample,πhyperparameter].[\pi_{p,n+1}^{sample}\leavevmode\nobreak\ \mid\leavevmode\nobreak\ \pi_{1:(p-1),n+1}^{sample},\leavevmode\nobreak\ \leavevmode\nobreak\ \pi_{1:p,1:n}^{sample},\leavevmode\nobreak\ \leavevmode\nobreak\ \pi^{hyperparameter}]. (6.6)

To fix ideas we sketch an application in the following example, where we take advantage of the sufficiency and ancillarity principles to simplify the construction of the principle.

Example 13.

Assume the vector of observable attributes, X5×1X^{5\times 1}, is normally distributed, conditional on the mean μ\mu and covariance matrix Σ\Sigma. We will sometimes parameterize Σ\Sigma in terms of the diagonal matrix of standard deviations σ=diag{σ1,,σ5}{\bf{\sigma}}=\mbox{diag}\{\sigma_{1},\dots,\sigma_{5}\} and the correlation matrix ρ{\bf{\rho}}, with Σ=σρσ\Sigma={\bf{\sigma}}{\bf{\rho}}{\bf{\sigma}}. Therefore, the parameters are 𝝀={μ,ρ,σ}\bm{\lambda}=\{\mu,\rho,\sigma\} and, conditional on 𝝀{\bm{\lambda}}, X5×1N5(λ,σρσ)X^{5\times 1}\sim N_{5}(\lambda,\sigma{\rho}\sigma). In practice, X5X_{5}, the fifth of these observable attributes is difficult to assess, leading to the idea of making X5X_{5} a predictand and the remaining four attributes predictors. All attributes lie on an interval-scale so a conventional approach would be seem to be straightforward: multivariate regression analysis. Simply collect a training sample of nn independent vectors and fit a regression model for the intended purpose

Complications arise due to the varying dimensions on which these attributes are to be measured. That in turn leads to different scales and different units of measurement, depending on how they are to be measured. That would not pose a problem for the unconscious statistician who might simply ignore the units. A better approach would be that suggested in Farawell (Faraway (2015), p.103), namely rescale the measurements in a thoughtful way to eliminate those units (see also Section 7). However, neither of those approaches deals with the rigid structural issue imposed by the need for dimensional consistency. That is, the units of measurement for the XiX_{i}’s are respectively ui,i=1,,5u_{i},\leavevmode\nobreak\ i=1,...,5, with u4u_{4} and u5u_{5} constrained to be u2u11u_{2}u_{1}^{-1} and u3u21u_{3}u_{2}^{-1}, respectively. To overcome the problem, Buckingham’s Pi-theorem suggests itself. Thus we might use X1:3X_{1:3} as primary variables to nondimensionalize X4:5X_{4:5}. But that does not work either since our variables lie on interval scales with 0’s as conceptually possible values in the appropriate units. That is, π\pi-functions as simple ratios of these variables, cannot be constructed directly to be used to nondimensionalize the attribute measurements. So ultimately we turn to the statistical invariance principle to solve the problem. The relevant transformation groups are described in what follows.

The first step creates the training set of nn random vectors of attribute measurements, recorded in the 5×n5\times n matrix 𝐗training{\bf{X}}^{training}. Letting xjx_{\cdot j} denote the jjth column of a realization of 𝐗training{\bf{X}}^{training}, x¯=j=1nxj\bar{x}=\sum_{j=1}^{n}x_{\cdot j} and 𝐒=j=1n(xjx¯)(xjx¯)T{\bf S}=\sum_{j=1}^{n}(x_{\cdot j}-\bar{x})(x_{\cdot j}-\bar{x})^{T}, the sample sum of squares, the likelihood function is

L(𝝀)\displaystyle L(\bm{\lambda}) \displaystyle\propto Σn/2exp(j=1n(x,jμ)TΣ1(x,jμ)/2)\displaystyle\mid\Sigma\mid^{-n/2}\exp\left(-\sum_{j=1}^{n}(x_{\cdot,j}-\mu)^{T}\Sigma^{-1}(x_{\cdot,j}-\mu)/2\right)
=\displaystyle= Σn/2exp(tr(Σ1)𝐒/2)\displaystyle\mid\Sigma\mid^{-n/2}\exp{-tr\leavevmode\nobreak\ (\Sigma^{-1})\leavevmode\nobreak\ {\bf S}/2}
×exp((μx¯)T(Σ/n)1(μx¯)T/2).)\displaystyle\times\exp{-(\mu-\bar{x})^{T}(\Sigma/n)^{-1}(\mu-\bar{x})^{T}/2).}

Conditional on μ\mu and Σ\Sigma, we may invoke the sufficiency principle and replace the training matrix 𝐗training{\bf{X}}^{training} with its sufficient statistics

𝐗𝑠𝑢𝑓𝑓=(X¯5×1,𝐒5×5),{\bf X}^{\rm{\it{suff}}}=(\bar{X}^{5\times 1},\leavevmode\nobreak\ \leavevmode\nobreak\ {\bf S}^{5\times 5}), (6.7)

i.e., the matrix whose first column consists of the sample (row) means of 𝐗training{\bf X}^{training} and the last five columns contain 𝐒{\bf S}. Thus we may estimate the covariance by Σ^\hat{\Sigma}, factored as

Σ^=σ^ρ^σ^.\hat{\Sigma}=\hat{\sigma}\hat{\rho}\hat{\sigma}.

Here σ^\hat{\sigma} denotes the diagonal matrix of estimates of the population standard deviations of the five attributes. Furthermore ρ^\hat{\rho} denotes the estimate of the matrix of correlations between the random attributes. It is invariant under changes of scale and transformations of their origins. Furthermore, these quantities would be independent given the parameters.

Turning to the Bayesian layer, we will adopt a conjugate prior (Gelman et al., 2014) for the illustrative purpose of this example, with hyperparameters ϕ={μH,ΣH,B}\bm{\phi}=\{\mu_{H},\Sigma_{H},B\}:

[μΣ]\displaystyle[\leavevmode\nobreak\ \mu\leavevmode\nobreak\ \mid\leavevmode\nobreak\ \Sigma\leavevmode\nobreak\ ] =\displaystyle= Np(μH,Σ/B)\displaystyle N_{p}(\mu_{H},\Sigma/B) (6.8)
[Σ]\displaystyle[\leavevmode\nobreak\ \Sigma\leavevmode\nobreak\ ] =\displaystyle= Inv-WishartB((BΣH)1).\displaystyle Inv{\text{-}}Wishart_{B}((B\Sigma_{H})^{-1}). (6.9)

We specify the hyperparameters by equating prior knowledge with a hypothetical sample of ω\omega’s and their associated attribute vectors We will add a superscript hh on quantities below to indicate their hypothetical nature. Thus the hypothetical sample is of size nhn^{h} with a likelihood derived from a prior sample with p×nhp\times n^{h} matrix 𝐗h{\bf X}^{h}, with sample mean x¯h\bar{x}^{h} and sample sum of squares 𝐒h{\bf{S}}^{h}. Thus we obtain the hypothetical likelihood for μ\mu and Σ\Sigma given the independence of 𝐒h{\bf S}^{h} and x¯h\bar{x}^{h}:

Lnh(μ,Σ)\displaystyle L_{n^{h}}(\mu,\Sigma) \displaystyle\propto Σnh/2exp(tr(Σ1𝐒h/2))\displaystyle\mid\Sigma\mid^{-n^{h}/2}\exp{-tr\leavevmode\nobreak\ \bigg{(}\Sigma^{-1}{\bf S}^{h}/2\bigg{)}}
×exp((μx¯h)T(Σ/nh)1(μx¯h)/2).\displaystyle\times\exp{-(\mu-\bar{x}^{h})^{T}(\Sigma/n_{h})^{-1}(\mu-\bar{x}^{h})/2}.

Finally complement the hypothetical likelihood with a noninformative improper prior on Σ\Sigma with density Σ(d+1)/2\mid\Sigma\mid^{(d+1)/2} where d>pd>p and obtain the specification of the prior distribution. We take the hyperparameters for the prior distributions in Equations (6.8) and (6.9) to get μH=x¯h\mu_{H}=\bar{x}^{h}, ΣH=𝐒h/B\Sigma_{H}={\bf{S}}^{h}/B and B=nhB=n^{h}. That completes the construction of the prior.

For our prediction problem involving a future X𝑓𝑢𝑡𝑢𝑟𝑒X^{\rm{\it{future}}}, we will use the posterior distribution of μ\mu and Σ\Sigma based on the training data 𝐗training{\bf{X}}^{training} via the sufficient statistics 𝐗𝑠𝑢𝑓𝑓{\mathbf{X}}^{\rm{\it{suff}}}. To get the posterior distributions for μ\mu and Σ\Sigma entails taking the product of the prior density as determined above based on the hypothetical sample, with the actual likelihood.

This determines a posterior density of 𝝀{\bm{\lambda}}. Thus the covariance matrix associated with the posterior distribution of μ\mu would depend on both the sample’s sum of squares and the hypothetical sample’s sum of squares matrix as

𝐒posterior=𝐒+𝐒h+(x¯x¯h)(x¯x¯h)T1/n+1/nh.{\bf S}^{posterior}={\bf S}+{\bf S}^{h}+\frac{(\bar{x}-\bar{x}^{h})(\bar{x}-\bar{x}^{h})^{T}}{1/n+1/n^{h}}.

In other words, SposteriorS^{posterior} would replace ShS_{h} to get us from its inverted Wishart prior distribution of Σ\Sigma to its posteriori distribution. Moreover, its degrees of freedom would move to n+nhn+n_{h} to reflect the larger sample size. We omit further details since our primary interest lies in the reduced model obtained by applying the invariance principle, a reduction to which we now turn.

We now turn to the construction of the transformation groups needed to implement the invariance principle. All transformations of data derive from transforming an attribute vector XX as follows:

x𝐂(x+b)x\rightarrow{\bf C}\leavevmode\nobreak\ (x+b)

for any possible realization xx of XX, where the coordinates of b5×1b^{5\times 1} have the same units as the corresponding coordinates of xx, 𝐂=diag{c1,c2,c3,c4,c5}{\bf C}=\mbox{diag}\{c_{1},c_{2},c_{3},c_{4}^{*},c_{5}^{*}\} where c4=c2c11c_{4}^{*}=c_{2}c_{1}^{-1} and c5=c3c21c_{5}^{*}=c_{3}c_{2}^{-1} to ensure dimensional consistency. The same 𝐂{\bf{C}} and bb are used to transform all data vectors and so the transformation of the sufficiency-reduced matrix 𝐗𝑠𝑢𝑓𝑓{\bf{X}}^{\rm{\it{suff}}} in Equation (6.7) and XfutureX^{future} is as follows:

gb,𝐂(x¯,𝐬,x𝑓𝑢𝑡𝑢𝑟𝑒)=(𝐂(x¯+b),𝐂𝐬𝐂,𝐂(x𝑓𝑢𝑡𝑢𝑟𝑒+b)).g_{b,{\bf C}}(\bar{x},{\bf s},x^{\rm{\it{future}}})=\left({\bf C}\leavevmode\nobreak\ (\bar{x}+b),\leavevmode\nobreak\ {\bf C}\leavevmode\nobreak\ {\bf s}\leavevmode\nobreak\ {\bf C},\leavevmode\nobreak\ {\bf C}\leavevmode\nobreak\ (x^{\rm{\it{future}}}+b)\right). (6.10)

We can study orbits and maximal invariants by considering the transformation of 𝐗𝑠𝑢𝑓𝑓{\bf{X}}^{\rm{\it{suff}}} separately from the transformation of X𝑓𝑢𝑡𝑢𝑟𝑒X^{\rm{\it{future}}}.

Consider the decomposition of 𝐗𝑠𝑢𝑓𝑓{\bf{X}}^{\rm{\it{suff}}} into orbits. To index those orbits we first determine a maximal invariant. We do this in two ways, to make a point: first, we ignore dimensional consistency and then we include it. To begin, for both ways we set b=x¯b=-\bar{x} to transform x¯\bar{x} to 𝟎[x¯n]{\bf 0}\leavevmode\nobreak\ [\bar{x}_{n}], where [x¯n][\bar{x}_{n}] denotes the vector of units of x¯n\bar{x}_{n}. This means that all the points in an orbit can be reached from the new origin by choosing bb to be the sample average. Next we observe that we may estimate the population covariance for the attributes vector by 𝐬/n=Σ^=σ^ρ^σ^{\bf s}/n=\hat{\Sigma}=\hat{\sigma}\hat{\rho}\hat{\sigma}. So the matrix 𝐂{\bf C} in Equation (6.10), in effect, acts on the diagonal matrix σ^\hat{\sigma}. If no restrictions were placed on 𝐂{\bf C}’s fourth and fifth diagonal elements, then the orbits could simply be indexed by ρ^\hat{\rho}. In short the maximal invariant could be defined by M(x¯,𝐬)=(𝟎,ρ^)M(\bar{x},{\bf s})=({\bf 0},\hat{\rho}). But dimensional consistency does not allow that choice of 𝐂{\bf C}. Instead it demands the structural requirement that we use the c4c^{*}_{4} and c5c^{*}_{5} specified above. The modified transformation would then act on σ^\hat{\sigma} as follows:

σ^σ^=diag{1,1,1,σ^1σ^4σ^21,σ^2σ^5σ^31}.\hat{\sigma}\rightarrow\hat{\sigma}^{*}=\mbox{diag}\{1,1,1,\hat{\sigma}_{1}\hat{\sigma}_{4}\hat{\sigma}_{2}^{-1},\hat{\sigma}_{2}\hat{\sigma}_{5}\hat{\sigma}_{3}^{-1}\}.

The result would mean the changes in σ4\sigma_{4} and σ5\sigma_{5} would be cancelled out by the changes in the transformations of the first three σ\sigma’s. The maximal invariant would then be dimensionless as required. It would make the maximal invariant for the sufficiency-reduced sample space

π𝑠𝑢𝑓𝑓=(𝟎,σ^ρ^σ^),\pi^{\rm{\it{suff}}}=({\bf 0},\hat{\sigma}^{*}\hat{\rho}\hat{\sigma}^{*}), (6.11)

not (𝟎,ρ^)({\bf 0},\hat{\rho}).

Remark 6.

The Buckingham theory concerned attributes measured on a ratio-scale. Were that the case in this example, we could have used the primary and secondary attributes differently. More precisely we could have let ci=Xi1,i=1,2,3c_{i}=X_{i}^{-1},\leavevmode\nobreak\ i=1,2,3, c4=X1X4X21c_{4}=X_{1}X_{4}X_{2}^{-1} and c5=X2X5X31c_{5}=X_{2}X_{5}X_{3}^{-1}. The result would eliminate the first three variables while achieving the primary objective of non-dimensionalizing the model. We have achieved this goal using the standard deviations instead. But the method suggested here could also be used for scales other than interval, a subject of current research.

The corresponding maximal invariant in the parameter space for Λ={(μ,Σ)}\Lambda=\{(\mu,\Sigma)\} would be identical to that in Equation (6.11), albeit with the hats removed, to get

πparameter=(𝟎,σρσ).\pi^{parameter}=({\bf 0},\sigma^{*}\rho\sigma^{*}).

Observe that the ratios of the σ\sigma’s with and without hats would be unitless and hence ancillary quantities, thus independent of the sufficient statistics. Hence the maximal invariants can be constructed from them by Basu’s theorem (Basu, 1958). Finally for the hyperparameter space we would obtain the analogous result:

πhyperparameter=(𝟎,σ0ρ0σ0).\pi^{hyperparameter}=({\bf 0},\sigma_{0}^{*}\rho_{0}\sigma_{0}^{*}).

We can now compute the posterior distribution

[πparameterπ𝑠𝑢𝑓𝑓,πhyperparameter][\leavevmode\nobreak\ \pi^{parameter}\leavevmode\nobreak\ \mid\leavevmode\nobreak\ \pi^{\rm{\it{suff}}},\pi^{hyperparameter}]

but we skip the details for brevity.

We now come to the principal objective of this example, namely a model for predicting a future but as yet unobserved value of the predictand, X5futureX^{future}_{5}, based on the future covariate vector X1:4futureX^{future}_{1:4}. The data in 𝐗𝑠𝑢𝑓𝑓{\bf X}^{\rm{\it{suff}}} is used as the sufficiency-reduced training sample. As well we assume that, given the parameters of the sampling distribution, a future attribute vector XfutureX^{future} is normal with mean μ\mu and covariance matrix Σ\Sigma, and is conditionally independent of 𝐗𝑠𝑢𝑓𝑓{\bf X}^{\rm{\it{suff}}}, given μ\mu and Σ\Sigma.

In conformity with the modelling above, which through application of the invariance principle led to the π\pi-functions required to nondimensionalize the problem, we transform the predictand using statistics computed from the data in 𝐗𝑠𝑢𝑓𝑓{\bf X}^{\rm{\it{suff}}}: (X5𝑓𝑢𝑡𝑢𝑟𝑒X¯5)/σ^5(X^{\rm{\it{future}}}_{5}-\bar{X}_{5})/\hat{\sigma}_{5}. Furthermore normalized in this way, the predictand becomes invariant of those population parameters. But one more step is necessary to ensure that we have nondimensionalized the predictand in its π\pi-function, namely to align the dimensions of the predictand with those in the predictors to ensure dimensional consistency. The result is

π5𝑓𝑢𝑡𝑢𝑟𝑒=σ^2σ^5σ^3(X5𝑓𝑢𝑡𝑢𝑟𝑒X¯5)σ^5.\pi^{\rm{\it{future}}}_{5}={\hat{\sigma}_{2}\hat{\sigma}_{5}\over\hat{\sigma}_{3}}{(X^{\rm{\it{future}}}_{5}-\bar{X}_{5})\over\hat{\sigma}_{5}}. (6.12)

We would also need to convert the four predictor’s into their π\pi-functions and that would be done as in Equation (6.12). The result will be π1:4,(n+1)future\pi^{future}_{1:4,(n+1)}.

That predictor is found using Equation (6.6), with modified notation. It is given by

E[π5𝑓𝑢𝑡𝑢𝑟𝑒π1:4future,π𝑠𝑢𝑓𝑓,πhyperparameter]\displaystyle E[\leavevmode\nobreak\ \pi^{\rm{\it{future}}}_{5}\leavevmode\nobreak\ \mid\pi^{future}_{1:4},\pi^{\rm{\it{suff}}},\pi^{hyperparameter}]
=\displaystyle= E{E[π5futureπ1:4𝑓𝑢𝑡𝑢𝑟𝑒,πparameter]π1:4𝑓𝑢𝑡𝑢𝑟𝑒,π𝑠𝑢𝑓𝑓,πhyperparameter}.\displaystyle E\{E[\leavevmode\nobreak\ \pi^{future}_{5}\leavevmode\nobreak\ \mid\pi^{\rm{\it{future}}}_{1:4},\pi^{parameter}]\mid\pi^{\rm{\it{future}}}_{1:4},\pi^{\rm{\it{suff}}},\pi^{hyperparameter}\}.

7 Discussion

Its roots in mathematical statistics along with its formalisms made dimensional analysis (DA) seem unnecessary in statistical science. In fact, Shen and Lin (2019) seem to have written the first paper in statistical science to recognize the need to incorporate units. For example, the authors propose what they call “physical Lebesgue measure” that integrates the units of measurement into the usual Lebesgue measure. Yet application of Buckingham’s desideratum eliminates those units. Paradoxically it does so by exploiting the units it eliminates. That is, it exploits the intrinsic structural relationships among those units that dictate how the model must be set up. This vital implicit connection is recognized in this paper and earlier, in other papers, in more specialized contexts (Shen, 2015; Shen and Lin, 2019).

Remark 7.

The linear model with a Gaussian stochastic structure implicitly assumes data are measured on an interval-scale. But for physical quantities on a ratio-scale, that model would at best be an approximation. Shen and Lin (2018) in their Section 3.3 argue in favor of using a power (meaning product rather than sum) model in this context. We give arguments in our Subsection 4.1 to show why additive linear models are inappropriate in this context other than as an approximation to a ratio in some sense.

That said, the simplest way of nondimensionalizing a model is by dividing each coordinate of 𝐱{\bf x} by a known constant with the same units of measurement as the coordinate itself, thereby removing its units of measurement. Then k=pk=p in Buckingham’s notation, and π1(𝐱)=(x1/c1,x2/c2,,xp/cp)\pi_{1}({\bf x})=(x_{1}/c_{1},x_{2}/c_{2},\ldots,x_{p}/c_{p}) where ci={1}[xi]c_{i}=\{1\}[x_{i}]. This is in effect the approach used by Zhai et al. (2012a) and Zhai et al. (2012b) to resolve dimensional inconsistencies in models. It is also the approach generally implicit in regression analysis where e.g.

X5=β1X1+β2X2+β3X3+X4X_{5}=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+X_{4}

with X1=1X_{1}=1 being unitless and X4X_{4} representing a combination of measurement and modelling error. The βi\beta_{i}’s play the key role of forcing the model to adhere to the principle of dimensional homogeneity when the XiX_{i}’s have different units of measurement. A preferable approach would be to nondimensionalize the XiX_{i}’s themselves in some meaningful way. For example if X2X_{2} were the air pollution level at a specific site at a specific time, it might be divided by its long-term average over the population of sites and times. The relative sizes of the now dimensionless βi\beta_{i}’s are then readily interpretable – a large β\beta would mean the associated XX contributes a lot to the overall mean effect.

Buckingham’s theory does not specify the partition of attributes into the primary and secondary categories as is needed when deriving Buckingham’s π\pi-functions. That topic is a current area of active research. Recently, two approaches have been proposed in the context of the optimal design of computer experiments. Arelis (2020) suggests using functional analysis of variance to choose the quantities that contribute most to the variation in the output of interest as the base quantities. Yang and Lin (2021), on the other hand, propose a criterion based on measurement errors and choose the quantities that best minimize the measurement errors.

That optimal design is an ideal application for the Buckingham theory. There, the true model, called a simulator, is a deterministic numerical computer model of some phenomenon. But it is computationally intensive. So a stochastic alternative called an emulator is fitted to a sample of outputs from the simulator at a judiciously selected set of input vectors called design points, although they represent covariates in the reality the simulator purports to represent. The Buckingham Pi-theorem simplifies the model by reducing the dimension of the input vector and hence the number of design points. It also simplifies the form of the emulator in the process. That kind of application is discussed in Shen (2015), Arelis (2020) and Adragni and Cook (2009).

Our approach to extending Buckingham’s work differs from that in Shen (2015). Shen restricts the quantities to lie on ratio-scales so he can base his theory directly on Buckingham’s Theorem. His starting point is the application of that theorem and the dimensionless π\pi- functions it generates. In contrast, our theory allows a fully general group of transformations and arbitrary scales. Like Buckingham, we designate certain dimensions such as length LL as primary (or fundamental) while the others are secondary. We require that a transformation of any primary scale must be made simultaneously to all scales involving that primary scale including secondary scales. That requirement ensures consistency of change across all the quantities and leads to our version of the π\pi-functions. However, that leaves open the issue of which variables to serve as the primary and which the secondary variables, a topic under active investigation.

The paper has explored the nature and possible application of DA with the aim of integrating physical and statistical modelling. The result has been an extension of the statistical invariance principle as a way of embracing the principles that lay behind Buckingham’s development of his Pi-theory. The result is a restriction on the class of allowable models and resulting optimal statistical procedures based on those models. How does the performance of these procedures compare with the general class of unrestricted procedures? Would a minimax or Bayesian procedure in the restricted class of allowable procedures have these same performance properties if they were thrown in with the entire set of decision rules? Under certain conditions, the answer is affirmative in the minimax case (Kiefer et al., 1957) and in the Bayesian case (Zidek, 1969).

8 Concluding remarks

This paper has given a comprehensive overview of DA and its importance in statistical modelling. Dimensions have long been known to lie at the foundations of deterministic modelling, with each dimension requiring the specification of a scale and each scale requiring the specification of units of measurement. Dimensions, their scales, and the associated units of measurement lie at the heart of empirical science.

However, statistical scientists regularly ignore their importance. We have demonstrated with the examples presented in Section 2 that ignoring scales and units of measurement can lead to results that are either wrong or meaningless. This points to the need for statistics education to incorporate some basic training on quantity calculus and the importance of scales, along with the impact at a fundamental level of transforming data. Statistics textbooks should reflect these needs. Going beyond training is the whole process of disseminating statistical research. There again the importance of these concepts should be recognized by authors and reviewers to ensure quality.

But does any of this really matter? We assert that it does. First we have described in Example 5, the genesis of this paper, an important example of dimensional inconsistency. An application of Buckingham’s theory showed that an additional quantity needs to be added to complete the model in the sense of Buckingham to make it nondimensionalizable (Wong and Zidek, 2018). The importance of this famous model, a model which was subsequently revised, lay in its use in assessing the reliability of lumber as quantified in terms of its return period, an important component in the calculation of quality standards for lumber when placed in service as a building material. Papers flowing from that discovery soon followed (Wong and Zidek, 2018; Yang, Zidek and Wong, 2018). The work reported in this paper led us to a deeper level than mere dimensional consistency, namely the discovery that the units impose important intrinsic structural links among the various quantities involved in the model. These links lead in Example 13 to a new version of transformation groups usually adopted in invariant Bayesian multivariate regression models. This new version requires use of a subgroup dictated by those links.

At a still deeper level, we are confronted by structural constraints imposed by the scales. For example, the artificial origin 0[u]0[u], where [u][u] denotes the units of 0, in the interval scale rule out use of Buckingham’s Pi-theory. Furthermore it leads to a new calculus for ratio-scales, a topic under active investigation.

Further, we have shown that, surprisingly, not all functions are candidates for use in formulating relationship among attribute variables. Thus functions like g(x)=ln(x)g(x)=\ln(x) are transcendental and hence inadmissible for that role. This eliminates from consideration in relationships not only the natural logarithm but also, for example, the hyperbolic trigonometric functions. This knowledge would be useful to statistical scientists in developing statistical models.

On the other hand the paper reveals an important deficiency of deterministic physical models of natural phenomena in their failure to reflect their uncertainty about these phenomena. An approach to doing so is presented in the paper along with an extension of the classical theory to incorporate the Bayesian approach. That approach to this union of the different frameworks is reached via the statistical invariance principle, yielding a generalization of the famous theories of Buckingham, Bridgman and Luce.

In summary, each of the two approaches to modelling, physical and statistical, has valuable aspects that can inform the other. This paper provides the groundwork for the unification of these approaches, setting the stage for future research.

Appendix A Validity of using ln(x)\ln(x) when xx has units of measurement: The debate goes on.

Whether as a transcendental function, the function ln((x))\ln{(x)} may be applied to measurements xx with units of measurement has been much discussed in other scientific disciplines and we now present some of that discussion for illustrative purposes. Molyneux (1991) points out that both affirmative and negative views had been expressed on this issue. He argues in favor of a compromise, namely defining the logarithm by exploiting one of its most fundamental properties as ln((X))=ln(({X}[X]))=ln(({X}))+ln(([X]))\ln{(X)}=\ln{(\{X\}[X])}=\ln{(\{X\}})+\ln{([X])}. He finds support for his proposal by noting that the derivative of the constant term, ln([X])\ln([X]), would be zero. It follows that

dln((X))dX=dln(({X}))d{X}.\frac{d\ln{(X)}}{dX}=\frac{d\ln{(\{X\})}}{d\{X\}}.

To see this under his definition of the logarithm

ln((x+Δx))ln((x))Δ{x}\displaystyle\frac{\ln{(x+\Delta x)}-\ln{(x)}}{\Delta\{x\}} =\displaystyle= ln{x+Δx}+ln[x]ln{x}ln[x]Δ{x}\displaystyle\frac{\ln\{x+\Delta x\}+\ln[x]-\ln\{x\}-\ln[x]}{\Delta\{x\}}
=\displaystyle= ln(1+Δ{x}{x})1/Δ{x}\displaystyle\ln\left(1+{\frac{\Delta\{x\}}{\{x\}}}\right)^{1/\Delta\{x\}}
\displaystyle\rightarrow 1{x}\displaystyle\frac{1}{\{x\}}

where 1/Δ{x}1/\Delta\{x\}\rightarrow\infty. Note that his definition of the derivative differs from ours, given in Equation (4.4); we include the units of xx in the denominatorm as the derivative is taken with respect to xx, units and all. Furthermore Molyneux (1991) argues that the proposal makes explicit the units that are sometimes hidden, pointing to the same example, Example 6, that we have used to make the point. It is unitless because the logarithm is applied to a count, not a measurement, that count being the number of SIUs. Molyneux (1991) gives other such examples. The proposal not only makes the units explicit, but on taking the antilog of the result, you get the original value of XX on the raw scale with the units [X][X] correctly attached.

But, in a letter to the Journal Editor (Mills, 1995), Ian Mills quotes Molyneux (1991), in which Molyneux himself says that his proposal “has no meaning.” Furthermore, Mills says he is “inclined to agree with him.” Furthermore Mills argues, like Bridgman, that the issue is mute since in practice since the logarithm is applied in the context of the difference of two logarithms, leading to ln((u/v))=ln(({u}/{v}))\ln{(u/v)}=\ln{(\{u\}/\{v\}}), a unitless quantity. In the same issue of the journal, Molyneux publishes a lengthy rejoinder saying amongst other things that Mills misquoted him.

However, in so far as the authors of this paper are aware, Molyneux’s proposal was not accepted by the scientific community, leaving unresolved the issue of applying the natural logarithm to a dimensional quantity. In particular Matta et al. (2010) also rejects it in a totally different context. Mayumi and Giampietro (2010) pick up on this discussion in a recent paper regarding dimensional analysis in economics and the frequent application of logarithmic specifications. Their approach is based on Taylor expansion arguments that show that application of the logarithm to dimensional quantities XX is fallacious since in the expansion

ln(1+X)=X+X22+.\ln\leavevmode\nobreak\ (1+X)=X+\frac{X^{2}}{2}+\dots. (A.1)

the terms on the right hand side would then have different units of measurement.

Mayumi and Giampietro then go on to describe a number of findings that are erroneous due to the misapplication of the logarithm. They also cite a “famous controversy” between A.C. Pigou and Milton Friedman that according to the authors, revolved around dimensional homogeneity (Pigou, Friedman and Georgescu-Roegen, 1936; Arrow et al., 1961), although not specifically involving the logarithm. One of the findings criticized in Mayumi and Giampietro (2010) is subsequently defended by Chilarescu and Viasu (2012). But Mayumi and Giampietro (2012) publish a rejoinder in which they reassert the violation of the principle of dimensional homogeneity in that finding and declare that the claim in Chilarescu and Viasu (2012) “is completely wrong. So, contrary to Chilarescu and Viasu’s claim, log(V/ L) or log W in Arrow et al. (1961) can never be used as a scientific representation.”

Although agreeing with the conclusion that the logarithm cannot be applied to a dimensional XX, Matta et al. (2010) states that Taylor expansion argument above, which the authors attribute to a Wikipedia article in September 2010, is fallacious. [The Wikipedia article actually misstates the Taylor expansion as

lnX=X+X2/2+.\ln\leavevmode\nobreak\ X=X+X^{2}/2+\dots. (A.2)

but that does not negate the thrust of their argument.] They argue that the Taylor expansion should be

g(X)=g(Xo)+(XXo)dgdXXo+g(X)=g(X_{o})+(X-X_{o})\frac{dg}{dX}\mid_{X_{o}}+\dots (A.3)

so that if XX had units of measurement, they would cancel out. But then the authors don’t state that expansion for the logarithm. If they did, they would have had to deal with the issue of the units of g(x0)=ln(x0)g(x_{0})=\ln(x_{0}), term, while the remainder of the expansion is unitless (see our comments on this issue in Subsection 4.4).

Baiocchi (2012) points out that if the claims of Mayumi and Giampietro (2010) were valid, they would make most “applications of statistics, economics, … unacceptable” for statistical inference based on the use the exponential and logarithmic transformations. Baiocchi then tries a rescue operation by arguing that the views of Mayumi and Giampietro (2010) go “against well established theory and practice of many disciplines including …statistics,…and that it rests on an inadequate understanding of dimensional homogeneity and the nature of empirical modeling.” The paper invokes the dominant theory of measurement in the social sciences that the author claims makes a numerical statement meaningful if it is invariant under legitimate scale transformations of the underlying variables. That idea of meaningfulness can then be applied to the logarithmic transformation of dimensional quantities in some situations.

To explain this idea, Baiocchi first gives the following analysis involving quantities on the ratio-scale. Start with the model lnX2=αX1\ln\leavevmode\nobreak\ X_{2}=\alpha X_{1}. Let us rescale X2X_{2} as kX2k\leavevmode\nobreak\ X_{2}. That leads to the need to appropriately rescale X1X_{1} say as mX1m\leavevmode\nobreak\ X_{1}. Consequently our model becomes ln(kX2)=αmX1\ln\leavevmode\nobreak\ (k\leavevmode\nobreak\ X_{2})=\alpha m\leavevmode\nobreak\ X_{1} or lnX2=α^xlnk\ln\leavevmode\nobreak\ X_{2}=\hat{\alpha}x-\ln\leavevmode\nobreak\ k with α^=mα\hat{\alpha}=m\alpha. But then this model for X2X_{2} cannot be reduced to its original form because of its second log term. Thus the model would be considered empirically meaningless.

On the other hand if X2X_{2} were unique up to a power transformation we would get lnX2k=α(mX1)\ln\leavevmode\nobreak\ X_{2}^{k}=\alpha(m\leavevmode\nobreak\ X_{1}) or lnX2=α^X1\ln\leavevmode\nobreak\ X_{2}=\hat{\alpha}X_{1} with α^=mα/k\hat{\alpha}=m\alpha/k. Therefore the model would be invariant under admissible transformations and hence empirically meaningful. So the situation is more complex than the paper of Mayumi and Giampietro (2010) would suggest.

Baiocchi (2012) also addresses other arguments given by Mayumi and Giampietro (2010). In particular, he is concerned with their Taylor expansion argument ln((1+X))=XX2/2+\ln{(1+X)}=X-X^{2}/2+\cdots. They point out that for 1+X1+X to make sense, the 11 would have to have the same units as XX. They use the expansion ln(X0+X)=lnX0+x/a(x/a)2/2+\ln\leavevmode\nobreak\ (X_{0}+X)=\ln\leavevmode\nobreak\ X_{0}+x/a-(x/a)^{2}/2+\cdots to make the point that when a=1a=1 has the same units as XX, the expansion is valid. However this argument ignores the fact that in lnX0\ln\leavevmode\nobreak\ X_{0}, X0X_{0} has units so this argument seems tenuous and therefore leaves doubt about their success in discrediting the arguments in Mayumi and Giampietro (2010). For brevity we will terminate our review of Baiocchi (2012) on that note. It is a lengthy paper with further discussion of the Mayumi and Giampietro (2010) arguments and a very lengthy bibliography of relatively recent relevant work on this issue.

Appendix B Application of Buckingham’s theorem and the discovery of Reynold’s number

This section provides a well-known example from fluid dynamics.

Example 14.

The example is a model for fluid flow around a sphere for the calculation of the drag force FF. It turns out that the model depends only on something called the coefficient of drag and on a complicated, single dimensionless number called the Reynolds number that incorporates all the relevant quantities.

To begin, we list all the relevant quantities, the drag force (FF), velocity (VV), fluid density (ρ\rho), viscosity (μ\mu) and sphere diameter (DD). We see that we have p=5p=5 XXs in the notation of Buckingham’s theorem. We first note that the dimensions of these five quantities can be expressed in terms of the three dimensions, length (LL), mass (MM) and time (TT). We treat these as the three primary dimensions and this tells us that we need at most 53=25-3=2 dimensionless π\pi functions to define for our model.

We first write down the dimensions of each of the five quantities in terms of LL, MM and TT:

[F]=MLT2;[V]=LT1;[ρ]=ML3;[μ]=ML1T1;[D]=L.[F]=MLT^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [V]=LT^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\rho]=ML^{-3};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu]=ML^{-1}T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D]=L. (B.1)

We now proceed to sequentially eliminate the dimensions LL, MM and TT in all five equations. First we use [D]=L[D]=L to eliminate LL. The first four equations become

[FD1]=MT2;[VD1]=T1;[D3ρ]=M;[Dμ]=MT1.[FD^{-1}]=MT^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [VD^{-1}]=T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D^{3}\rho]=M;\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D\mu]=MT^{-1}.

We next eliminate MM via D3ρD^{3}\rho, yielding

[FD1D3ρ1]=T2;[VD]1=T1;[DμD3ρ1]=T1,[FD^{-1}D^{-3}\rho^{-1}]=T^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [VD]^{-1}=T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D\mu D^{-3}\rho^{-1}]=T^{-1},

that is

[FD4ρ1]=T2;[VD1]=T1;[μD2ρ1]=T1.[FD^{-4}\rho^{-1}]=T^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [VD^{-1}]=T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu D^{-2}\rho^{-1}]=T^{-1}.

To eliminate TT, we could use [VD1][VD^{-1}] or [μD2ρ1][\mu D^{-2}\rho^{-1}] or even, with a bit more work, [FD4ρ1][FD^{-4}\rho^{-1}]. We use [VD1][VD^{-1}], yielding

[FD4ρ1V2D2]=1and[μD2ρ1V1D]=1,[FD^{-4}\rho^{-1}V^{-2}D^{2}]=1\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\rm{and}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu D^{-2}\rho^{-1}V^{-1}D]=1,

that is

[FD2ρ1V2]=1and[μD1ρ1V1]=1.[FD^{-2}\rho^{-1}V^{-2}]=1\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\rm{and}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu D^{-1}\rho^{-1}V^{-1}]=1.

All the dimensions are now gone so we have nondimensionalized the problem and in the process found π1\pi_{1} and π2\pi_{2} as implied by Buckingham’s theorem:

π1(F,V,μ,ρ,D)=FρD2V2andπ2(F,V,μ,ρ,D)=μρDV.\pi_{1}(F,V,\mu,\rho,D)=\frac{F}{\rho D^{2}V^{2}}\hskip 20.0pt{\rm{and}}\hskip 20.0pt\pi_{2}(F,V,\mu,\rho,D)=\frac{\mu}{\rho DV}.

Therefore, for some function UU,

U(FρD2V2,μρDV)=0.U\left(\frac{F}{\rho D^{2}V^{2}},\frac{\mu}{\rho DV}\right)=0.

Remarkably we have also found the famous Reynolds number, ρDV/μ\rho DV/\mu (Friedmann, Gillis and Liron, 1968). The Reynolds number determines the coefficient of drag, ρD2V2/F\rho D^{2}V^{2}/F, according to a fundamental law of fluid mechanics.

If we knew uu in equation (5.2) to begin with, we could track the series of transformations starting at (B.1) to find UU. If, however, we had no specified uu to begin with, we could use π1\pi_{1} and π2\pi_{2} to determine a model, that is, to find UU. For instance, we could carry out experiments, make measurements and determine UU from the data. In either case, we can use UU to determine the coefficient of drag from the Reynolds number and in turn calculate the drag force.

Invoking the principle of invariance enables us to embed Example 14 in a stochastic framework using Approach 1 as follows.

Example 15 (continues=ex:reynolds).

We continue to use the same notation. In this example, the random variable to be replicated in independent experiments is 𝐗=(V,ρ,μ,D,N)𝒳=IR+5{\bf X}=(V,\rho,\mu,D,N)\in{\cal X}={\rm I\!R}^{5}_{+}.

The sample space.

The creation of the transformation group and relevant subgroup follow the lines of Example 12. We choose L,M,TL,M,T as the primary dimensions. Then with 𝐜=(c1,c2,c3){\bf c}=(c_{1},c_{2},c_{3}) the corresponding group of transformations is

g𝐜(V,ρ,D,μ,N)=(c1c3V,c2c13ρ,c1D,c2c1c3μ,c1c2c32N).g_{\bf c}(V,\rho,D,\mu,N)=\bigg{(}\frac{c_{1}}{c_{3}}V,\frac{c_{2}}{c_{1}^{3}}\rho,c_{1}D,\frac{c_{2}}{c_{1}c_{3}}\mu,\frac{c_{1}c_{2}}{c_{3}^{2}}N\bigg{)}.

For indexing the cross sections of 𝒳{\cal X} we have the maximal invariant

M(𝐗)=(VV,ρρ,DD,μρVD2,NρV2D2)=(1,1,1,πμ,πN,)M({\bf X})=(\frac{V}{V},\frac{\rho}{\rho},\frac{D}{D},\frac{\mu}{\rho VD^{2}},\frac{N}{\rho V^{2}D^{2}})=(1,1,1,\pi_{\mu},\pi_{N},) (B.2)

where πμ=μ/(ρVD)\pi_{\mu}=\mu/(\rho VD) and πN=N/(ρV2D2)\pi_{N}=N/(\rho V^{2}D^{2}). Let 𝝅𝐗=(πμ,πN)\bm{\pi}_{\bf X}=(\pi_{\mu},\pi_{N}). To show that MM is a maximal invariant, first observe that M(X)M(X) is invariant since each term is dimensionless. Thus showing M(X)M(X) is a maximal invariant reduces to finding a subgroup element for which X=g𝐜(X)X^{*}=g_{{\bf c}^{*}}(X) when M(X)=M(X)M(X)=M(X^{*}). For NN,

g𝐜(N)\displaystyle g_{{\bf c}^{*}}(N) =c1c2(c3)2N\displaystyle=\frac{c_{1}^{*}c_{2}^{*}}{(c_{3}^{*})^{2}}N
=DDρ(D)3D3ρD2V2(V)2(D)2N\displaystyle=\frac{D^{*}}{D}\frac{\rho^{*}(D^{*})^{3}}{D^{3}\rho}\frac{D^{2}}{V^{2}}\frac{(V^{*})^{2}}{(D^{*})^{2}}N
=DDρ(D)3D3ρD2V2(V)2(D)2ρD2V2NρD2V2\displaystyle=\frac{D^{*}}{D}\frac{\rho^{*}(D^{*})^{3}}{D^{3}\rho}\frac{D^{2}}{V^{2}}\frac{(V^{*})^{2}}{(D^{*})^{2}}\rho D^{2}V^{2}\frac{N}{\rho D^{2}V^{2}}
=ρ(D)2(V)2πN\displaystyle=\rho^{*}(D^{*})^{2}(V^{*})^{2}\pi_{N}
=ρ(D)2(V)2πN using the assumption that M(X)=M(X)\displaystyle=\rho^{*}(D^{*})^{2}(V^{*})^{2}\pi_{N^{*}}\mbox{ using the assumption that $M(X)=M(X^{*})$}
=ρ(D)2(V)2Nρ(D)2(V)2\displaystyle=\rho^{*}(D^{*})^{2}(V^{*})^{2}\frac{N^{*}}{\rho^{*}(D^{*})^{2}(V^{*})^{2}}
=N.\displaystyle=N^{*}.

Similarly, we get that μμ\mu\rightarrow\mu^{*}. Relating these results to Shen’s thesis, this is essentially his Lemma 5.5. However Shen does not derive the maximal invariant; he simply uses the Pi quantities derived from Buckingham’s Pi-theorem as the maximal invariant. In contrast, for us the maximal invariant emerges in M(X)M(X) purely as an artifact of the need for dimensional consistency as expressed through the application of the invariance principle.

Observe that all points in 𝒳{\cal X} obtain from the cross section in Equation (B.2) by application of the appropriate element of the group of transformations. To see this let us first choose c1=D1c_{1}=D^{-1}. Then we have

g𝐜(x)=(1Dc3V,c2D3ρ,1,c2Dc3μ,c2Dc32N).g_{\bf c}(\textbf{x})=\bigg{(}\frac{1}{Dc_{3}}V,c_{2}D^{3}\rho,1,\frac{c_{2}D}{c_{3}}\mu,\frac{c_{2}}{Dc_{3}^{2}}N\bigg{)}.

Next let c2=(D3ρ)1c_{2}=(D^{3}\rho)^{-1} and get

g𝐜(x)=(1Dc3V,1,1,1D3ρc3μ,1D4ρc32N).g_{\bf c}(\textbf{x})=\bigg{(}\frac{1}{Dc_{3}}V,1,1,\frac{1}{D^{3}\rho c_{3}}\mu,\frac{1}{D^{4}\rho c_{3}^{2}}N\bigg{)}.

Finally choose c3=VD1c_{3}=VD^{-1}, which yields

g𝐜(x)=(1,1,1,M(x)).g_{\bf c}(\textbf{x})=\bigg{(}1,1,1,M(\textbf{x})\bigg{)}.

Inverting this transformation takes us from the cross section to the point x.

The sampling distribution

The analysis above naturally suggests the transformation group G¯\bar{G} and its cross section for the parameter space,

Λ={(λV,λρ,λD,λμ,λN),λi>0,i=V,,N}\Lambda=\{(\lambda_{V},\lambda_{\rho},\lambda_{D},\lambda_{\mu},\lambda_{N}),\leavevmode\nobreak\ \lambda_{i}>0,\leavevmode\nobreak\ i=V,\dots,N\}

namely

M(λ)=(1,1,1,𝝅λ)M(\lambda)=\bigg{(}1,1,1,\bm{\pi}_{\lambda}\bigg{)}

where with πλμ=λμ/(λρλDλV)\pi_{\lambda_{\mu}}=\lambda_{\mu}/(\lambda_{\rho}\lambda_{D}\lambda_{V}) and πλN=λN/(λρλD2λV2)\pi_{\lambda_{N}}=\lambda_{N}/(\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}) and 𝝅λ=(πλμ,πλN)\bm{\pi}_{\lambda}=(\pi_{\lambda_{\mu}},\pi_{\lambda_{N}}) characterizes the maximal invariant over the parameter space. It follows that for any λΛ\lambda\in\Lambda,

λ=g¯𝐜(M(λ),)\lambda=\bar{g}_{{\bf c}}(M(\lambda),)

where

c1=1/λD\displaystyle c_{1}=1/\lambda_{D}
c2=1/(λρλD3)\displaystyle c_{2}=1/(\lambda_{\rho}\lambda_{D}^{3})
c3=λV/λD.\displaystyle c_{3}=\lambda_{V}/\lambda_{D}.

The statistical invariance implies that

F(X|λ)=P(Xx|λ)=P(gc(X)x|g¯c(λ))F(X|\lambda)=P(X\leq x|\lambda)=P(g_{c}(X)\leq x|\bar{g}_{c}(\lambda)) (B.3)

for any ci>0,i=1,2,3c_{i}>0,\leavevmode\nobreak\ i=1,2,3. Notice that

P(gc(X)x|g¯c(λ))\displaystyle P(g_{c}(X)\leq x|\bar{g}_{c}(\lambda)) =P(Xgc1(x)|g¯c(λ))\displaystyle=P(X\leq g_{c}^{-1}(x)|\bar{g}_{c}(\lambda))
=F(gc1(x)|g¯c(λ))\displaystyle=F(g_{c}^{-1}(x)|\bar{g}_{c}(\lambda))

Now by taking the partial derivatives with respect to the variables, we obtain

f(x|λ)=f(gc1(x)|g¯c(λ))c3c1c13c21c1c1c3c2c32c1c2\displaystyle f(x|\lambda)=f(g_{c}^{-1}(x)|\bar{g}_{c}(\lambda))\frac{c_{3}}{c_{1}}\frac{c_{1}^{3}}{c_{2}}\frac{1}{c_{1}}\frac{c_{1}c_{3}}{c_{2}}\frac{c_{3}^{2}}{c_{1}c_{2}}

Since this must hold for any ci>0c_{i}>0, we may choose c1=λD,c2=λD3λρ,c3=λD/λVc_{1}=\lambda_{D},c_{2}=\lambda_{D}^{3}\lambda_{\rho},c_{3}=\lambda_{D}/\lambda_{V}. Then

f(gc1(x)|g¯c(λ))\displaystyle f(g_{c}^{-1}(x)|\bar{g}_{c}(\lambda)) =f(gc1(x)|g¯c1(λ))\displaystyle=f(g_{c}^{-1}(x)|\bar{g}_{c}^{-1}(\lambda))
=f(VλV,ρλρ,DλD,μλρλDλV,NλρλD2λV2)|𝝅λ)\displaystyle=f(\frac{V}{\lambda_{V}},\frac{\rho}{\lambda_{\rho}},\frac{D}{\lambda_{D}},\frac{\mu}{\lambda_{\rho}\lambda_{D}\lambda_{V}},\frac{N}{\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}})|\bm{\pi}_{\lambda})
=f(VλV,ρλρ,DλD,λμλρλDλNμλμ,λNλρλD2λV2NλN|𝝅λ)\displaystyle=f(\frac{V}{\lambda_{V}},\frac{\rho}{\lambda_{\rho}},\frac{D}{\lambda_{D}},\frac{\lambda_{\mu}}{\lambda_{\rho}\lambda_{D}\lambda_{N}}\frac{\mu}{\lambda_{\mu}},\frac{\lambda_{N}}{\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}}\frac{N}{\lambda_{N}}|\bm{\pi}_{\lambda})

Thus the joint PDF is proportional to

f(VλV,ρλρ,DλD,πλμμλμ,πλNNλN|𝝅λ).f({V\over\lambda_{V}},{\rho\over\lambda_{\rho}},{D\over\lambda_{D}},{\pi_{\lambda_{\mu}}\mu\over\lambda_{\mu}},{\pi_{\lambda_{N}}N\over\lambda_{N}}|\bm{\pi}_{\lambda}). (B.4)

Hence the statistical invariance implies that information about the variables can be summarized by maximal invariants in the sample space and in the parameter space.

The sample

Now suppose nn independent experiments are performed and that they yield data x1,,xn\textbf{x}_{1},\dots,\textbf{x}_{n}. Further suppose for this illustrative example, that the model in Equation (B.3) and resulting likelihood derived from Equation (B.4), the sufficiency principle implies 𝐒=Σj𝐱j=(SV,Sρ,SD,Sμ,SN){\bf S}=\Sigma_{j}{\bf x}_{j}=(S_{V},S_{\rho},S_{D},S_{\mu},S_{N}) is a sufficient statistic. Then a maximal invariant for transformation group is

M(V,ρ,D,μ,N)=(VSV,ρSρ,DSD,μ(SρSVSD2),N(SρSV2SD2)),ci>0,i.M(V,\rho,D,\mu,N)=\bigg{(}\frac{V}{S_{V}},\frac{\rho}{S_{\rho}},\frac{D}{S_{D}},{\mu\over(S_{\rho}S_{V}S_{D^{2}})},{N\over(S_{\rho}S_{V^{2}}S_{D^{2}})}\bigg{)},\leavevmode\nobreak\ c_{i}>0,\leavevmode\nobreak\ \forall i.

To see this, observe that each term is dimensionless, so MM is certainly invariant. Now suppose M(V,ρ,D,μ,N)=M(V,ρ,D,μ,N)M(V,\rho,D,\mu,N)=M(V^{*},\rho^{*},D^{*},\mu^{*},N^{*}). Then we need to show that there exists {ci}\{c_{i}^{*}\} such that (V,ρ,D,μ,N)=gc1,c2,c3(V,ρ,D,μ,N)(V,\rho,D,\mu,N)=g_{c_{1}^{*},c_{2}^{*},c_{3}^{*}}(V^{*},\rho^{*},D^{*},\mu^{*},N^{*}). These do exist and they are

c1\displaystyle c_{1}^{*} =SDSD,\displaystyle=\frac{S_{D^{*}}}{S_{D}},
c2\displaystyle c_{2}^{*} =SρSD3SρSD3,\displaystyle=\frac{S_{\rho^{*}}S_{{D^{*}}^{3}}}{S_{\rho}S_{D^{3}}},
c3\displaystyle c_{3}^{*} =SDSVSDSV.\displaystyle=\frac{\frac{S_{D^{*}}}{S_{V^{*}}}}{\frac{S_{D}}{S_{V}}}.

We conclude our discussion of this example. Proceeding further would entail the specification of the sampling distribution and that in turn would depend on contextual details.

Appendix C Invariance models for interval-scales

This section develops the theory for the interval case, which parallels that for ratio-scales seen in Subsection 6.3.

The sample space

We first partition the response vector X as in Equation (6.4). These partitions correspond to the primary and secondary quantities as in the Buckingham theory, although that distinction was not made as far as we know in the Luce work and its successors. Of particular interest again is XpX_{p} in the model of Equation (5.4) . The first step in our construction entails a choice of the transformation group GG^{*}. That choice will depend on the dimensions involved. However, given that we are assuming in this subsection that quantities line an affine space, we will in the sequel rely on Paganoni (1987) as described in Subsection 5.3 for an illustration in this subsection.

We begin with a setup more general than that of Paganoni (1987) and would include for example the discrete seven point Semantic Differential Scale (SDM). So we we extend Equation (5.3) as follows

g1(x1)\displaystyle g_{1}(\textbf{x}_{1}) =\displaystyle= R1x1+P1\displaystyle\textbf{R}_{1}\textbf{x}_{1}+\textbf{P}_{1} (C.1)
g2(x2)\displaystyle g_{2}(\textbf{x}_{2}) =\displaystyle= R2x2+P2.\displaystyle\textbf{R}_{2}\textbf{x}_{2}+\textbf{P}_{2}. (C.2)

where now xpx_{p} is the final coordinate of x2\textbf{x}_{2} when pk+1>1p-k+1>1. Note that in the univariate version of the model model proposed by Paganoni (1987), Equation 5.3 has the vector 𝐱2{\bf x}_{2} replaced with xpx_{p}. Here both the rescaling matrix R2\textbf{R}_{2} and the translation vector P2\textbf{P}_{2} depends on the pair R1\textbf{R}_{1} and P1\textbf{P}_{1}, i.e. R2=R(R1,P1)\textbf{R}_{2}=R(\textbf{R}_{1},\textbf{P}_{1}) and P2=P(R1,P1)\textbf{P}_{2}=P(\textbf{R}_{1},\textbf{P}_{1}). Note that the ratio–scales are formally incorporated in this extension simply by setting to 0, the relevant coordinates of P1P_{1} and P2P_{2}.

Conditions are needed to ensure that GG^{*} is a transformation group. For definiteness we choose R2=M(R1)\textbf{R}_{2}=M(\textbf{R}_{1}) and P2=ψ(R1)\textbf{P}_{2}=\psi(\textbf{R}_{1}) where in general M(SR)=M(S)M(R)M(\textbf{S}\textbf{R})=M(\textbf{S})M(\textbf{R}) and ψ(SR)=M(S)ψ(R)+ψ(S)\psi(\textbf{S}\textbf{R})=M(\textbf{S})\psi(\textbf{R})+\psi(\textbf{S}). The objects R1\textbf{R}_{1} and P1\textbf{P}_{1} lie in the subspaces described in Subsection 5.3 while R2\textbf{R}_{2} and P2\textbf{P}_{2} lie in multidimensional rather than one dimensional spaces as before. We omit details for brevity.

Finally we index the transformation group G0G_{0} acting on x by (R1,P1)(\textbf{R}_{1},\textbf{P}_{1}) and define that associated transformation by

g(R1,P1)(x)=[g1(x1),g2(x2)].g_{(\textbf{R}_{1},\textbf{P}_{1})}(\textbf{x})=[g_{1}(\textbf{x}_{1}),g_{2}(\textbf{x}_{2})]. (C.3)

It remains to show that in this case, G0G_{0} is a transformation group and for this we need the conditions presented by Paganoni (1987).

Theorem 3.

The set G0G_{0} of transformations defined by Equation (C.3) is a transformation group acting on the sample space.

Proof. First we show that G0G_{0} possesses an identity transformation. This is found simply by taking R1=Ik\textbf{R}_{1}=\textbf{I}_{k} and P=0k\textbf{P}=\textbf{0}_{k} and invoking the definitions of MM^{*}, ψ\psi and AA:

g1(𝐱1)\displaystyle g_{1}(\mathbf{x}_{1}) =R1𝐱1+P1=𝐈kx1+𝟎k=𝐱1.\displaystyle=\textbf{R}_{1}\mathbf{x}_{1}+\textbf{P}_{1}=\mathbf{I}_{k}x_{1}+\mathbf{0}_{k}=\mathbf{x}_{1}.
g2(𝐱2)\displaystyle g_{2}(\mathbf{x}_{2}) =R2𝐱2+P2\displaystyle=\textbf{R}_{2}\mathbf{x}_{2}+\textbf{P}_{2}
=M(R1)𝐱2+ψ(R1)+A(P1)\displaystyle=M(\textbf{R}_{1})\mathbf{x}_{2}+\psi(\textbf{R}_{1})+A(\textbf{P}_{1})
=𝐱2+0+0\displaystyle=\mathbf{x}_{2}+0+0
=𝐱2.\displaystyle=\mathbf{x}_{2}.

Next we show that the composition of two transformations indexed by (S1,Q1)(\textbf{S}_{1},\textbf{Q}_{1}) and (R1,P1)(\textbf{R}_{1},\textbf{P}_{1}) yield a transformation in G0G_{0}. First we obtain g(R1,P1)(x)=(x11,x21)g_{(\textbf{R}_{1},\textbf{P}_{1})}(\textbf{x})=(\textbf{x}_{1}^{1},\textbf{x}_{2}^{1}) where

x11\displaystyle\textbf{x}_{1}^{1} =\displaystyle= R1x1+P1,and\displaystyle\textbf{R}_{1}\textbf{x}_{1}+\textbf{P}_{1},\leavevmode\nobreak\ {\rm and} (C.4)
x21\displaystyle\textbf{x}_{2}^{1} =\displaystyle= R2x2+P2=M(R1)x2+ψ(R1)+A(P1).\displaystyle\textbf{R}_{2}\textbf{x}_{2}+\textbf{P}_{2}=M(\textbf{R}_{1})\textbf{x}_{2}+\psi(\textbf{R}_{1})+A(\textbf{P}_{1}). (C.5)

Next we compute

g(S1,Q1)(x1)\displaystyle g_{(\textbf{S}_{1},\textbf{Q}_{1})}(\textbf{x}^{1}) =\displaystyle= (S1x11+Q1,S2x22+Q2)\displaystyle\textbf{(S}_{1}\textbf{x}_{1}^{1}+\textbf{Q}_{1},\textbf{S}_{2}\textbf{x}_{2}^{2}+\textbf{Q}_{2})
=\displaystyle= (S1x11+Q1,M(S1)x21+ψ(S1)+A(Q1)).\displaystyle\textbf{(S}_{1}\textbf{x}_{1}^{1}+\textbf{Q}_{1},M(\textbf{S}_{1})\textbf{x}_{2}^{1}+\psi(\textbf{S}_{1})+A(\textbf{Q}_{1})).

But

M(S1)x21+ψ(S1)+A(Q1)\displaystyle M(\textbf{S}_{1})\textbf{x}_{2}^{1}+\psi(\textbf{S}_{1})+A(\textbf{Q}_{1}) =M(S1R1)x2+M(S1)ψ(R1)+M(S1)A(P1)\displaystyle=M(\textbf{S}_{1}\textbf{R}_{1})\textbf{x}_{2}+M(\textbf{S}_{1})\psi(\textbf{R}_{1})+M(\textbf{S}_{1})A(\textbf{P}_{1})
+ψ(S1)+A(Q1)\displaystyle+\psi(\textbf{S}_{1})+A(\textbf{Q}_{1})
=M(S1R1)x2+ψ(S1R1)+A(S1P1)+A(Q1)\displaystyle=M(\textbf{S}_{1}\textbf{R}_{1})\textbf{x}_{2}+\psi(\textbf{S}_{1}\textbf{R}_{1})+A(\textbf{S}_{1}\textbf{P}_{1})+A(\textbf{Q}_{1})
=M(S1R1)x2+ψ(S1R1)+A(S1P1+Q1),\displaystyle=M(\textbf{S}_{1}\textbf{R}_{1})\textbf{x}_{2}+\psi(\textbf{S}_{1}\textbf{R}_{1})+A(\textbf{S}_{1}\textbf{P}_{1}+\textbf{Q}_{1}),

which proves that the composition is an element of G0G_{0}.

Finally we need to show that for any member of G0G_{0} indexed by (R1,P1)(\textbf{R}_{1},\textbf{P}_{1}) there exists an inverse. Starting with the transformed quantities in Equations (C.4) and (C.5), let (S1,Q1)=(R11,R1P1)(S_{1},Q_{1})=(\textbf{R}_{1}^{-1},-\textbf{R}_{1}\textbf{P}_{1}). Then we find that

g(S1,Q1)(R1x1+P1,R2x2+P2)\displaystyle g_{(\textbf{S}_{1},\textbf{Q}_{1})}(\textbf{R}_{1}\textbf{x}_{1}+\textbf{P}_{1},\textbf{R}_{2}\textbf{x}_{2}+\textbf{P}_{2}) =(S1(R1𝐱1+P1)+Q1,S2(R2𝐱2+P2)+Q2)\displaystyle=(\textbf{S}_{1}(\textbf{R}_{1}\mathbf{x}_{1}+\textbf{P}_{1})+\textbf{Q}_{1},\textbf{S}_{2}(\textbf{R}_{2}\mathbf{x}_{2}+\textbf{P}_{2})+\textbf{Q}_{2})
=(S1R1𝐱1+S1P1+Q1,S2R2𝐱2+S2P2+Q2).\displaystyle=(\textbf{S}_{1}\textbf{R}_{1}\mathbf{x}_{1}+\textbf{S}_{1}\textbf{P}_{1}+\textbf{Q}_{1},\textbf{S}_{2}\textbf{R}_{2}\mathbf{x}_{2}+\textbf{S}_{2}\textbf{P}_{2}+\textbf{Q}_{2}).

But

S1R1𝐱1+S1P1+Q1\displaystyle\textbf{S}_{1}\textbf{R}_{1}\mathbf{x}_{1}+\textbf{S}_{1}\textbf{P}_{1}+\textbf{Q}_{1} =R11R1𝐱1+R11P1+(R11P1)\displaystyle=\textbf{R}_{1}^{-1}\textbf{R}_{1}\mathbf{x}_{1}+\textbf{R}_{1}^{-1}\textbf{P}_{1}+(-\textbf{R}_{1}^{-1}\textbf{P}_{1})
=𝐱1+0=𝐱1,\displaystyle=\mathbf{x}_{1}+0=\mathbf{x}_{1},

and

S2R2𝐱2\displaystyle\textbf{S}_{2}\textbf{R}_{2}\mathbf{x}_{2} +S2P2+Q2=M(S1)M(R1)𝐱2+M(S1)(ψ(R1)+A(P1))+(ψ(S1)+A(Q1))\displaystyle+\textbf{S}_{2}\textbf{P}_{2}+\textbf{Q}_{2}=M(\textbf{S}_{1})M(\textbf{R}_{1})\mathbf{x}_{2}+M(\textbf{S}_{1})(\psi(\textbf{R}_{1})+A(\textbf{P}_{1}))+(\psi(\textbf{S}_{1})+A(\textbf{Q}_{1}))
=M(R11)M(R1)𝐱2+M(R11)ψ(R1)+M(R11)A(P1)+ψ(R11)+A(R11P1)\displaystyle=M(\textbf{R}_{1}^{-1})M(\textbf{R}_{1})\mathbf{x}_{2}+M(\textbf{R}_{1}^{-1})\psi(\textbf{R}_{1})+M(\textbf{R}_{1}^{-1})A(\textbf{P}_{1})+\psi(\textbf{R}_{1}^{-1})+A(-\textbf{R}_{1}^{-1}\textbf{P}_{1})
=M(R11R1)𝐱2+ψ(R11R1)+A(R11P1)+A(R11P1)\displaystyle=M(\textbf{R}_{1}^{-1}\textbf{R}_{1})\mathbf{x}_{2}+\psi(\textbf{R}_{1}^{-1}\textbf{R}_{1})+A(\textbf{R}_{1}^{-1}\textbf{P}_{1})+A(-\textbf{R}_{1}^{-1}\textbf{P}_{1})
=M(𝐈k)𝐱2+ψ(𝐈k)+A(R11P1R11P1)\displaystyle=M(\mathbf{I}_{k})\mathbf{x}_{2}+\psi(\mathbf{I}_{k})+A(\textbf{R}_{1}^{-1}\textbf{P}_{1}-\textbf{R}_{1}^{-1}\textbf{P}_{1})
=𝐱2+𝟎pk+1+A(𝟎k)\displaystyle=\mathbf{x}_{2}+\mathbf{0}_{p-k+1}+A(\mathbf{0}_{k})
=𝐱2+𝟎pk+1+𝟎pk+1\displaystyle=\mathbf{x}_{2}+\mathbf{0}_{p-k+1}+\mathbf{0}_{p-k+1}
=𝐱2.\displaystyle=\mathbf{x}_{2}.

Thus the transformation indexed by (R11,P1)(\textbf{R}_{1}^{-1},-\textbf{P}_{1}) is the inverse of that indexed by (R1,P1)(\textbf{R}_{1},\textbf{P}_{1}). That concludes the proof that G0G_{0} is a transformation group.

We now proceed, as outlined in Subsection 6.3, to find the analogues of the Pi function in Buckingham’s theory, which in our extension of that theory are coordinates of the maximal invariant under the transformation group G0G_{0}. To that end we seek that transformation for which g1x1(x1)=P10g_{1\textbf{x}_{1}}(\textbf{x}_{1})=\textbf{P}_{10} i.e. x1=g1x11(P10)=S1x1P10+Q1x1\textbf{x}_{1}=g_{1\textbf{x}_{1}}^{-1}(\textbf{P}_{10})=\textbf{S}_{1\textbf{x}_{1}}\textbf{P}_{10}+\textbf{Q}_{1\textbf{x}_{1}} for an appropriate S1x1\textbf{S}_{1\textbf{x}_{1}} and Q1x1\textbf{Q}_{1\textbf{x}_{1}}, where S1g(R,P)(x1)=RS1x1\textbf{S}_{1g_{(R,P)}(\textbf{x}_{1})}=R\textbf{S}_{1\textbf{x}_{1}} and Q1g(R,P)(x1)=RQ1x1+P\textbf{Q}_{1g_{(R,P)}(\textbf{x}_{1})}=R\textbf{Q}_{1\textbf{x}_{1}}+P. It follows that P10=S1x11(x1Q1x1)\textbf{P}_{10}=\textbf{S}_{1\textbf{x}_{1}}^{-1}(\textbf{x}_{1}-\textbf{Q}_{1\textbf{x}_{1}}) for a designated fixed origin P10\textbf{P}_{10} in the range of X1\textbf{X}_{1}. Dimensional consistency calls for the transformation of x2\textbf{x}_{2} by the g2g_{2} that complements the g1g_{1} found in the previous paragraph, the one indexed by (S1x11,S1x11Q1x1)(\textbf{S}_{1\textbf{x}_{1}}^{-1},\leavevmode\nobreak\ -\textbf{S}_{1\textbf{x}_{1}}^{-1}\textbf{Q}_{1\textbf{x}_{1}}). If we invoke the invariance principle, we may thus transform x=(x1,x2)\textbf{x}=(\textbf{x}_{1},\textbf{x}_{2}) to

(𝝅1x,𝝅2x),(\bm{\pi}_{1x},\bm{\pi}_{2x}),

where 𝝅1x=P10\bm{\pi}_{1x}=\textbf{P}_{10} and 𝝅2x=M(S1x11)x2+ψ(S1x11)\bm{\pi}_{2x}=M(\textbf{S}_{1\textbf{x}_{1}}^{-1})\textbf{x}_{2}+\psi(\textbf{S}_{1\textbf{x}_{1}}^{-1}) is the maximal invariant. Certainly it is invariant. Now we need to show there exists (S,Q)(S^{*},Q^{*}) such that x=g(S,Q)(y)\textbf{x}=g_{(S^{*},Q^{*})}(\textbf{y}) when (π1x,π2x)=(π1y,π2y)(\pi_{1x},\pi{2x})=(\pi_{1y},\pi{2y}). So suppose (π1x,π2x)=(π1y,π2y)(\pi_{1x},\pi{2x})=(\pi_{1y},\pi{2y}). We claim that x1=g(S,Q)y1\textbf{x}_{1}=g_{(S^{*},Q^{*})}\textbf{y}_{1}, and hence x2=g(M(S),ψ(S))y2\textbf{x}_{2}=g_{(M(S^{*}),\psi(S^{*}))}\textbf{y}_{2}, where S=S1x1S1y11S^{*}=S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1} and Q=(S1x1S1y11Q1y1+Q1x1)Q^{*}=-(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}Q_{1\textbf{y}_{1}}+Q_{1\textbf{x}_{1}}).

Proof. Assume below that M1(X)=M(X1)M^{-1}(X)=M(X^{-1}).

π1x=π1y\displaystyle\pi_{1x}=\pi_{1y}
S1x11(x1Q1x1)\displaystyle\iff S_{1\textbf{x}_{1}}^{-1}(\textbf{x}_{1}-Q_{1\textbf{x}_{1}}) =S1y11(y1Q1y1)\displaystyle=S_{1\textbf{y}_{1}}^{-1}(\textbf{y}_{1}-Q_{1\textbf{y}_{1}})
x1\displaystyle\iff\textbf{x}_{1} =S1x1S1y11(y1Q1y1)+Q1x1\displaystyle=S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}(\textbf{y}_{1}-Q_{1\textbf{y}_{1}})+Q_{1\textbf{x}_{1}}
=S1x1S1y11y1S1x1S1y11Q1y1+Q1x1\displaystyle=S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}\textbf{y}_{1}-S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}Q_{1\textbf{y}_{1}}+Q_{1\textbf{x}_{1}}
=Sy1+Q\displaystyle=S^{*}\textbf{y}_{1}+Q^{*}
=g(S,Q)y1\displaystyle=g_{(S^{*},Q^{*})}\textbf{y}_{1}
π2x\displaystyle\pi_{2x} =π2y\displaystyle=\pi_{2y}
M(S1x11)x2+ψ(S1x11)\displaystyle\iff M(S_{1\textbf{x}_{1}}^{-1})\textbf{x}_{2}+\psi(S_{1\textbf{x}_{1}}^{-1}) =M(S1y11)y2+ψ(S1y11)\displaystyle=M(S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+\psi(S_{1\textbf{y}_{1}}^{-1})
x2\displaystyle\iff\textbf{x}_{2} =M1(S1x11)M(S1y11)y2+M1(S1x11)(ψ(S1y11)ψ(S1x11))\displaystyle=M^{-1}(S_{1\textbf{x}_{1}}^{-1})M(S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M^{-1}(S_{1\textbf{x}_{1}}^{-1})(\psi(S_{1\textbf{y}_{1}}^{-1})-\psi(S_{1\textbf{x}_{1}}^{-1}))
=M(S1x1)M(S1y11)y2+M(S1x1)(ψ(S1y11)ψ(S1x11))\displaystyle=M(S_{1\textbf{x}_{1}})M(S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})(\psi(S_{1\textbf{y}_{1}}^{-1})-\psi(S_{1\textbf{x}_{1}}^{-1}))
=M(S1x1S1y11)y2+M(S1x1)ψ(S1y11)M(S1x1)ψ(S1x11)\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{x}_{1}}^{-1})
=M(S1x1S1y11)y2+M(S1x1)ψ(S1y11)(ψ(S1x1S1x11)+ψ(S1x1))\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-(\psi(S_{1\textbf{x}_{1}}S_{1\textbf{x}_{1}}^{-1})+\psi(S_{1\textbf{x}_{1}}))
=M(S1x1S1y11)y2+M(S1x1)ψ(S1y11)(ψ(I)ψ(S1x1))\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-(\psi(I)-\psi(S_{1\textbf{x}_{1}}))
=M(S1x1S1y11)y2+M(S1x1)ψ(S1y11)0+ψ(S1x1)\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-0+\psi(S_{1\textbf{x}_{1}})
=M(S1x1S1y11)y2+ψ(S1x1S1y11)\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+\psi(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})
=g(M(S),ψ(S))y2.\displaystyle=g_{(M(S^{*}),\psi(S^{*}))}\textbf{y}_{2}.

Thus the proof is complete.

Example 16.

The linear regression model is one of the most famous models in statistics: y1×1=βx(p1)×1y^{1\times 1}=\beta\textbf{x}^{(p-1)\times 1}. Shen (2015) shows using dimensional analysis that this model is inappropriate when all the variables are on a ratio–scale. Instead in that case the right hand side should be the product of powers of P–functions of the coordinates of x. But this section shows how to handle the case where the variables are regarded as interval–valued. The Pi functions would then be combinations of the xx coordinates depending on the units of measurement of yy and those.

To more specific we begin by defining for every x𝒳\textbf{x}\in{\cal X}, a gxG0g_{\textbf{x}}\in G_{0} such that gx(x)=(g1x(x1),g2x(x2))=(𝝅x11×k,𝝅x21×(pk+1))g_{\textbf{x}}(\textbf{x})=(g_{1\textbf{x}}(\textbf{x}_{1}),g_{2\textbf{x}}(\textbf{x}_{2}))=(\bm{\pi}_{\textbf{x}1}^{1\times k},\bm{\pi}_{\textbf{x}2}^{1\times(p-k+1)}), where [𝝅x1]=[1k][\bm{\pi}_{\textbf{x}1}]=[\textbf{1}_{k}], [𝝅x2]=[1(pk+1)][\bm{\pi}_{\textbf{x}2}]=[\textbf{1}_{(p-k+1)}] and in general 1r\textbf{1}_{r} denotes the vector of dimension rr, all of whose elements are 11, representing generically the unit on the coordinate’s scale. For the regression example the final coordinate in 𝐱2{\bf x}_{2} is xp=Yx_{p}=Y. It then follows from the above analysis in the notation used there that where

𝝅1x=P10\bm{\pi}_{1x}=\textbf{P}_{10}

and

𝝅2x=M(S1x11)x2+ψ(S1x11)\bm{\pi}_{2x}=M(\textbf{S}_{1\textbf{x}_{1}}^{-1})\textbf{x}_{2}+\psi(\textbf{S}_{1\textbf{x}_{1}}^{-1})

is the non–dimensionalized maximal invariant. The distribution of 𝝅2X\bm{\pi}_{2X} then determines the nondimensionalized regression model. We omit the details for brevity.

Appendix D Foundations of statistical modelling

After a sample 𝐗p×n=𝐱{\bf X}^{p\times n}={\bf x} is selected, a statistical inquiry is expected to lead to a decision d(𝐱)=ad({\bf x})=a chosen from an action space 𝒜{\cal A}. That action may be chosen by a randomized decision rule, i.e. a probability distribution δ(D;𝐱)\delta(D;{\bf x}) for events D𝒜D\subset{\cal A}. A nonrandomized rule d(𝐱)d({\bf x}) would then correspond to a degenerate probability distribution for δ({d(𝐱)};𝐱)=1\delta(\{d({\bf x})\};{\bf x})=1.

The decision would be based on the loss function L(a,λ)L(a,\lambda) or rather the expected loss function called the risk function

r(δ,λ)=L(a,λ)δ(da;𝐱)Pλ(d𝐱).r(\delta,\lambda)=\int L(a,\lambda)\delta(da;{\bf x})P_{\lambda}(d{\bf x}).

The minimax criterion is commonly used to determine the optimal decision rule as

δminimax=argminδmaxλr(δ,λ).\delta_{minimax}=\mathop{\mathrm{argmin}}_{\delta}\max_{\lambda}r(\delta,\lambda).

An alternative is the Bayes rule where, given a prior distribution Π\Pi for λ\lambda, the Bayesian decision rule is found by minimizing

R(δ)=L(a,λ)δ(da;𝐱)Pλ(d𝐱)Π(λ).R(\delta)=\int L(a,\lambda)\delta(da;{\bf x})P_{\lambda}(d{\bf x})\Pi(\lambda).

That prior distribution will be usually indexed by a hyperparameter vector υ\upsilon lying in a hyperparameter space Υ\Upsilon.

Acknowledgements

We are indebted to Professor George Bluman of the Department of Mathematics at the University of British Columbia (UBC) for his helpful discussions on dimensional analysis. Our gratitude goes to Professor Douw Steyn of the Earth and Ocean Sciences Department, also at UBC, for introducing the second author to the Unconscious Statistician. We also thank Yongliang (Vincent) Zhai, former Masters student of the last two authors of this paper, for contributing to Example 4 and for his work in his Masters thesis that inspired much of this research. Finally we acknowledge the key role played by the Forest Products Stochastic Modelling Group centered at UBC and funded by the a combination of FPInnovations and the Natural Sciences and Engineering Research Council of Canada through a Collaborative Research and Development Grant. It was the work of that Group that sparked our interest in the problems addressed in this paper. The research of the last two authors was also supported via the Discovery Grant program of the Natural Sciences and Engineering Research Council of Canada.

References

  • Aczél, Roberts and Rosenbaum (1986) {barticle}[author] \bauthor\bsnmAczél, \bfnmJános\binitsJ., \bauthor\bsnmRoberts, \bfnmFred S\binitsF. S. and \bauthor\bsnmRosenbaum, \bfnmZangwill\binitsZ. (\byear1986). \btitleOn scientific laws without dimensional constants. \bjournalJournal of Mathematical Analysis and Applications \bvolume119 \bpages389–416. \endbibitem
  • Adragni and Cook (2009) {barticle}[author] \bauthor\bsnmAdragni, \bfnmKofi P\binitsK. P. and \bauthor\bsnmCook, \bfnmR Dennis\binitsR. D. (\byear2009). \btitleSufficient dimension reduction and prediction in regression. \bjournalPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences \bvolume367 \bpages4385–4405. \endbibitem
  • Albrecht et al. (2013) {barticle}[author] \bauthor\bsnmAlbrecht, \bfnmMark C\binitsM. C., \bauthor\bsnmNachtsheim, \bfnmChristopher J\binitsC. J., \bauthor\bsnmAlbrecht, \bfnmThomas A\binitsT. A. and \bauthor\bsnmCook, \bfnmR Dennis\binitsR. D. (\byear2013). \btitleExperimental design for engineering dimensional analysis. \bjournalTechnometrics \bvolume55 \bpages257–270. \endbibitem
  • Arelis (2020) {bphdthesis}[author] \bauthor\bsnmArelis, \bfnmRodr\́@@lbibitem{}\NAT@@wrout{5}{}{}{}{(5)}{}\lx@bibnewblockguez Alexi Gilberto\binitsR. A. G. (\byear2020). \btitleHow to improve prediction accuracy in the analysis of computer experiments, exploitation of low-order effects and dimensional analysis, \btypePhD thesis, \bpublisherUniversity of British Columbia. \endbibitem
  • Arrow et al. (1961) {barticle}[author] \bauthor\bsnmArrow, \bfnmKenneth J\binitsK. J., \bauthor\bsnmChenery, \bfnmHollis B\binitsH. B., \bauthor\bsnmMinhas, \bfnmBagicha S\binitsB. S. and \bauthor\bsnmSolow, \bfnmRobert M\binitsR. M. (\byear1961). \btitleCapital-labor substitution and economic efficiency. \bjournalThe Review of Economics and Statistics \bpages225–250. \endbibitem
  • Baiocchi (2012) {barticle}[author] \bauthor\bsnmBaiocchi, \bfnmGiovanni\binitsG. (\byear2012). \btitleOn dimensions of ecological economics. \bjournalEcological Economics \bvolume75 \bpages1–9. \endbibitem
  • Basu (1958) {barticle}[author] \bauthor\bsnmBasu, \bfnmD\binitsD. (\byear1958). \btitleOn statistics independent of sufficient statistics. \bjournalSankhyā: The Indian Journal of Statistics \bpages223–226. \endbibitem
  • Bluman and Cole (1974) {bbook}[author] \bauthor\bsnmBluman, \bfnmG W\binitsG. W. and \bauthor\bsnmCole, \bfnmJ D\binitsJ. D. (\byear1974). \btitleSimilarity Methods for Differential Equations. \bseriesApplied Mathematical Sciences. \bpublisherSpringer-Verlag. \endbibitem
  • Box and Cox (1964) {barticle}[author] \bauthor\bsnmBox, \bfnmGeorge EP\binitsG. E. and \bauthor\bsnmCox, \bfnmDavid R\binitsD. R. (\byear1964). \btitleAn analysis of transformations. \bjournalJournal of the Royal Statistical Society. Series B (Methodological) \bvolume26 \bpages211–252. \endbibitem
  • Bridgman (1931) {bbook}[author] \bauthor\bsnmBridgman, \bfnmP. W.\binitsP. W. (\byear1931). \btitleDimensional Analysis, Revised Edition. \bpublisherYale University Press, \baddressNew Haven. \endbibitem
  • Buckingham (1914) {barticle}[author] \bauthor\bsnmBuckingham, \bfnmEdgar\binitsE. (\byear1914). \btitleOn physically similar systems; illustrations of the use of dimensional equations. \bjournalPhysical Review \bvolume4 \bpages345–376. \endbibitem
  • Chilarescu and Viasu (2012) {barticle}[author] \bauthor\bsnmChilarescu, \bfnmConstantin\binitsC. and \bauthor\bsnmViasu, \bfnmIoana\binitsI. (\byear2012). \btitleDimensions and logarithmic function in economics: A comment. \bjournalEcological Economics \bvolume75 \bpages10–11. \endbibitem
  • Cohen et al. (2004) {barticle}[author] \bauthor\bsnmCohen, \bfnmAaron J\binitsA. J., \bauthor\bsnmAnderson, \bfnmH Ross\binitsH. R., \bauthor\bsnmOstro, \bfnmBart\binitsB., \bauthor\bsnmPandey, \bfnmK Dev\binitsK. D., \bauthor\bsnmKrzyzanowski, \bfnmMichal\binitsM., \bauthor\bsnmKünzli, \bfnmNino\binitsN., \bauthor\bsnmGutschmidt, \bfnmKersten\binitsK., \bauthor\bsnmPope III, \bfnmC Arden\binitsC. A., \bauthor\bsnmRomieu, \bfnmIsabelle\binitsI., \bauthor\bsnmSamet, \bfnmJonathan M\binitsJ. M. \betalet al. (\byear2004). \btitleUrban air pollution. \bjournalComparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors \bvolume2 \bpages1353–1433. \endbibitem
  • De Oliveira, Kedem and Short (1997) {barticle}[author] \bauthor\bsnmDe Oliveira, \bfnmV.\binitsV., \bauthor\bsnmKedem, \bfnmB.\binitsB. and \bauthor\bsnmShort, \bfnmD. A.\binitsD. A. (\byear1997). \btitleBayesian prediction of transformed Gaussian random fields. \bjournalJournal of the American Statistical Association \bvolume92 \bpages1422–1433. \endbibitem
  • Dou, Le and Zidek (2007) {btechreport}[author] \bauthor\bsnmDou, \bfnmYP\binitsY., \bauthor\bsnmLe, \bfnmND\binitsN. and \bauthor\bsnmZidek, \bfnmJV\binitsJ. (\byear2007). \btitleA dynamic linear model for hourly ozone concentrations \btypeTechnical Report No. \bnumber228, \bpublisherStatistics Department, University of British Columbia. \endbibitem
  • Draper and Cox (1969) {barticle}[author] \bauthor\bsnmDraper, \bfnmN R\binitsN. R. and \bauthor\bsnmCox, \bfnmD R\binitsD. R. (\byear1969). \btitleOn distributions and their transformation to normality. \bjournalJournal of the Royal Statistical Society: Series B (Methodological) \bvolume31 \bpages472–476. \endbibitem
  • Eaton (1983) {bbook}[author] \bauthor\bsnmEaton, \bfnmMorris L\binitsM. L. (\byear1983). \btitleMultivariate Statistics: a Vector Space Approach. \bpublisherJohn Wiley & Sons, Inc., 605 Third Ave., New York, NY 10158, USA, 1983, 512. \endbibitem
  • Faraway (2015) {bbook}[author] \bauthor\bsnmFaraway, \bfnmJulian J\binitsJ. J. (\byear2015). \btitleLinear models with R: Second Edition. \bpublisherChapman and Hall/CRC. \endbibitem
  • Finney (1977) {barticle}[author] \bauthor\bsnmFinney, \bfnmDJ\binitsD. (\byear1977). \btitleDimensions of statistics. \bjournalJournal of the Royal Statistical Society, Series C (Applied Statistics) \bvolume26 \bpages285–289. \endbibitem
  • Foschi and Yao (1986) {binproceedings}[author] \bauthor\bsnmFoschi, \bfnmR. O.\binitsR. O. and \bauthor\bsnmYao, \bfnmF. Z.\binitsF. Z. (\byear1986). \btitleAnother look at three duration of load models. In \bbooktitleProceedings, XVII IUFRO Congress. \endbibitem
  • Fourier (1822) {bbook}[author] \bauthor\bsnmFourier, \bfnmJoseph\binitsJ. (\byear1822). \btitleThéorie Analytique de la Chaleur, par M. Fourier. \bpublisherChez Firmin Didot, Père et Fils. \endbibitem
  • Friedmann, Gillis and Liron (1968) {barticle}[author] \bauthor\bsnmFriedmann, \bfnmM\binitsM., \bauthor\bsnmGillis, \bfnmJ\binitsJ. and \bauthor\bsnmLiron, \bfnmN\binitsN. (\byear1968). \btitleLaminar flow in a pipe at low and moderate Reynolds numbers. \bjournalApplied Scientific Research \bvolume19 \bpages426–438. \endbibitem
  • Gelman et al. (2014) {bbook}[author] \bauthor\bsnmGelman, \bfnmAndrew\binitsA., \bauthor\bsnmCarlin, \bfnmJohn B\binitsJ. B., \bauthor\bsnmStern, \bfnmHal S\binitsH. S., \bauthor\bsnmDunson, \bfnmDavid B\binitsD. B., \bauthor\bsnmVehtari, \bfnmAki\binitsA. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear2014). \btitleBayesian Data Analysis; Third edition. \bpublisherCRC press. \endbibitem
  • Gibbings (2011) {bbook}[author] \bauthor\bsnmGibbings, \bfnmJ. C.\binitsJ. C. (\byear2011). \btitleDimensional Analysis. \bpublisherSpringer Verlag, \baddressLondon. \endbibitem
  • Hand (1996) {barticle}[author] \bauthor\bsnmHand, \bfnmDavid J\binitsD. J. (\byear1996). \btitleStatistics and the theory of measurement. \bjournalJournal of the Royal Statistical Society. Series A (Statistics in Society) \bvolume159 \bpages445–492. \endbibitem
  • Härdle and Vogt (2015) {barticle}[author] \bauthor\bsnmHärdle, \bfnmWolfgang Karl\binitsW. K. and \bauthor\bsnmVogt, \bfnmAnnette B.\binitsA. B. (\byear2015). \btitleLadislaus von Bortkiewicz—Statistician, Economist and a European Intellectual. \bjournalInternational Statistical Review \bvolume83 \bpages17-35. \bdoi10.1111/insr.12083 \endbibitem
  • Hoffmeyer and Sørensen (2007) {barticle}[author] \bauthor\bsnmHoffmeyer, \bfnmPreben\binitsP. and \bauthor\bsnmSørensen, \bfnmJohn Dalsgaard\binitsJ. D. (\byear2007). \btitleDuration of load revisited. \bjournalWood Science and Technology \bvolume41 \bpages687–711. \endbibitem
  • Kiefer et al. (1957) {barticle}[author] \bauthor\bsnmKiefer, \bfnmJack\binitsJ. \betalet al. (\byear1957). \btitleInvariance, minimax sequential estimation, and continuous time processes. \bjournalThe Annals of Mathematical Statistics \bvolume28 \bpages573–601. \endbibitem
  • Köhler and Svensson (2002) {binproceedings}[author] \bauthor\bsnmKöhler, \bfnmJochen\binitsJ. and \bauthor\bsnmSvensson, \bfnmStaffan\binitsS. (\byear2002). \btitleProbabilistic modelling of duration of load effects in timber structures. In \bbooktitleProceedings of the 35th Meeting, International Council for Research and Innovation in Building and Construction, Working Commission W18–Timber Structures, CIB-W18, Paper \bvolume35-17 \bpages1. \endbibitem
  • Kovera (2010) {bmisc}[author] \bauthor\bsnmKovera, \bfnmMargaret Bull\binitsM. B. (\byear2010). \btitleEncyclopedia of research design. \endbibitem
  • Lehmann and Romano (2010) {bbook}[author] \bauthor\bsnmLehmann, \bfnmErick L\binitsE. L. and \bauthor\bsnmRomano, \bfnmJoseph P\binitsJ. P. (\byear2010). \btitleTesting Statistical Hypotheses. \bpublisherSpringer. \endbibitem
  • LibreTexts (2019) {bmisc}[author] \bauthor\bsnmLibreTexts (\byear2019). \btitleThe Ideal Gas Law. \bhowpublishedhttps://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_(Physical_and_Theoretical_Chemistry)/Physical_Properties_of_Matter/States_of_Matter/Properties_of_Gases/Gas_Laws/The_Ideal_Gas_Law. \bnoteAccessed 01/04/2020. \endbibitem
  • Lin and Shen (2013) {barticle}[author] \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. and \bauthor\bsnmShen, \bfnmWeijie\binitsW. (\byear2013). \btitleComment: some statistical concerns on dimensional analysis. \bjournalTechnometrics \bvolume55 \bpages281–285. \endbibitem
  • Luce (1959) {barticle}[author] \bauthor\bsnmLuce, \bfnmR Duncan\binitsR. D. (\byear1959). \btitleOn the possible psychophysical laws. \bjournalPsychological Review \bvolume66 \bpages81. \endbibitem
  • Luce (1964) {barticle}[author] \bauthor\bsnmLuce, \bfnmR Duncan\binitsR. D. (\byear1964). \btitleA generalization of a theorem of dimensional analysis. \bjournalJournal of Mathematical Psychology \bvolume1 \bpages278–284. \endbibitem
  • Magnello (2009) {barticle}[author] \bauthor\bsnmMagnello, \bfnmM Eileen\binitsM. E. (\byear2009). \btitleKarl Pearson and the establishment of mathematical statistics. \bjournalInternational Statistical Review \bvolume77 \bpages3–29. \endbibitem
  • Matta et al. (2010) {barticle}[author] \bauthor\bsnmMatta, \bfnmChérif F\binitsC. F., \bauthor\bsnmMassa, \bfnmLou\binitsL., \bauthor\bsnmGubskaya, \bfnmAnna V\binitsA. V. and \bauthor\bsnmKnoll, \bfnmEva\binitsE. (\byear2010). \btitleCan one take the logarithm or the sine of a dimensioned quantity or a unit? Dimensional analysis involving transcendental functions. \bjournalJournal of Chemical Education \bvolume88 \bpages67–70. \endbibitem
  • Mayumi and Giampietro (2010) {barticle}[author] \bauthor\bsnmMayumi, \bfnmKozo\binitsK. and \bauthor\bsnmGiampietro, \bfnmMario\binitsM. (\byear2010). \btitleDimensions and logarithmic function in economics: A short critical analysis. \bjournalEcological Economics \bvolume69 \bpages1604–1609. \endbibitem
  • Mayumi and Giampietro (2012) {barticle}[author] \bauthor\bsnmMayumi, \bfnmKozo\binitsK. and \bauthor\bsnmGiampietro, \bfnmMario\binitsM. (\byear2012). \btitleResponse to dimensions and logarithmic function in economics: A comment. \bjournalEcological Economics \bvolume75 \bpages12–14. \endbibitem
  • Meinsma (2019) {barticle}[author] \bauthor\bsnmMeinsma, \bfnmGjerrit\binitsG. (\byear2019). \btitleDimensional and scaling analysis. \bjournalSIAM review \bvolume61 \bpages159–184. \endbibitem
  • Mills (1995) {barticle}[author] \bauthor\bsnmMills, \bfnmIan M\binitsI. M. (\byear1995). \btitleDimensions of logarithmic quantities. \bjournalJournal of Chemical Education \bvolume72 \bpages954. \endbibitem
  • Molyneux (1991) {barticle}[author] \bauthor\bsnmMolyneux, \bfnmP.\binitsP. (\byear1991). \btitleThe dimensions of logarithmic quantities: implications for the hidden concentration and pressure units in pH values, acidity constants, standard thermodynamic functions, and standard electrode potentials. \bjournalJournal of Chemical Education \bvolume68 \bpages467. \endbibitem
  • Mosteller and Tukey (1977) {bbook}[author] \bauthor\bsnmMosteller, \bfnmFrederick\binitsF. and \bauthor\bsnmTukey, \bfnmJohn Wilder\binitsJ. W. (\byear1977). \btitleData Analysis and Regression: A Second Course in Statistics. \bpublisherAddison-Wesley Series in Behavioral Science: Quantitative Methods. \endbibitem
  • Joint Commitee on Guides in Metrology (2012) {bmisc}[author] \bauthor\bsnmJoint Commitee on Guides in Metrology (\byear2012). \btitle200: 2012 — International Vocabulary of Metrology Basic and General Concepts and Associated Terms (VIM). \endbibitem
  • Paganoni (1987) {barticle}[author] \bauthor\bsnmPaganoni, \bfnmL\binitsL. (\byear1987). \btitleOn a functional equation concerning affine transformations. \bjournalJournal of Mathematical Analysis and Applications \bvolume127 \bpages475–491. \endbibitem
  • Pigou, Friedman and Georgescu-Roegen (1936) {barticle}[author] \bauthor\bsnmPigou, \bfnmArthur C\binitsA. C., \bauthor\bsnmFriedman, \bfnmMilton\binitsM. and \bauthor\bsnmGeorgescu-Roegen, \bfnmNicholas\binitsN. (\byear1936). \btitleMarginal utility of money and elasticities of demand. \bjournalThe Quarterly Journal of Economics \bvolume50 \bpages532–539. \endbibitem
  • Shen (2015) {bphdthesis}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. (\byear2015). \btitleDimensional analysis in statistics: theories, methodologies and applications, \btypePhD thesis, \bpublisherThe Pennsylvania State University. \endbibitem
  • Shen and Lin (2018) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2018). \btitleA conjugate model for dimensional analysis. \bjournalTechnometrics \bvolume60 \bpages79–89. \endbibitem
  • Shen and Lin (2019) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2019). \btitleStatistical theories for dimensional analysis. \bjournalStatistica Sinica \bvolume29 \bpages527–550. \endbibitem
  • Shen et al. (2014) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW., \bauthor\bsnmDavis, \bfnmTim\binitsT., \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. and \bauthor\bsnmNachtsheim, \bfnmChristopher J\binitsC. J. (\byear2014). \btitleDimensional analysis and its applications in statistics. \bjournalJournal of Quality Technology \bvolume46 \bpages185–198. \endbibitem
  • Stevens (1946) {barticle}[author] \bauthor\bsnmStevens, \bfnmStanley Smith\binitsS. S. (\byear1946). \btitleOn the theory of scales of measurement. \bjournalScience \bvolume103 \bpages677-680. \endbibitem
  • Stevens (1951) {barticle}[author] \bauthor\bsnmStevens, \bfnmStanley Smith\binitsS. S. (\byear1951). \btitleMathematics, measurement, and psychophysics. \bjournalHandbook of Experimental Psychology \bpages1-49. \endbibitem
  • Taylor (2018) {barticle}[author] \bauthor\bsnmTaylor, \bfnmBarry N\binitsB. N. (\byear2018). \btitleQuantity calculus, fundamental constants, and SI units. \bjournalJournal of Research of the National Institute of Standards and Technology \bvolume123 \bpages123008. \endbibitem
  • Velleman and Wilkinson (1993) {barticle}[author] \bauthor\bsnmVelleman, \bfnmPaul F\binitsP. F. and \bauthor\bsnmWilkinson, \bfnmLeland\binitsL. (\byear1993). \btitleNominal, ordinal, interval, and ratio typologies are misleading. \bjournalThe American Statistician \bvolume47 \bpages65–72. \endbibitem
  • Vignaux and Scott (1999) {barticle}[author] \bauthor\bsnmVignaux, \bfnmVA\binitsV. and \bauthor\bsnmScott, \bfnmJL\binitsJ. (\byear1999). \btitleTheory & methods: Simplifying regression models using dimensional analysis. \bjournalAustralian & New Zealand Journal of Statistics \bvolume41 \bpages31–41. \endbibitem
  • Ward (2017) {barticle}[author] \bauthor\bsnmWard, \bfnmLawrence M\binitsL. M. (\byear2017). \btitleSS Stevens’s invariant legacy: scale types and the power law. \bjournalAmerican Journal of Psychology \bvolume130 \bpages401–412. \endbibitem
  • Wijsman (1967) {binproceedings}[author] \bauthor\bsnmWijsman, \bfnmRobert A\binitsR. A. (\byear1967). \btitleCross-sections of orbits and their application to densities of maximal invariants. In \bbooktitleProc. Fifth Berkeley Symp. on Math. Stat. and Prob \bvolume1 \bpages389–400. \endbibitem
  • Wikipedia (2020) {bmisc}[author] \bauthor\bsnmWikipedia (\byear2020). \btitleTranscendental function. \bhowpublishedhttps://en.wikipedia.org/wiki/Transcendental_function. \bnoteAccessed 2020/02/24. \endbibitem
  • Wong and Zidek (2018) {barticle}[author] \bauthor\bsnmWong, \bfnmSamuel WK\binitsS. W. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2018). \btitleDimensional and statistical foundations for accumulated damage models. \bjournalWood science and technology \bvolume52 \bpages45–65. \endbibitem
  • Yang and Lin (2021) {barticle}[author] \bauthor\bsnmYang, \bfnmChing-Chi\binitsC.-C. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2021). \btitleA note on selection of basis quantities for dimensional analysis. \bjournalQuality Engineering \bvolume33 \bpages240–251. \endbibitem
  • Yang, Zidek and Wong (2018) {barticle}[author] \bauthor\bsnmYang, \bfnmChun-Hao\binitsC.-H., \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. and \bauthor\bsnmWong, \bfnmSamuel WK\binitsS. W. (\byear2018). \btitleBayesian analysis of accumulated damage models in lumber reliability. \bjournalTechnometrics. \endbibitem
  • Zhai et al. (2012a) {btechreport}[author] \bauthor\bsnmZhai, \bfnmYongliang\binitsY., \bauthor\bsnmPirvu, \bfnmCiprian\binitsC., \bauthor\bsnmHeckman, \bfnmNancy\binitsN., \bauthor\bsnmLum, \bfnmConroy\binitsC., \bauthor\bsnmWu, \bfnmLang\binitsL. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2012a). \btitleA review of dynamic duration of load models for lumber strength \btypeTechnical Report No. \bnumber270, \bpublisherDepartment of Statistics, University of British Columbia. \endbibitem
  • Zhai et al. (2012b) {btechreport}[author] \bauthor\bsnmZhai, \bfnmYongliang\binitsY., \bauthor\bsnmHeckman, \bfnmNancy\binitsN., \bauthor\bsnmLum, \bfnmConroy\binitsC., \bauthor\bsnmPirvu, \bfnmCiprian\binitsC., \bauthor\bsnmWu, \bfnmlang\binitsl. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2012b). \btitleStochastic models for the effects of duration of load on lumber properties \btypeTechnical Report No. \bnumber271, \bpublisherDepartment of Statistics, University of British Columbia. \endbibitem
  • Zidek (1969) {barticle}[author] \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear1969). \btitleA representation of Bayes invariant procedures in terms of Haar measure. \bjournalAnnals of the Institute of Statistical Mathematics \bvolume21 \bpages291–308. \endbibitem