Dimensional Analysis in Statistical Modelling

Tae Yoon Lee label=e1]dlxodbs@student.ubc.ca [ James V Zidek label=e2]jim@stat.ubc.ca label=u1 [[ url]www.stat.ubc.ca/ jim Nancy Heckman label=e3]nancy@stat.ubc.ca label=u2 [[ url]www.stat.ubc.ca/ nancy University of British Columbia Department of Statistics
University of British Columbia
2207 Main Mall
Vancouver, BC
Canada V6T 1Z4
.

(2020)

Abstract

Building on recent work in statistical science, the paper presents a theory for modelling natural phenomena that unifies physical and statistical paradigms based on the underlying principle that a model must be non-dimensionalizable. After all, such phenomena cannot depend on how the experimenter chooses to assess them. Yet the model itself must be comprised of quantities that can be determined theoretically or empirically. Hence, the underlying principle requires that the model represents these natural processes correctly no matter what scales and units of measurement are selected. This goal was realized for physical modelling through the celebrated theories of Buckingham and Bridgman and for statistical modellers through the invariance principle of Hunt and Stein. Building on recent research in statistical science, the paper shows how the latter can embrace and extend the former. The invariance principle is extended to encompass the Bayesian paradigm, thereby enabling an assessment of model uncertainty. The paper covers topics not ordinarily seen in statistical science regarding dimensions, scales, and units of quantities in statistical modelling. It shows the special difficulties that can arise when models involve transcendental functions, such as the logarithm which is used e.g. in likelihood analysis and is a singularity in the family of Box-Cox family of transformations. Further, it demonstrates the importance of the scale of measurement, in particular how differently modellers must handle ratio- and interval-scales.

62A01, 00A71, 97F70,

Buckingham Pi-theorem,

statistical invariance principle,

Box–Cox transformation,

logarithmic transformation,

nondimensionalization,

dimensional analysis,

keywords:

[class=MSC2020]

keywords:

^†^†volume: 0^†^†issue: 0

\startlocaldefs\startlocaldefs\endlocaldefs\endlocaldefs

t1The research reported in this paper was supported by grants from the Natural Science and Engineering Research Council of Canada.

1 Introduction

The importance of dimension, scale and units of measurement in modelling is largely ignored in statistical modelling. In fact, an anonymous reviewer stated:

“Generally speaking, statisticians treat data as already dimensionless by taking the numeric part, which is equivalent to dividing them by their own units…”

Others have long recognized the role of dimensions, and hence of their scales and units of measurement, in modelling; dimensions can be used to simplify model fitting by reducing the number of quantities involved to a few non-dimensionalized ones. A principal contribution of this paper makes clear to statisticians the importance of dimensions, scales and units in statistical modelling. We also develop a framework that incorporates these important ideas via a statistical invariance approach. This allows us to extend the existing work’s focus on physical quantities, which by their nature must lie on ratio-scales, to other scales and to vector spaces for multivariate responses. Yet another contribution is addressing a number of issues that are crucial in laying the foundation for the extension. These include: adopting a sampling distribution supported on an interval-scale when the actual support for the sampling distribution is a subset of a ratio-scale; the meaninglessness of applying transcendental transformations such as the logarithm to quantities with units.

This paper, which is partly expository, describes and contributes to the unification of two overlapping paradigms for modelling natural phenomena. For simplicity we will refer to these as statistical and what Meinsma (2019) calls physical. Physical models are usually deterministic and developed for a specific phenomenon.

In this approach, model development cannot ignore the dimensions, scales and units of measurement on which empirical implementation and assessment will be based. Indeed, Buckingham (1914) believed that a valid model cannot depend on how an investigator chooses to measure the quantities that describe the phenomena of interest. After all, nature cannot know what the measuring methods of science are. Consequently, Buckingham suggested that any valid model must be nondimensionalizable, leading him to his celebrated Pi-theorem. In contrast, Bridgman (1931) argued that models must depend on the measurements but must be invariant under rescaling. Based on the latter premise he was able to derive the invariant “ $\pi$ -functions” of the measurements that were central to Buckingham’s theory. The work of these pioneers spurred the development of what is now known as dimensional analysis (DA), its notions of dimensional homogeneity (DH) and its quantity calculus, explored in depth in Section 3.

The following example renders the ideas above in a more concrete form.

Example 1.

Newton’s second law of motion exemplifies the physical approach to modelling:

a=F\leavevmode\nobreak\ M^{-1}.

(1.1)

Here $a$ denotes acceleration, the second derivative with respect to time of the location of an object computed with respect to the starting point of its motion. $F$ and $M$ are, respectively, the force acting on the object and its mass. The model in Equation (1.1) satisfies the fundamental requirement of DH – the units on the left hand side are the same as the units on the right hand side. Moreover, all three of the quantities in the model lie on a ratio-scale i.e. they are inherently positive, having a structural $0$ for an origin when and where the motion began.

The work of Buckingham and Bridgman cited above implies the quantities in a valid model have to be transformable to dimensionless alternatives called $\pi$ -functions. In the case of Newton’s law, we can use $M$ and $F$ to transform $a$ into a dimensionless quantity to get the simpler but mathematically equivalent model involving a single $\pi$ -function:

\pi(a,F,M)\equiv aF^{-1}M=1.

(1.2)

In contrast to physical modelling that commonly takes a bottom-up approach, that of statistics as a discipline was top-down (Magnello, 2009) when, in the latter part of the nineteenth century, Karl Pearson established mathematical statistics with its focus on abstract classes of statistical models. Pearson’s view freed the statistician from dealing with the demanding contextual complexities of specific applications. In his abstract formulation, Pearson was able to: incorporate model uncertainty expressed probabilistically; define desirable model properties; determine conditions under which these occur; and develop practical tools to implement models that possess those properties for a specific application. The emphasis on mathematical foundations led inevitably to an almost total disregard of the constraints brought on by dimension, scale and units of measurement. Thus statisticians often use a symbol like $X$ to mean a number to be manipulated in a formal analysis in equations, models and transformations. On the other hand, scientists use such a symbol to represent some specific aspect of a natural phenomenon or process. The scientist’s goal is to characterize $X$ through a combination of basic principles and empirical analysis. This leads to the need to specify one or more “dimensions” of $X$ , e.g. length $L$ . That in turn leads to the need to specify an appropriate “scale” for $X$ , e.g. categorical, ordinal, interval or ratio. For interval- and ratio-scales, $X$ would have some associated units of measurement depending on the nature and resolution of the device making the measurement. How all of these parts of $X$ fit together is the subject addressed in the realms of measurement theory and DA.

In recent years, importance of dimensions, scales, and units of measurements has been progressively recognized in statistics. At a deep theoretical level, Hand (1996) considers different interpretations of measurement, studying what things can be measured and how numbers can be assigned to measurements. On a more practical side, inspired by applied scientists, Finney (1977) demonstrates how the principle of DH can be used to assess the validity of a statistical model. The first application of DA in statistical modeling appears in the work of Vignaux and Scott (1999), who develop a framework for applying DA to linear regression models. The practicality of DA is illustrated to a great extent in design of experiments by Albrecht et al. (2013). While much has been written in this area by nonstatisticians, such as Luce (1964), surprisingly little has been written by statisticians.

A natural statistical approach to these ideas is via the statistical invariance principle due to Hunt and Stein in unpublished work (Lehmann and Romano, 2010, Chapter 6). Despite the abstraction of Pearson’s approach they articulated an important principal of modelling- that when a test of a hypothesis about a natural phenomenon based on a sample of measurements rejects the null hypothesis, that decision should not change if the data were transformed to a different scale, e.g. from Celsius (interval scale) to Fahrenheit (ratio scale). That led them to the statistical invariance principle: methods and models must transform coherently under measurement scale transformations.

However, the link between DA and the statistical invariance does not seem to have been recognized until the work of Shen, Lin, and his co-investigators (Lin and Shen, 2013; Shen et al., 2014; Shen, 2015; Shen and Lin, 2018, 2019). They develop a theoretical framework for applying DA to statistical analysis and the design of experiments while showcasing the practical benefits through numerous examples involving physical quantities. In their framework, Buckingham’s Pi-theorem plays the key role in unifying DA and the statistical invariance. In our paper, we extend their work in two ways: (1) elucidating the link between DA and the statistical invariance by removing the dependency on Buckingham’s Pi-theorem and (2) in doing so, freeing ourselves from restricting to modelling physical quantities and ultimately embedding scientific modeling within a stochastic modeling framework in a general setting.

This paper considers issues that arise when $X$ lies on an interval-scale with values on the entire real line and when $X$ lies on a ratio-scale, that is, with non-negative values and a real origin $0$ having a meaning of “nothingness”. A good example to keep in mind is the possible scales of temperature; it can be measured on the ratio-scale in units of degrees Kelvin ( ${}^{\circ}K$ ), where $0^{\circ}K$ means all molecular momentum is lost. Alternatively, it can be measured on the interval-scale in units of degrees Celsius ( ${}^{\circ}C$ ) where $0^{\circ}C$ means water freezes. The same probabilistic model cannot be used for both scales although often, in practice, the Gaussian model for the interval-scale is used as an approximation for the ratio-scale.

We conclude this Introduction with a summary of the paper’s contents and findings. Section 2 introduces us to the Unconscious Statistician through examples that illustrate the importance of units of measurement. That demonstrated importance of units then leads us into Section 3, which is largely a review of the basic elements of DA, a subject taught in the physical sciences but rarely in statistics. We describe a key tool, quantity calculus, which is the algebra of units of measurement and DH.

In Section 4, we discuss problems that might arise when statisticians transform variables. Sometimes the transformation leads to an unintended change of scale, e.g. when a Gaussian distribution on $(-\infty,\infty)$ is adopted as an approximation to a distribution on $(0,\infty)$ . This can matter a lot when populations with responses on a ratio-scale are being compared. We discuss when such an approximation and hence transformation may be justified. Even using the famous family of Box-Cox transformations can cause problems, in particular with the limiting case logarithmic transformation.

Having investigated units and scales, we then turn to the models themselves. It turns out that when restricted by the need for DH, the class of models is also restricted; that topic is explored in Section 5 where we review the famous Buckingham Pi-theorem. We also see for the first time the ratio-scale’s cousin, the interval-scale. All this points to the need for a general approach to scientific modelling that was foreseen in Hunt and Stein’s unpublished work on the famous invariance principle.

Section 6 gets us to the major contribution of the paper, namely extending the work of recent vintage by statistical scientists (Shen and his co-investigators) on the invariance principle as developed in its classical setting, the frequentist paradigm, and applied principally to variables on a ratio-scale. Our work constitutes a major extension of that scientific modelling paradigm.

In particular, Section 6.5 extends the invariance principle and moves our new approach to scientific modelling to the Bayesian paradigm. This major extension of both the scientific and statistical modelling approaches allows for quantities that could represent uncertain parameters, thereby embedding uncertainty quantification directly into the modeling paradigm. Indeed our new paradigm enables model uncertainty itself to be incorporated .

The paper wraps up with discussion in Section 7 and concluding remarks in Section 8. The supplementary material includes additional discussion, in particular a summary of the controversy of ancient vintage about whether or not taking the logarithm of measurements with units is valid, how Buckingham’s theory leads us to the famous Reynolds number, a general theory for handling data on an interval-scale, and finally a brief review of statistical decision analysis for Section 6.5.

2 The unconscious statistician

We start with critically examining key issues surrounding the topics of dimension and measurement scales through the Unconscious Statistician. We present three examples that illustrate some of the issues we’ll be exploring.

Example 2.

Ignoring scale and units of measurement when creating models can lead to difficulties; we cannot ignore the distinction between numbers and measurements. Consider the Poisson random variable $X$ . The claim is often made that the expected value and variance of $X$ are equal. But if $X$ has units, as it did when the distribution was first introduced in 1898 as the number of horse kick deaths in a year in the Prussian army (Härdle and Vogt, 2015), then clearly, the expectation and variance will have different units and therefore cannot be equated.

Example 3.

Consider a random variable representing length in millimetres, $Y$ is normally distributed with mean $\mu$ and variance $\sigma^{2}$ , independently measured $n$ times to yield data $y_{1},\ldots,y_{n}$ . Assume, as is common, that $\mu$ is so large that there is a negligible chance that any of the $y_{i}$ ’s are negative (we return to this common assumption in Section 4).

Then the maximum likelihood estimate (MLE) of $\mu$ is easily shown to be the sample average $\bar{y}$ and the MLE of $\sigma^{2}$ is then maximizer of the profile likelihood

L(\sigma^{2})=(\sigma^{2})^{-n/2}\leavevmode\nobreak\ \exp\left\{-n\tilde{\sigma}^{2}/(2\sigma^{2})\right\}

(2.1)

where $\tilde{\sigma}^{2}=\sum_{1}^{n}(y_{i}-\bar{y})^{2}/n$ , which has units of mm². That estimate for $\sigma^{2}$ is the MLE of $\sigma^{2}$ , which is easily shown by differentiating $L(\sigma^{2})$ with respect to $\sigma^{2}$ and setting the result equal to zero. We note that, by any sensible definition of unit arithmetic, $\tilde{\sigma}^{2}/\sigma^{2}$ is unitless and so the units of $L(\sigma^{2})$ are mm^-n.

The Unconscious Statistician simplifies the maximization of $L$ by maximizing instead, its logarithm, believing that this alternative approach yields the same result. . The statistician finds the log of $L$ to be

l(\sigma^{2})=\ln\left[(\sigma^{2})^{-n/2}\leavevmode\nobreak\ \exp\left\{-n\tilde{\sigma}^{2}/(2\sigma^{2})\right\}\right]=-\frac{n}{2}\left[\ln(\sigma^{2})+\tilde{\sigma}^{2}/\sigma^{2}\right].

Since the second term is unitless, dimensional homogeneity implies that the first term $\ln(\sigma^{2})$ must also be unitless. So where did the units go? Analyses in Subsection 4.4 suggest the units reduce to a unitless $1$ by constructive processes for the logarithm that define it. The result is $\ln(\sigma^{2})=\ln(\{\sigma^{2}\})$ , the curly brackets demarcating the numerical part of $\sigma^{2}$ , gotten by dropping the units of measurement. But $\sigma^{2}$ itself has units mm² and it seems unsettling to have them disappear simply by taking the logarithm.

However, the Unconscious Statistician ultimately gets the correct answer by failing to recognize the distinction between the scales of $\{\sigma^{2}\}$ and $\sigma^{2}$ So the derivative, which represents the relative rate of change between quantities on different scales, is computed as $d\leavevmode\nobreak\ \ln(\{\sigma^{2}\})/d\sigma^{2}=$ mm ${}^{-2}d\leavevmode\nobreak\ \ln(\{\sigma^{2}\})/d\{\sigma^{2}\}$ rather than $d\leavevmode\nobreak\ \ln(\{\sigma^{2}\})/d\{\sigma^{2}\}$ . This then restores the missing units in the final result. As a fringe benefit, the second derivative $d^{2}\ln(\{\sigma^{2}\})/d^{2}\sigma^{2}$ , whose inverse defines Fisher’s information, also turns out to have the appropriate units.

However, the story does not end there. The problem of logarithms and their units warrants further discussion such as that in Subsection 4.4. That discussion indicates that calculating the logarithm of the likelihood is, in general, not sensible.

Remark 1.

In the frequentist paradigm for statistical modelling, the likelihood is defined by the sampling distribution, which depends on the stopping rule employed in collecting the sample. The likelihood function then becomes an equivalence class. The likelihood ratio can then be used to specify a member of that class. In Example 3 a reference normal likelihood could be used with $\sigma^{2}$ set to a substantively meaningful $\sigma_{0}^{2}$ . The MLE of $\mu$ and $\sigma^{2}$ would then maximize this relative likelihood. This leads again to $\hat{\mu}=\bar{y}$ , but now the MLE of $\sigma^{2}$ is found by maximizing the unitless $L(\sigma^{2})/L(\sigma_{0}^{2})$ :

\frac{L(\sigma^{2})}{L(\sigma_{0}^{2})}=\left(\frac{\sigma^{2}}{\sigma_{0}^{2}}\right)^{-n/2}\exp\left\{-\frac{n\tilde{\sigma}^{2}}{2\sigma_{0}^{2}}\left[\left(\frac{\sigma_{0}^{2}}{\sigma^{2}}\right)-1\right]\right\}.

We can now maximize this ratio as a function of the unitless $t=\sigma^{2}/\sigma_{0}^{2}$ , by taking logarithms, differentiating with respect to $t$ , setting it equal to 0 and solving for $\hat{t}=\tilde{\sigma}^{2}/\sigma_{0}^{2}$ , and so finding that $\hat{\sigma}^{2}=44.2$ mm².

Two complimentary, unconscious choices in Example 3 lead ultimately to a correct MLE. Things don’t go so well for two unconscious statisticians seen in the next example.

Example 4.

Here, the data are assumed to follow the model that relates $Y_{i}$ , a length, to $t_{i}$ , a time:

Y_{i}=1+\theta t_{i}+\epsilon_{i},i=1,\leavevmode\nobreak\ \ldots,2n.

Here the $\epsilon_{i}$ ’s are independent and identically distributed as a $N(0,\sigma^{2})$ for a known $\sigma$ . Suppose that $t_{1}=\cdots=t_{n}=1$ hour while $t_{n+1}=\cdots=t_{2n}=2$ hours. Let $\bar{Y}_{1}=\sum_{i=1}^{n}Y_{i}/n$ , and $\bar{Y}_{2}=\sum_{i=n+1}^{2n}Y_{i}/n$ . An analysis might go as follows when two statisticians A and B get involved.

First they both compute the likelihood and learn that the MLE is found by minimizing the sum of squared residuals $SSR(\theta)$ :

SSR(\theta)=\sum_{i=1}^{2n}[Y_{i}-1-\theta t_{i}]^{2}

which gives the MLE of $\theta$ ,

\hat{\theta}=\frac{\sum_{i=1}^{2n}t_{i}(Y_{i}-1)}{\sum_{i=1}^{2n}t_{i}^{2}}=\frac{n\bar{Y}_{1}+2n\bar{Y}_{2}-3n}{5n}=\frac{\bar{Y}_{1}+2\bar{Y}_{2}-3}{5}.

Then for prediction at time $t=1$ hour, they get

\widehat{Y}=1+\hat{\theta}\times 1=1+\frac{\bar{Y}_{1}+2\bar{Y}_{2}-3}{5}.

Suppose that $\bar{Y}_{1}=1$ foot, or 12 inches, and $\bar{Y}_{2}=3$ feet, or 36 inches. Statistician A uses feet and predicts $Y$ at time $t=1$ hour to be

\widehat{Y}_{A}=1+\frac{1+2\times 3-3}{5}=1.8\leavevmode\nobreak\ {\rm{feet}}=21.6\leavevmode\nobreak\ {\rm{inches}}.

But Statistician B uses inches and predicts $Y$ at $t=1$ hour to be

\widehat{Y}_{B}=1+\frac{12+2\times 36-3}{5}=17.2\leavevmode\nobreak\ {\rm{inches}}.

What has gone wrong here? The problem is that the stated model implicitly depends on the units of measurement. For instance, the numerical value of the expectation of $Y_{i}$ when $t_{i}=0$ is equal to 1, no matter what the units of $Y_{i}$ . When $t_{i}=0$ , Statistician A expects $Y_{i}$ to equal 1 foot and Statistician B expects $Y_{i}$ to equal 1 inch. We can see that the problem arises because the equation defining the model does not satisfy DH, since the “1” is unitless. In technical terms, we would say that this model is not invariant under scalar transformations. Invariance is important when defining a model that involves units. However, one could simply avoid the whole problem of units in model formulation by constructing the relationship between $Y_{i}$ and $t_{i}$ so that there are no units. This is exactly the goal of the Buckingham Pi-theorem, presented in Subsection 5.1.

3 Dimensional analysis

Key to unifying the work on scales of measurement and the statistical invariance principle is DA. DA has a long history, beginning with the discussion of dimension and measurement (Fourier, 1822). Since DA is key to the description of a natural phenomenon, DA lies at the root of physical modelling. A phenomenon’s description begins with the phenomenon’s features, each of which has a dimension, e.g. ‘mass’ ( $M$ ) in physics or ‘utility’ ( $U$ ) in economics. Each dimension is assigned a scale e.g. ‘categorical’, ‘ordinal’, ‘ratio’, or ‘interval’, a choice that might be dictated by practical as well as intrinsic considerations. Once the scales are chosen, each feature is mapped into a point on its scale. For a quantitative scale, the mapping will be made by measurement or counting, for a qualitative scale, by assignment of classification. Units of measurement may be assigned as appropriate for quantitative scales, depending on the metric chosen. For example, temperature might be measured on the Fahrenheit scale, the Kelvin scale or the Celsius scale. This paper will be restricted to quantitative features, more specifically those features on ratio- and interval-scales.

3.1 Foundations

One tenet of working with measured quantities is that units in an expression or equation must “match up”; relationships among measurable quantities require dimensional homogeneity. To check the validity of comparative statements about two quantities, $X_{1}$ and $X_{2}$ , such as $X_{1}=X_{2}$ , $X_{1}<X_{2}$ or $X_{1}>X_{2}$ , $X_{1}$ and $X_{2}$ must be the same dimension, such as time. To add $X_{1}$ to $X_{2}$ , $X_{1}$ and $X_{2}$ must not only be the same dimension but must also be on the same scale and expressed in the same units of measurement.

To discuss this explicitly, we use a standard notation (Joint Commitee on Guides in Metrology, 2012) and write a measured quantity $X$ as $X=\{X\}[X]$ , where $\{X\}$ is the numerical part of $X$ . $[X]$ may be read as the dimension of $X$ e.g. $[X]=L$ for length, for instance, or the units of $X$ on the chosen scale of measurement, e.g. $[X]=cm$ . The latter by its nature means that the dimension is $L$ . If $[X]=[1]$ , then we say that $X$ is unitless or dimensionless. We define 1 or any number to be unitless, i.e., $1={1}[1]$ , unless stated explicitly.

To develop an algebra for measured quantities, for a function $f$ we must say what we mean by $\{f(X)\}$ (usually easy) and $[f(X)]$ (sometimes challenging). The path is clear for $f$ a simple function. For example, consider $f(X)=X^{2}$ . Clearly we must have $X^{2}=\{X\}^{2}[X]^{2}$ , yielding, say, (3 inches) ${}^{2}=9$ inches². But what if $f$ is a more complex function? This issue will be discussed in general in Subsection 4.2 and in detail for $f(x)=\ln(x)$ in Subsection 4.4.

For simple functions, the manipulation of both numbers and units is governed by an algebra of rules referred to as quantity calculus (Taylor, 2018). This set of rules states that $x$ and $y$

•

can be added, subtracted or compared if and only if $[x]=[y]$ ;
•

can always be multiplied to get $xy=\{xy\}[xy]$ where $\{xy\}=\{x\}\{y\}$ and $[xy]=[x][y]$ ;
•

can always be divided when $\{x\}\neq 0$ to get $y/x=\{y/x\}[y/x]$ where $\{y/x\}=\{y\}/\{x\}$ and $[y/x]=[y]/[x]$ ;

and that

•

$x$ can be raised to a power that is a rational fraction $\gamma$ , provided that the result is not an imaginary number, to get $x^{\gamma}=\{x\}^{\gamma}[x]^{\gamma}$ .

Thus it makes sense to square-root transform ozone $O_{3}=\{O_{3}\}$ parts per million (ppm) as $\{O_{3}\}^{1/2}$ ppm^1/2 since ozone is measured on a ratio-scale with a true origin of $0$ and hence must be non–negative (Dou, Le and Zidek, 2007). These rules can be applied iteratively a finite number of times to get expressions that are combinations of products of quantities raised to powers, along with sums and rational functions of such expressions.

This subsection concludes with an example that demonstrates the use of DH and quantity calculus.

Example 5.

This example concerns a structural engineering model for lumber strength now called the “Canadian model”(Foschi and Yao, 1986). Here $\alpha(t)$ is dimensionless and represents the somewhat abstract quantity of the damage accumulated to a piece of lumber by time $t$ . When $\alpha(t)=1$ , the piece of lumber breaks. This is the only time when $\alpha(t)$ is observed. The Canadian model posits that $\dot{\alpha}$ , the derivative of $\alpha$ with respect to time, satisfies

\dot{\alpha}(t)=a[\tau(t)-\sigma_{0}\tau_{s}]_{+}^{b}\leavevmode\nobreak\ +\leavevmode\nobreak\ c[\tau(t)-\sigma_{0}\tau_{s}]_{+}^{n}\leavevmode\nobreak\ \alpha(t),

(3.1)

where $a$ , $b$ , $c$ , $n$ and $\sigma_{0}$ are log-normally distributed random effects for an individual specimen of lumber, $\tau(t)$ , measured in pounds per square inch (psi), is the stress applied to the specimen cumulative to time $t$ , $\tau_{s}$ (in psi) is the specimen’s short term breaking strength if it had experienced the stress pattern $\tau(t)=kt$ for a fixed known $k$ (in psi per unit of time), and $\sigma_{0}$ is the unitless stress ratio threshold. The expression $[t]_{+}$ is equal to $t$ if $t$ is non-negative and is equal to 0 otherwise. Let $T_{F}$ denote the random time to failure for the specimen, under the specified stress history curve, meaning $\alpha(T_{F})=1$ .

As has been noted (Köhler and Svensson, 2002; Hoffmeyer and Sørensen, 2007; Zhai et al., 2012a; Wong and Zidek, 2018), this model is not dimensionally homogeneous. In particular, the units associated with both terms on the right hand side of Equation (3.1) involve random powers, $b$ and $n$ , leading to random units, respectively (psi)^b and (psi)ⁿ. As noted by Wong and Zidek (2018), the coefficients $a$ and $c$ in Equation (3.1) cannot involve these random powers and so cannot compensate to make the model dimensionally homogeneous.

Rescaling is a formal way of addressing this problem. Zhai et al. (2012a) rescale by setting $\pi(t)=\tau(t)/\tau_{s}$ . They let $\mu$ denote the population mean of $\tau_{s}$ and write a modified equation of (3.1) as the dimensionally homogeneous model

\mu\dot{\alpha}(t)=a^{*}[\pi(t)-\sigma_{0}]_{+}^{b}\leavevmode\nobreak\ +\leavevmode\nobreak\ c^{*}[\pi(t)-\sigma_{0}]_{+}^{n}\leavevmode\nobreak\ \alpha(t).

In contrast, Wong and Zidek (2018) propose another dimensionally homogeneous model

\displaystyle\mu\dot{\alpha}(t)

\displaystyle=

\displaystyle[(\tilde{a}\tau_{s})(\pi(t)-\sigma_{0})_{+}]^{b}\leavevmode\nobreak\ +\leavevmode\nobreak\ [(\tilde{c}\tau_{s})(\pi(t)-\sigma_{0})_{+}]^{n}\leavevmode\nobreak\ \alpha(t),

where $\tilde{a}$ and $\tilde{c}$ are now random effects with dimensions Force ${}^{-1}\cdot$ Length².

We see that there may be several ways to nondimensionalize a model. Another method, widely used in the physical sciences, involves always normalizing by the standard units specified by the Système International d’Unités (SIU), units such as meters or kilograms. So when the dimensions of a non–negative quantity $X$ like absolute temperature have an associated SIU of $Q_{0}=\{1\}[Q_{0}]$ , $X$ can be converted to a unitless quantity by first expressing $X$ in SIUs and then by using quantity calculus to rescale it as $X/Q_{0}$ . The next example provides an important illustration of the application of the standardized unit approach.

Example 6.

Liquids contain both hydrogen and hydroxide ions. In pure water these ions appear in equal numbers. But the water becomes acidic when there are more hydrogen ions and basic when there are more hydroxide ions. Thus acidity is measured by the concentration of these ions. The customary measurement is in terms of the hydrogen ion concentration, denoted $H^{+}$ and measured in the SIU of one mole of ions per litre of liquid. These units are denoted c^o and thus, in our notation, $[H^{+}]=$ c^o. However for substantive reasons, the pH index for the acidity of a liquid is now used to characterize acidity. The index is defined by $pH=-\log_{10}(H^{+}/$ c ${}^{o})$ . Distilled water has a $pH=7$ while lemon juice has a $pH$ level of about $3$ . Note that $\{H^{+}\}\in(0,\infty)$ lies on a ratio-scale while $pH$ lies on an interval-scale $(-\infty,\infty)$ – the transformation has changed the scale of measurement.

3.2 The problem of scales

The choice of scale restricts the choice of units of measurement, and these units dictate the type of model that may be used. However, comparing the size of two quantities on a ratio-scale must be made using their ratio, not their difference, whereas the opposite is true on an interval-scale where differences must be used.

Thus we need to study scales in the context of model building and hence in the context of quantity calculus. In his celebrated paper, Stevens (1946) starts by proposing four major scales for measurements or observations: categorical, ordinal, interval and ratio. This taxonomy is based on the notion of permissible transformations as is the work of our Section 5. However, our work is aimed at modelling while Stevens’s work is aimed at statistical analysis. Stevens defines permissible transformations as follows. He allows permutations as the transformations of data on all four scales, allows strictly increasing transformations of data on the ordinal, ratio and interval-scales, allows scalar transformations ( $f(x)=ax$ ) of data on the ratio and interval-scales and allows linear transformations ( $f(x)=ax+b$ ) of data on the interval-scale.

Stevens created his taxonomy as a basis for classifying the family of all statistical procedures for their applicability in any given situation (Stevens, 1951). And for instance, Luce (1959) points out that, for measurements made on a ratio-scale, the geometric mean would be appropriate for estimating the central tendency of a population distribution (Velleman and Wilkinson, 1993). In contrast, when measurements are made on an interval-scale the arithmetic mean would be appropriate. The work of Stevens seems to be well-accepted in the social sciences, with Ward (2017) calling his work monumental. But Steven’s work is not widely recognized in statistics. Velleman and Wilkinson (1993) reviews the work of Stevens with an eye on potential applications in the then emerging area in statistics of artificial intelligence (AI), hoping to automate data analysis. They claim that “Unfortunately, the use of Stevens’s categories in selecting or recommending statistical analysis methods is inappropriate and can often be wrong.” They describe alternative scale taxonomies for statistics that have been proposed, notably by Mosteller and Tukey (1977). A common concern centers on the inadequacies of an automaton to select the statistical method for an AI application. Even the choice of scale itself will depend on the nature of the inquiry and thus is something to be determined by humans. For example, length might be observed on the relatively uninformative ordinal scale $\{short,\leavevmode\nobreak\ medium\leavevmode\nobreak\ ,long\}$ , were it sufficient for the intended goal of a scientific inquiry, rather than on the seemingly more natural ratio-scale $(0,\infty)$ .

3.3 Why the origin matters

The interval-scale of real numbers allows for the taking of sums, differences, products, ratios, and integer powers of values observed on that scale. Rational powers of nonnegative values are also allowed although irrational powers lead into the domain of transcendental functions and difficulties of interpretation. The same operations are allowed for a ratio-scale of real numbers provided that the differences are non-negative. So superficially, these two scales seem nearly the same.

But there is a substantial qualitative difference between ratio- and interval-scales, so ignoring the importance of scale when building models can result in challenges in interpretation. The issue has to do with the meaning of the $0$ on a ratio-scale. The next hypothetical example illustrates the point.

Example 7.

When the material in the storage cabinet at a manufacturing facility has been depleted, the amount left is $0$ . To understand the usefulness of this origin, consider if the facility’s inventory monitoring program recorded a drop of $100kg$ during the past month. Without the knowledge of the origin, of where the amount of inventory lies on the scale, the implications of this drop are unclear. If the amount left in the facility is $99,900kg$ the drop means one thing, while if the amount left is $50kg$ , the interpretation would be completely different.

Since the amount of inventory lies on the ratio-scale, these changes should instead be reported using ratios. The recorder in the facility in the first case would report a decline in inventory of $100/100,000=0.001$ or $0.1\%$ . In the second case, the recorder would report a decline of $100/150=2/3$ or $66.7\%$ , the same drop but with a totally different meaning. This example explains why stock price changes are reported on a ratio-scale, as a percentage, and not on an interval-scale.

4 Transforming quantities

The scale of the measurement $X$ may be transformed in a variety of ways. No change of scale occurs when the transformation is a rescaling, where we know how to transform both the numerical part of $X$ and $X$ ’s units of measurement. When the transformation is complex, the scale itself might change. For instance, if $X$ is measured on a ratio-scale, then the logarithm of $X$ will be on an interval-scale. Observe that in Example 6, the units of measurement in $H^{+}$ were eliminated before transforming by the transcendental function $\log_{10}$ . That raises the question: do we need to eliminate units before applying the logarithm? This question and the logarithmic transformation in science have led to vigorous debate for over six decades (Matta et al., 2010). We highlight and resolve some of that debate below in Section 4.4.

However, we begin with an even simpler situation seen in the next subsection, where we study the issues that may arise when interval-scales are superimposed on ratio-scales.

4.1 Switching scales

This subsection concerns a perhaps unconscious switch in a statistical analysis from a ratio scale, which lies on $[0,\infty)$ , to an interval-scale, which lies on $(-\infty,\infty)$ .

The bell curve approximation.

Despite the fundamental difference between the ratio-scale and interval-scales, the normal approximation is often used to approximate the sampling distribution for a ratio-valued response quantity. This in effect replaces the ratio-scale with an interval-scale. In this situation, what should be used is the truncated normal distribution approximation, although this introduces undesired complexity. For example, if the approximation for the cumulative distribution function CDF were $P(X\leq x\mid X>0)$ where $X\sim N(\mu,\sigma^{2}$ , we would have

E(Z\mid X>0)=\frac{\phi(\mu/\sigma)}{\Phi(\mu/\sigma)}

(4.1)

where $Z=(X-\mu)/\sigma$ , while $\phi$ and $\Phi$ denote respectively, the standardized Gaussian distribution’s probability density function and CDF. Observe that in Equation (4.1) is invariant under changes of the units in which $X$ is measured, as it should be. Furthermore the expectation of $Z$ would be approximately $0$ if $\mu/\sigma$ were large compared to $0$ as it would be were the non-truncated Gaussian distribution for an interval-scale imposed on this ratio-scale. This would occur if the mean $\mu$ were much larger than the standard distribution $\sigma$ . That suggests the bell curve approximation would not work well as an approximation were the population under investigation widely dispersed. For example it might be satisfactory if $X$ represented the height of a randomly selected adult woman, but not if it were the the height of a randomly selected human female.

As mentioned at the beginning of this section, this switch occurs when approximating a distribution. This switch is ubiquitous and seen in most elementary statistics textbooks. The assumed Gaussian distribution model leads to the sample average as a measurement of the population average instead of the geometric mean, which should have been used (Luce, 1959). That same switch is made in such things as regression analysis and the design of experiments. The seductive simplicity has also led to the widespread use of the Gaussian process in spatial statistics and machine learning.

The justification of the widespread use of the Gaussian approximation may well lie in the belief that the natural origin $0$ of the ratio-scale lies well below the range of values of $X$ likely to be found in a scientific study. This may well be the explanation of the reliance on interval-scales for temperature in Celsius and Fahrenheit on planet Earth at least since one would not expect to see temperatures anywhere near the true origin of temperature $0^{\circ}K$ on the Kelvin scale (ratio) that corresponds to $-273^{\circ}C$ on the Celsius scale (interval). We would note in passing that these two interval-scales for temperature also illustrate the statistical invariance principle (see Subsection 6.4); each scale is an affine transformation of the other.

We illustrate the difficulties that can arise when an interval-scale is misused in a hypothetical experiment where measurements are made on a ratio-scale, with serious consequences.

Example 8.

The justification above for the switch from a ratio to an interval-scale can be turned into a simple approximation that may help with the interpretation of the data. To elaborate, suppose interest lies in comparing two values of $X$ , $x_{1}$ and $x_{2}$ , that lie in a ratio-scale with $0<a<x_{1}<x_{2}$ for a known $a$ . Interest lies in the relative size of these quantities i.e. on $r=x_{2}/x_{1}$ . An approximation to $r$ through the first order Taylor expansion of $x_{2}/x_{1}$ at $(x_{1},x_{2})=(a,a)$ yields $r\approx 1+(x_{2}-x_{1})/a$ , thus providing an approximation to $r$ on an interval-scale. For instance, with $a=120cm,\leavevmode\nobreak\ x_{1}=150cm,$ and $x_{2}=180cm$ , the ratio is $r=1.20$ and the approximation, $1.25$ . Both are unitless. This points to the potential value of rescaled ratio data when a Gaussian approximation is to be used for the sampling distribution of a quantity on a ratio-scale.

4.2 Algebraic versus transcendental functions

A function $u$ , which describes the relationship among quantities $X_{1},\dots,X_{p}$ as

u(X_{1},\dots,X_{p})=0,

may be a sequence of transformations or operations involving the $X_{i}$ ’s, possibly combined with parameters. We know how to calculate the resulting units of measurement when $u$ consists of a finite sequence of permissible algebraic operations. The function consisting of the concatenation of such a sequence may formally be defined as a root of a polynomial equation that must satisfy the requirement of dimensional homogeneity (other desirable properties of $u$ along with methods for determining an allowable $u$ are discussed in Section 5). Such a function is called algebraic.

But $u$ may also involve non-algebraic operations leading to non-algebraic functions called transcendental (because they “transcend” an algebraic construction). Examples in the univariate case ( $p=1)$ are $\sin(X)$ and $\cosh(X)$ and, for a given nonnegative constant $\alpha$ , $\alpha^{X}$ and $\log_{\alpha}{(X)}$ . The formal definition of a non-algebraic function does not explicitly say whether or not such a function can be applied to quantities with units of measurement. Bridgman (1931) sidesteps this issue by arguing that it is mute since valid representations of natural phenomena can always be nondimensionalized (see Subsection 5.1). But the current Wikipedia entry on the subject states “transcendental functions are notable because they make sense only when their argument is dimensionless” (Wikipedia, 2020). The next subsection explores the much used class of Box-Cox family (Box and Cox, 1964) that includes transcendental functions.

4.3 The Box-Cox tranformation

Frequently in statistical modelling, a transformation is used to extend the domain of applicability of a procedure that assumes normally distributed measurements (De Oliveira, Kedem and Short, 1997). That transformation may also be seen as a formal part of statistical model building that facilitates maximum likelihood estimation of a single parameter (Draper and Cox, 1969). The Box-Cox (BC) transformations constitute an important class of such transformation and therefore a standard tool in the statistical scientist’s toolbox.

In its simplest form, a member of this family of transformations has the form of a function $bc(X)=X^{\lambda}$ for a real-valued parameter $\lambda\in(-\infty,\infty)$ . Here $X$ would need to lie in $[0,\infty)$ , a ratio-scale, to avoid potential imaginary numbers. However, in practice interval-scales are sometimes allowed, a positive constant being added to avoid negative realizations of $X$ . This ad hoc procedure thus validates the use of a Gaussian distribution to approximate the sampling distribution for $X$ .

Since $X$ is measured on a ratio–scale, for any two points on that scale, $bc(x_{2}/x_{1})\leavevmode\nobreak\ =\leavevmode\nobreak\ bc(x_{2})/bc(x_{1})$ while the scale is equivariant under multiplicative transformations, i.e., $bc(ax)\leavevmode\nobreak\ =\leavevmode\nobreak\ a^{\lambda}\leavevmode\nobreak\ bc(x)$ for any point on that scale. Finally $bc(X)>0$ so that the result of the transformation also lies on a ratio–scale, even when its intended goal is an approximately Gaussian distribution for the (transformed) response.

Box and Cox (1964) actually state their transformation as

bc_{\lambda}(X)=\begin{array}[]{cc}\frac{X^{\lambda}-1}{\lambda},\nobreak\leavevmode&(\lambda\neq 0)\\ \end{array},

(4.2)

that moves the origin of the ratio–scale from $0$ to $-1$ . It is readily seen that unless $\lambda$ is a rational number, $bc_{\lambda},\leavevmode\nobreak\ \lambda\neq 0$ will be transcendental. That fact would be inconsequential in practice in as much as a modeller would only ever use a rational number for $\lambda$ . Or at least that would be the case except that the BC class has been extended to include $\lambda=0$ by admitting for membership $\lim_{\lambda\rightarrow 0}bc_{\lambda}(X)=\ln(X),\leavevmode\nobreak\ X>0$ .

On closer inspection, we see that for validity, in Equation (4.2), the $1$ needs to be replaced by $(1[X])^{\lambda}$ to include units of measurement. Then for $\lambda\neq 0$ the transformation becomes

X_{\lambda}\doteq bc(X)=\frac{X^{\lambda}-1}{\lambda}=\frac{\{X\}^{\lambda}-1^{\lambda}}{\lambda}[X]^{\lambda}.

(4.3)

As $\lambda\rightarrow 0$ , the only tenable limit would seem to be

X_{0}=\ln{(\{X\})}

not $\ln{(X})$ . In other words, in taking logarithms in the above example, the authors may have unconsciously nondimensionalized the measurements. Taking antilogarithms would then return $\{X\}$ , not $X$ .

Equation 4.3 thus tells us the Box–Cox transformation may have the unintended consequence of transforming not only the numerical value of the measurements but also its units of measurement, which become $[X]^{\lambda}$ . This makes a statistical model difficult to interpret. For example imagine the challenge of a model with a random response in units of $mm^{1/100}$ . And since the transformation is nonlinear, returning to the original scales of the data would be difficult. For example, unbiased estimates would become biased and their standard errors could be hard to approximate.

Remark 2.

Box and Cox (1964) do not discuss the issue of scales in relation to the transformation it introduces. In that paper’s second example the logarithmic transformation is applied to $X$ , the number of number of cycles to failure, which may be regarded as unitless.

Remark 3.

The mathematical foundation of the Box–Cox family is quite complicated. Observe that in Equation (4.2), if $\lambda$ is a rational number $m/n$ for some nonnegative integers $m$ and $n$ , $bc_{\lambda}$ will be an algebraic function of $X$ . So as $\lambda$ varies over its domain, $(-\infty,\infty)$ , the function flips back and forth from an algebraic to a transcendental function. For any fixed $m$ and point $x$ , as $n$ approaches infinity, the trajectory of

\{bc_{m/n}(x):n=1,2,\dots,\}

converges to $\ln{x}$ , so the family now includes the logarithmic transformation as another transformation in the statistical analyst’s toolbox, which is used when the response distribution appears to have a long right tail. Thus a transcendental function has been added to the family of algebraic transformations obtained when $\lambda$ is chosen to be a positive rational number. It does not seem to be known if all transcendental transformations lie in the closure of the class algebraic functions under the topology of pointwise convergence. However, when this family is shifted from the domain of mathematical statistics in the human brain to that of computational statistics in the computer’s processor, this complexity disappears. In the computational process, all functions are algebraic and neither the logarithm nor infinity exist.

The importance of the logarithmic transformation in statistical and scientific modelling, and issues that have arisen about its lack of units, leads next to a special subsection devoted to it.

4.4 The logarithm: a transcendental function

Does the logarithm have units?

We have argued (see Example 3) that the answer is “no.” First consider applying the logarithm to a unitless quantity $x$ . It is sensible to think that its value will have no units, and so we take this as fact. But what happens if we apply the logarithm to a quantity with units? For instance, is log(12 inches) = log(12) + log(inches)? This issue has been debated for decades across different scientific disciplines; we summarize recent debates in Appendix A.

We now discuss this issue in more detail and argue that the result must be a unitless quantity. We use the definition of the natural logarithm of $x$ as the area under the curve of the function $f(u)=1/u$ (Molyneux, 1991). We follow the notation defined in Section 3.1, and for clarification, we write “1” as $y\equiv 1[x]$ and $u=\{u\}[x]$ . We then make the change of variables $v=u/y$ so that $v$ is unitless and get

\ln(x)=\int_{y}^{x}\frac{1}{u}\leavevmode\nobreak\ d(u)=\int_{1}^{x/y}\frac{1}{y\leavevmode\nobreak\ v}\leavevmode\nobreak\ d(y\leavevmode\nobreak\ v)=\int_{1}^{x/y}\frac{1}{v}d(v),

(4.4)

which is a unitless quantity, as claimed.

We now derive the more specific result, that $\ln(x)=\ln(\{x\})$ . In other words, applying this transcendental function to a dimensional quantity $x$ simply causes the units to be lost. We show below, from first principles, that for $v$ unitless,

\frac{d\ln(v)}{dv}=\frac{1}{v}.

This implies that

\ln(w)=\int_{1}^{w}\frac{1}{v}\leavevmode\nobreak\ dv,

which, combined with Equation (4.4), implies that $\ln(x)=\ln(\{x\})$ .

To show that the derivative of $\ln(v)$ is $1/v$ , we turn to the original definition of the natural logarithm as the inverse of another transcendental function $\exp(v)$ , at least if $v>0$ . In other words $v=\exp\leavevmode\nobreak\ (\ln v),\leavevmode\nobreak\ v>0.$ The chain rule now tells us that

1=\frac{d\ln(v)}{dv}\exp\leavevmode\nobreak\ (\ln v).

Thus

\frac{d\ln(v)}{dv}=\exp\leavevmode\nobreak\ (-\ln v)=\frac{1}{v}

for any real $v>0$ .

Can we take the logarithm of a dimensional quantity with units?

We argue that the answer is “no.” We reason along the lines of Molyneux (1991), who sensibly argues that, since $\ln x$ has no units even when $x$ has units, the result is meaningless. In other words, since it is disturbing that the value of a function is unitless, no matter what the argument, we should not take the logarithm of a dimensional quantity with units.

This view agrees with that of Meinsma (2019). He notes that the Shannon entropy of a probability density function $f$ , which has units, is defined in terms of the $ln\leavevmode\nobreak\ f(x)$ . He notes that Shannon found this to be an objectionable feature of his entropy but rationalized its use nevertheless. But not Meinsma, who concludes “To me it still does feel right….”

To consider the ramifications of ignoring this in a statistical model, suppose that $z$ is some measure of particulate air pollution in the logarithmic scale with $z=\ln x$ where $x$ is a measurement with units. This measurement appears as $\beta\leavevmode\nobreak\ z$ in a scientific model of the impact of particulate air pollution on health (Cohen et al., 2004). In this model, even though $z$ is unitless, its numerical value depends on the numerical value of $x$ , via $\{z\}=\ln\{x\}$ . Thus the numerical value of $z$ depends on the units of measurement of $x$ . But, since $z$ itself is unitless, we cannot adjust $\beta$ to reflect changes in the units of $x$ . To make this point explicit, suppose that experimental data pointed to the value $\beta=1,101,231.52$ . We have no idea if air pollution was a serious health problem. Thus, we see the problem that arises with a model that involves the logarithm of a measurement with units. This property of the logarithm points to the need to nondimensionalize $x$ before applying the logarithmic transformation in scientific and statistical modelling, in keeping with the theories of Buckingham, Bridgman and Luce.

One of the major routes taken in debates about the validity of applying the natural logarithm to a dimensional quantity involves arguments based one way or another on a Taylor expansion. A key feature of these debates involves the claim that the expansion is impossible, since the terms in the expansion have different units and so cannot be summed (Mayumi and Giampietro, 2010). We show below that this claim is incorrect by showing that all of the terms in the expansion have no units (see Appendix A for more details).

Key to the Taylor expansion argument of validity is how to take the derivative of $\ln x$ when $x$ has units. Recall that above, we calculated the derivative of $\ln v$ for $v$ unitless. To define the derivative of $\ln x$ when $x$ has units, we proceed from first principles. Suppose we have a function $f$ with argument $x=\{x\}[x]$ . We define the derivative of $f$ with respect to $x$ as follows. Let $\Delta=\{\Delta\}[\Delta]$ and $x=\{x\}[x]$ and suppose that $\Delta$ and $x$ have the same units, that is, that $[\Delta]=[x]$ . Otherwise, we would not be able to add $x$ and $\Delta$ in what follows. Then we define

$\displaystyle f^{\prime}(x)$	$\displaystyle\equiv$	$\displaystyle\lim_{\{\Delta\}\to 0}\frac{f(x+\Delta)-f(x)}{\Delta}$

	$\displaystyle=$	$\displaystyle\lim_{\{\Delta\}\to 0}\frac{f(\{x+\Delta\}[x+\Delta])-f(\{x\}[x])}{\{\Delta\}[\Delta]}$

	$\displaystyle=$	$\displaystyle\lim_{\{\Delta\}\to 0}\frac{f(\{x+\Delta\}[x])-f(\{x\}[x])}{\{\Delta\}[x]}.$

For instance, for $f(x)=x^{2}$

	$\displaystyle\frac{d}{dx}x^{2}$	$\displaystyle=$	$\displaystyle\lim_{\{\Delta\}\to 0}\frac{\{x+\Delta\}^{2}[x]^{2}-\{x\}^{2}[x]^{2}}{\{\Delta\}[x]}=\lim_{\{\Delta\}\to 0}\frac{\{x+\Delta\}^{2}-\{x\}^{2}}{\{\Delta\}}\times[x]$
		$\displaystyle=$	$\displaystyle 2\{x\}[x]=2x.$

To use Equation (4.4) to differentiate $f(x)=\ln(x)$ , we first write

\ln(x+\Delta)-\ln x=\ln\{x+\Delta\}-\ln\{x\}.

\frac{d}{dx}\ln x=\lim_{\{\Delta\}\to 0}\frac{\ln\{x+\Delta\}-\ln\{x\}}{\{\Delta\}[x]}=\frac{d\ln\{x\}}{d\{x\}}\times\frac{1}{[x]}=\frac{1}{\{x\}}\frac{1}{[x]}=\frac{1}{x}.

Using this definition of the derivative we can carry out a Taylor series expansion about $x=a>0$ to obtain

\log(x)=\log(a)+\sum_{k=1}^{\infty}g^{(k)}(a)\frac{(x-a)^{k}}{k!},

(4.6)

where

g^{(k)}(a)=\bigg{[}d^{k}\log(x)/dx^{k}\bigg{|}_{x=a}.

As $g^{\prime}(x)=1/x$ , the first term, $g^{\prime}(x)(x-a)$ , in the infinite summation is unitless. Differentiating $g^{\prime}(x)$ yields $g^{\prime\prime}(x)=1/x^{2}$ and once again, we see that the term $g^{\prime\prime}(x)(x-a)^{2}/2$ is unitless. Continuing in this way, we see that the summation on the right side of equation (4.6) is unitless, and so the equation satisfies dimensional homogeneity. This reasoning differs from the incorrect reasoning of Mayumi and Giampietro (2010) in their argument that the logarithm cannot be applied to quantities with units because the terms in the Taylor expansion would have different units. Our reasoning also differs from that of Baiocchi (2012) who uses a different expansion to show that the logarithm cannot be applied to measurements with units, albeit without explicitly recognizing the need for $\ln x$ to be unitless. The expansion in Equation (4.6) is the same as that given in Matta et al. (2010), albeit not in an explicit form for $\ln x$ . Like us, they do discredit the Taylor expansion argument against applying $\ln x$ to quantities with units.

5 Allowable relationships among quantities

Having explored dimensional analysis and the kinds of difficulties that can arise when scales or units are ignored, we turn to a key step in unifying physical and statistical modelling. We now determine how to relate quantities and hence how to specify the ‘law’ that characterizes the phenomenon which is being modelled.

But what models may be considered legitimate? Answers for the sciences, given long ago, were based on the principle that for a model to completely describe a natural phenomenon, it cannot depend on the units of measurement that might be chosen to implement it. This answer was interpreted in two different ways. In the first interpretation, the model must be nondimensionalizable, i.e., it cannot have scales of measurement and hence cannot depend on units. In the second interpretation, the model must be invariant under all allowable transformations of scales. Both of these interpretations reduce the class of allowable relationships that describe the phenomenon being modelled and place restrictions on the complexity of any experiment that might be needed to implement that relationship.

5.1 Buckingham’s Pi-theorem

The section begins with Buckingham’s simple motivating example.

Example 9.

This example is a characterization of properties of gas in a container, namely, a characterization of the relationship amongst the pressure ( $p$ ), the volume ( $v$ ), the number of moles of gas ( $N$ ) and the absolute temperature ( $\theta$ ) of the gas. The absolute temperature reflects the kinetic energy of the system and is measured in degrees Kelvin ( ${}^{\circ}K$ ), the SIUs for temperature. A fundamental relationship amongst these quantities is given by

\frac{pv}{\theta N}-D=0

(5.1)

for some constant $D$ that doesn’t depend on the gas. Since the dimension of $pv/(N\theta)$ is (force $\times$ length³)/(# moles $\times$ temperature), as expressed, the relationship in Equation (5.1) depends on the units associated with $p,v$ and $\theta$ , whereas the physical phenomenon underlying the relationship does not. Buckingham gets around this by invoking a parameter $R$ ( $\equiv D$ ) with units (# moles $\times$ temperature)/(force $\times$ length³). He rewrites Equation (5.1) as

\frac{pv}{R\theta N}-1=0.

Thus $\pi=pv/(R\theta N)$ has no units. Buckingham calls this equation complete and hence nondimensionalizable. This equation is known as the Ideal Gas Law, with $R$ denoting the ideal gas constant (LibreTexts, 2019).

This example of nondimensionalizing by finding one expression, $\pi$ , as in Equation (5.1) can be extended to cases where we must nondimensionalize by finding several $\pi$ quantities. This extension is formalized in Buckingham’s Pi-theorem. Here is a formal statement (in slightly simplified form) as stated by Buckingham (1914) and discussed in a modern style in Bluman and Cole (1974).

Theorem 1.

Suppose $X_{1},\dots,X_{p}$ are $p$ measurable quantities satisfying a defining relation

u(X_{1},\dots,X_{p})=0

(5.2)

that is dimensionally homogeneous. In addition, suppose that there are $m$ dimensions appearing in this equation, denoted $L_{1},\ldots,L_{m}$ , and that the dimension of $u$ can be expressed $[u]=L_{1}^{\alpha_{1}}\times\cdots\times L_{m}^{\alpha_{m}}$ and the dimension of each $X_{j}$ can be expressed as $[X_{j}]=L_{1}^{\alpha_{j1}}\times\cdots\times L_{m}^{\alpha_{jm}}$ . Then Equation (5.2) implies the existence of $q$ fundamental quantities, $q\geq p-m$ dimensionless quantities $\pi_{1},\dots,\pi_{q}$ with $\pi_{i}=\Pi_{j=1}^{p}X_{j}^{a_{ji}},\leavevmode\nobreak\ i=1,\dots,q$ , and a function $U$ such that

U(\pi_{1},\dots,\pi_{q})=0.

In this way $u$ has been nondimensionalized. The choice of $\pi_{1},\ldots,\pi_{q}$ in general is not unique.

The theorem is proven constructively, so we can find $\pi_{1},\ldots,\pi_{q}$ and $U$ . We first determine the $m$ fundamental dimensions used in $X_{1},\ldots,X_{p}$ . We then use the quantities $X_{1},\ldots,X_{p}$ to construct two sets of variables: a set of $m$ primary variables also called repeating variables and a set of $q$ secondary variables, which are nondimensional. For example, if $X_{1}$ is the length of a box and $X_{2}$ is the height and $X_{3}$ is the width, then there is $m=1$ fundamental dimension, the generic length denoted $L$ . We can choose $X_{1}$ as the primary variable and use $X_{1}$ to define two new variables $\pi_{1}=X_{2}/X_{1}$ and $\pi_{2}=X_{3}/X_{1}$ . These new variables, called secondary variables, are dimensionless. Buckingham’s theorem states the algebraic equation relating $X_{1},X_{2}$ and $X_{3}$ can be re-written as an equation involving only $\pi_{1}$ and $\pi_{2}$ . Note that we could have also chosen either $X_{2}$ or $X_{3}$ as the primary variable.

A famous application of Buckingham’s theorem concerns the discovery of the Reynold’s number in fluid dynamics, which is discussed in Gibbings (2011). For brevity we include that example in Appendix B.

A link between Buckingham’s approach and statistical modelling was recognized in the paper of Albrecht et al. (2013) and commented on in Lin and Shen (2013). But its link with the statistical invariance principal seems to have been first identified in the thesis of Shen (2015). This connection provides a valuable approach for the statistical modelling of scientific phenomena. Shen builds a regression model starting with Buckingham’s approach and thereby a nondimensionalized relationship amongst the variables of interest. We propose a different approach in Section 6. We present Shen’s illustrative example next.

Example 10.

This example, from Shen (2015), concerns a model for the predictive relationship between the volume $X_{3}$ of wood in a pine tree and its height $X_{1}$ and diameter $X_{2}$ . The dimensions are $[X_{1}]=L$ , $[X_{2}]=L$ and $[X_{3}]=L^{3}$ . Shen chooses $X_{1}$ as the repeating variable and calculates the $\pi$ -functions $\pi_{1}=X_{2}X_{1}^{-1}$ and $\pi_{2}=X_{3}X_{1}^{-3}$ . He then applies the Pi-theorem to get the dimensionless version of the relationship amongst the variables:

\pi_{2}=g(\pi_{1})

for some function $g$ . He correctly recognizes that $(\pi_{1},\pi_{2})$ is the maximal invariant under the scale transformation group, although the connection to the ratio-scale of Stevens is not made explicitly. He somewhat arbitrarily chooses the class of relationships given by

\pi_{2}=k\pi_{1}^{\gamma}.

(5.3)

He linearizes the model in Equation (5.3) by taking the logarithm and adds a residual to get a standard regression model, susceptible to standard methods of analysis. In particular the least squares estimate $\hat{\gamma}=1.942$ turns out to provide a good fit judging by a scatterplot.

Note that application of the logarithmic transformation is justified since the $\pi$ -functions are dimensionless.

5.2 Bridgman’s alternative

We now describe an alternative to the approach of Buckingham (1914) due to Bridgman (1931). At around the same time that Edgar Buckingham was working on his Pi-theorem, Percy William Bridgman was giving lectures at Harvard on the topic of nondimensionalization that were incorporated in a book whose first edition was published by Yale University Press in 1922. The second edition came out in 1931 (Gibbings, 2011). Bridgman thanks Buckingham for his papers but notes their approaches differ. And so they do. For a start, Bridgman asserts his disagreement with the position that seems to underlie Buckingham’s work that “a dimensional formula has some esoteric significance connected with the ‘ultimate nature’ of things.” Thus those that espouse that point of view must “find the true dimensions and when they are found, it is expected that something new will be suggested about the physical properties of the system.” Instead, Bridgman takes measurement itself as the starting point in modelling and even the collection of data: “Having obtained a sufficient area of numbers by which the different quantities are measured, we search for relations between these numbers, and if we are skillful and fortunate, we find relations which can be expressed in mathematical form.” He then seeks to characterize a measured quantity as either primary, that is, the product of direct measurement, or secondary, that is, computed from the measurements of primary quantities, as, for instance, velocity is computed from the primary quantities of length and time. Finally he sees the basic scientific issue as that of characterizing one quantity in terms of the others as in our explication of Buckingham’s work above in terms of the function $u$ .

Bridgman considers the functional relationship between secondary and primary measurements under what statistical scientists might call “equivariance” under multiplicative changes of scale in the primary units. He proves that the functional relationship must be based on monomials with possible fractional exponents, not unlike the form of the $\pi$ -functions above. Thus Bridgman is able to re-derive Buckingham’s $\pi$ formula, albeit with the added assumption that $u$ is differentiable with respect to its arguments.

5.3 Beyond ratio-scales

Nondimensionalization seems more difficult outside of the domain of the physical sciences. For example, the dimensions of quantities such as utility cannot be characterized by a ratio-scale. And the choice of the primary dimensions is not generally so clear, although Baiocchi (2012) does provide an example in macroeconomics where time $[T]$ , money $[\$]$ , goods $[R]$ and utility $[U]$ may together be sufficient to characterize all other quantities.

Bridgman’s results on allowable laws were limited to laws involving quantities measured on ratio-scales. A substantial body of work has been devoted to extending these results to laws involving quantities measured on nonratio-scales, beginning with the seminal paper of Luce (1959). To quote the paper by Aczél, Roberts and Rosenbaum (1986), which contains an extensive review of that work, “Luce shows that the general form of a ‘scientific law’ is greatly restricted by knowledge of the ‘admissible transformations’ of the dependent and independent variables.” It seems puzzling that this principle has been recognized little if at all in statistical science. This may be due to the fact fact that little attention is paid to such things as dimensions and units of measurement.

The substantial body of research that followed Luce’s publication covers a variety of scales, e.g. ordinal, among other things. Curiously that body of work largely ignores the work of Buckingham in favor of Bridgman even though the former preceded the latter. Also ignored is the work on statistical invariance described, which goes back to G. Hunt and C. Stein in 1946 in unpublished but well-known work that led to optimum statistical tests of hypotheses.

To describe this important work by Luce, we re-express Equation (5.2) as

X_{p}=u^{*}(X_{1},\dots,X_{p-1})

(5.4)

for some function $u^{*}$ and thereby define a class of all possible laws that could relate $X_{p}$ to the predictors $X_{1},\dots,X_{p-1}$ , before turning to a purely data-based empirical assessment of the possible $u^{*}$ ’s. Luce requires that $u^{*}$ satisfy an invariance condition. Specifically, he makes the strong assumption that the scale of each $X_{i},\leavevmode\nobreak\ i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p-1$ , is susceptible to a transformation $T_{i}\in{\cal F}_{i}$ , i.e. $X_{i}\rightarrow T_{i}(X_{i})$ for some sets of possible transformations ${\cal F}_{1},\dots,{\cal F}_{p-1}$ . Furthermore he assumes that the $X_{i}$ ’s are transformed independently of one another; no structural constraints are imposed. Luce assumes the existence of a function $D$ such that

u^{*}\left(T_{1}(X_{1}),\dots,T_{p-1}(X_{p-1})\right)=D(T_{1},\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ T_{p-1})\leavevmode\nobreak\ u^{*}(X_{1},\dots,X_{p-1})

(5.5)

for all possible transformations and choices of $X_{i},\leavevmode\nobreak\ i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p-1$ . He determines that under these conditions, if each $X_{i},\leavevmode\nobreak\ i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p$ , lies on a ratio-scale then

u^{*}(X_{1},\dots,X_{p-1})\propto\Pi_{i=1}^{p-1}\leavevmode\nobreak\ X_{i}^{\alpha_{i}},

where the $\alpha_{i}$ ’s are nondimensional constants. This is Bridgman’s result, albeit proved by Luce without assuming differentiability of $u^{*}$ . If on the other hand some of the $X_{i}$ ’s, $i=1,\leavevmode\nobreak\ \dots,\leavevmode\nobreak\ p-1$ , are on a ratio-scale while others are on an interval-scale and $X_{p}$ is on an interval-scale, then Luce proves $u^{*}$ cannot exist except in the case where $p=2$ and $X_{1}$ is on an interval-scale.

However, as noted by Aczél, Roberts and Rosenbaum (1986), the assumption in Equation (5.5) of the independence of the transformations $T_{i}$ seems unduly strong for many situations, and weakening that assumption expands the number of possibilities for the form of $u^{*}$ . Further work culminated in that of Paganoni (1987). While this work was for $X_{i}$ ’s in a general vector space, for simplicity we present it here in our context, where $X_{i}\in{\rm I\!R}$ , $i=1,\ldots,p$ . Let ${\cal X}$ and ${\cal{P}}$ be nonempty subsets of ${\rm I\!R}^{p-1}$ and ${\cal R}$ a set of $(p-1)$ by $(p-1)$ real-valued matrices. Suppose that

1.

$\textbf{x}+\textbf{p}\in{\cal X}$ for all $\textbf{x}\in{\cal X}$ and $\textbf{p}\in{\cal P}$ ;
2.

the identity matrix is in ${\cal R}$ and, for all $R\in{\cal R}$ and all ${\textbf{x}}\in{\cal X}$ , $R{\textbf{x}}\in{\cal X}$ ;
3.

if ${\cal P}\neq\{0\}$ , then $\lambda R\in{\cal R}$ for all $R\in{\cal R}$ and all $\lambda>0$ .

Suppose also that the function $u^{*}$ in Equation (5.4) satisfies

u^{*}(\textbf{R}\leavevmode\nobreak\ \textbf{x}+\textbf{p})=\alpha(\textbf{R},\textbf{p})\leavevmode\nobreak\ u^{*}(\textbf{x})\leavevmode\nobreak\ +\leavevmode\nobreak\ \beta(\textbf{R},\textbf{p})

for all $\textbf{R}\in\mathcal{R}$ , $\textbf{x}\in{\cal X}$ and $\textbf{p}\in{\cal P}$ for some positive-valued function $\alpha$ and real-valued function $\beta$ . Paganoni then determines the possible forms of $\alpha$ and $\beta$ .

6 Statistical invariance

Having covered some important issues at the foundations of modelling in previous sections, we now turn to the modelling itself. It usually starts with a study question on the relationship among a set of specified, observable or measurable attributes of members of a population, $\omega\in\Omega$ . A random sample of its members is to be collected to address the study question.

A fundamental principle (Principle 1) for constructing a model for natural phenomena, which is embraced by Buckingham’s Pi-theory, asserts that the model cannot depend on the scales and consequent units in which the attributes are to be measured. That principle can be extended to cover other features deemed to be irrelevant.

Principle 2 for constructing a model calls for judicious attribute choices and transformations to reduce the sample size needed to fit the model. The specified attributes could be design variables selected in advance of sampling to maximize the value of the study. A notable example comes from computationally expensive computer simulators. These are run at a selected set of input attributes to develop computationally cheap emulators. This in turn leads to a need to reduce the number of inputs in the predictive model that would need to be fitted empirically. A classical example is given in Appendix B where Buckingham’s theorem leads to a model with just a single predictand and a single predictor, the latter being derived from the original set of five predictors.

Finally we dichotomize general approaches to modelling. Approach 1 i.e. scientific modelling, leads to what Meinsma (2019) calls “physical models.” The models are generally deterministic with attributes measured on a ratio-scale. The $u$ in Equation (5.2) is commonly known at least up to unknown parameters (e.g., Example 1), before any sampling is done.

Approach 2 leads to a second type of models commonly seen in the social and medical sciences. There, we have a sample of $n$ attribute-vectors, each of dimension $p$ to which the invariance principle is applied. And that application can lead to a nondimensionalization of the data with a consequent reduction in the number of attributes, all based on the aggregated sample of $n$ attribute-vectors. But beyond eliminating irrelevant units of measurement, applying the principle can eliminate other irrelevant features of the data, such as angle of rotation. In our approach to be described, the entire sample is holistically incorporated into model development and implementation. Now a single maximal invariant is used to summarize the sample.

In keeping with the goal of generalizing Buckingham’s theory, our approach will focus on the construction of a predictive distribution model. Model uncertainties can then be characterized through such things as conditional variances and residual analysis. Furthermore principled empirical assessments of the validity of $u$ can be made given the replicate samples.

Scales play a prominent role in modelling as well. So for categorical attributes, e.g. R red, Y yellow, G green, the model should be invariant under permutations of the code by which the attributes are recorded. Models with ordinal attributes e.g. small, medium, large should be invariant under positive monotone transformation. But, as noted in Section 1, this paper will focus mainly on ratio-scales and interval-scales. For all scales, and both approaches to modelling, transformation groups to which we now turn play a key role.

6.1 Transformation groups

This subsection reviews the theory of transformation groups and the statistical invariance principle, a topic that has a rich history (Eaton, 1983). These are needed for extending the Buckingham Pi-theory. That need is recognized by Shen and Lin (2019), although their applications concern ratio-scales and physical models. To introduce these groups, for simplicity, in both this section and the next, we will focus on how groups transform the sample space. Later, in Sections 6.3 and 6.5, we will use the same concepts for the full general theory of statistical invariance and generalized statistical invariance.

Each $\omega\in\Omega$ has a vector of measurable attributes $X=X(\omega)\in{\cal X}$ :

X=\{X\}[X]=\left(\begin{array}[]{c}X_{1}\\ \vdots\\ X_{p}\\ \end{array}\right).

A sample of $\omega$ ’s is to be drawn according to a probability distribution $P$ on $\Omega$ , with $P$ inducing a probability distribution on $X$ . Buckingham’s theory (see Subsection 5.1) aims at relating $\omega$ ’s attributes through a model like that in Equation (5.4). Our extension of that theory below will be stochastic in nature and assign $X_{p}$ the special role of predictand.

A sample of size $n$ yields a sample of observations $X_{ij}$ represented by ${\bf X}^{p\times n}\in{\cal X}^{n}$ . The statistical invariance principle posits that randomized statistical decision rules that determine actions should be invariant under $1:1$ transformations by members $g$ of an algebraic group of transformations $G$ . That is, any pair of points ${\bf x},{\bf x}^{\prime}\in{\cal X}^{n}$ are considered equivalent for statistical inference if and only if ${\bf x}=g({\bf x}^{\prime})$ for some $g\in G$ . This equivalence is denoted ${\bf x}\sim{\bf x}^{\prime}$ . By definition, the equivalence classes formed by $G$ are disjoint and exhaustive so we can index them by a parameter $\gamma\in\Gamma$ and let ${\cal X}^{n}_{\gamma},\leavevmode\nobreak\ \gamma\in\Gamma$ , represent an equivalence class. The ${\cal X}^{n}_{\gamma},\leavevmode\nobreak\ \gamma\in\Gamma$ , are referred to as orbits, which could be indexed by a set of points, $\{{\bf x}_{\gamma},\leavevmode\nobreak\ \gamma\in\Gamma\}$ . If the set of points satisfies some regularity conditions, then it is called a cross-section, denoted ${\cal X}^{n}/G$ , and its existence is studied by Wijsman (1967). Assuming a cross-section does exist, we may write

{\cal X}^{n}=G\times{\cal X}^{n}/G.

In other words, any point ${\bf x}\in{\cal X}^{n}$ is represented by $(g,{\bf x}_{\gamma})$ for appropriately chosen $g$ and ${\bf x}_{\gamma}$ .

The statistical invariance principle states that a statistical decision rule must be invariant, that is, the rule must take the same value for all points in a single orbit $\gamma$ . Maximal invariant functions play a special role in statistics. The function $M$ is invariant if its value is constant on each orbit. Further, $M$ is a maximal invariant if it takes different values on each orbit.

The following example shows the statistical invariance principle in action.

Example 11.

A hard-to-make, short-lived product has an exponentially distributed random time $X\leavevmode\nobreak\ hr$ to failure. A process-capability-analysis led to a published value of $\lambda_{0}$ for that product’s average time-to-failure. The need to assure that standard is valid has led to a periodic sample of size $n=2$ resulting in a sample vector ${\bf X}=(X_{1},X_{2})$ . To make inference about $\lambda$ , the expected value of $X$ , following Remark 1, the analyst relies on the relative likelihood, i.e., ignoring irrelevant quantities

\tilde{L}(\lambda)\doteq\frac{L(\lambda)}{L(\lambda_{0})}=-2[\ln{\tilde{\lambda}}+\tilde{\bar{x}}\tilde{\lambda}^{-1}]

(6.1)

where in general for any quantity $u$ with the same units as $\lambda_{0}$ , $\tilde{u}=u/\lambda_{0}$ is unitless. Differentiating the relative likelihood in Equation (6.1) yields the maximum likelihood (MLE)

\hat{\tilde{\lambda}}_{MLE}=\tilde{\bar{x}}.

Using the relative likelihood thus leads to any change in $\lambda$ relative to the published value being expressed by their ratio as mandated by their lying on a ratio scale. The same is true of relative change estimated by the MLE.

The group $G=\{g_{c},\leavevmode\nobreak\ c>0\}$ transforms any realization ${\bf X}={\bf x}$ as follows:

g_{c}({\bf x})=(c\leavevmode\nobreak\ x_{1},c\leavevmode\nobreak\ x_{2}).

As a maximal invariant i.e. $\pi$ -function we may take,

\pi=M(x_{1},x_{2})=(x_{1}/x_{\cdot},x_{2}/x_{\cdot})=M(\tilde{x}_{1},\tilde{x}_{2}),

where $x_{\cdot}=x_{1}+x_{2}$ . The range of $M$ in $(-\infty,\infty)^{2}$ is given by

{\cal M}=\{(m_{1},m_{2}):m_{2}=1-m_{1},\leavevmode\nobreak\ m_{1},\leavevmode\nobreak\ m_{2}>0\}.

Points in ${\cal M}$ index the orbits of the group $G$ . To locate a point ${\bf x}$ on the orbit, it entails taking ${\bf{m}}=(x_{1}/x_{\cdot},x_{2}/x_{\cdot})$ and applying the transformation $g_{c},\leavevmode\nobreak\ c=x_{\cdot}$ , to ${\bf m}$ . Thus, the orbits created by $G$ are rays in the positive quadrant, emanating from, but not including, the point $(0\leavevmode\nobreak\ hr,0\leavevmode\nobreak\ hr)$ . Thus ${\cal X}^{2}$ is the union of these rays. Finally, we may let ${\cal X}^{2}/G={\cal M}$ .

$M$ , as a $\pi$ function, plays a key role in developing the randomized (if necessary) statistical procedures that are invariant under transformations of ${\cal X}^{2}$ . For example

\hat{\tilde{\lambda}}_{MLE}=\tilde{\bar{x}}=c\times\upsilon[M(\tilde{x}_{1},\tilde{x}_{2})]

where $c=\tilde{x}_{\cdot}$ and $\upsilon[M(\tilde{x}_{1},\tilde{x}_{2})]\equiv 1/2$ . But better choices of $\upsilon$ may be dictated by the manufacturer’s loss function.

Note that here $c$ , $\upsilon$ , and $M$ are all unitless.

Invariance of statistical procedures under the action of transformation groups may be a necessity of modelling. For instance, consider the extension of Newton’s second law (Example 1) to the case of vector fields where velocity replaces speed and direction now plays a role. The statistical model for this extension may need to be invariant under changes of direction. In other cases, invariance may be required under permutations and monotone transformations. So in summary transformation groups may play an important role in both scientific and statistical modelling.

6.2 Nondimensionalization

This section presents a novel feature of this paper, the need for dimensional consistency combined with the nondimensionalization principle, that no model should depend on the units in which the data have been measured. Of particular note is the comparison of the strict application of Buckingham’s $\pi$ -theory as described in Shen and Lin (2019) (Approach 1) and the one we are proposing (Approach 2). The comparison is best described in terms of an hypothetical example.

Example 12.

In a study of the magnitude of rainfall, the primary (repeating) variables are ${X}_{1}$ and ${X}_{2}$ , denoting the depth of the rain collected in a standardized cylinder and the duration of the rainfall, respectively. The third quantity $X_{3}$ represents the magnitude of the rainfall as measured by an electronic sensor that computes a weighted average of $X_{1}$ as a process over the continuous time period ending at time $X_{2}$ . The dimensions of the three measurable quantities are $[X_{1}]=L$ , $[X_{2}]=T$ and $[X_{3}]=LT^{-1}$ , which is secondary. Thus the attribute-vector is the column vector

X=\left(\begin{array}[]{c}X_{1}\\ X_{2}\\ X_{3}\end{array}\right).

The attribute-space is all possible values of $X\in{\cal X}$ .

The scales and units of measurement are selected by the investigators. These could be changed by an arbitrary linear transformation

{X}\rightarrow{\bf C}^{3\times 3}{X}=[\mbox{diag}\{c_{1},c_{2},c_{3}\}]\leavevmode\nobreak\ X,\quad c_{i}>0.

(6.2)

But the dimension of $X_{3}$ $([L]/[T])$ is related to those of $X_{1}$ ( $[L]$ ), and $X_{2}$ ( $[T]$ ). This relationship must be taken into account when the scales of these dimensions are selected with their associated units of measurement.

To begin, an experiment is to be performed twice and all three attributes measured each time. The result will be a $3\times 2$ dimensional matrix

{\bf X}=\left(\begin{array}[]{ccc}X_{11}&X_{12}\\ X_{21}&X_{22}\\ X_{31}&X_{32}\\ \end{array}\right).

Thus, the sample space ${\cal X}^{2}$ will be the set of all possible realizations of ${\bf X}$ .

Now a predictive model is to be constructed in accordance with Buckingham’s desideratum that the model should not depend on the measurement system the experimenters happen to choose. Furthermore, dimensional consistency dictates that any changes in the measurements must be consistently applied to all the attributes. More precisely the scales of measurement would require that $c_{3}=c_{1}/c_{2}$ in the transformation matrix of Equation (6.2).

Approach 1 focuses on $X$ , not ${\bf{X}}$ , and is based on considering length and time as fundamental quantities, the primary attributes are $X_{1}$ and $X_{2}$ , with respective dimensions, $L$ and $T$ . The predictand, $X_{3}$ , must be nondimensionalized as a Buckingham $\pi$ -function. Thus, we get

\pi_{3}=\frac{X_{2}X_{3}}{X_{1}}

for each one of the two sampled vectors. The primary variables are labelled $\pi_{1}=\pi_{2}=1$ . We define the nondimensionalized attributes vector as

\pi({\bf{X}})=\left[\begin{array}[]{c c c}1&\nobreak\leavevmode\nobreak\leavevmode\nobreak\leavevmode\hfil&1\\ 1&\nobreak\leavevmode\nobreak\leavevmode\nobreak\leavevmode\hfil&1\\ X_{21}X_{31}/X_{11}&&X_{22}X_{32}/{X_{12}}\end{array}\right].

In other words, in Buckingham’s theory, the function that expresses the relationship among the variables for each of the two samples is

\pi_{3j}=u^{*}(\pi_{1j},\pi_{2j})\equiv K

But the right hand side of this equation is a constant, which is not unreasonable since Buckingham’s model was intended to be deterministic. To deal with that issue we might adopt the ad hoc solution proposed by Shen in a different example (Shen, 2015, p. 17) by introducing a further variable, namely a model error $\epsilon$ :

\pi_{3j}=K\exp{\epsilon_{3j}}.

(6.3)

Taking logarithms and fitting the resulting model yields an estimate $\hat{K}$ for $K$ .

In predictive form, for a future $\omega$ , without an electronic sensor for measuring rainfall, Equation (6.3) yields, after estimating $K$ ,

X_{3f}=\hat{K}\frac{X_{1f}}{X_{2f}}

where $X_{1f}$ and $X_{2f}$ are the depth and duration measurements. On the other hand, there are technical advantages to ignoring units of measurements as is commonly done in developing and validating statistical models, as noted by the anonymous reviewer quoted in Section 1. In that case we would obtain

\{X_{3f}\}=\hat{K}\frac{\{X_{1f}\}}{\{X_{2f}\}}.

Remark 4.

A more formal approach would include the model error $X_{0}$ in a nondimensional form so that $\pi_{0}=X_{0}$ . Going through the steps above yields

\pi_{3j}=u^{*}(\pi_{0j},\pi_{1j},\pi_{2j}),\leavevmode\nobreak\ j=1,2.

In contrast to Approach 1, our approach, Approach 2, treats the sample holistically, considering the whole data matrix ${\bf{X}}$ , not just $X$ . We nondimensionalize the problem by choosing as the primary variables $X_{1j}$ and $X_{2j}$ , $j=1,2$ , although other choices are available. Let $\hat{X}_{i}=(X_{i1}X_{i2})^{1/2}$ . We then form the $\pi$ -functions

\pi_{1j}=\frac{X_{1j}}{\hat{X}_{1}},\quad\pi_{2j}=\frac{X_{2j}}{\hat{X}_{2}},\quad\pi_{3j}=\frac{X_{3j}\hat{X}_{2}}{\hat{X}_{1}},\quad j=1,2.

Then for each of the two samples we obtain

\pi_{3j}=u^{*}(\pi_{1j},\pi_{2j}),\leavevmode\nobreak\ j=1,2.

In predictive form this result becomes

X_{3j}=\frac{\hat{X}_{1}}{\hat{X}_{2}}u^{*}(\pi_{1j},\pi_{2j}),\leavevmode\nobreak\ j=1,2.

Suppose we take

u^{*}(\pi_{1j},\pi_{2j})=K\frac{\pi_{1j}}{\pi_{2j}},\leavevmode\nobreak\ j=1,2

for some positive $K$ . Then

X_{3j}=\frac{\hat{X}_{1}}{\hat{X}_{2}}u^{*}(\pi_{1j},\pi_{2}j)=\frac{\hat{X}_{1}}{\hat{X}_{2}}\left(K\frac{\pi_{1j}}{\pi_{2j}}\right)=K\frac{X_{1j}}{X_{2j}}.\\

From the last result we obtain the model of Shen and Lin (2019)

\pi_{3j}=K,\leavevmode\nobreak\ j=1,2.

However, the final choice for $u^{*}$ could be dictated by an analysis of the data, an advantage of our holistic approach.

Finally we summarize our choice of $\pi$ -functions as a maximal invariant

M({\bf X})=\left(\begin{array}[]{cc}\pi_{11}&\pi_{12}\\ \pi_{21}&\pi_{22}\\ \pi_{31}&\pi_{32}\\ \end{array}\right).

Remark 5.

This example shows that Approach 2 can yield the same model as Approach 1 even though Approach 1 is designed for a single $3\times 1$ dimensional attribute vector unlike Approach 2, which starts with the entire $3\times n$ sample matrix (with $n=2$ ). This phenomenon will be investigated in future work.

Following Example 11, we can formalize the creation of orbits, $\pi$ -functions and so on in terms of a transformation group $G=\{g=g_{c_{1},c_{2},c_{1}/c_{2}}:\leavevmode\nobreak\ c_{i}>0\leavevmode\nobreak\ [c_{i}],\leavevmode\nobreak\ i=1,2\}$ acting on the attribute-vector $X$ . A subgroup $G^{*}$ obtains by restricting $c_{3}=c_{1}/c_{2}$ .

We are now prepared to move to the general case and a generalization of the concepts seen in this example.

6.3 Invariant statistical models

This subsection builds on Subsections 6.1 and 6.2 to obtain a generalized version of Buckingham’s Pi-theory. This means transforming the scales of each of the so-called primary attributes $X_{1},\dots,X_{q}$ , which leads ineluctably to transforming the scales of the remaining, secondary attributes $X_{q+1},\dots,X_{p}$ . Models like that in Equation (5.2) must reflect that link. In this subsection, we extend Buckingham’s idea beyond changes of scale by considering the application of a transformation of the attribute-scales. Our approach assumes a sample of attribute vectors. When prediction is the ultimate goal of inference, as in Example (1), our inferential aim is to construct a model as expressed in Equation (5.4).

Sample space.

We consider a sample of $n$ possibly dependent attribute-vectors collected from a sample of $\omega$ ’s from the population $\Omega$ . The sample matrix, ${\bf X}^{p\times n}$ , is partitioned to reflect the primary and secondary attributes as follows.

{\bf X}=\left(\begin{array}[]{ccc}X_{11}&\dots&X_{1n}\\ \vdots&\vdots&\vdots\\ X_{p1}&\dots&X_{pn}\\ \end{array}\right)\equiv\left(\begin{array}[]{c}{\bf X}_{1}^{q\times n}\\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\bf X}_{2}^{(p-q)\times n}\end{array}\right)\equiv\left(\begin{array}[]{c}{\bf X}_{1}\\ {\bf X}_{2}\end{array}\right).

(6.4)

Let ${\cal X}_{j}$ denote all possible values of ${\bf{X}}_{j}$ , $j=1,2$ . We define a group $G^{*}$ of transformations on ${\cal X}_{1}\times{\cal X}_{2}$ through the following theorem. Each transformation is first defined on ${\cal X}_{1}$ , with an extension to ${\cal X}_{2}$ that yields unit consistency.

Theorem 2.

Let $G_{1}$ be a group of transformations on ${\cal X}_{1}$ with identity element $e_{1}$ . Assume the following.

1.

There exists a function $H$ defined on ${\cal X}_{1}\times{\cal X}_{2}$ so that $H({\bf X}_{1},{\bf X}_{2})$ is always unitless.
2.

For each $g\in G_{1}$ , there exists a $\tilde{g}_{g}:{\bf X}_{2}\to{\bf X}_{2}$ with $H(g({\bf X}_{1}),\tilde{g}_{g}({\bf X}_{2}))=H({\bf X}_{1},{\bf X}_{2})$ for all ${\bf X}_{1}\in{\cal X}_{1}$ and ${\bf X}_{2}\in{\cal X}_{2}$ .
3.

For all ${\bf X}_{2}\in{\cal X}_{2}$ , $\tilde{g}_{e_{1}}({\bf X}_{2})={\bf X_{2}}$ .
4.

For all $g_{1},g_{2}\in G_{1}$ , $\tilde{g}_{g_{1}\circ g_{2}}=\tilde{g}_{g_{1}}\circ\tilde{g}_{g_{2}}$ .

Let $G^{*}$ be the set of all transformations from ${\cal X}_{1}\times{\cal X}_{2}$ to ${\cal X}_{1}\times{\cal X}_{2}$ of the form

g^{*}({\bf X}_{1},{\bf X}_{2})=\left(g({\bf X}_{1}),\tilde{g}_{g}({\bf X}_{2})\right),\quad g\in G_{1}.

Then $G^{*}$ is a group under composition.

Proof.

To show that $G^{*}$ is closed under composition, let $g_{1}^{*}$ and $g_{2}^{*}$ , both in $G^{*}$ , be associated with, respectively, $g_{1}$ and $g_{2}$ , both in $G_{1}$ . Then

$\displaystyle(g_{1}^{}\circ g_{2}^{})({\bf X}_{1},{\bf X}_{2})$	$\displaystyle=$	$\displaystyle g_{1}^{}\left(g_{2}^{}({\bf X}_{1},{\bf X}_{2})\right)=g_{1}^{*}\left(g_{2}(X_{1}),\tilde{g}_{g_{2}}(X_{2})\right)$
	$\displaystyle=$	$\displaystyle\left(g_{1}(g_{2}(X_{1})),\tilde{g}_{g_{1}}(\tilde{g}_{g_{2}}(X_{2}))\right)=\left((g_{1}\circ g_{2})({\bf X}_{1}),(\tilde{g}_{g_{1}}\circ\tilde{g}_{g_{2}})({\bf X}_{2})\right)$
	$\displaystyle=$	$\displaystyle\left((g_{1}\circ g_{2})({\bf X}_{1}),\tilde{g}_{g_{1}\circ g_{2}}({\bf X}_{2})\right)$

by Assumption 4. So $g_{1}^{*}\circ g_{2}^{*}$ is associated with $g_{1}\circ g_{2}\in G_{1}$ . We easily see that $H((g_{1}\circ g_{2})(X_{1}),\tilde{g}_{g_{1}\circ g_{2}}(X_{2})=H(X_{1},X_{2})$ , and so $g_{1}^{*}\circ g_{2}^{*}$ is in $G^{*}$ . Clearly, the identity element of $G^{*}$ is given by $e({\bf X}_{1},{\bf X}_{2})\equiv(e_{1}({\bf X}_{1}),\tilde{g}_{e_{1}}({\bf X}_{2}))$ , which equals $({\bf X}_{1},{\bf X}_{2})$ by the definition of $e_{1}$ and by Assumption 3. The inverse of $g\in G^{*}$ is easily found: if $g^{*}({\bf X}_{1},{\bf X}_{2})=(g({\bf X}_{1}),\tilde{g}_{g}({\bf X}_{2}))$ , then $(g^{*})^{-1}({\bf X}_{1},{\bf X}_{2})=(g^{-1}({\bf X}_{1}),\tilde{g}_{g^{-1}}({\bf X}_{2}))$ . ∎

Illustrating the Theorem via Example 11, we have

{\bf X}_{1}=\left(\begin{array}[]{cc}X_{11}\nobreak\leavevmode&\leavevmode\nobreak\ X_{12}\\ X_{21}\nobreak\leavevmode&\leavevmode\nobreak\ X_{22}\end{array}\right)\quad{\rm{and}}\quad{\bf X}_{2}=\left(\begin{array}[]{cc}X_{31}\nobreak\leavevmode&\leavevmode\nobreak\ X_{32}\\ \end{array}\right).

$G_{1}$ has members

g_{c_{1},c_{2}}({\bf X}_{1})=\left(\begin{array}[]{cc}c_{1}X_{11}\nobreak\leavevmode&\leavevmode\nobreak\ c_{1}X_{12}\\ c_{2}X_{21}\nobreak\leavevmode&\leavevmode\nobreak\ c_{2}X_{22}\end{array}\right).

One choice for the function $H$ is

H({\bf X}_{1},{\bf X}_{2})=\left(X_{31}X_{21}/X_{11},\leavevmode\nobreak\ X_{32}X_{22}/X_{12}\right).

For each $g_{c_{1},c_{2}}\in G_{1}$ , we see that $\tilde{g}_{g_{c_{1},c_{2}}}({\bf X}_{2})=(c_{1}/c_{2}){\bf X_{2}}$ . We also see that $\tilde{g}_{e_{1}}({\bf X}_{2})={\bf X}_{2}.$ The set $G^{*}$ consists of transformations of the form

g^{*}_{c_{1},c_{2}}({\bf X}_{1},{\bf X}_{2})=\left(\begin{array}[]{cc}c_{1}X_{11}\nobreak\leavevmode&\leavevmode\nobreak\ c_{1}X_{12}\\ c_{2}X_{21}\nobreak\leavevmode&\leavevmode\nobreak\ c_{2}X_{22}\\ (c_{1}/c_{2})\leavevmode\nobreak\ X_{31}\nobreak\leavevmode&\leavevmode\nobreak\ (c_{1}/c_{2})\leavevmode\nobreak\ X_{32}\end{array}\right).

We easily see that $\tilde{g}_{g_{1}\circ g_{2}}=\tilde{g}_{g_{1}}\circ\tilde{g}_{g_{2}}$ . Therefore, $G^{*}$ is a group.

Thus, by the Theorem, given a group of transformations $G_{1}$ on the primary attributes in the sample, we can construct a group $G^{*}$ of transformations on all attributes and we can write

{\cal X}_{1}\times{\cal X}_{2}=G^{*}\times({\cal X}_{1}\times{\cal X}_{2})/G^{*}.

Orbits will be indexed by $\gamma\in\Gamma$ and $\pi({\bf X})$ will denote a maximal invariant under the action of $G^{*}$ . Let

\pi({\bf X})=(\pi_{ij}({\bf X}))^{p\times n}.

Therefore, by the statistical invariance principle, acceptable randomized decision rules, which include equivariant estimators as a special case, depend on ${\bf X}$ only through the maximal invariant. We obtain as a special case the Buckingham $\pi$ -functions as a special case where, in particular, the attributes are assessed on ratio-scales. Note that the $\pi$ -functions obtained in this way are not unique.

The maximal invariant’s distribution.

Suppose that the distribution of ${\bf X}$ is in the collection of probability distributions, ${\cal P}=\{P_{n,\bm{\lambda}},\leavevmode\nobreak\ \bm{\lambda}\in{\Lambda}\}$ . Assume, for all $g\in G^{*}$ , the distribution of $g({\bf X})$ is also contained in ${\cal P}$ . More precisely assume that for each $g\in G^{*}$ , there is a one-to-one transformation $\bar{g}$ of ${\Lambda}$ onto ${\Lambda}$ such that ${\bf X}$ has distribution $P_{n,\bm{\lambda}}$ if and only if $g({\bf X})$ has distribution $P_{n,\bar{g}(\bm{\lambda})}$ . Assume further that the set $\bar{G}^{*}$ of all $\bar{g}$ is a transformation group under composition, with identity $\bar{e}$ . Assume also that $\bar{G}^{*}$ is homomorphic to $G^{*}$ , i.e. that there exists a one-to-one mapping $h$ from $G^{*}$ onto $\bar{G}^{*}$ such that, for all $g,g^{*}\in G^{*}$ , $h(g\circ g^{*})=h(g)\circ h(g^{*})$ ; $h(e)=\bar{e}$ , and $h(g^{-1})=\{h(g)\}^{-1}$ .

Let $\pi^{-1}$ denote the set inverse, that is, $\pi^{-1}(C)=\{{\textbf{X}}\in{\cal X}^{n}$ : $\pi({\textbf{X}})\in C\}$ . Then since $\pi({\textbf{X}})=\pi(g({\textbf{X}}))$ for any $g\in G^{*}$ , for all $g\in G^{*}$ and $\bm{\lambda}\in{\Lambda}$ ,

$\displaystyle P_{n,\bm{\lambda}}[\pi({\textbf{X}})\in B]$	$\displaystyle=$	$\displaystyle P_{n,\bm{\lambda}}[\pi(g({\textbf{X}}))\in B]$
	$\displaystyle=$	$\displaystyle P_{n,\bm{\lambda}}[g(\textbf{X})\in\pi^{-1}(B)]$
	$\displaystyle=$	$\displaystyle P_{n,\bar{g}(\bm{\lambda})}[\textbf{X}\in\pi^{-1}(B)]$
	$\displaystyle=$	$\displaystyle P_{n,\bar{g}(\bm{\lambda})}[\pi({\textbf{X}})\in B].$

Thus, any $\bm{\lambda}^{*}$ “connected to” ${\bm{\lambda}}$ via some $\bar{g}\in G_{0}$ induces the same distribution on $\pi({\textbf{X}})$ . This implies that $\upsilon(\bm{\lambda})\doteq P_{\bm{\lambda}}[\pi({\textbf{X}})\in B]$ is invariant under transformations in $\bar{G}^{*}$ and hence that $\upsilon(\bm{\lambda})$ depends on $\bm{\lambda}$ only through a maximal invariant on $\Lambda$ . We denote that maximal invariant by $\bm{\pi}_{\bm{\lambda}}$ . Finally we relabel the distribution of $\pi({\textbf{X}})$ under ${\bm{\lambda}}$ (and under all of the associated $\bm{\lambda}^{*}$ ’s by $P_{\bm{\pi}_{\bm{\lambda}}}$ .

The actions of the group $G^{*}$ have nondimensionalized ${\bf X}$ as ${\bf X}\rightarrow\pi(\textbf{X})$ . Thus we obtain a stochastic version of the Pi-theorem. More precisely using the general notation $[{\bf U}]$ to represent “the distribution of” for any random object ${\bf U}$ we have a nondimensionalized conditional distribution of the nondimensionalized predictand from sample $j$ given the transformed predictors of all samples as

[\pi_{pj}\mid\pi_{1:(p-1),1:n},\leavevmode\nobreak\ \bm{\pi}_{\bm{\lambda}}].

(6.5)

More specifically, we have derived the result seen in Equation (6.5), which is the conditional distribution assumed in a special case by Shen (2015) in his Assumption 2. Furthermore we predict for $X_{p}$ by its conditional expectation, using the distribution in equation (6.5), which can be derived once the joint distribution of the attributes has been specified. The conditional variance would express the predictor’s uncertainty. Hence statistical invariance implies that information about the variables can be summarized by maximal invariants in the sample space and in the parameter space.

6.4 interval-scales

Returning to Equation (5.2), recall that underlying the Buckingham Pi-theorem are $p$ variables that together describe a natural phenomenon through the relationship expressed in that equation. The Pi-theorem assumes that $q$ of these variables are designated as the repeating or primary variables, while the remainder, which are secondary, have scales of measurement that involve the dimensions of the primary variables. It is the latter that are converted to the $\pi$ -functions in the Buckingham theorem. But as we have seen in Subsection 6.3, it is these same variables that together yield the maximal invariant under the actions of a suitably chosen group, which was fairly easily identified in the case of ratio-scales.

Subsection 6.3 provides the bridge between the statistical invariance principle and the deterministic modeling theories described in Section 5 (i.e., the deterministic modeling frameworks developed in the physical sciences where ratio–scales are appropriate). Appendix C develops a similar bridge with such models in the social sciences where quantities on interval-scales are involved. For such quantities, allowable transformations extend from simple scale transformations to affine transformations. Examples of such quantities can be found in Kovera (2010): the Intelligence Quotient (IQ), Scholastic Assessment Test (SAT), Graduate Record Examination (GRE), Graduate Management Admission Test (GMAT), and Miller Analogies Test (MAT). Models for such quantities might involve variables measured on a ratio–scale as well. Since much of the development parallels that in Subsection 6.3, we omit a lot of the details for brevity and those that we do provide are in Appendix C.

6.5 Extending invariance to random effects and the Bayesian paradigm

This section extends the previous sections to incorporate random effects and the Bayesian paradigm. Its foundations lie in statistical decision theory as sketched in Appendix D. Here the action that is a component of decision theory is prediction based on a prediction model as in Equation (5.4). A training sample of $n$ attribute vectors of length $p$ provides data for building the prediction model. Thus the predictors and predictand are observed for each of $n$ sampled $\omega$ ’s to yield the random sample’s $p\times n$ matrix ${\bf X}$ seen in Equation (6.4), which we denote ${\bf X}^{training}$ . Given a future ( $n+1)$ st attribute $p$ -vector $X^{\rm{\it{future}}}$ , the goal is the prediction of its $p$ th component, $X^{\rm{\it{future}}}_{p}$ , based on observations of its first $p-1$ components $X^{\rm{\it{future}}}_{1:(p-1)}$ , all within a Bayesian framework with an appropriate extension of the framework presented in earlier sections. The situation is the one confronting the analyst who must fit a regression model based on $n$ data points and then predict a response given only the future predictors. We let ${\bf{X}}^{sample}$ denote the current data matrix ${\bf{X}}^{training}$ and the future data vector $X^{\rm{\it{future}}}$ , with ${\bf{X}}^{sample}$ , a $p\times(n+1)$ matrix in ${\cal{X}}$ .

The sampling distribution of ${\bf X}^{sample}$ is determined conditional on the random parameters $\bm{\lambda}\in\Lambda$ . That means specifying $\bm{\lambda}$ ’s prior distribution, which in turn is conditional on the set of (specified) hyperparameters $\bm{\phi}\in\Phi$ .

To extend the invariance principle requires, in addition to the structures described above, an action space, ${\cal A}$ , that is the space of possible future predictions of the missing observation, a prior distribution on the parameter space and a loss function, which remain to be specified. We also require the specified transformation groups for ${\cal X}$ and $\Lambda$ , in addition to the transformation groups for ${\cal A}$ and $\Phi$ . In summary, we have the homomorphically related transformation groups $G^{*},\bar{G},\hat{G}$ and $\tilde{G}$ acting on, respectively, ${\cal X},\Lambda,{\cal A}$ and $\Phi$ . The extended invariance principle then reduces points in these four spaces to their maximal invariants i.e. $\pi$ -functions, that can be used to index the orbits induced by their respective groups. Assuming a convex loss function, the Bayes predictor in this reduced problem is a nonrandomized decision rule leading to an action in ${\cal A}$ . Each of the spaces ${\cal X},\Lambda,{\cal A},\Phi$ can (subject to regularity conditions) be represented in the form

W=H_{g}\times W/H_{g}

for the appropriate transformation group $H_{g}$ (Zidek, 1969). The corresponding maximal invariants can be expressed as matrices:

\pi^{{sample}},\leavevmode\nobreak\ \pi^{{parameter}},\leavevmode\nobreak\ \pi^{{action}}\leavevmode\nobreak\ {\rm{and}}\leavevmode\nobreak\ \pi^{{hyperparameter}}.

Finally using square brackets to represent the distributions involved, we get the predictive distribution of interest conditional on quantities we know:

[\pi_{p,n+1}^{sample}\leavevmode\nobreak\ \mid\leavevmode\nobreak\ \pi_{1:(p-1),n+1}^{sample},\leavevmode\nobreak\ \leavevmode\nobreak\ \pi_{1:p,1:n}^{sample},\leavevmode\nobreak\ \leavevmode\nobreak\ \pi^{hyperparameter}].

(6.6)

To fix ideas we sketch an application in the following example, where we take advantage of the sufficiency and ancillarity principles to simplify the construction of the principle.

Example 13.

Assume the vector of observable attributes, $X^{5\times 1}$ , is normally distributed, conditional on the mean $\mu$ and covariance matrix $\Sigma$ . We will sometimes parameterize $\Sigma$ in terms of the diagonal matrix of standard deviations ${\bf{\sigma}}=\mbox{diag}\{\sigma_{1},\dots,\sigma_{5}\}$ and the correlation matrix ${\bf{\rho}}$ , with $\Sigma={\bf{\sigma}}{\bf{\rho}}{\bf{\sigma}}$ . Therefore, the parameters are $\bm{\lambda}=\{\mu,\rho,\sigma\}$ and, conditional on ${\bm{\lambda}}$ , $X^{5\times 1}\sim N_{5}(\lambda,\sigma{\rho}\sigma)$ . In practice, $X_{5}$ , the fifth of these observable attributes is difficult to assess, leading to the idea of making $X_{5}$ a predictand and the remaining four attributes predictors. All attributes lie on an interval-scale so a conventional approach would be seem to be straightforward: multivariate regression analysis. Simply collect a training sample of $n$ independent vectors and fit a regression model for the intended purpose

Complications arise due to the varying dimensions on which these attributes are to be measured. That in turn leads to different scales and different units of measurement, depending on how they are to be measured. That would not pose a problem for the unconscious statistician who might simply ignore the units. A better approach would be that suggested in Farawell (Faraway (2015), p.103), namely rescale the measurements in a thoughtful way to eliminate those units (see also Section 7). However, neither of those approaches deals with the rigid structural issue imposed by the need for dimensional consistency. That is, the units of measurement for the $X_{i}$ ’s are respectively $u_{i},\leavevmode\nobreak\ i=1,...,5$ , with $u_{4}$ and $u_{5}$ constrained to be $u_{2}u_{1}^{-1}$ and $u_{3}u_{2}^{-1}$ , respectively. To overcome the problem, Buckingham’s Pi-theorem suggests itself. Thus we might use $X_{1:3}$ as primary variables to nondimensionalize $X_{4:5}$ . But that does not work either since our variables lie on interval scales with $0$ ’s as conceptually possible values in the appropriate units. That is, $\pi$ -functions as simple ratios of these variables, cannot be constructed directly to be used to nondimensionalize the attribute measurements. So ultimately we turn to the statistical invariance principle to solve the problem. The relevant transformation groups are described in what follows.

The first step creates the training set of $n$ random vectors of attribute measurements, recorded in the $5\times n$ matrix ${\bf{X}}^{training}$ . Letting $x_{\cdot j}$ denote the $j$ th column of a realization of ${\bf{X}}^{training}$ , $\bar{x}=\sum_{j=1}^{n}x_{\cdot j}$ and ${\bf S}=\sum_{j=1}^{n}(x_{\cdot j}-\bar{x})(x_{\cdot j}-\bar{x})^{T}$ , the sample sum of squares, the likelihood function is

$\displaystyle L(\bm{\lambda})$	$\displaystyle\propto$	$\displaystyle\mid\Sigma\mid^{-n/2}\exp\left(-\sum_{j=1}^{n}(x_{\cdot,j}-\mu)^{T}\Sigma^{-1}(x_{\cdot,j}-\mu)/2\right)$
	$\displaystyle=$	$\displaystyle\mid\Sigma\mid^{-n/2}\exp{-tr\leavevmode\nobreak\ (\Sigma^{-1})\leavevmode\nobreak\ {\bf S}/2}$
		$\displaystyle\times\exp{-(\mu-\bar{x})^{T}(\Sigma/n)^{-1}(\mu-\bar{x})^{T}/2).}$

Conditional on $\mu$ and $\Sigma$ , we may invoke the sufficiency principle and replace the training matrix ${\bf{X}}^{training}$ with its sufficient statistics

{\bf X}^{\rm{\it{suff}}}=(\bar{X}^{5\times 1},\leavevmode\nobreak\ \leavevmode\nobreak\ {\bf S}^{5\times 5}),

(6.7)

i.e., the matrix whose first column consists of the sample (row) means of ${\bf X}^{training}$ and the last five columns contain ${\bf S}$ . Thus we may estimate the covariance by $\hat{\Sigma}$ , factored as

\hat{\Sigma}=\hat{\sigma}\hat{\rho}\hat{\sigma}.

Here $\hat{\sigma}$ denotes the diagonal matrix of estimates of the population standard deviations of the five attributes. Furthermore $\hat{\rho}$ denotes the estimate of the matrix of correlations between the random attributes. It is invariant under changes of scale and transformations of their origins. Furthermore, these quantities would be independent given the parameters.

Turning to the Bayesian layer, we will adopt a conjugate prior (Gelman et al., 2014) for the illustrative purpose of this example, with hyperparameters $\bm{\phi}=\{\mu_{H},\Sigma_{H},B\}$ :

	$\displaystyle[\leavevmode\nobreak\ \mu\leavevmode\nobreak\ \mid\leavevmode\nobreak\ \Sigma\leavevmode\nobreak\ ]$	$\displaystyle=$	$\displaystyle N_{p}(\mu_{H},\Sigma/B)$		(6.8)
	$\displaystyle[\leavevmode\nobreak\ \Sigma\leavevmode\nobreak\ ]$	$\displaystyle=$	$\displaystyle Inv{\text{-}}Wishart_{B}((B\Sigma_{H})^{-1}).$		(6.9)

We specify the hyperparameters by equating prior knowledge with a hypothetical sample of $\omega$ ’s and their associated attribute vectors We will add a superscript $h$ on quantities below to indicate their hypothetical nature. Thus the hypothetical sample is of size $n^{h}$ with a likelihood derived from a prior sample with $p\times n^{h}$ matrix ${\bf X}^{h}$ , with sample mean $\bar{x}^{h}$ and sample sum of squares ${\bf{S}}^{h}$ . Thus we obtain the hypothetical likelihood for $\mu$ and $\Sigma$ given the independence of ${\bf S}^{h}$ and $\bar{x}^{h}$ :

	$\displaystyle L_{n^{h}}(\mu,\Sigma)$	$\displaystyle\propto$	$\displaystyle\mid\Sigma\mid^{-n^{h}/2}\exp{-tr\leavevmode\nobreak\ \bigg{(}\Sigma^{-1}{\bf S}^{h}/2\bigg{)}}$
			$\displaystyle\times\exp{-(\mu-\bar{x}^{h})^{T}(\Sigma/n_{h})^{-1}(\mu-\bar{x}^{h})/2}.$

Finally complement the hypothetical likelihood with a noninformative improper prior on $\Sigma$ with density $\mid\Sigma\mid^{(d+1)/2}$ where $d>p$ and obtain the specification of the prior distribution. We take the hyperparameters for the prior distributions in Equations (6.8) and (6.9) to get $\mu_{H}=\bar{x}^{h}$ , $\Sigma_{H}={\bf{S}}^{h}/B$ and $B=n^{h}$ . That completes the construction of the prior.

For our prediction problem involving a future $X^{\rm{\it{future}}}$ , we will use the posterior distribution of $\mu$ and $\Sigma$ based on the training data ${\bf{X}}^{training}$ via the sufficient statistics ${\mathbf{X}}^{\rm{\it{suff}}}$ . To get the posterior distributions for $\mu$ and $\Sigma$ entails taking the product of the prior density as determined above based on the hypothetical sample, with the actual likelihood.

This determines a posterior density of ${\bm{\lambda}}$ . Thus the covariance matrix associated with the posterior distribution of $\mu$ would depend on both the sample’s sum of squares and the hypothetical sample’s sum of squares matrix as

{\bf S}^{posterior}={\bf S}+{\bf S}^{h}+\frac{(\bar{x}-\bar{x}^{h})(\bar{x}-\bar{x}^{h})^{T}}{1/n+1/n^{h}}.

In other words, $S^{posterior}$ would replace $S_{h}$ to get us from its inverted Wishart prior distribution of $\Sigma$ to its posteriori distribution. Moreover, its degrees of freedom would move to $n+n_{h}$ to reflect the larger sample size. We omit further details since our primary interest lies in the reduced model obtained by applying the invariance principle, a reduction to which we now turn.

We now turn to the construction of the transformation groups needed to implement the invariance principle. All transformations of data derive from transforming an attribute vector $X$ as follows:

x\rightarrow{\bf C}\leavevmode\nobreak\ (x+b)

for any possible realization $x$ of $X$ , where the coordinates of $b^{5\times 1}$ have the same units as the corresponding coordinates of $x$ , ${\bf C}=\mbox{diag}\{c_{1},c_{2},c_{3},c_{4}^{*},c_{5}^{*}\}$ where $c_{4}^{*}=c_{2}c_{1}^{-1}$ and $c_{5}^{*}=c_{3}c_{2}^{-1}$ to ensure dimensional consistency. The same ${\bf{C}}$ and $b$ are used to transform all data vectors and so the transformation of the sufficiency-reduced matrix ${\bf{X}}^{\rm{\it{suff}}}$ in Equation (6.7) and $X^{future}$ is as follows:

g_{b,{\bf C}}(\bar{x},{\bf s},x^{\rm{\it{future}}})=\left({\bf C}\leavevmode\nobreak\ (\bar{x}+b),\leavevmode\nobreak\ {\bf C}\leavevmode\nobreak\ {\bf s}\leavevmode\nobreak\ {\bf C},\leavevmode\nobreak\ {\bf C}\leavevmode\nobreak\ (x^{\rm{\it{future}}}+b)\right).

(6.10)

We can study orbits and maximal invariants by considering the transformation of ${\bf{X}}^{\rm{\it{suff}}}$ separately from the transformation of $X^{\rm{\it{future}}}$ .

Consider the decomposition of ${\bf{X}}^{\rm{\it{suff}}}$ into orbits. To index those orbits we first determine a maximal invariant. We do this in two ways, to make a point: first, we ignore dimensional consistency and then we include it. To begin, for both ways we set $b=-\bar{x}$ to transform $\bar{x}$ to ${\bf 0}\leavevmode\nobreak\ [\bar{x}_{n}]$ , where $[\bar{x}_{n}]$ denotes the vector of units of $\bar{x}_{n}$ . This means that all the points in an orbit can be reached from the new origin by choosing $b$ to be the sample average. Next we observe that we may estimate the population covariance for the attributes vector by ${\bf s}/n=\hat{\Sigma}=\hat{\sigma}\hat{\rho}\hat{\sigma}$ . So the matrix ${\bf C}$ in Equation (6.10), in effect, acts on the diagonal matrix $\hat{\sigma}$ . If no restrictions were placed on ${\bf C}$ ’s fourth and fifth diagonal elements, then the orbits could simply be indexed by $\hat{\rho}$ . In short the maximal invariant could be defined by $M(\bar{x},{\bf s})=({\bf 0},\hat{\rho})$ . But dimensional consistency does not allow that choice of ${\bf C}$ . Instead it demands the structural requirement that we use the $c^{*}_{4}$ and $c^{*}_{5}$ specified above. The modified transformation would then act on $\hat{\sigma}$ as follows:

\hat{\sigma}\rightarrow\hat{\sigma}^{*}=\mbox{diag}\{1,1,1,\hat{\sigma}_{1}\hat{\sigma}_{4}\hat{\sigma}_{2}^{-1},\hat{\sigma}_{2}\hat{\sigma}_{5}\hat{\sigma}_{3}^{-1}\}.

The result would mean the changes in $\sigma_{4}$ and $\sigma_{5}$ would be cancelled out by the changes in the transformations of the first three $\sigma$ ’s. The maximal invariant would then be dimensionless as required. It would make the maximal invariant for the sufficiency-reduced sample space

\pi^{\rm{\it{suff}}}=({\bf 0},\hat{\sigma}^{*}\hat{\rho}\hat{\sigma}^{*}),

(6.11)

not $({\bf 0},\hat{\rho})$ .

Remark 6.

The Buckingham theory concerned attributes measured on a ratio-scale. Were that the case in this example, we could have used the primary and secondary attributes differently. More precisely we could have let $c_{i}=X_{i}^{-1},\leavevmode\nobreak\ i=1,2,3$ , $c_{4}=X_{1}X_{4}X_{2}^{-1}$ and $c_{5}=X_{2}X_{5}X_{3}^{-1}$ . The result would eliminate the first three variables while achieving the primary objective of non-dimensionalizing the model. We have achieved this goal using the standard deviations instead. But the method suggested here could also be used for scales other than interval, a subject of current research.

The corresponding maximal invariant in the parameter space for $\Lambda=\{(\mu,\Sigma)\}$ would be identical to that in Equation (6.11), albeit with the hats removed, to get

\pi^{parameter}=({\bf 0},\sigma^{*}\rho\sigma^{*}).

Observe that the ratios of the $\sigma$ ’s with and without hats would be unitless and hence ancillary quantities, thus independent of the sufficient statistics. Hence the maximal invariants can be constructed from them by Basu’s theorem (Basu, 1958). Finally for the hyperparameter space we would obtain the analogous result:

\pi^{hyperparameter}=({\bf 0},\sigma_{0}^{*}\rho_{0}\sigma_{0}^{*}).

We can now compute the posterior distribution

[\leavevmode\nobreak\ \pi^{parameter}\leavevmode\nobreak\ \mid\leavevmode\nobreak\ \pi^{\rm{\it{suff}}},\pi^{hyperparameter}]

but we skip the details for brevity.

We now come to the principal objective of this example, namely a model for predicting a future but as yet unobserved value of the predictand, $X^{future}_{5}$ , based on the future covariate vector $X^{future}_{1:4}$ . The data in ${\bf X}^{\rm{\it{suff}}}$ is used as the sufficiency-reduced training sample. As well we assume that, given the parameters of the sampling distribution, a future attribute vector $X^{future}$ is normal with mean $\mu$ and covariance matrix $\Sigma$ , and is conditionally independent of ${\bf X}^{\rm{\it{suff}}}$ , given $\mu$ and $\Sigma$ .

In conformity with the modelling above, which through application of the invariance principle led to the $\pi$ -functions required to nondimensionalize the problem, we transform the predictand using statistics computed from the data in ${\bf X}^{\rm{\it{suff}}}$ : $(X^{\rm{\it{future}}}_{5}-\bar{X}_{5})/\hat{\sigma}_{5}$ . Furthermore normalized in this way, the predictand becomes invariant of those population parameters. But one more step is necessary to ensure that we have nondimensionalized the predictand in its $\pi$ -function, namely to align the dimensions of the predictand with those in the predictors to ensure dimensional consistency. The result is

\pi^{\rm{\it{future}}}_{5}={\hat{\sigma}_{2}\hat{\sigma}_{5}\over\hat{\sigma}_{3}}{(X^{\rm{\it{future}}}_{5}-\bar{X}_{5})\over\hat{\sigma}_{5}}.

(6.12)

We would also need to convert the four predictor’s into their $\pi$ -functions and that would be done as in Equation (6.12). The result will be $\pi^{future}_{1:4,(n+1)}$ .

That predictor is found using Equation (6.6), with modified notation. It is given by

			$\displaystyle E[\leavevmode\nobreak\ \pi^{\rm{\it{future}}}_{5}\leavevmode\nobreak\ \mid\pi^{future}_{1:4},\pi^{\rm{\it{suff}}},\pi^{hyperparameter}]$
		$\displaystyle=$	$\displaystyle E\{E[\leavevmode\nobreak\ \pi^{future}_{5}\leavevmode\nobreak\ \mid\pi^{\rm{\it{future}}}_{1:4},\pi^{parameter}]\mid\pi^{\rm{\it{future}}}_{1:4},\pi^{\rm{\it{suff}}},\pi^{hyperparameter}\}.$

7 Discussion

Its roots in mathematical statistics along with its formalisms made dimensional analysis (DA) seem unnecessary in statistical science. In fact, Shen and Lin (2019) seem to have written the first paper in statistical science to recognize the need to incorporate units. For example, the authors propose what they call “physical Lebesgue measure” that integrates the units of measurement into the usual Lebesgue measure. Yet application of Buckingham’s desideratum eliminates those units. Paradoxically it does so by exploiting the units it eliminates. That is, it exploits the intrinsic structural relationships among those units that dictate how the model must be set up. This vital implicit connection is recognized in this paper and earlier, in other papers, in more specialized contexts (Shen, 2015; Shen and Lin, 2019).

Remark 7.

The linear model with a Gaussian stochastic structure implicitly assumes data are measured on an interval-scale. But for physical quantities on a ratio-scale, that model would at best be an approximation. Shen and Lin (2018) in their Section 3.3 argue in favor of using a power (meaning product rather than sum) model in this context. We give arguments in our Subsection 4.1 to show why additive linear models are inappropriate in this context other than as an approximation to a ratio in some sense.

That said, the simplest way of nondimensionalizing a model is by dividing each coordinate of ${\bf x}$ by a known constant with the same units of measurement as the coordinate itself, thereby removing its units of measurement. Then $k=p$ in Buckingham’s notation, and $\pi_{1}({\bf x})=(x_{1}/c_{1},x_{2}/c_{2},\ldots,x_{p}/c_{p})$ where $c_{i}=\{1\}[x_{i}]$ . This is in effect the approach used by Zhai et al. (2012a) and Zhai et al. (2012b) to resolve dimensional inconsistencies in models. It is also the approach generally implicit in regression analysis where e.g.

X_{5}=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+X_{4}

with $X_{1}=1$ being unitless and $X_{4}$ representing a combination of measurement and modelling error. The $\beta_{i}$ ’s play the key role of forcing the model to adhere to the principle of dimensional homogeneity when the $X_{i}$ ’s have different units of measurement. A preferable approach would be to nondimensionalize the $X_{i}$ ’s themselves in some meaningful way. For example if $X_{2}$ were the air pollution level at a specific site at a specific time, it might be divided by its long-term average over the population of sites and times. The relative sizes of the now dimensionless $\beta_{i}$ ’s are then readily interpretable – a large $\beta$ would mean the associated $X$ contributes a lot to the overall mean effect.

Buckingham’s theory does not specify the partition of attributes into the primary and secondary categories as is needed when deriving Buckingham’s $\pi$ -functions. That topic is a current area of active research. Recently, two approaches have been proposed in the context of the optimal design of computer experiments. Arelis (2020) suggests using functional analysis of variance to choose the quantities that contribute most to the variation in the output of interest as the base quantities. Yang and Lin (2021), on the other hand, propose a criterion based on measurement errors and choose the quantities that best minimize the measurement errors.

That optimal design is an ideal application for the Buckingham theory. There, the true model, called a simulator, is a deterministic numerical computer model of some phenomenon. But it is computationally intensive. So a stochastic alternative called an emulator is fitted to a sample of outputs from the simulator at a judiciously selected set of input vectors called design points, although they represent covariates in the reality the simulator purports to represent. The Buckingham Pi-theorem simplifies the model by reducing the dimension of the input vector and hence the number of design points. It also simplifies the form of the emulator in the process. That kind of application is discussed in Shen (2015), Arelis (2020) and Adragni and Cook (2009).

Our approach to extending Buckingham’s work differs from that in Shen (2015). Shen restricts the quantities to lie on ratio-scales so he can base his theory directly on Buckingham’s Theorem. His starting point is the application of that theorem and the dimensionless $\pi$ - functions it generates. In contrast, our theory allows a fully general group of transformations and arbitrary scales. Like Buckingham, we designate certain dimensions such as length $L$ as primary (or fundamental) while the others are secondary. We require that a transformation of any primary scale must be made simultaneously to all scales involving that primary scale including secondary scales. That requirement ensures consistency of change across all the quantities and leads to our version of the $\pi$ -functions. However, that leaves open the issue of which variables to serve as the primary and which the secondary variables, a topic under active investigation.

The paper has explored the nature and possible application of DA with the aim of integrating physical and statistical modelling. The result has been an extension of the statistical invariance principle as a way of embracing the principles that lay behind Buckingham’s development of his Pi-theory. The result is a restriction on the class of allowable models and resulting optimal statistical procedures based on those models. How does the performance of these procedures compare with the general class of unrestricted procedures? Would a minimax or Bayesian procedure in the restricted class of allowable procedures have these same performance properties if they were thrown in with the entire set of decision rules? Under certain conditions, the answer is affirmative in the minimax case (Kiefer et al., 1957) and in the Bayesian case (Zidek, 1969).

8 Concluding remarks

This paper has given a comprehensive overview of DA and its importance in statistical modelling. Dimensions have long been known to lie at the foundations of deterministic modelling, with each dimension requiring the specification of a scale and each scale requiring the specification of units of measurement. Dimensions, their scales, and the associated units of measurement lie at the heart of empirical science.

However, statistical scientists regularly ignore their importance. We have demonstrated with the examples presented in Section 2 that ignoring scales and units of measurement can lead to results that are either wrong or meaningless. This points to the need for statistics education to incorporate some basic training on quantity calculus and the importance of scales, along with the impact at a fundamental level of transforming data. Statistics textbooks should reflect these needs. Going beyond training is the whole process of disseminating statistical research. There again the importance of these concepts should be recognized by authors and reviewers to ensure quality.

But does any of this really matter? We assert that it does. First we have described in Example 5, the genesis of this paper, an important example of dimensional inconsistency. An application of Buckingham’s theory showed that an additional quantity needs to be added to complete the model in the sense of Buckingham to make it nondimensionalizable (Wong and Zidek, 2018). The importance of this famous model, a model which was subsequently revised, lay in its use in assessing the reliability of lumber as quantified in terms of its return period, an important component in the calculation of quality standards for lumber when placed in service as a building material. Papers flowing from that discovery soon followed (Wong and Zidek, 2018; Yang, Zidek and Wong, 2018). The work reported in this paper led us to a deeper level than mere dimensional consistency, namely the discovery that the units impose important intrinsic structural links among the various quantities involved in the model. These links lead in Example 13 to a new version of transformation groups usually adopted in invariant Bayesian multivariate regression models. This new version requires use of a subgroup dictated by those links.

At a still deeper level, we are confronted by structural constraints imposed by the scales. For example, the artificial origin $0[u]$ , where $[u]$ denotes the units of $0$ , in the interval scale rule out use of Buckingham’s Pi-theory. Furthermore it leads to a new calculus for ratio-scales, a topic under active investigation.

Further, we have shown that, surprisingly, not all functions are candidates for use in formulating relationship among attribute variables. Thus functions like $g(x)=\ln(x)$ are transcendental and hence inadmissible for that role. This eliminates from consideration in relationships not only the natural logarithm but also, for example, the hyperbolic trigonometric functions. This knowledge would be useful to statistical scientists in developing statistical models.

On the other hand the paper reveals an important deficiency of deterministic physical models of natural phenomena in their failure to reflect their uncertainty about these phenomena. An approach to doing so is presented in the paper along with an extension of the classical theory to incorporate the Bayesian approach. That approach to this union of the different frameworks is reached via the statistical invariance principle, yielding a generalization of the famous theories of Buckingham, Bridgman and Luce.

In summary, each of the two approaches to modelling, physical and statistical, has valuable aspects that can inform the other. This paper provides the groundwork for the unification of these approaches, setting the stage for future research.

Appendix A Validity of using $\ln(x)$ when $x$ has units of measurement: The debate goes on.

Whether as a transcendental function, the function $\ln{(x)}$ may be applied to measurements $x$ with units of measurement has been much discussed in other scientific disciplines and we now present some of that discussion for illustrative purposes. Molyneux (1991) points out that both affirmative and negative views had been expressed on this issue. He argues in favor of a compromise, namely defining the logarithm by exploiting one of its most fundamental properties as $\ln{(X)}=\ln{(\{X\}[X])}=\ln{(\{X\}})+\ln{([X])}$ . He finds support for his proposal by noting that the derivative of the constant term, $\ln([X])$ , would be zero. It follows that

\frac{d\ln{(X)}}{dX}=\frac{d\ln{(\{X\})}}{d\{X\}}.

To see this under his definition of the logarithm

$\displaystyle\frac{\ln{(x+\Delta x)}-\ln{(x)}}{\Delta\{x\}}$	$\displaystyle=$	$\displaystyle\frac{\ln\{x+\Delta x\}+\ln[x]-\ln\{x\}-\ln[x]}{\Delta\{x\}}$
	$\displaystyle=$	$\displaystyle\ln\left(1+{\frac{\Delta\{x\}}{\{x\}}}\right)^{1/\Delta\{x\}}$
	$\displaystyle\rightarrow$	$\displaystyle\frac{1}{\{x\}}$

where $1/\Delta\{x\}\rightarrow\infty$ . Note that his definition of the derivative differs from ours, given in Equation (4.4); we include the units of $x$ in the denominatorm as the derivative is taken with respect to $x$ , units and all. Furthermore Molyneux (1991) argues that the proposal makes explicit the units that are sometimes hidden, pointing to the same example, Example 6, that we have used to make the point. It is unitless because the logarithm is applied to a count, not a measurement, that count being the number of SIUs. Molyneux (1991) gives other such examples. The proposal not only makes the units explicit, but on taking the antilog of the result, you get the original value of $X$ on the raw scale with the units $[X]$ correctly attached.

But, in a letter to the Journal Editor (Mills, 1995), Ian Mills quotes Molyneux (1991), in which Molyneux himself says that his proposal “has no meaning.” Furthermore, Mills says he is “inclined to agree with him.” Furthermore Mills argues, like Bridgman, that the issue is mute since in practice since the logarithm is applied in the context of the difference of two logarithms, leading to $\ln{(u/v)}=\ln{(\{u\}/\{v\}})$ , a unitless quantity. In the same issue of the journal, Molyneux publishes a lengthy rejoinder saying amongst other things that Mills misquoted him.

However, in so far as the authors of this paper are aware, Molyneux’s proposal was not accepted by the scientific community, leaving unresolved the issue of applying the natural logarithm to a dimensional quantity. In particular Matta et al. (2010) also rejects it in a totally different context. Mayumi and Giampietro (2010) pick up on this discussion in a recent paper regarding dimensional analysis in economics and the frequent application of logarithmic specifications. Their approach is based on Taylor expansion arguments that show that application of the logarithm to dimensional quantities $X$ is fallacious since in the expansion

\ln\leavevmode\nobreak\ (1+X)=X+\frac{X^{2}}{2}+\dots.

(A.1)

the terms on the right hand side would then have different units of measurement.

Mayumi and Giampietro then go on to describe a number of findings that are erroneous due to the misapplication of the logarithm. They also cite a “famous controversy” between A.C. Pigou and Milton Friedman that according to the authors, revolved around dimensional homogeneity (Pigou, Friedman and Georgescu-Roegen, 1936; Arrow et al., 1961), although not specifically involving the logarithm. One of the findings criticized in Mayumi and Giampietro (2010) is subsequently defended by Chilarescu and Viasu (2012). But Mayumi and Giampietro (2012) publish a rejoinder in which they reassert the violation of the principle of dimensional homogeneity in that finding and declare that the claim in Chilarescu and Viasu (2012) “is completely wrong. So, contrary to Chilarescu and Viasu’s claim, log(V/ L) or log W in Arrow et al. (1961) can never be used as a scientific representation.”

Although agreeing with the conclusion that the logarithm cannot be applied to a dimensional $X$ , Matta et al. (2010) states that Taylor expansion argument above, which the authors attribute to a Wikipedia article in September 2010, is fallacious. [The Wikipedia article actually misstates the Taylor expansion as

\ln\leavevmode\nobreak\ X=X+X^{2}/2+\dots.

(A.2)

but that does not negate the thrust of their argument.] They argue that the Taylor expansion should be

g(X)=g(X_{o})+(X-X_{o})\frac{dg}{dX}\mid_{X_{o}}+\dots

(A.3)

so that if $X$ had units of measurement, they would cancel out. But then the authors don’t state that expansion for the logarithm. If they did, they would have had to deal with the issue of the units of $g(x_{0})=\ln(x_{0})$ , term, while the remainder of the expansion is unitless (see our comments on this issue in Subsection 4.4).

Baiocchi (2012) points out that if the claims of Mayumi and Giampietro (2010) were valid, they would make most “applications of statistics, economics, … unacceptable” for statistical inference based on the use the exponential and logarithmic transformations. Baiocchi then tries a rescue operation by arguing that the views of Mayumi and Giampietro (2010) go “against well established theory and practice of many disciplines including …statistics,…and that it rests on an inadequate understanding of dimensional homogeneity and the nature of empirical modeling.” The paper invokes the dominant theory of measurement in the social sciences that the author claims makes a numerical statement meaningful if it is invariant under legitimate scale transformations of the underlying variables. That idea of meaningfulness can then be applied to the logarithmic transformation of dimensional quantities in some situations.

To explain this idea, Baiocchi first gives the following analysis involving quantities on the ratio-scale. Start with the model $\ln\leavevmode\nobreak\ X_{2}=\alpha X_{1}$ . Let us rescale $X_{2}$ as $k\leavevmode\nobreak\ X_{2}$ . That leads to the need to appropriately rescale $X_{1}$ say as $m\leavevmode\nobreak\ X_{1}$ . Consequently our model becomes $\ln\leavevmode\nobreak\ (k\leavevmode\nobreak\ X_{2})=\alpha m\leavevmode\nobreak\ X_{1}$ or $\ln\leavevmode\nobreak\ X_{2}=\hat{\alpha}x-\ln\leavevmode\nobreak\ k$ with $\hat{\alpha}=m\alpha$ . But then this model for $X_{2}$ cannot be reduced to its original form because of its second log term. Thus the model would be considered empirically meaningless.

On the other hand if $X_{2}$ were unique up to a power transformation we would get $\ln\leavevmode\nobreak\ X_{2}^{k}=\alpha(m\leavevmode\nobreak\ X_{1})$ or $\ln\leavevmode\nobreak\ X_{2}=\hat{\alpha}X_{1}$ with $\hat{\alpha}=m\alpha/k$ . Therefore the model would be invariant under admissible transformations and hence empirically meaningful. So the situation is more complex than the paper of Mayumi and Giampietro (2010) would suggest.

Baiocchi (2012) also addresses other arguments given by Mayumi and Giampietro (2010). In particular, he is concerned with their Taylor expansion argument $\ln{(1+X)}=X-X^{2}/2+\cdots$ . They point out that for $1+X$ to make sense, the $1$ would have to have the same units as $X$ . They use the expansion $\ln\leavevmode\nobreak\ (X_{0}+X)=\ln\leavevmode\nobreak\ X_{0}+x/a-(x/a)^{2}/2+\cdots$ to make the point that when $a=1$ has the same units as $X$ , the expansion is valid. However this argument ignores the fact that in $\ln\leavevmode\nobreak\ X_{0}$ , $X_{0}$ has units so this argument seems tenuous and therefore leaves doubt about their success in discrediting the arguments in Mayumi and Giampietro (2010). For brevity we will terminate our review of Baiocchi (2012) on that note. It is a lengthy paper with further discussion of the Mayumi and Giampietro (2010) arguments and a very lengthy bibliography of relatively recent relevant work on this issue.

Appendix B Application of Buckingham’s theorem and the discovery of Reynold’s number

This section provides a well-known example from fluid dynamics.

Example 14.

The example is a model for fluid flow around a sphere for the calculation of the drag force $F$ . It turns out that the model depends only on something called the coefficient of drag and on a complicated, single dimensionless number called the Reynolds number that incorporates all the relevant quantities.

To begin, we list all the relevant quantities, the drag force ( $F$ ), velocity ( $V$ ), fluid density ( $\rho$ ), viscosity ( $\mu$ ) and sphere diameter ( $D$ ). We see that we have $p=5$ $X$ s in the notation of Buckingham’s theorem. We first note that the dimensions of these five quantities can be expressed in terms of the three dimensions, length ( $L$ ), mass ( $M$ ) and time ( $T$ ). We treat these as the three primary dimensions and this tells us that we need at most $5-3=2$ dimensionless $\pi$ functions to define for our model.

We first write down the dimensions of each of the five quantities in terms of $L$ , $M$ and $T$ :

[F]=MLT^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [V]=LT^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\rho]=ML^{-3};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu]=ML^{-1}T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D]=L.

(B.1)

We now proceed to sequentially eliminate the dimensions $L$ , $M$ and $T$ in all five equations. First we use $[D]=L$ to eliminate $L$ . The first four equations become

[FD^{-1}]=MT^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [VD^{-1}]=T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D^{3}\rho]=M;\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D\mu]=MT^{-1}.

We next eliminate $M$ via $D^{3}\rho$ , yielding

[FD^{-1}D^{-3}\rho^{-1}]=T^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [VD]^{-1}=T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [D\mu D^{-3}\rho^{-1}]=T^{-1},

that is

[FD^{-4}\rho^{-1}]=T^{-2};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [VD^{-1}]=T^{-1};\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu D^{-2}\rho^{-1}]=T^{-1}.

To eliminate $T$ , we could use $[VD^{-1}]$ or $[\mu D^{-2}\rho^{-1}]$ or even, with a bit more work, $[FD^{-4}\rho^{-1}]$ . We use $[VD^{-1}]$ , yielding

[FD^{-4}\rho^{-1}V^{-2}D^{2}]=1\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\rm{and}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu D^{-2}\rho^{-1}V^{-1}D]=1,

that is

[FD^{-2}\rho^{-1}V^{-2}]=1\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\rm{and}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ [\mu D^{-1}\rho^{-1}V^{-1}]=1.

All the dimensions are now gone so we have nondimensionalized the problem and in the process found $\pi_{1}$ and $\pi_{2}$ as implied by Buckingham’s theorem:

\pi_{1}(F,V,\mu,\rho,D)=\frac{F}{\rho D^{2}V^{2}}\hskip 20.0pt{\rm{and}}\hskip 20.0pt\pi_{2}(F,V,\mu,\rho,D)=\frac{\mu}{\rho DV}.

Therefore, for some function $U$ ,

U\left(\frac{F}{\rho D^{2}V^{2}},\frac{\mu}{\rho DV}\right)=0.

Remarkably we have also found the famous Reynolds number, $\rho DV/\mu$ (Friedmann, Gillis and Liron, 1968). The Reynolds number determines the coefficient of drag, $\rho D^{2}V^{2}/F$ , according to a fundamental law of fluid mechanics.

If we knew $u$ in equation (5.2) to begin with, we could track the series of transformations starting at (B.1) to find $U$ . If, however, we had no specified $u$ to begin with, we could use $\pi_{1}$ and $\pi_{2}$ to determine a model, that is, to find $U$ . For instance, we could carry out experiments, make measurements and determine $U$ from the data. In either case, we can use $U$ to determine the coefficient of drag from the Reynolds number and in turn calculate the drag force.

Invoking the principle of invariance enables us to embed Example 14 in a stochastic framework using Approach 1 as follows.

Example 15 (continues=ex:reynolds).

We continue to use the same notation. In this example, the random variable to be replicated in independent experiments is ${\bf X}=(V,\rho,\mu,D,N)\in{\cal X}={\rm I\!R}^{5}_{+}$ .

The sample space.

The creation of the transformation group and relevant subgroup follow the lines of Example 12. We choose $L,M,T$ as the primary dimensions. Then with ${\bf c}=(c_{1},c_{2},c_{3})$ the corresponding group of transformations is

g_{\bf c}(V,\rho,D,\mu,N)=\bigg{(}\frac{c_{1}}{c_{3}}V,\frac{c_{2}}{c_{1}^{3}}\rho,c_{1}D,\frac{c_{2}}{c_{1}c_{3}}\mu,\frac{c_{1}c_{2}}{c_{3}^{2}}N\bigg{)}.

For indexing the cross sections of ${\cal X}$ we have the maximal invariant

M({\bf X})=(\frac{V}{V},\frac{\rho}{\rho},\frac{D}{D},\frac{\mu}{\rho VD^{2}},\frac{N}{\rho V^{2}D^{2}})=(1,1,1,\pi_{\mu},\pi_{N},)

(B.2)

where $\pi_{\mu}=\mu/(\rho VD)$ and $\pi_{N}=N/(\rho V^{2}D^{2})$ . Let $\bm{\pi}_{\bf X}=(\pi_{\mu},\pi_{N})$ . To show that $M$ is a maximal invariant, first observe that $M(X)$ is invariant since each term is dimensionless. Thus showing $M(X)$ is a maximal invariant reduces to finding a subgroup element for which $X^{*}=g_{{\bf c}^{*}}(X)$ when $M(X)=M(X^{*})$ . For $N$ ,

	$\displaystyle g_{{\bf c}^{*}}(N)$	$\displaystyle=\frac{c_{1}^{}c_{2}^{}}{(c_{3}^{*})^{2}}N$
		$\displaystyle=\frac{D^{}}{D}\frac{\rho^{}(D^{})^{3}}{D^{3}\rho}\frac{D^{2}}{V^{2}}\frac{(V^{})^{2}}{(D^{*})^{2}}N$
		$\displaystyle=\frac{D^{}}{D}\frac{\rho^{}(D^{})^{3}}{D^{3}\rho}\frac{D^{2}}{V^{2}}\frac{(V^{})^{2}}{(D^{*})^{2}}\rho D^{2}V^{2}\frac{N}{\rho D^{2}V^{2}}$
		$\displaystyle=\rho^{}(D^{})^{2}(V^{*})^{2}\pi_{N}$
		$\displaystyle=\rho^{}(D^{})^{2}(V^{})^{2}\pi_{N^{}}\mbox{ using the assumption that $M(X)=M(X^{*})$}$
		$\displaystyle=\rho^{}(D^{})^{2}(V^{})^{2}\frac{N^{}}{\rho^{}(D^{})^{2}(V^{*})^{2}}$
		$\displaystyle=N^{*}.$

Similarly, we get that $\mu\rightarrow\mu^{*}$ . Relating these results to Shen’s thesis, this is essentially his Lemma 5.5. However Shen does not derive the maximal invariant; he simply uses the Pi quantities derived from Buckingham’s Pi-theorem as the maximal invariant. In contrast, for us the maximal invariant emerges in $M(X)$ purely as an artifact of the need for dimensional consistency as expressed through the application of the invariance principle.

Observe that all points in ${\cal X}$ obtain from the cross section in Equation (B.2) by application of the appropriate element of the group of transformations. To see this let us first choose $c_{1}=D^{-1}$ . Then we have

g_{\bf c}(\textbf{x})=\bigg{(}\frac{1}{Dc_{3}}V,c_{2}D^{3}\rho,1,\frac{c_{2}D}{c_{3}}\mu,\frac{c_{2}}{Dc_{3}^{2}}N\bigg{)}.

Next let $c_{2}=(D^{3}\rho)^{-1}$ and get

g_{\bf c}(\textbf{x})=\bigg{(}\frac{1}{Dc_{3}}V,1,1,\frac{1}{D^{3}\rho c_{3}}\mu,\frac{1}{D^{4}\rho c_{3}^{2}}N\bigg{)}.

Finally choose $c_{3}=VD^{-1}$ , which yields

g_{\bf c}(\textbf{x})=\bigg{(}1,1,1,M(\textbf{x})\bigg{)}.

Inverting this transformation takes us from the cross section to the point x.

The sampling distribution

The analysis above naturally suggests the transformation group $\bar{G}$ and its cross section for the parameter space,

\Lambda=\{(\lambda_{V},\lambda_{\rho},\lambda_{D},\lambda_{\mu},\lambda_{N}),\leavevmode\nobreak\ \lambda_{i}>0,\leavevmode\nobreak\ i=V,\dots,N\}

namely

M(\lambda)=\bigg{(}1,1,1,\bm{\pi}_{\lambda}\bigg{)}

where with $\pi_{\lambda_{\mu}}=\lambda_{\mu}/(\lambda_{\rho}\lambda_{D}\lambda_{V})$ and $\pi_{\lambda_{N}}=\lambda_{N}/(\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2})$ and $\bm{\pi}_{\lambda}=(\pi_{\lambda_{\mu}},\pi_{\lambda_{N}})$ characterizes the maximal invariant over the parameter space. It follows that for any $\lambda\in\Lambda$ ,

\lambda=\bar{g}_{{\bf c}}(M(\lambda),)

where

	$\displaystyle c_{1}=1/\lambda_{D}$
	$\displaystyle c_{2}=1/(\lambda_{\rho}\lambda_{D}^{3})$
	$\displaystyle c_{3}=\lambda_{V}/\lambda_{D}.$

The statistical invariance implies that

F(X|\lambda)=P(X\leq x|\lambda)=P(g_{c}(X)\leq x|\bar{g}_{c}(\lambda))

(B.3)

for any $c_{i}>0,\leavevmode\nobreak\ i=1,2,3$ . Notice that

	$\displaystyle P(g_{c}(X)\leq x\|\bar{g}_{c}(\lambda))$	$\displaystyle=P(X\leq g_{c}^{-1}(x)\|\bar{g}_{c}(\lambda))$
		$\displaystyle=F(g_{c}^{-1}(x)\|\bar{g}_{c}(\lambda))$

Now by taking the partial derivatives with respect to the variables, we obtain

\displaystyle f(x|\lambda)=f(g_{c}^{-1}(x)|\bar{g}_{c}(\lambda))\frac{c_{3}}{c_{1}}\frac{c_{1}^{3}}{c_{2}}\frac{1}{c_{1}}\frac{c_{1}c_{3}}{c_{2}}\frac{c_{3}^{2}}{c_{1}c_{2}}

Since this must hold for any $c_{i}>0$ , we may choose $c_{1}=\lambda_{D},c_{2}=\lambda_{D}^{3}\lambda_{\rho},c_{3}=\lambda_{D}/\lambda_{V}$ . Then

	$\displaystyle f(g_{c}^{-1}(x)\|\bar{g}_{c}(\lambda))$	$\displaystyle=f(g_{c}^{-1}(x)\|\bar{g}_{c}^{-1}(\lambda))$
		$\displaystyle=f(\frac{V}{\lambda_{V}},\frac{\rho}{\lambda_{\rho}},\frac{D}{\lambda_{D}},\frac{\mu}{\lambda_{\rho}\lambda_{D}\lambda_{V}},\frac{N}{\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}})\|\bm{\pi}_{\lambda})$
		$\displaystyle=f(\frac{V}{\lambda_{V}},\frac{\rho}{\lambda_{\rho}},\frac{D}{\lambda_{D}},\frac{\lambda_{\mu}}{\lambda_{\rho}\lambda_{D}\lambda_{N}}\frac{\mu}{\lambda_{\mu}},\frac{\lambda_{N}}{\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}}\frac{N}{\lambda_{N}}\|\bm{\pi}_{\lambda})$

Thus the joint PDF is proportional to

f({V\over\lambda_{V}},{\rho\over\lambda_{\rho}},{D\over\lambda_{D}},{\pi_{\lambda_{\mu}}\mu\over\lambda_{\mu}},{\pi_{\lambda_{N}}N\over\lambda_{N}}|\bm{\pi}_{\lambda}).

(B.4)

Hence the statistical invariance implies that information about the variables can be summarized by maximal invariants in the sample space and in the parameter space.

The sample

Now suppose $n$ independent experiments are performed and that they yield data $\textbf{x}_{1},\dots,\textbf{x}_{n}$ . Further suppose for this illustrative example, that the model in Equation (B.3) and resulting likelihood derived from Equation (B.4), the sufficiency principle implies ${\bf S}=\Sigma_{j}{\bf x}_{j}=(S_{V},S_{\rho},S_{D},S_{\mu},S_{N})$ is a sufficient statistic. Then a maximal invariant for transformation group is

M(V,\rho,D,\mu,N)=\bigg{(}\frac{V}{S_{V}},\frac{\rho}{S_{\rho}},\frac{D}{S_{D}},{\mu\over(S_{\rho}S_{V}S_{D^{2}})},{N\over(S_{\rho}S_{V^{2}}S_{D^{2}})}\bigg{)},\leavevmode\nobreak\ c_{i}>0,\leavevmode\nobreak\ \forall i.

To see this, observe that each term is dimensionless, so $M$ is certainly invariant. Now suppose $M(V,\rho,D,\mu,N)=M(V^{*},\rho^{*},D^{*},\mu^{*},N^{*})$ . Then we need to show that there exists $\{c_{i}^{*}\}$ such that $(V,\rho,D,\mu,N)=g_{c_{1}^{*},c_{2}^{*},c_{3}^{*}}(V^{*},\rho^{*},D^{*},\mu^{*},N^{*})$ . These do exist and they are

	$\displaystyle c_{1}^{*}$	$\displaystyle=\frac{S_{D^{*}}}{S_{D}},$
	$\displaystyle c_{2}^{*}$	$\displaystyle=\frac{S_{\rho^{}}S_{{D^{}}^{3}}}{S_{\rho}S_{D^{3}}},$
	$\displaystyle c_{3}^{*}$	$\displaystyle=\frac{\frac{S_{D^{}}}{S_{V^{}}}}{\frac{S_{D}}{S_{V}}}.$

We conclude our discussion of this example. Proceeding further would entail the specification of the sampling distribution and that in turn would depend on contextual details.

Appendix C Invariance models for interval-scales

This section develops the theory for the interval case, which parallels that for ratio-scales seen in Subsection 6.3.

The sample space

We first partition the response vector X as in Equation (6.4). These partitions correspond to the primary and secondary quantities as in the Buckingham theory, although that distinction was not made as far as we know in the Luce work and its successors. Of particular interest again is $X_{p}$ in the model of Equation (5.4) . The first step in our construction entails a choice of the transformation group $G^{*}$ . That choice will depend on the dimensions involved. However, given that we are assuming in this subsection that quantities line an affine space, we will in the sequel rely on Paganoni (1987) as described in Subsection 5.3 for an illustration in this subsection.

We begin with a setup more general than that of Paganoni (1987) and would include for example the discrete seven point Semantic Differential Scale (SDM). So we we extend Equation (5.3) as follows

	$\displaystyle g_{1}(\textbf{x}_{1})$	$\displaystyle=$	$\displaystyle\textbf{R}_{1}\textbf{x}_{1}+\textbf{P}_{1}$		(C.1)
	$\displaystyle g_{2}(\textbf{x}_{2})$	$\displaystyle=$	$\displaystyle\textbf{R}_{2}\textbf{x}_{2}+\textbf{P}_{2}.$		(C.2)

where now $x_{p}$ is the final coordinate of $\textbf{x}_{2}$ when $p-k+1>1$ . Note that in the univariate version of the model model proposed by Paganoni (1987), Equation 5.3 has the vector ${\bf x}_{2}$ replaced with $x_{p}$ . Here both the rescaling matrix $\textbf{R}_{2}$ and the translation vector $\textbf{P}_{2}$ depends on the pair $\textbf{R}_{1}$ and $\textbf{P}_{1}$ , i.e. $\textbf{R}_{2}=R(\textbf{R}_{1},\textbf{P}_{1})$ and $\textbf{P}_{2}=P(\textbf{R}_{1},\textbf{P}_{1})$ . Note that the ratio–scales are formally incorporated in this extension simply by setting to $0$ , the relevant coordinates of $P_{1}$ and $P_{2}$ .

Conditions are needed to ensure that $G^{*}$ is a transformation group. For definiteness we choose $\textbf{R}_{2}=M(\textbf{R}_{1})$ and $\textbf{P}_{2}=\psi(\textbf{R}_{1})$ where in general $M(\textbf{S}\textbf{R})=M(\textbf{S})M(\textbf{R})$ and $\psi(\textbf{S}\textbf{R})=M(\textbf{S})\psi(\textbf{R})+\psi(\textbf{S})$ . The objects $\textbf{R}_{1}$ and $\textbf{P}_{1}$ lie in the subspaces described in Subsection 5.3 while $\textbf{R}_{2}$ and $\textbf{P}_{2}$ lie in multidimensional rather than one dimensional spaces as before. We omit details for brevity.

Finally we index the transformation group $G_{0}$ acting on x by $(\textbf{R}_{1},\textbf{P}_{1})$ and define that associated transformation by

g_{(\textbf{R}_{1},\textbf{P}_{1})}(\textbf{x})=[g_{1}(\textbf{x}_{1}),g_{2}(\textbf{x}_{2})].

(C.3)

It remains to show that in this case, $G_{0}$ is a transformation group and for this we need the conditions presented by Paganoni (1987).

Theorem 3.

The set $G_{0}$ of transformations defined by Equation (C.3) is a transformation group acting on the sample space.

Proof. First we show that $G_{0}$ possesses an identity transformation. This is found simply by taking $\textbf{R}_{1}=\textbf{I}_{k}$ and $\textbf{P}=\textbf{0}_{k}$ and invoking the definitions of $M^{*}$ , $\psi$ and $A$ :

	$\displaystyle g_{1}(\mathbf{x}_{1})$	$\displaystyle=\textbf{R}_{1}\mathbf{x}_{1}+\textbf{P}_{1}=\mathbf{I}_{k}x_{1}+\mathbf{0}_{k}=\mathbf{x}_{1}.$
	$\displaystyle g_{2}(\mathbf{x}_{2})$	$\displaystyle=\textbf{R}_{2}\mathbf{x}_{2}+\textbf{P}_{2}$
		$\displaystyle=M(\textbf{R}_{1})\mathbf{x}_{2}+\psi(\textbf{R}_{1})+A(\textbf{P}_{1})$
		$\displaystyle=\mathbf{x}_{2}+0+0$
		$\displaystyle=\mathbf{x}_{2}.$

Next we show that the composition of two transformations indexed by $(\textbf{S}_{1},\textbf{Q}_{1})$ and $(\textbf{R}_{1},\textbf{P}_{1})$ yield a transformation in $G_{0}$ . First we obtain $g_{(\textbf{R}_{1},\textbf{P}_{1})}(\textbf{x})=(\textbf{x}_{1}^{1},\textbf{x}_{2}^{1})$ where

	$\displaystyle\textbf{x}_{1}^{1}$	$\displaystyle=$	$\displaystyle\textbf{R}_{1}\textbf{x}_{1}+\textbf{P}_{1},\leavevmode\nobreak\ {\rm and}$		(C.4)
	$\displaystyle\textbf{x}_{2}^{1}$	$\displaystyle=$	$\displaystyle\textbf{R}_{2}\textbf{x}_{2}+\textbf{P}_{2}=M(\textbf{R}_{1})\textbf{x}_{2}+\psi(\textbf{R}_{1})+A(\textbf{P}_{1}).$		(C.5)

Next we compute

	$\displaystyle g_{(\textbf{S}_{1},\textbf{Q}_{1})}(\textbf{x}^{1})$	$\displaystyle=$	$\displaystyle\textbf{(S}_{1}\textbf{x}_{1}^{1}+\textbf{Q}_{1},\textbf{S}_{2}\textbf{x}_{2}^{2}+\textbf{Q}_{2})$
		$\displaystyle=$	$\displaystyle\textbf{(S}_{1}\textbf{x}_{1}^{1}+\textbf{Q}_{1},M(\textbf{S}_{1})\textbf{x}_{2}^{1}+\psi(\textbf{S}_{1})+A(\textbf{Q}_{1})).$

But

	$\displaystyle M(\textbf{S}_{1})\textbf{x}_{2}^{1}+\psi(\textbf{S}_{1})+A(\textbf{Q}_{1})$	$\displaystyle=M(\textbf{S}_{1}\textbf{R}_{1})\textbf{x}_{2}+M(\textbf{S}_{1})\psi(\textbf{R}_{1})+M(\textbf{S}_{1})A(\textbf{P}_{1})$
		$\displaystyle+\psi(\textbf{S}_{1})+A(\textbf{Q}_{1})$
		$\displaystyle=M(\textbf{S}_{1}\textbf{R}_{1})\textbf{x}_{2}+\psi(\textbf{S}_{1}\textbf{R}_{1})+A(\textbf{S}_{1}\textbf{P}_{1})+A(\textbf{Q}_{1})$
		$\displaystyle=M(\textbf{S}_{1}\textbf{R}_{1})\textbf{x}_{2}+\psi(\textbf{S}_{1}\textbf{R}_{1})+A(\textbf{S}_{1}\textbf{P}_{1}+\textbf{Q}_{1}),$

which proves that the composition is an element of $G_{0}$ .

Finally we need to show that for any member of $G_{0}$ indexed by $(\textbf{R}_{1},\textbf{P}_{1})$ there exists an inverse. Starting with the transformed quantities in Equations (C.4) and (C.5), let $(S_{1},Q_{1})=(\textbf{R}_{1}^{-1},-\textbf{R}_{1}\textbf{P}_{1})$ . Then we find that

	$\displaystyle g_{(\textbf{S}_{1},\textbf{Q}_{1})}(\textbf{R}_{1}\textbf{x}_{1}+\textbf{P}_{1},\textbf{R}_{2}\textbf{x}_{2}+\textbf{P}_{2})$	$\displaystyle=(\textbf{S}_{1}(\textbf{R}_{1}\mathbf{x}_{1}+\textbf{P}_{1})+\textbf{Q}_{1},\textbf{S}_{2}(\textbf{R}_{2}\mathbf{x}_{2}+\textbf{P}_{2})+\textbf{Q}_{2})$
		$\displaystyle=(\textbf{S}_{1}\textbf{R}_{1}\mathbf{x}_{1}+\textbf{S}_{1}\textbf{P}_{1}+\textbf{Q}_{1},\textbf{S}_{2}\textbf{R}_{2}\mathbf{x}_{2}+\textbf{S}_{2}\textbf{P}_{2}+\textbf{Q}_{2}).$

But

	$\displaystyle\textbf{S}_{1}\textbf{R}_{1}\mathbf{x}_{1}+\textbf{S}_{1}\textbf{P}_{1}+\textbf{Q}_{1}$	$\displaystyle=\textbf{R}_{1}^{-1}\textbf{R}_{1}\mathbf{x}_{1}+\textbf{R}_{1}^{-1}\textbf{P}_{1}+(-\textbf{R}_{1}^{-1}\textbf{P}_{1})$
		$\displaystyle=\mathbf{x}_{1}+0=\mathbf{x}_{1},$

and

	$\displaystyle\textbf{S}_{2}\textbf{R}_{2}\mathbf{x}_{2}$	$\displaystyle+\textbf{S}_{2}\textbf{P}_{2}+\textbf{Q}_{2}=M(\textbf{S}_{1})M(\textbf{R}_{1})\mathbf{x}_{2}+M(\textbf{S}_{1})(\psi(\textbf{R}_{1})+A(\textbf{P}_{1}))+(\psi(\textbf{S}_{1})+A(\textbf{Q}_{1}))$
		$\displaystyle=M(\textbf{R}_{1}^{-1})M(\textbf{R}_{1})\mathbf{x}_{2}+M(\textbf{R}_{1}^{-1})\psi(\textbf{R}_{1})+M(\textbf{R}_{1}^{-1})A(\textbf{P}_{1})+\psi(\textbf{R}_{1}^{-1})+A(-\textbf{R}_{1}^{-1}\textbf{P}_{1})$
		$\displaystyle=M(\textbf{R}_{1}^{-1}\textbf{R}_{1})\mathbf{x}_{2}+\psi(\textbf{R}_{1}^{-1}\textbf{R}_{1})+A(\textbf{R}_{1}^{-1}\textbf{P}_{1})+A(-\textbf{R}_{1}^{-1}\textbf{P}_{1})$
		$\displaystyle=M(\mathbf{I}_{k})\mathbf{x}_{2}+\psi(\mathbf{I}_{k})+A(\textbf{R}_{1}^{-1}\textbf{P}_{1}-\textbf{R}_{1}^{-1}\textbf{P}_{1})$
		$\displaystyle=\mathbf{x}_{2}+\mathbf{0}_{p-k+1}+A(\mathbf{0}_{k})$
		$\displaystyle=\mathbf{x}_{2}+\mathbf{0}_{p-k+1}+\mathbf{0}_{p-k+1}$
		$\displaystyle=\mathbf{x}_{2}.$

Thus the transformation indexed by $(\textbf{R}_{1}^{-1},-\textbf{P}_{1})$ is the inverse of that indexed by $(\textbf{R}_{1},\textbf{P}_{1})$ . That concludes the proof that $G_{0}$ is a transformation group.

We now proceed, as outlined in Subsection 6.3, to find the analogues of the Pi function in Buckingham’s theory, which in our extension of that theory are coordinates of the maximal invariant under the transformation group $G_{0}$ . To that end we seek that transformation for which $g_{1\textbf{x}_{1}}(\textbf{x}_{1})=\textbf{P}_{10}$ i.e. $\textbf{x}_{1}=g_{1\textbf{x}_{1}}^{-1}(\textbf{P}_{10})=\textbf{S}_{1\textbf{x}_{1}}\textbf{P}_{10}+\textbf{Q}_{1\textbf{x}_{1}}$ for an appropriate $\textbf{S}_{1\textbf{x}_{1}}$ and $\textbf{Q}_{1\textbf{x}_{1}}$ , where $\textbf{S}_{1g_{(R,P)}(\textbf{x}_{1})}=R\textbf{S}_{1\textbf{x}_{1}}$ and $\textbf{Q}_{1g_{(R,P)}(\textbf{x}_{1})}=R\textbf{Q}_{1\textbf{x}_{1}}+P$ . It follows that $\textbf{P}_{10}=\textbf{S}_{1\textbf{x}_{1}}^{-1}(\textbf{x}_{1}-\textbf{Q}_{1\textbf{x}_{1}})$ for a designated fixed origin $\textbf{P}_{10}$ in the range of $\textbf{X}_{1}$ . Dimensional consistency calls for the transformation of $\textbf{x}_{2}$ by the $g_{2}$ that complements the $g_{1}$ found in the previous paragraph, the one indexed by $(\textbf{S}_{1\textbf{x}_{1}}^{-1},\leavevmode\nobreak\ -\textbf{S}_{1\textbf{x}_{1}}^{-1}\textbf{Q}_{1\textbf{x}_{1}})$ . If we invoke the invariance principle, we may thus transform $\textbf{x}=(\textbf{x}_{1},\textbf{x}_{2})$ to

(\bm{\pi}_{1x},\bm{\pi}_{2x}),

where $\bm{\pi}_{1x}=\textbf{P}_{10}$ and $\bm{\pi}_{2x}=M(\textbf{S}_{1\textbf{x}_{1}}^{-1})\textbf{x}_{2}+\psi(\textbf{S}_{1\textbf{x}_{1}}^{-1})$ is the maximal invariant. Certainly it is invariant. Now we need to show there exists $(S^{*},Q^{*})$ such that $\textbf{x}=g_{(S^{*},Q^{*})}(\textbf{y})$ when $(\pi_{1x},\pi{2x})=(\pi_{1y},\pi{2y})$ . So suppose $(\pi_{1x},\pi{2x})=(\pi_{1y},\pi{2y})$ . We claim that $\textbf{x}_{1}=g_{(S^{*},Q^{*})}\textbf{y}_{1}$ , and hence $\textbf{x}_{2}=g_{(M(S^{*}),\psi(S^{*}))}\textbf{y}_{2}$ , where $S^{*}=S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}$ and $Q^{*}=-(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}Q_{1\textbf{y}_{1}}+Q_{1\textbf{x}_{1}})$ .

Proof. Assume below that $M^{-1}(X)=M(X^{-1})$ .

	$\displaystyle\pi_{1x}=\pi_{1y}$
	$\displaystyle\iff S_{1\textbf{x}_{1}}^{-1}(\textbf{x}_{1}-Q_{1\textbf{x}_{1}})$	$\displaystyle=S_{1\textbf{y}_{1}}^{-1}(\textbf{y}_{1}-Q_{1\textbf{y}_{1}})$
	$\displaystyle\iff\textbf{x}_{1}$	$\displaystyle=S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}(\textbf{y}_{1}-Q_{1\textbf{y}_{1}})+Q_{1\textbf{x}_{1}}$
		$\displaystyle=S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}\textbf{y}_{1}-S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1}Q_{1\textbf{y}_{1}}+Q_{1\textbf{x}_{1}}$
		$\displaystyle=S^{}\textbf{y}_{1}+Q^{}$
		$\displaystyle=g_{(S^{},Q^{})}\textbf{y}_{1}$
	$\displaystyle\pi_{2x}$	$\displaystyle=\pi_{2y}$
	$\displaystyle\iff M(S_{1\textbf{x}_{1}}^{-1})\textbf{x}_{2}+\psi(S_{1\textbf{x}_{1}}^{-1})$	$\displaystyle=M(S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+\psi(S_{1\textbf{y}_{1}}^{-1})$
	$\displaystyle\iff\textbf{x}_{2}$	$\displaystyle=M^{-1}(S_{1\textbf{x}_{1}}^{-1})M(S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M^{-1}(S_{1\textbf{x}_{1}}^{-1})(\psi(S_{1\textbf{y}_{1}}^{-1})-\psi(S_{1\textbf{x}_{1}}^{-1}))$
		$\displaystyle=M(S_{1\textbf{x}_{1}})M(S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})(\psi(S_{1\textbf{y}_{1}}^{-1})-\psi(S_{1\textbf{x}_{1}}^{-1}))$
		$\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{x}_{1}}^{-1})$
		$\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-(\psi(S_{1\textbf{x}_{1}}S_{1\textbf{x}_{1}}^{-1})+\psi(S_{1\textbf{x}_{1}}))$
		$\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-(\psi(I)-\psi(S_{1\textbf{x}_{1}}))$
		$\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+M(S_{1\textbf{x}_{1}})\psi(S_{1\textbf{y}_{1}}^{-1})-0+\psi(S_{1\textbf{x}_{1}})$
		$\displaystyle=M(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})\textbf{y}_{2}+\psi(S_{1\textbf{x}_{1}}S_{1\textbf{y}_{1}}^{-1})$
		$\displaystyle=g_{(M(S^{}),\psi(S^{}))}\textbf{y}_{2}.$

Thus the proof is complete.

Example 16.

The linear regression model is one of the most famous models in statistics: $y^{1\times 1}=\beta\textbf{x}^{(p-1)\times 1}$ . Shen (2015) shows using dimensional analysis that this model is inappropriate when all the variables are on a ratio–scale. Instead in that case the right hand side should be the product of powers of P–functions of the coordinates of x. But this section shows how to handle the case where the variables are regarded as interval–valued. The Pi functions would then be combinations of the $x$ coordinates depending on the units of measurement of $y$ and those.

To more specific we begin by defining for every $\textbf{x}\in{\cal X}$ , a $g_{\textbf{x}}\in G_{0}$ such that $g_{\textbf{x}}(\textbf{x})=(g_{1\textbf{x}}(\textbf{x}_{1}),g_{2\textbf{x}}(\textbf{x}_{2}))=(\bm{\pi}_{\textbf{x}1}^{1\times k},\bm{\pi}_{\textbf{x}2}^{1\times(p-k+1)})$ , where $[\bm{\pi}_{\textbf{x}1}]=[\textbf{1}_{k}]$ , $[\bm{\pi}_{\textbf{x}2}]=[\textbf{1}_{(p-k+1)}]$ and in general $\textbf{1}_{r}$ denotes the vector of dimension $r$ , all of whose elements are $1$ , representing generically the unit on the coordinate’s scale. For the regression example the final coordinate in ${\bf x}_{2}$ is $x_{p}=Y$ . It then follows from the above analysis in the notation used there that where

\bm{\pi}_{1x}=\textbf{P}_{10}

and

\bm{\pi}_{2x}=M(\textbf{S}_{1\textbf{x}_{1}}^{-1})\textbf{x}_{2}+\psi(\textbf{S}_{1\textbf{x}_{1}}^{-1})

is the non–dimensionalized maximal invariant. The distribution of $\bm{\pi}_{2X}$ then determines the nondimensionalized regression model. We omit the details for brevity.

Appendix D Foundations of statistical modelling

After a sample ${\bf X}^{p\times n}={\bf x}$ is selected, a statistical inquiry is expected to lead to a decision $d({\bf x})=a$ chosen from an action space ${\cal A}$ . That action may be chosen by a randomized decision rule, i.e. a probability distribution $\delta(D;{\bf x})$ for events $D\subset{\cal A}$ . A nonrandomized rule $d({\bf x})$ would then correspond to a degenerate probability distribution for $\delta(\{d({\bf x})\};{\bf x})=1$ .

The decision would be based on the loss function $L(a,\lambda)$ or rather the expected loss function called the risk function

r(\delta,\lambda)=\int L(a,\lambda)\delta(da;{\bf x})P_{\lambda}(d{\bf x}).

The minimax criterion is commonly used to determine the optimal decision rule as

\delta_{minimax}=\mathop{\mathrm{argmin}}_{\delta}\max_{\lambda}r(\delta,\lambda).

An alternative is the Bayes rule where, given a prior distribution $\Pi$ for $\lambda$ , the Bayesian decision rule is found by minimizing

R(\delta)=\int L(a,\lambda)\delta(da;{\bf x})P_{\lambda}(d{\bf x})\Pi(\lambda).

That prior distribution will be usually indexed by a hyperparameter vector $\upsilon$ lying in a hyperparameter space $\Upsilon$ .

Acknowledgements

We are indebted to Professor George Bluman of the Department of Mathematics at the University of British Columbia (UBC) for his helpful discussions on dimensional analysis. Our gratitude goes to Professor Douw Steyn of the Earth and Ocean Sciences Department, also at UBC, for introducing the second author to the Unconscious Statistician. We also thank Yongliang (Vincent) Zhai, former Masters student of the last two authors of this paper, for contributing to Example 4 and for his work in his Masters thesis that inspired much of this research. Finally we acknowledge the key role played by the Forest Products Stochastic Modelling Group centered at UBC and funded by the a combination of FPInnovations and the Natural Sciences and Engineering Research Council of Canada through a Collaborative Research and Development Grant. It was the work of that Group that sparked our interest in the problems addressed in this paper. The research of the last two authors was also supported via the Discovery Grant program of the Natural Sciences and Engineering Research Council of Canada.

References

Aczél, Roberts and Rosenbaum (1986) {barticle}[author] \bauthor\bsnmAczél, \bfnmJános\binitsJ., \bauthor\bsnmRoberts, \bfnmFred S\binitsF. S. and \bauthor\bsnmRosenbaum, \bfnmZangwill\binitsZ. (\byear1986). \btitleOn scientific laws without dimensional constants. \bjournalJournal of Mathematical Analysis and Applications \bvolume119 \bpages389–416. \endbibitem
Adragni and Cook (2009) {barticle}[author] \bauthor\bsnmAdragni, \bfnmKofi P\binitsK. P. and \bauthor\bsnmCook, \bfnmR Dennis\binitsR. D. (\byear2009). \btitleSufficient dimension reduction and prediction in regression. \bjournalPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences \bvolume367 \bpages4385–4405. \endbibitem
Albrecht et al. (2013) {barticle}[author] \bauthor\bsnmAlbrecht, \bfnmMark C\binitsM. C., \bauthor\bsnmNachtsheim, \bfnmChristopher J\binitsC. J., \bauthor\bsnmAlbrecht, \bfnmThomas A\binitsT. A. and \bauthor\bsnmCook, \bfnmR Dennis\binitsR. D. (\byear2013). \btitleExperimental design for engineering dimensional analysis. \bjournalTechnometrics \bvolume55 \bpages257–270. \endbibitem
Arelis (2020) {bphdthesis}[author] \bauthor\bsnmArelis, \bfnmRodr\́@@lbibitem{}\NAT@@wrout{5}{}{}{}{(5)}{}\lx@bibnewblockguez Alexi Gilberto\binitsR. A. G. (\byear2020). \btitleHow to improve prediction accuracy in the analysis of computer experiments, exploitation of low-order effects and dimensional analysis, \btypePhD thesis, \bpublisherUniversity of British Columbia. \endbibitem
Arrow et al. (1961) {barticle}[author] \bauthor\bsnmArrow, \bfnmKenneth J\binitsK. J., \bauthor\bsnmChenery, \bfnmHollis B\binitsH. B., \bauthor\bsnmMinhas, \bfnmBagicha S\binitsB. S. and \bauthor\bsnmSolow, \bfnmRobert M\binitsR. M. (\byear1961). \btitleCapital-labor substitution and economic efficiency. \bjournalThe Review of Economics and Statistics \bpages225–250. \endbibitem
Baiocchi (2012) {barticle}[author] \bauthor\bsnmBaiocchi, \bfnmGiovanni\binitsG. (\byear2012). \btitleOn dimensions of ecological economics. \bjournalEcological Economics \bvolume75 \bpages1–9. \endbibitem
Basu (1958) {barticle}[author] \bauthor\bsnmBasu, \bfnmD\binitsD. (\byear1958). \btitleOn statistics independent of sufficient statistics. \bjournalSankhyā: The Indian Journal of Statistics \bpages223–226. \endbibitem
Bluman and Cole (1974) {bbook}[author] \bauthor\bsnmBluman, \bfnmG W\binitsG. W. and \bauthor\bsnmCole, \bfnmJ D\binitsJ. D. (\byear1974). \btitleSimilarity Methods for Differential Equations. \bseriesApplied Mathematical Sciences. \bpublisherSpringer-Verlag. \endbibitem
Box and Cox (1964) {barticle}[author] \bauthor\bsnmBox, \bfnmGeorge EP\binitsG. E. and \bauthor\bsnmCox, \bfnmDavid R\binitsD. R. (\byear1964). \btitleAn analysis of transformations. \bjournalJournal of the Royal Statistical Society. Series B (Methodological) \bvolume26 \bpages211–252. \endbibitem
Bridgman (1931) {bbook}[author] \bauthor\bsnmBridgman, \bfnmP. W.\binitsP. W. (\byear1931). \btitleDimensional Analysis, Revised Edition. \bpublisherYale University Press, \baddressNew Haven. \endbibitem
Buckingham (1914) {barticle}[author] \bauthor\bsnmBuckingham, \bfnmEdgar\binitsE. (\byear1914). \btitleOn physically similar systems; illustrations of the use of dimensional equations. \bjournalPhysical Review \bvolume4 \bpages345–376. \endbibitem
Chilarescu and Viasu (2012) {barticle}[author] \bauthor\bsnmChilarescu, \bfnmConstantin\binitsC. and \bauthor\bsnmViasu, \bfnmIoana\binitsI. (\byear2012). \btitleDimensions and logarithmic function in economics: A comment. \bjournalEcological Economics \bvolume75 \bpages10–11. \endbibitem
Cohen et al. (2004) {barticle}[author] \bauthor\bsnmCohen, \bfnmAaron J\binitsA. J., \bauthor\bsnmAnderson, \bfnmH Ross\binitsH. R., \bauthor\bsnmOstro, \bfnmBart\binitsB., \bauthor\bsnmPandey, \bfnmK Dev\binitsK. D., \bauthor\bsnmKrzyzanowski, \bfnmMichal\binitsM., \bauthor\bsnmKünzli, \bfnmNino\binitsN., \bauthor\bsnmGutschmidt, \bfnmKersten\binitsK., \bauthor\bsnmPope III, \bfnmC Arden\binitsC. A., \bauthor\bsnmRomieu, \bfnmIsabelle\binitsI., \bauthor\bsnmSamet, \bfnmJonathan M\binitsJ. M. \betalet al. (\byear2004). \btitleUrban air pollution. \bjournalComparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors \bvolume2 \bpages1353–1433. \endbibitem
De Oliveira, Kedem and Short (1997) {barticle}[author] \bauthor\bsnmDe Oliveira, \bfnmV.\binitsV., \bauthor\bsnmKedem, \bfnmB.\binitsB. and \bauthor\bsnmShort, \bfnmD. A.\binitsD. A. (\byear1997). \btitleBayesian prediction of transformed Gaussian random fields. \bjournalJournal of the American Statistical Association \bvolume92 \bpages1422–1433. \endbibitem
Dou, Le and Zidek (2007) {btechreport}[author] \bauthor\bsnmDou, \bfnmYP\binitsY., \bauthor\bsnmLe, \bfnmND\binitsN. and \bauthor\bsnmZidek, \bfnmJV\binitsJ. (\byear2007). \btitleA dynamic linear model for hourly ozone concentrations \btypeTechnical Report No. \bnumber228, \bpublisherStatistics Department, University of British Columbia. \endbibitem
Draper and Cox (1969) {barticle}[author] \bauthor\bsnmDraper, \bfnmN R\binitsN. R. and \bauthor\bsnmCox, \bfnmD R\binitsD. R. (\byear1969). \btitleOn distributions and their transformation to normality. \bjournalJournal of the Royal Statistical Society: Series B (Methodological) \bvolume31 \bpages472–476. \endbibitem
Eaton (1983) {bbook}[author] \bauthor\bsnmEaton, \bfnmMorris L\binitsM. L. (\byear1983). \btitleMultivariate Statistics: a Vector Space Approach. \bpublisherJohn Wiley & Sons, Inc., 605 Third Ave., New York, NY 10158, USA, 1983, 512. \endbibitem
Faraway (2015) {bbook}[author] \bauthor\bsnmFaraway, \bfnmJulian J\binitsJ. J. (\byear2015). \btitleLinear models with R: Second Edition. \bpublisherChapman and Hall/CRC. \endbibitem
Finney (1977) {barticle}[author] \bauthor\bsnmFinney, \bfnmDJ\binitsD. (\byear1977). \btitleDimensions of statistics. \bjournalJournal of the Royal Statistical Society, Series C (Applied Statistics) \bvolume26 \bpages285–289. \endbibitem
Foschi and Yao (1986) {binproceedings}[author] \bauthor\bsnmFoschi, \bfnmR. O.\binitsR. O. and \bauthor\bsnmYao, \bfnmF. Z.\binitsF. Z. (\byear1986). \btitleAnother look at three duration of load models. In \bbooktitleProceedings, XVII IUFRO Congress. \endbibitem
Fourier (1822) {bbook}[author] \bauthor\bsnmFourier, \bfnmJoseph\binitsJ. (\byear1822). \btitleThéorie Analytique de la Chaleur, par M. Fourier. \bpublisherChez Firmin Didot, Père et Fils. \endbibitem
Friedmann, Gillis and Liron (1968) {barticle}[author] \bauthor\bsnmFriedmann, \bfnmM\binitsM., \bauthor\bsnmGillis, \bfnmJ\binitsJ. and \bauthor\bsnmLiron, \bfnmN\binitsN. (\byear1968). \btitleLaminar flow in a pipe at low and moderate Reynolds numbers. \bjournalApplied Scientific Research \bvolume19 \bpages426–438. \endbibitem
Gelman et al. (2014) {bbook}[author] \bauthor\bsnmGelman, \bfnmAndrew\binitsA., \bauthor\bsnmCarlin, \bfnmJohn B\binitsJ. B., \bauthor\bsnmStern, \bfnmHal S\binitsH. S., \bauthor\bsnmDunson, \bfnmDavid B\binitsD. B., \bauthor\bsnmVehtari, \bfnmAki\binitsA. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear2014). \btitleBayesian Data Analysis; Third edition. \bpublisherCRC press. \endbibitem
Gibbings (2011) {bbook}[author] \bauthor\bsnmGibbings, \bfnmJ. C.\binitsJ. C. (\byear2011). \btitleDimensional Analysis. \bpublisherSpringer Verlag, \baddressLondon. \endbibitem
Hand (1996) {barticle}[author] \bauthor\bsnmHand, \bfnmDavid J\binitsD. J. (\byear1996). \btitleStatistics and the theory of measurement. \bjournalJournal of the Royal Statistical Society. Series A (Statistics in Society) \bvolume159 \bpages445–492. \endbibitem
Härdle and Vogt (2015) {barticle}[author] \bauthor\bsnmHärdle, \bfnmWolfgang Karl\binitsW. K. and \bauthor\bsnmVogt, \bfnmAnnette B.\binitsA. B. (\byear2015). \btitleLadislaus von Bortkiewicz—Statistician, Economist and a European Intellectual. \bjournalInternational Statistical Review \bvolume83 \bpages17-35. \bdoi10.1111/insr.12083 \endbibitem
Hoffmeyer and Sørensen (2007) {barticle}[author] \bauthor\bsnmHoffmeyer, \bfnmPreben\binitsP. and \bauthor\bsnmSørensen, \bfnmJohn Dalsgaard\binitsJ. D. (\byear2007). \btitleDuration of load revisited. \bjournalWood Science and Technology \bvolume41 \bpages687–711. \endbibitem
Kiefer et al. (1957) {barticle}[author] \bauthor\bsnmKiefer, \bfnmJack\binitsJ. \betalet al. (\byear1957). \btitleInvariance, minimax sequential estimation, and continuous time processes. \bjournalThe Annals of Mathematical Statistics \bvolume28 \bpages573–601. \endbibitem
Köhler and Svensson (2002) {binproceedings}[author] \bauthor\bsnmKöhler, \bfnmJochen\binitsJ. and \bauthor\bsnmSvensson, \bfnmStaffan\binitsS. (\byear2002). \btitleProbabilistic modelling of duration of load effects in timber structures. In \bbooktitleProceedings of the 35th Meeting, International Council for Research and Innovation in Building and Construction, Working Commission W18–Timber Structures, CIB-W18, Paper \bvolume35-17 \bpages1. \endbibitem
Kovera (2010) {bmisc}[author] \bauthor\bsnmKovera, \bfnmMargaret Bull\binitsM. B. (\byear2010). \btitleEncyclopedia of research design. \endbibitem
Lehmann and Romano (2010) {bbook}[author] \bauthor\bsnmLehmann, \bfnmErick L\binitsE. L. and \bauthor\bsnmRomano, \bfnmJoseph P\binitsJ. P. (\byear2010). \btitleTesting Statistical Hypotheses. \bpublisherSpringer. \endbibitem
LibreTexts (2019) {bmisc}[author] \bauthor\bsnmLibreTexts (\byear2019). \btitleThe Ideal Gas Law. \bhowpublishedhttps://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_(Physical_and_Theoretical_Chemistry)/Physical_Properties_of_Matter/States_of_Matter/Properties_of_Gases/Gas_Laws/The_Ideal_Gas_Law. \bnoteAccessed 01/04/2020. \endbibitem
Lin and Shen (2013) {barticle}[author] \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. and \bauthor\bsnmShen, \bfnmWeijie\binitsW. (\byear2013). \btitleComment: some statistical concerns on dimensional analysis. \bjournalTechnometrics \bvolume55 \bpages281–285. \endbibitem
Luce (1959) {barticle}[author] \bauthor\bsnmLuce, \bfnmR Duncan\binitsR. D. (\byear1959). \btitleOn the possible psychophysical laws. \bjournalPsychological Review \bvolume66 \bpages81. \endbibitem
Luce (1964) {barticle}[author] \bauthor\bsnmLuce, \bfnmR Duncan\binitsR. D. (\byear1964). \btitleA generalization of a theorem of dimensional analysis. \bjournalJournal of Mathematical Psychology \bvolume1 \bpages278–284. \endbibitem
Magnello (2009) {barticle}[author] \bauthor\bsnmMagnello, \bfnmM Eileen\binitsM. E. (\byear2009). \btitleKarl Pearson and the establishment of mathematical statistics. \bjournalInternational Statistical Review \bvolume77 \bpages3–29. \endbibitem
Matta et al. (2010) {barticle}[author] \bauthor\bsnmMatta, \bfnmChérif F\binitsC. F., \bauthor\bsnmMassa, \bfnmLou\binitsL., \bauthor\bsnmGubskaya, \bfnmAnna V\binitsA. V. and \bauthor\bsnmKnoll, \bfnmEva\binitsE. (\byear2010). \btitleCan one take the logarithm or the sine of a dimensioned quantity or a unit? Dimensional analysis involving transcendental functions. \bjournalJournal of Chemical Education \bvolume88 \bpages67–70. \endbibitem
Mayumi and Giampietro (2010) {barticle}[author] \bauthor\bsnmMayumi, \bfnmKozo\binitsK. and \bauthor\bsnmGiampietro, \bfnmMario\binitsM. (\byear2010). \btitleDimensions and logarithmic function in economics: A short critical analysis. \bjournalEcological Economics \bvolume69 \bpages1604–1609. \endbibitem
Mayumi and Giampietro (2012) {barticle}[author] \bauthor\bsnmMayumi, \bfnmKozo\binitsK. and \bauthor\bsnmGiampietro, \bfnmMario\binitsM. (\byear2012). \btitleResponse to dimensions and logarithmic function in economics: A comment. \bjournalEcological Economics \bvolume75 \bpages12–14. \endbibitem
Meinsma (2019) {barticle}[author] \bauthor\bsnmMeinsma, \bfnmGjerrit\binitsG. (\byear2019). \btitleDimensional and scaling analysis. \bjournalSIAM review \bvolume61 \bpages159–184. \endbibitem
Mills (1995) {barticle}[author] \bauthor\bsnmMills, \bfnmIan M\binitsI. M. (\byear1995). \btitleDimensions of logarithmic quantities. \bjournalJournal of Chemical Education \bvolume72 \bpages954. \endbibitem
Molyneux (1991) {barticle}[author] \bauthor\bsnmMolyneux, \bfnmP.\binitsP. (\byear1991). \btitleThe dimensions of logarithmic quantities: implications for the hidden concentration and pressure units in pH values, acidity constants, standard thermodynamic functions, and standard electrode potentials. \bjournalJournal of Chemical Education \bvolume68 \bpages467. \endbibitem
Mosteller and Tukey (1977) {bbook}[author] \bauthor\bsnmMosteller, \bfnmFrederick\binitsF. and \bauthor\bsnmTukey, \bfnmJohn Wilder\binitsJ. W. (\byear1977). \btitleData Analysis and Regression: A Second Course in Statistics. \bpublisherAddison-Wesley Series in Behavioral Science: Quantitative Methods. \endbibitem
Joint Commitee on Guides in Metrology (2012) {bmisc}[author] \bauthor\bsnmJoint Commitee on Guides in Metrology (\byear2012). \btitle200: 2012 — International Vocabulary of Metrology Basic and General Concepts and Associated Terms (VIM). \endbibitem
Paganoni (1987) {barticle}[author] \bauthor\bsnmPaganoni, \bfnmL\binitsL. (\byear1987). \btitleOn a functional equation concerning affine transformations. \bjournalJournal of Mathematical Analysis and Applications \bvolume127 \bpages475–491. \endbibitem
Pigou, Friedman and Georgescu-Roegen (1936) {barticle}[author] \bauthor\bsnmPigou, \bfnmArthur C\binitsA. C., \bauthor\bsnmFriedman, \bfnmMilton\binitsM. and \bauthor\bsnmGeorgescu-Roegen, \bfnmNicholas\binitsN. (\byear1936). \btitleMarginal utility of money and elasticities of demand. \bjournalThe Quarterly Journal of Economics \bvolume50 \bpages532–539. \endbibitem
Shen (2015) {bphdthesis}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. (\byear2015). \btitleDimensional analysis in statistics: theories, methodologies and applications, \btypePhD thesis, \bpublisherThe Pennsylvania State University. \endbibitem
Shen and Lin (2018) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2018). \btitleA conjugate model for dimensional analysis. \bjournalTechnometrics \bvolume60 \bpages79–89. \endbibitem
Shen and Lin (2019) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2019). \btitleStatistical theories for dimensional analysis. \bjournalStatistica Sinica \bvolume29 \bpages527–550. \endbibitem
Shen et al. (2014) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW., \bauthor\bsnmDavis, \bfnmTim\binitsT., \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. and \bauthor\bsnmNachtsheim, \bfnmChristopher J\binitsC. J. (\byear2014). \btitleDimensional analysis and its applications in statistics. \bjournalJournal of Quality Technology \bvolume46 \bpages185–198. \endbibitem
Stevens (1946) {barticle}[author] \bauthor\bsnmStevens, \bfnmStanley Smith\binitsS. S. (\byear1946). \btitleOn the theory of scales of measurement. \bjournalScience \bvolume103 \bpages677-680. \endbibitem
Stevens (1951) {barticle}[author] \bauthor\bsnmStevens, \bfnmStanley Smith\binitsS. S. (\byear1951). \btitleMathematics, measurement, and psychophysics. \bjournalHandbook of Experimental Psychology \bpages1-49. \endbibitem
Taylor (2018) {barticle}[author] \bauthor\bsnmTaylor, \bfnmBarry N\binitsB. N. (\byear2018). \btitleQuantity calculus, fundamental constants, and SI units. \bjournalJournal of Research of the National Institute of Standards and Technology \bvolume123 \bpages123008. \endbibitem
Velleman and Wilkinson (1993) {barticle}[author] \bauthor\bsnmVelleman, \bfnmPaul F\binitsP. F. and \bauthor\bsnmWilkinson, \bfnmLeland\binitsL. (\byear1993). \btitleNominal, ordinal, interval, and ratio typologies are misleading. \bjournalThe American Statistician \bvolume47 \bpages65–72. \endbibitem
Vignaux and Scott (1999) {barticle}[author] \bauthor\bsnmVignaux, \bfnmVA\binitsV. and \bauthor\bsnmScott, \bfnmJL\binitsJ. (\byear1999). \btitleTheory & methods: Simplifying regression models using dimensional analysis. \bjournalAustralian & New Zealand Journal of Statistics \bvolume41 \bpages31–41. \endbibitem
Ward (2017) {barticle}[author] \bauthor\bsnmWard, \bfnmLawrence M\binitsL. M. (\byear2017). \btitleSS Stevens’s invariant legacy: scale types and the power law. \bjournalAmerican Journal of Psychology \bvolume130 \bpages401–412. \endbibitem
Wijsman (1967) {binproceedings}[author] \bauthor\bsnmWijsman, \bfnmRobert A\binitsR. A. (\byear1967). \btitleCross-sections of orbits and their application to densities of maximal invariants. In \bbooktitleProc. Fifth Berkeley Symp. on Math. Stat. and Prob \bvolume1 \bpages389–400. \endbibitem
Wikipedia (2020) {bmisc}[author] \bauthor\bsnmWikipedia (\byear2020). \btitleTranscendental function. \bhowpublishedhttps://en.wikipedia.org/wiki/Transcendental_function. \bnoteAccessed 2020/02/24. \endbibitem
Wong and Zidek (2018) {barticle}[author] \bauthor\bsnmWong, \bfnmSamuel WK\binitsS. W. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2018). \btitleDimensional and statistical foundations for accumulated damage models. \bjournalWood science and technology \bvolume52 \bpages45–65. \endbibitem
Yang and Lin (2021) {barticle}[author] \bauthor\bsnmYang, \bfnmChing-Chi\binitsC.-C. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2021). \btitleA note on selection of basis quantities for dimensional analysis. \bjournalQuality Engineering \bvolume33 \bpages240–251. \endbibitem
Yang, Zidek and Wong (2018) {barticle}[author] \bauthor\bsnmYang, \bfnmChun-Hao\binitsC.-H., \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. and \bauthor\bsnmWong, \bfnmSamuel WK\binitsS. W. (\byear2018). \btitleBayesian analysis of accumulated damage models in lumber reliability. \bjournalTechnometrics. \endbibitem
Zhai et al. (2012a) {btechreport}[author] \bauthor\bsnmZhai, \bfnmYongliang\binitsY., \bauthor\bsnmPirvu, \bfnmCiprian\binitsC., \bauthor\bsnmHeckman, \bfnmNancy\binitsN., \bauthor\bsnmLum, \bfnmConroy\binitsC., \bauthor\bsnmWu, \bfnmLang\binitsL. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2012a). \btitleA review of dynamic duration of load models for lumber strength \btypeTechnical Report No. \bnumber270, \bpublisherDepartment of Statistics, University of British Columbia. \endbibitem
Zhai et al. (2012b) {btechreport}[author] \bauthor\bsnmZhai, \bfnmYongliang\binitsY., \bauthor\bsnmHeckman, \bfnmNancy\binitsN., \bauthor\bsnmLum, \bfnmConroy\binitsC., \bauthor\bsnmPirvu, \bfnmCiprian\binitsC., \bauthor\bsnmWu, \bfnmlang\binitsl. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2012b). \btitleStochastic models for the effects of duration of load on lumber properties \btypeTechnical Report No. \bnumber271, \bpublisherDepartment of Statistics, University of British Columbia. \endbibitem
Zidek (1969) {barticle}[author] \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear1969). \btitleA representation of Bayes invariant procedures in terms of Haar measure. \bjournalAnnals of the Institute of Statistical Mathematics \bvolume21 \bpages291–308. \endbibitem

	$\displaystyle g_{{\bf c}^{*}}(N)$	$\displaystyle=\frac{c_{1}^{}c_{2}^{}}{(c_{3}^{*})^{2}}N$
		$\displaystyle=\frac{D^{}}{D}\frac{\rho^{}(D^{})^{3}}{D^{3}\rho}\frac{D^{2}}{V^{2}}\frac{(V^{})^{2}}{(D^{*})^{2}}N$
		$\displaystyle=\frac{D^{}}{D}\frac{\rho^{}(D^{})^{3}}{D^{3}\rho}\frac{D^{2}}{V^{2}}\frac{(V^{})^{2}}{(D^{*})^{2}}\rho D^{2}V^{2}\frac{N}{\rho D^{2}V^{2}}$
		$\displaystyle=\rho^{}(D^{})^{2}(V^{*})^{2}\pi_{N}$
		$\displaystyle=\rho^{}(D^{})^{2}(V^{})^{2}\pi_{N^{}}\mbox{ using the assumption that $M(X)=M(X^{*})$}$
		$\displaystyle=\rho^{}(D^{})^{2}(V^{})^{2}\frac{N^{}}{\rho^{}(D^{})^{2}(V^{*})^{2}}$
		$\displaystyle=N^{*}.$

	$\displaystyle f(g_{c}^{-1}(x)\|\bar{g}_{c}(\lambda))$	$\displaystyle=f(g_{c}^{-1}(x)\|\bar{g}_{c}^{-1}(\lambda))$
		$\displaystyle=f(\frac{V}{\lambda_{V}},\frac{\rho}{\lambda_{\rho}},\frac{D}{\lambda_{D}},\frac{\mu}{\lambda_{\rho}\lambda_{D}\lambda_{V}},\frac{N}{\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}})\|\bm{\pi}_{\lambda})$
		$\displaystyle=f(\frac{V}{\lambda_{V}},\frac{\rho}{\lambda_{\rho}},\frac{D}{\lambda_{D}},\frac{\lambda_{\mu}}{\lambda_{\rho}\lambda_{D}\lambda_{N}}\frac{\mu}{\lambda_{\mu}},\frac{\lambda_{N}}{\lambda_{\rho}\lambda_{D}^{2}\lambda_{V}^{2}}\frac{N}{\lambda_{N}}\|\bm{\pi}_{\lambda})$

Dimensional Analysis in Statistical Modelling

Abstract

keywords:

keywords:

1 Introduction

Example 1.

2 The unconscious statistician

Example 2.

Example 3.

Remark 1.

Example 4.

3 Dimensional analysis

3.1 Foundations

Example 5.

Example 6.

3.2 The problem of scales

3.3 Why the origin matters

Example 7.

4 Transforming quantities

4.1 Switching scales

The bell curve approximation.

Example 8.

4.2 Algebraic versus transcendental functions

4.3 The Box-Cox tranformation

Remark 2.

Remark 3.

4.4 The logarithm: a transcendental function

Does the logarithm have units?

Can we take the logarithm of a dimensional quantity with units?

5 Allowable relationships among quantities

5.1 Buckingham’s Pi-theorem

Example 9.

Theorem 1.

Example 10.

5.2 Bridgman’s alternative

5.3 Beyond ratio-scales

6 Statistical invariance

6.1 Transformation groups

Example 11.

6.2 Nondimensionalization

Example 12.

Remark 4.

Remark 5.

6.3 Invariant statistical models

Sample space.

Theorem 2.

Proof.

The maximal invariant’s distribution.

6.4 interval-scales

6.5 Extending invariance to random effects and the Bayesian paradigm

Example 13.

Remark 6.

7 Discussion

Remark 7.

8 Concluding remarks

Appendix A Validity of using ln⁡(x)\ln(x) when xx has units of measurement: The debate goes on.

Appendix B Application of Buckingham’s theorem and the discovery of Reynold’s number

Example 14.

Example 15 (continues=ex:reynolds).

The sample space.

The sampling distribution

The sample

Appendix C Invariance models for interval-scales

The sample space

Theorem 3.

Example 16.

Appendix D Foundations of statistical modelling

Acknowledgements

References

Appendix A Validity of using $\ln(x)$ when $x$ has units of measurement: The debate goes on.