Dimensional Analysis in Statistical Modelling
Abstract
Building on recent work in statistical science, the paper presents a theory for modelling natural phenomena that unifies physical and statistical paradigms based on the underlying principle that a model must be non-dimensionalizable. After all, such phenomena cannot depend on how the experimenter chooses to assess them. Yet the model itself must be comprised of quantities that can be determined theoretically or empirically. Hence, the underlying principle requires that the model represents these natural processes correctly no matter what scales and units of measurement are selected. This goal was realized for physical modelling through the celebrated theories of Buckingham and Bridgman and for statistical modellers through the invariance principle of Hunt and Stein. Building on recent research in statistical science, the paper shows how the latter can embrace and extend the former. The invariance principle is extended to encompass the Bayesian paradigm, thereby enabling an assessment of model uncertainty. The paper covers topics not ordinarily seen in statistical science regarding dimensions, scales, and units of quantities in statistical modelling. It shows the special difficulties that can arise when models involve transcendental functions, such as the logarithm which is used e.g. in likelihood analysis and is a singularity in the family of Box-Cox family of transformations. Further, it demonstrates the importance of the scale of measurement, in particular how differently modellers must handle ratio- and interval-scales.
keywords:
[class=MSC2020]keywords:
t1The research reported in this paper was supported by grants from the Natural Science and Engineering Research Council of Canada.
1 Introduction
The importance of dimension, scale and units of measurement in modelling is largely ignored in statistical modelling. In fact, an anonymous reviewer stated:
“Generally speaking, statisticians treat data as already dimensionless by taking the numeric part, which is equivalent to dividing them by their own units…”
Others have long recognized the role of dimensions, and hence of their scales and units of measurement, in modelling; dimensions can be used to simplify model fitting by reducing the number of quantities involved to a few non-dimensionalized ones. A principal contribution of this paper makes clear to statisticians the importance of dimensions, scales and units in statistical modelling. We also develop a framework that incorporates these important ideas via a statistical invariance approach. This allows us to extend the existing work’s focus on physical quantities, which by their nature must lie on ratio-scales, to other scales and to vector spaces for multivariate responses. Yet another contribution is addressing a number of issues that are crucial in laying the foundation for the extension. These include: adopting a sampling distribution supported on an interval-scale when the actual support for the sampling distribution is a subset of a ratio-scale; the meaninglessness of applying transcendental transformations such as the logarithm to quantities with units.
This paper, which is partly expository, describes and contributes to the unification of two overlapping paradigms for modelling natural phenomena. For simplicity we will refer to these as statistical and what Meinsma (2019) calls physical. Physical models are usually deterministic and developed for a specific phenomenon.
In this approach, model development cannot ignore the dimensions, scales and units of measurement on which empirical implementation and assessment will be based. Indeed, Buckingham (1914) believed that a valid model cannot depend on how an investigator chooses to measure the quantities that describe the phenomena of interest. After all, nature cannot know what the measuring methods of science are. Consequently, Buckingham suggested that any valid model must be nondimensionalizable, leading him to his celebrated Pi-theorem. In contrast, Bridgman (1931) argued that models must depend on the measurements but must be invariant under rescaling. Based on the latter premise he was able to derive the invariant “-functions” of the measurements that were central to Buckingham’s theory. The work of these pioneers spurred the development of what is now known as dimensional analysis (DA), its notions of dimensional homogeneity (DH) and its quantity calculus, explored in depth in Section 3.
The following example renders the ideas above in a more concrete form.
Example 1.
Newton’s second law of motion exemplifies the physical approach to modelling:
(1.1) |
Here denotes acceleration, the second derivative with respect to time of the location of an object computed with respect to the starting point of its motion. and are, respectively, the force acting on the object and its mass. The model in Equation (1.1) satisfies the fundamental requirement of DH – the units on the left hand side are the same as the units on the right hand side. Moreover, all three of the quantities in the model lie on a ratio-scale i.e. they are inherently positive, having a structural for an origin when and where the motion began.
The work of Buckingham and Bridgman cited above implies the quantities in a valid model have to be transformable to dimensionless alternatives called -functions. In the case of Newton’s law, we can use and to transform into a dimensionless quantity to get the simpler but mathematically equivalent model involving a single -function:
(1.2) |
In contrast to physical modelling that commonly takes a bottom-up approach, that of statistics as a discipline was top-down (Magnello, 2009) when, in the latter part of the nineteenth century, Karl Pearson established mathematical statistics with its focus on abstract classes of statistical models. Pearson’s view freed the statistician from dealing with the demanding contextual complexities of specific applications. In his abstract formulation, Pearson was able to: incorporate model uncertainty expressed probabilistically; define desirable model properties; determine conditions under which these occur; and develop practical tools to implement models that possess those properties for a specific application. The emphasis on mathematical foundations led inevitably to an almost total disregard of the constraints brought on by dimension, scale and units of measurement. Thus statisticians often use a symbol like to mean a number to be manipulated in a formal analysis in equations, models and transformations. On the other hand, scientists use such a symbol to represent some specific aspect of a natural phenomenon or process. The scientist’s goal is to characterize through a combination of basic principles and empirical analysis. This leads to the need to specify one or more “dimensions” of , e.g. length . That in turn leads to the need to specify an appropriate “scale” for , e.g. categorical, ordinal, interval or ratio. For interval- and ratio-scales, would have some associated units of measurement depending on the nature and resolution of the device making the measurement. How all of these parts of fit together is the subject addressed in the realms of measurement theory and DA.
In recent years, importance of dimensions, scales, and units of measurements has been progressively recognized in statistics. At a deep theoretical level, Hand (1996) considers different interpretations of measurement, studying what things can be measured and how numbers can be assigned to measurements. On a more practical side, inspired by applied scientists, Finney (1977) demonstrates how the principle of DH can be used to assess the validity of a statistical model. The first application of DA in statistical modeling appears in the work of Vignaux and Scott (1999), who develop a framework for applying DA to linear regression models. The practicality of DA is illustrated to a great extent in design of experiments by Albrecht et al. (2013). While much has been written in this area by nonstatisticians, such as Luce (1964), surprisingly little has been written by statisticians.
A natural statistical approach to these ideas is via the statistical invariance principle due to Hunt and Stein in unpublished work (Lehmann and Romano, 2010, Chapter 6). Despite the abstraction of Pearson’s approach they articulated an important principal of modelling- that when a test of a hypothesis about a natural phenomenon based on a sample of measurements rejects the null hypothesis, that decision should not change if the data were transformed to a different scale, e.g. from Celsius (interval scale) to Fahrenheit (ratio scale). That led them to the statistical invariance principle: methods and models must transform coherently under measurement scale transformations.
However, the link between DA and the statistical invariance does not seem to have been recognized until the work of Shen, Lin, and his co-investigators (Lin and Shen, 2013; Shen et al., 2014; Shen, 2015; Shen and Lin, 2018, 2019). They develop a theoretical framework for applying DA to statistical analysis and the design of experiments while showcasing the practical benefits through numerous examples involving physical quantities. In their framework, Buckingham’s Pi-theorem plays the key role in unifying DA and the statistical invariance. In our paper, we extend their work in two ways: (1) elucidating the link between DA and the statistical invariance by removing the dependency on Buckingham’s Pi-theorem and (2) in doing so, freeing ourselves from restricting to modelling physical quantities and ultimately embedding scientific modeling within a stochastic modeling framework in a general setting.
This paper considers issues that arise when lies on an interval-scale with values on the entire real line and when lies on a ratio-scale, that is, with non-negative values and a real origin having a meaning of “nothingness”. A good example to keep in mind is the possible scales of temperature; it can be measured on the ratio-scale in units of degrees Kelvin (), where means all molecular momentum is lost. Alternatively, it can be measured on the interval-scale in units of degrees Celsius () where means water freezes. The same probabilistic model cannot be used for both scales although often, in practice, the Gaussian model for the interval-scale is used as an approximation for the ratio-scale.
We conclude this Introduction with a summary of the paper’s contents and findings. Section 2 introduces us to the Unconscious Statistician through examples that illustrate the importance of units of measurement. That demonstrated importance of units then leads us into Section 3, which is largely a review of the basic elements of DA, a subject taught in the physical sciences but rarely in statistics. We describe a key tool, quantity calculus, which is the algebra of units of measurement and DH.
In Section 4, we discuss problems that might arise when statisticians transform variables. Sometimes the transformation leads to an unintended change of scale, e.g. when a Gaussian distribution on is adopted as an approximation to a distribution on . This can matter a lot when populations with responses on a ratio-scale are being compared. We discuss when such an approximation and hence transformation may be justified. Even using the famous family of Box-Cox transformations can cause problems, in particular with the limiting case logarithmic transformation.
Having investigated units and scales, we then turn to the models themselves. It turns out that when restricted by the need for DH, the class of models is also restricted; that topic is explored in Section 5 where we review the famous Buckingham Pi-theorem. We also see for the first time the ratio-scale’s cousin, the interval-scale. All this points to the need for a general approach to scientific modelling that was foreseen in Hunt and Stein’s unpublished work on the famous invariance principle.
Section 6 gets us to the major contribution of the paper, namely extending the work of recent vintage by statistical scientists (Shen and his co-investigators) on the invariance principle as developed in its classical setting, the frequentist paradigm, and applied principally to variables on a ratio-scale. Our work constitutes a major extension of that scientific modelling paradigm.
In particular, Section 6.5 extends the invariance principle and moves our new approach to scientific modelling to the Bayesian paradigm. This major extension of both the scientific and statistical modelling approaches allows for quantities that could represent uncertain parameters, thereby embedding uncertainty quantification directly into the modeling paradigm. Indeed our new paradigm enables model uncertainty itself to be incorporated .
The paper wraps up with discussion in Section 7 and concluding remarks in Section 8. The supplementary material includes additional discussion, in particular a summary of the controversy of ancient vintage about whether or not taking the logarithm of measurements with units is valid, how Buckingham’s theory leads us to the famous Reynolds number, a general theory for handling data on an interval-scale, and finally a brief review of statistical decision analysis for Section 6.5.
2 The unconscious statistician
We start with critically examining key issues surrounding the topics of dimension and measurement scales through the Unconscious Statistician. We present three examples that illustrate some of the issues we’ll be exploring.
Example 2.
Ignoring scale and units of measurement when creating models can lead to difficulties; we cannot ignore the distinction between numbers and measurements. Consider the Poisson random variable . The claim is often made that the expected value and variance of are equal. But if has units, as it did when the distribution was first introduced in 1898 as the number of horse kick deaths in a year in the Prussian army (Härdle and Vogt, 2015), then clearly, the expectation and variance will have different units and therefore cannot be equated.
Example 3.
Consider a random variable representing length in millimetres, is normally distributed with mean and variance , independently measured times to yield data . Assume, as is common, that is so large that there is a negligible chance that any of the ’s are negative (we return to this common assumption in Section 4).
Then the maximum likelihood estimate (MLE) of is easily shown to be the sample average and the MLE of is then maximizer of the profile likelihood
(2.1) |
where , which has units of mm2. That estimate for is the MLE of , which is easily shown by differentiating with respect to and setting the result equal to zero. We note that, by any sensible definition of unit arithmetic, is unitless and so the units of are mm-n.
The Unconscious Statistician simplifies the maximization of by maximizing instead, its logarithm, believing that this alternative approach yields the same result. . The statistician finds the log of to be
Since the second term is unitless, dimensional homogeneity implies that the first term must also be unitless. So where did the units go? Analyses in Subsection 4.4 suggest the units reduce to a unitless by constructive processes for the logarithm that define it. The result is , the curly brackets demarcating the numerical part of , gotten by dropping the units of measurement. But itself has units mm2 and it seems unsettling to have them disappear simply by taking the logarithm.
However, the Unconscious Statistician ultimately gets the correct answer by failing to recognize the distinction between the scales of and So the derivative, which represents the relative rate of change between quantities on different scales, is computed as mm rather than . This then restores the missing units in the final result. As a fringe benefit, the second derivative , whose inverse defines Fisher’s information, also turns out to have the appropriate units.
However, the story does not end there. The problem of logarithms and their units warrants further discussion such as that in Subsection 4.4. That discussion indicates that calculating the logarithm of the likelihood is, in general, not sensible.
Remark 1.
In the frequentist paradigm for statistical modelling, the likelihood is defined by the sampling distribution, which depends on the stopping rule employed in collecting the sample. The likelihood function then becomes an equivalence class. The likelihood ratio can then be used to specify a member of that class. In Example 3 a reference normal likelihood could be used with set to a substantively meaningful . The MLE of and would then maximize this relative likelihood. This leads again to , but now the MLE of is found by maximizing the unitless :
We can now maximize this ratio as a function of the unitless , by taking logarithms, differentiating with respect to , setting it equal to 0 and solving for , and so finding that mm2.
Two complimentary, unconscious choices in Example 3 lead ultimately to a correct MLE. Things don’t go so well for two unconscious statisticians seen in the next example.
Example 4.
Here, the data are assumed to follow the model that relates , a length, to , a time:
Here the ’s are independent and identically distributed as a for a known . Suppose that hour while hours. Let , and . An analysis might go as follows when two statisticians A and B get involved.
First they both compute the likelihood and learn that the MLE is found by minimizing the sum of squared residuals :
which gives the MLE of ,
Then for prediction at time hour, they get
Suppose that foot, or 12 inches, and feet, or 36 inches. Statistician A uses feet and predicts at time hour to be
But Statistician B uses inches and predicts at hour to be
What has gone wrong here? The problem is that the stated model implicitly depends on the units of measurement. For instance, the numerical value of the expectation of when is equal to 1, no matter what the units of . When , Statistician A expects to equal 1 foot and Statistician B expects to equal 1 inch. We can see that the problem arises because the equation defining the model does not satisfy DH, since the “1” is unitless. In technical terms, we would say that this model is not invariant under scalar transformations. Invariance is important when defining a model that involves units. However, one could simply avoid the whole problem of units in model formulation by constructing the relationship between and so that there are no units. This is exactly the goal of the Buckingham Pi-theorem, presented in Subsection 5.1.
3 Dimensional analysis
Key to unifying the work on scales of measurement and the statistical invariance principle is DA. DA has a long history, beginning with the discussion of dimension and measurement (Fourier, 1822). Since DA is key to the description of a natural phenomenon, DA lies at the root of physical modelling. A phenomenon’s description begins with the phenomenon’s features, each of which has a dimension, e.g. ‘mass’ () in physics or ‘utility’ () in economics. Each dimension is assigned a scale e.g. ‘categorical’, ‘ordinal’, ‘ratio’, or ‘interval’, a choice that might be dictated by practical as well as intrinsic considerations. Once the scales are chosen, each feature is mapped into a point on its scale. For a quantitative scale, the mapping will be made by measurement or counting, for a qualitative scale, by assignment of classification. Units of measurement may be assigned as appropriate for quantitative scales, depending on the metric chosen. For example, temperature might be measured on the Fahrenheit scale, the Kelvin scale or the Celsius scale. This paper will be restricted to quantitative features, more specifically those features on ratio- and interval-scales.
3.1 Foundations
One tenet of working with measured quantities is that units in an expression or equation must “match up”; relationships among measurable quantities require dimensional homogeneity. To check the validity of comparative statements about two quantities, and , such as , or , and must be the same dimension, such as time. To add to , and must not only be the same dimension but must also be on the same scale and expressed in the same units of measurement.
To discuss this explicitly, we use a standard notation (Joint Commitee on Guides in Metrology, 2012) and write a measured quantity as , where is the numerical part of . may be read as the dimension of e.g. for length, for instance, or the units of on the chosen scale of measurement, e.g. . The latter by its nature means that the dimension is . If , then we say that is unitless or dimensionless. We define 1 or any number to be unitless, i.e., , unless stated explicitly.
To develop an algebra for measured quantities, for a function we must say what we mean by (usually easy) and (sometimes challenging). The path is clear for a simple function. For example, consider . Clearly we must have , yielding, say, (3 inches) inches2. But what if is a more complex function? This issue will be discussed in general in Subsection 4.2 and in detail for in Subsection 4.4.
For simple functions, the manipulation of both numbers and units is governed by an algebra of rules referred to as quantity calculus (Taylor, 2018). This set of rules states that and
-
•
can be added, subtracted or compared if and only if ;
-
•
can always be multiplied to get where and ;
-
•
can always be divided when to get where and ;
and that
-
•
can be raised to a power that is a rational fraction , provided that the result is not an imaginary number, to get .
Thus it makes sense to square-root transform ozone parts per million (ppm) as ppm1/2 since ozone is measured on a ratio-scale with a true origin of and hence must be non–negative (Dou, Le and Zidek, 2007). These rules can be applied iteratively a finite number of times to get expressions that are combinations of products of quantities raised to powers, along with sums and rational functions of such expressions.
This subsection concludes with an example that demonstrates the use of DH and quantity calculus.
Example 5.
This example concerns a structural engineering model for lumber strength now called the “Canadian model”(Foschi and Yao, 1986). Here is dimensionless and represents the somewhat abstract quantity of the damage accumulated to a piece of lumber by time . When , the piece of lumber breaks. This is the only time when is observed. The Canadian model posits that , the derivative of with respect to time, satisfies
(3.1) |
where , , , and are log-normally distributed random effects for an individual specimen of lumber, , measured in pounds per square inch (psi), is the stress applied to the specimen cumulative to time , (in psi) is the specimen’s short term breaking strength if it had experienced the stress pattern for a fixed known (in psi per unit of time), and is the unitless stress ratio threshold. The expression is equal to if is non-negative and is equal to 0 otherwise. Let denote the random time to failure for the specimen, under the specified stress history curve, meaning .
As has been noted (Köhler and Svensson, 2002; Hoffmeyer and Sørensen, 2007; Zhai et al., 2012a; Wong and Zidek, 2018), this model is not dimensionally homogeneous. In particular, the units associated with both terms on the right hand side of Equation (3.1) involve random powers, and , leading to random units, respectively (psi)b and (psi)n. As noted by Wong and Zidek (2018), the coefficients and in Equation (3.1) cannot involve these random powers and so cannot compensate to make the model dimensionally homogeneous.
Rescaling is a formal way of addressing this problem. Zhai et al. (2012a) rescale by setting . They let denote the population mean of and write a modified equation of (3.1) as the dimensionally homogeneous model
In contrast, Wong and Zidek (2018) propose another dimensionally homogeneous model
where and are now random effects with dimensions Force Length2.
We see that there may be several ways to nondimensionalize a model. Another method, widely used in the physical sciences, involves always normalizing by the standard units specified by the Système International d’Unités (SIU), units such as meters or kilograms. So when the dimensions of a non–negative quantity like absolute temperature have an associated SIU of , can be converted to a unitless quantity by first expressing in SIUs and then by using quantity calculus to rescale it as . The next example provides an important illustration of the application of the standardized unit approach.
Example 6.
Liquids contain both hydrogen and hydroxide ions. In pure water these ions appear in equal numbers. But the water becomes acidic when there are more hydrogen ions and basic when there are more hydroxide ions. Thus acidity is measured by the concentration of these ions. The customary measurement is in terms of the hydrogen ion concentration, denoted and measured in the SIU of one mole of ions per litre of liquid. These units are denoted co and thus, in our notation, co. However for substantive reasons, the pH index for the acidity of a liquid is now used to characterize acidity. The index is defined by c. Distilled water has a while lemon juice has a level of about . Note that lies on a ratio-scale while lies on an interval-scale – the transformation has changed the scale of measurement.
3.2 The problem of scales
The choice of scale restricts the choice of units of measurement, and these units dictate the type of model that may be used. However, comparing the size of two quantities on a ratio-scale must be made using their ratio, not their difference, whereas the opposite is true on an interval-scale where differences must be used.
Thus we need to study scales in the context of model building and hence in the context of quantity calculus. In his celebrated paper, Stevens (1946) starts by proposing four major scales for measurements or observations: categorical, ordinal, interval and ratio. This taxonomy is based on the notion of permissible transformations as is the work of our Section 5. However, our work is aimed at modelling while Stevens’s work is aimed at statistical analysis. Stevens defines permissible transformations as follows. He allows permutations as the transformations of data on all four scales, allows strictly increasing transformations of data on the ordinal, ratio and interval-scales, allows scalar transformations () of data on the ratio and interval-scales and allows linear transformations () of data on the interval-scale.
Stevens created his taxonomy as a basis for classifying the family of all statistical procedures for their applicability in any given situation (Stevens, 1951). And for instance, Luce (1959) points out that, for measurements made on a ratio-scale, the geometric mean would be appropriate for estimating the central tendency of a population distribution (Velleman and Wilkinson, 1993). In contrast, when measurements are made on an interval-scale the arithmetic mean would be appropriate. The work of Stevens seems to be well-accepted in the social sciences, with Ward (2017) calling his work monumental. But Steven’s work is not widely recognized in statistics. Velleman and Wilkinson (1993) reviews the work of Stevens with an eye on potential applications in the then emerging area in statistics of artificial intelligence (AI), hoping to automate data analysis. They claim that “Unfortunately, the use of Stevens’s categories in selecting or recommending statistical analysis methods is inappropriate and can often be wrong.” They describe alternative scale taxonomies for statistics that have been proposed, notably by Mosteller and Tukey (1977). A common concern centers on the inadequacies of an automaton to select the statistical method for an AI application. Even the choice of scale itself will depend on the nature of the inquiry and thus is something to be determined by humans. For example, length might be observed on the relatively uninformative ordinal scale , were it sufficient for the intended goal of a scientific inquiry, rather than on the seemingly more natural ratio-scale .
3.3 Why the origin matters
The interval-scale of real numbers allows for the taking of sums, differences, products, ratios, and integer powers of values observed on that scale. Rational powers of nonnegative values are also allowed although irrational powers lead into the domain of transcendental functions and difficulties of interpretation. The same operations are allowed for a ratio-scale of real numbers provided that the differences are non-negative. So superficially, these two scales seem nearly the same.
But there is a substantial qualitative difference between ratio- and interval-scales, so ignoring the importance of scale when building models can result in challenges in interpretation. The issue has to do with the meaning of the on a ratio-scale. The next hypothetical example illustrates the point.
Example 7.
When the material in the storage cabinet at a manufacturing facility has been depleted, the amount left is . To understand the usefulness of this origin, consider if the facility’s inventory monitoring program recorded a drop of during the past month. Without the knowledge of the origin, of where the amount of inventory lies on the scale, the implications of this drop are unclear. If the amount left in the facility is the drop means one thing, while if the amount left is , the interpretation would be completely different.
Since the amount of inventory lies on the ratio-scale, these changes should instead be reported using ratios. The recorder in the facility in the first case would report a decline in inventory of or . In the second case, the recorder would report a decline of or , the same drop but with a totally different meaning. This example explains why stock price changes are reported on a ratio-scale, as a percentage, and not on an interval-scale.
4 Transforming quantities
The scale of the measurement may be transformed in a variety of ways. No change of scale occurs when the transformation is a rescaling, where we know how to transform both the numerical part of and ’s units of measurement. When the transformation is complex, the scale itself might change. For instance, if is measured on a ratio-scale, then the logarithm of will be on an interval-scale. Observe that in Example 6, the units of measurement in were eliminated before transforming by the transcendental function . That raises the question: do we need to eliminate units before applying the logarithm? This question and the logarithmic transformation in science have led to vigorous debate for over six decades (Matta et al., 2010). We highlight and resolve some of that debate below in Section 4.4.
However, we begin with an even simpler situation seen in the next subsection, where we study the issues that may arise when interval-scales are superimposed on ratio-scales.
4.1 Switching scales
This subsection concerns a perhaps unconscious switch in a statistical analysis from a ratio scale, which lies on , to an interval-scale, which lies on .
The bell curve approximation.
Despite the fundamental difference between the ratio-scale and interval-scales, the normal approximation is often used to approximate the sampling distribution for a ratio-valued response quantity. This in effect replaces the ratio-scale with an interval-scale. In this situation, what should be used is the truncated normal distribution approximation, although this introduces undesired complexity. For example, if the approximation for the cumulative distribution function CDF were where , we would have
(4.1) |
where , while and denote respectively, the standardized Gaussian distribution’s probability density function and CDF. Observe that in Equation (4.1) is invariant under changes of the units in which is measured, as it should be. Furthermore the expectation of would be approximately if were large compared to as it would be were the non-truncated Gaussian distribution for an interval-scale imposed on this ratio-scale. This would occur if the mean were much larger than the standard distribution . That suggests the bell curve approximation would not work well as an approximation were the population under investigation widely dispersed. For example it might be satisfactory if represented the height of a randomly selected adult woman, but not if it were the the height of a randomly selected human female.
As mentioned at the beginning of this section, this switch occurs when approximating a distribution. This switch is ubiquitous and seen in most elementary statistics textbooks. The assumed Gaussian distribution model leads to the sample average as a measurement of the population average instead of the geometric mean, which should have been used (Luce, 1959). That same switch is made in such things as regression analysis and the design of experiments. The seductive simplicity has also led to the widespread use of the Gaussian process in spatial statistics and machine learning.
The justification of the widespread use of the Gaussian approximation may well lie in the belief that the natural origin of the ratio-scale lies well below the range of values of likely to be found in a scientific study. This may well be the explanation of the reliance on interval-scales for temperature in Celsius and Fahrenheit on planet Earth at least since one would not expect to see temperatures anywhere near the true origin of temperature on the Kelvin scale (ratio) that corresponds to on the Celsius scale (interval). We would note in passing that these two interval-scales for temperature also illustrate the statistical invariance principle (see Subsection 6.4); each scale is an affine transformation of the other.
We illustrate the difficulties that can arise when an interval-scale is misused in a hypothetical experiment where measurements are made on a ratio-scale, with serious consequences.
Example 8.
The justification above for the switch from a ratio to an interval-scale can be turned into a simple approximation that may help with the interpretation of the data. To elaborate, suppose interest lies in comparing two values of , and , that lie in a ratio-scale with for a known . Interest lies in the relative size of these quantities i.e. on . An approximation to through the first order Taylor expansion of at yields , thus providing an approximation to on an interval-scale. For instance, with and , the ratio is and the approximation, . Both are unitless. This points to the potential value of rescaled ratio data when a Gaussian approximation is to be used for the sampling distribution of a quantity on a ratio-scale.
4.2 Algebraic versus transcendental functions
A function , which describes the relationship among quantities as
may be a sequence of transformations or operations involving the ’s, possibly combined with parameters. We know how to calculate the resulting units of measurement when consists of a finite sequence of permissible algebraic operations. The function consisting of the concatenation of such a sequence may formally be defined as a root of a polynomial equation that must satisfy the requirement of dimensional homogeneity (other desirable properties of along with methods for determining an allowable are discussed in Section 5). Such a function is called algebraic.
But may also involve non-algebraic operations leading to non-algebraic functions called transcendental (because they “transcend” an algebraic construction). Examples in the univariate case ( are and and, for a given nonnegative constant , and . The formal definition of a non-algebraic function does not explicitly say whether or not such a function can be applied to quantities with units of measurement. Bridgman (1931) sidesteps this issue by arguing that it is mute since valid representations of natural phenomena can always be nondimensionalized (see Subsection 5.1). But the current Wikipedia entry on the subject states “transcendental functions are notable because they make sense only when their argument is dimensionless” (Wikipedia, 2020). The next subsection explores the much used class of Box-Cox family (Box and Cox, 1964) that includes transcendental functions.
4.3 The Box-Cox tranformation
Frequently in statistical modelling, a transformation is used to extend the domain of applicability of a procedure that assumes normally distributed measurements (De Oliveira, Kedem and Short, 1997). That transformation may also be seen as a formal part of statistical model building that facilitates maximum likelihood estimation of a single parameter (Draper and Cox, 1969). The Box-Cox (BC) transformations constitute an important class of such transformation and therefore a standard tool in the statistical scientist’s toolbox.
In its simplest form, a member of this family of transformations has the form of a function for a real-valued parameter . Here would need to lie in , a ratio-scale, to avoid potential imaginary numbers. However, in practice interval-scales are sometimes allowed, a positive constant being added to avoid negative realizations of . This ad hoc procedure thus validates the use of a Gaussian distribution to approximate the sampling distribution for .
Since is measured on a ratio–scale, for any two points on that scale, while the scale is equivariant under multiplicative transformations, i.e., for any point on that scale. Finally so that the result of the transformation also lies on a ratio–scale, even when its intended goal is an approximately Gaussian distribution for the (transformed) response.
Box and Cox (1964) actually state their transformation as
(4.2) |
that moves the origin of the ratio–scale from to . It is readily seen that unless is a rational number, will be transcendental. That fact would be inconsequential in practice in as much as a modeller would only ever use a rational number for . Or at least that would be the case except that the BC class has been extended to include by admitting for membership .
On closer inspection, we see that for validity, in Equation (4.2), the needs to be replaced by to include units of measurement. Then for the transformation becomes
(4.3) |
As , the only tenable limit would seem to be
not . In other words, in taking logarithms in the above example, the authors may have unconsciously nondimensionalized the measurements. Taking antilogarithms would then return , not .
Equation 4.3 thus tells us the Box–Cox transformation may have the unintended consequence of transforming not only the numerical value of the measurements but also its units of measurement, which become . This makes a statistical model difficult to interpret. For example imagine the challenge of a model with a random response in units of . And since the transformation is nonlinear, returning to the original scales of the data would be difficult. For example, unbiased estimates would become biased and their standard errors could be hard to approximate.
Remark 2.
Box and Cox (1964) do not discuss the issue of scales in relation to the transformation it introduces. In that paper’s second example the logarithmic transformation is applied to , the number of number of cycles to failure, which may be regarded as unitless.
Remark 3.
The mathematical foundation of the Box–Cox family is quite complicated. Observe that in Equation (4.2), if is a rational number for some nonnegative integers and , will be an algebraic function of . So as varies over its domain, , the function flips back and forth from an algebraic to a transcendental function. For any fixed and point , as approaches infinity, the trajectory of
converges to , so the family now includes the logarithmic transformation as another transformation in the statistical analyst’s toolbox, which is used when the response distribution appears to have a long right tail. Thus a transcendental function has been added to the family of algebraic transformations obtained when is chosen to be a positive rational number. It does not seem to be known if all transcendental transformations lie in the closure of the class algebraic functions under the topology of pointwise convergence. However, when this family is shifted from the domain of mathematical statistics in the human brain to that of computational statistics in the computer’s processor, this complexity disappears. In the computational process, all functions are algebraic and neither the logarithm nor infinity exist.
The importance of the logarithmic transformation in statistical and scientific modelling, and issues that have arisen about its lack of units, leads next to a special subsection devoted to it.
4.4 The logarithm: a transcendental function
Does the logarithm have units?
We have argued (see Example 3) that the answer is “no.” First consider applying the logarithm to a unitless quantity . It is sensible to think that its value will have no units, and so we take this as fact. But what happens if we apply the logarithm to a quantity with units? For instance, is log(12 inches) = log(12) + log(inches)? This issue has been debated for decades across different scientific disciplines; we summarize recent debates in Appendix A.
We now discuss this issue in more detail and argue that the result must be a unitless quantity. We use the definition of the natural logarithm of as the area under the curve of the function (Molyneux, 1991). We follow the notation defined in Section 3.1, and for clarification, we write “1” as and . We then make the change of variables so that is unitless and get
(4.4) |
which is a unitless quantity, as claimed.
We now derive the more specific result, that . In other words, applying this transcendental function to a dimensional quantity simply causes the units to be lost. We show below, from first principles, that for unitless,
This implies that
which, combined with Equation (4.4), implies that .
To show that the derivative of is , we turn to the original definition of the natural logarithm as the inverse of another transcendental function , at least if . In other words The chain rule now tells us that
Thus
for any real .
Can we take the logarithm of a dimensional quantity with units?
We argue that the answer is “no.” We reason along the lines of Molyneux (1991), who sensibly argues that, since has no units even when has units, the result is meaningless. In other words, since it is disturbing that the value of a function is unitless, no matter what the argument, we should not take the logarithm of a dimensional quantity with units.
This view agrees with that of Meinsma (2019). He notes that the Shannon entropy of a probability density function , which has units, is defined in terms of the . He notes that Shannon found this to be an objectionable feature of his entropy but rationalized its use nevertheless. But not Meinsma, who concludes “To me it still does feel right….”
To consider the ramifications of ignoring this in a statistical model, suppose that is some measure of particulate air pollution in the logarithmic scale with where is a measurement with units. This measurement appears as in a scientific model of the impact of particulate air pollution on health (Cohen et al., 2004). In this model, even though is unitless, its numerical value depends on the numerical value of , via . Thus the numerical value of depends on the units of measurement of . But, since itself is unitless, we cannot adjust to reflect changes in the units of . To make this point explicit, suppose that experimental data pointed to the value . We have no idea if air pollution was a serious health problem. Thus, we see the problem that arises with a model that involves the logarithm of a measurement with units. This property of the logarithm points to the need to nondimensionalize before applying the logarithmic transformation in scientific and statistical modelling, in keeping with the theories of Buckingham, Bridgman and Luce.
One of the major routes taken in debates about the validity of applying the natural logarithm to a dimensional quantity involves arguments based one way or another on a Taylor expansion. A key feature of these debates involves the claim that the expansion is impossible, since the terms in the expansion have different units and so cannot be summed (Mayumi and Giampietro, 2010). We show below that this claim is incorrect by showing that all of the terms in the expansion have no units (see Appendix A for more details).
Key to the Taylor expansion argument of validity is how to take the derivative of when has units. Recall that above, we calculated the derivative of for unitless. To define the derivative of when has units, we proceed from first principles. Suppose we have a function with argument . We define the derivative of with respect to as follows. Let and and suppose that and have the same units, that is, that . Otherwise, we would not be able to add and in what follows. Then we define
For instance, for
Using this definition of the derivative we can carry out a Taylor series expansion about to obtain
(4.6) |
where
As , the first term, , in the infinite summation is unitless. Differentiating yields and once again, we see that the term is unitless. Continuing in this way, we see that the summation on the right side of equation (4.6) is unitless, and so the equation satisfies dimensional homogeneity. This reasoning differs from the incorrect reasoning of Mayumi and Giampietro (2010) in their argument that the logarithm cannot be applied to quantities with units because the terms in the Taylor expansion would have different units. Our reasoning also differs from that of Baiocchi (2012) who uses a different expansion to show that the logarithm cannot be applied to measurements with units, albeit without explicitly recognizing the need for to be unitless. The expansion in Equation (4.6) is the same as that given in Matta et al. (2010), albeit not in an explicit form for . Like us, they do discredit the Taylor expansion argument against applying to quantities with units.
5 Allowable relationships among quantities
Having explored dimensional analysis and the kinds of difficulties that can arise when scales or units are ignored, we turn to a key step in unifying physical and statistical modelling. We now determine how to relate quantities and hence how to specify the ‘law’ that characterizes the phenomenon which is being modelled.
But what models may be considered legitimate? Answers for the sciences, given long ago, were based on the principle that for a model to completely describe a natural phenomenon, it cannot depend on the units of measurement that might be chosen to implement it. This answer was interpreted in two different ways. In the first interpretation, the model must be nondimensionalizable, i.e., it cannot have scales of measurement and hence cannot depend on units. In the second interpretation, the model must be invariant under all allowable transformations of scales. Both of these interpretations reduce the class of allowable relationships that describe the phenomenon being modelled and place restrictions on the complexity of any experiment that might be needed to implement that relationship.
5.1 Buckingham’s Pi-theorem
The section begins with Buckingham’s simple motivating example.
Example 9.
This example is a characterization of properties of gas in a container, namely, a characterization of the relationship amongst the pressure (), the volume (), the number of moles of gas () and the absolute temperature () of the gas. The absolute temperature reflects the kinetic energy of the system and is measured in degrees Kelvin (), the SIUs for temperature. A fundamental relationship amongst these quantities is given by
(5.1) |
for some constant that doesn’t depend on the gas. Since the dimension of is (force length3)/(# moles temperature), as expressed, the relationship in Equation (5.1) depends on the units associated with and , whereas the physical phenomenon underlying the relationship does not. Buckingham gets around this by invoking a parameter () with units (# moles temperature)/(force length3). He rewrites Equation (5.1) as
Thus has no units. Buckingham calls this equation complete and hence nondimensionalizable. This equation is known as the Ideal Gas Law, with denoting the ideal gas constant (LibreTexts, 2019).
This example of nondimensionalizing by finding one expression, , as in Equation (5.1) can be extended to cases where we must nondimensionalize by finding several quantities. This extension is formalized in Buckingham’s Pi-theorem. Here is a formal statement (in slightly simplified form) as stated by Buckingham (1914) and discussed in a modern style in Bluman and Cole (1974).
Theorem 1.
Suppose are measurable quantities satisfying a defining relation
(5.2) |
that is dimensionally homogeneous. In addition, suppose that there are dimensions appearing in this equation, denoted , and that the dimension of can be expressed and the dimension of each can be expressed as . Then Equation (5.2) implies the existence of fundamental quantities, dimensionless quantities with , and a function such that
In this way has been nondimensionalized. The choice of in general is not unique.
The theorem is proven constructively, so we can find and . We first determine the fundamental dimensions used in . We then use the quantities to construct two sets of variables: a set of primary variables also called repeating variables and a set of secondary variables, which are nondimensional. For example, if is the length of a box and is the height and is the width, then there is fundamental dimension, the generic length denoted . We can choose as the primary variable and use to define two new variables and . These new variables, called secondary variables, are dimensionless. Buckingham’s theorem states the algebraic equation relating and can be re-written as an equation involving only and . Note that we could have also chosen either or as the primary variable.
A famous application of Buckingham’s theorem concerns the discovery of the Reynold’s number in fluid dynamics, which is discussed in Gibbings (2011). For brevity we include that example in Appendix B.
A link between Buckingham’s approach and statistical modelling was recognized in the paper of Albrecht et al. (2013) and commented on in Lin and Shen (2013). But its link with the statistical invariance principal seems to have been first identified in the thesis of Shen (2015). This connection provides a valuable approach for the statistical modelling of scientific phenomena. Shen builds a regression model starting with Buckingham’s approach and thereby a nondimensionalized relationship amongst the variables of interest. We propose a different approach in Section 6. We present Shen’s illustrative example next.
Example 10.
This example, from Shen (2015), concerns a model for the predictive relationship between the volume of wood in a pine tree and its height and diameter . The dimensions are , and . Shen chooses as the repeating variable and calculates the -functions and . He then applies the Pi-theorem to get the dimensionless version of the relationship amongst the variables:
for some function . He correctly recognizes that is the maximal invariant under the scale transformation group, although the connection to the ratio-scale of Stevens is not made explicitly. He somewhat arbitrarily chooses the class of relationships given by
(5.3) |
He linearizes the model in Equation (5.3) by taking the logarithm and adds a residual to get a standard regression model, susceptible to standard methods of analysis. In particular the least squares estimate turns out to provide a good fit judging by a scatterplot.
Note that application of the logarithmic transformation is justified since the -functions are dimensionless.
5.2 Bridgman’s alternative
We now describe an alternative to the approach of Buckingham (1914) due to Bridgman (1931). At around the same time that Edgar Buckingham was working on his Pi-theorem, Percy William Bridgman was giving lectures at Harvard on the topic of nondimensionalization that were incorporated in a book whose first edition was published by Yale University Press in 1922. The second edition came out in 1931 (Gibbings, 2011). Bridgman thanks Buckingham for his papers but notes their approaches differ. And so they do. For a start, Bridgman asserts his disagreement with the position that seems to underlie Buckingham’s work that “a dimensional formula has some esoteric significance connected with the ‘ultimate nature’ of things.” Thus those that espouse that point of view must “find the true dimensions and when they are found, it is expected that something new will be suggested about the physical properties of the system.” Instead, Bridgman takes measurement itself as the starting point in modelling and even the collection of data: “Having obtained a sufficient area of numbers by which the different quantities are measured, we search for relations between these numbers, and if we are skillful and fortunate, we find relations which can be expressed in mathematical form.” He then seeks to characterize a measured quantity as either primary, that is, the product of direct measurement, or secondary, that is, computed from the measurements of primary quantities, as, for instance, velocity is computed from the primary quantities of length and time. Finally he sees the basic scientific issue as that of characterizing one quantity in terms of the others as in our explication of Buckingham’s work above in terms of the function .
Bridgman considers the functional relationship between secondary and primary measurements under what statistical scientists might call “equivariance” under multiplicative changes of scale in the primary units. He proves that the functional relationship must be based on monomials with possible fractional exponents, not unlike the form of the -functions above. Thus Bridgman is able to re-derive Buckingham’s formula, albeit with the added assumption that is differentiable with respect to its arguments.
5.3 Beyond ratio-scales
Nondimensionalization seems more difficult outside of the domain of the physical sciences. For example, the dimensions of quantities such as utility cannot be characterized by a ratio-scale. And the choice of the primary dimensions is not generally so clear, although Baiocchi (2012) does provide an example in macroeconomics where time , money , goods and utility may together be sufficient to characterize all other quantities.
Bridgman’s results on allowable laws were limited to laws involving quantities measured on ratio-scales. A substantial body of work has been devoted to extending these results to laws involving quantities measured on nonratio-scales, beginning with the seminal paper of Luce (1959). To quote the paper by Aczél, Roberts and Rosenbaum (1986), which contains an extensive review of that work, “Luce shows that the general form of a ‘scientific law’ is greatly restricted by knowledge of the ‘admissible transformations’ of the dependent and independent variables.” It seems puzzling that this principle has been recognized little if at all in statistical science. This may be due to the fact fact that little attention is paid to such things as dimensions and units of measurement.
The substantial body of research that followed Luce’s publication covers a variety of scales, e.g. ordinal, among other things. Curiously that body of work largely ignores the work of Buckingham in favor of Bridgman even though the former preceded the latter. Also ignored is the work on statistical invariance described, which goes back to G. Hunt and C. Stein in 1946 in unpublished but well-known work that led to optimum statistical tests of hypotheses.
To describe this important work by Luce, we re-express Equation (5.2) as
(5.4) |
for some function and thereby define a class of all possible laws that could relate to the predictors , before turning to a purely data-based empirical assessment of the possible ’s. Luce requires that satisfy an invariance condition. Specifically, he makes the strong assumption that the scale of each , is susceptible to a transformation , i.e. for some sets of possible transformations . Furthermore he assumes that the ’s are transformed independently of one another; no structural constraints are imposed. Luce assumes the existence of a function such that
(5.5) |
for all possible transformations and choices of . He determines that under these conditions, if each , lies on a ratio-scale then
where the ’s are nondimensional constants. This is Bridgman’s result, albeit proved by Luce without assuming differentiability of . If on the other hand some of the ’s, , are on a ratio-scale while others are on an interval-scale and is on an interval-scale, then Luce proves cannot exist except in the case where and is on an interval-scale.
However, as noted by Aczél, Roberts and Rosenbaum (1986), the assumption in Equation (5.5) of the independence of the transformations seems unduly strong for many situations, and weakening that assumption expands the number of possibilities for the form of . Further work culminated in that of Paganoni (1987). While this work was for ’s in a general vector space, for simplicity we present it here in our context, where , . Let and be nonempty subsets of and a set of by real-valued matrices. Suppose that
-
1.
for all and ;
-
2.
the identity matrix is in and, for all and all , ;
-
3.
if , then for all and all .
Suppose also that the function in Equation (5.4) satisfies
for all , and for some positive-valued function and real-valued function . Paganoni then determines the possible forms of and .
6 Statistical invariance
Having covered some important issues at the foundations of modelling in previous sections, we now turn to the modelling itself. It usually starts with a study question on the relationship among a set of specified, observable or measurable attributes of members of a population, . A random sample of its members is to be collected to address the study question.
A fundamental principle (Principle 1) for constructing a model for natural phenomena, which is embraced by Buckingham’s Pi-theory, asserts that the model cannot depend on the scales and consequent units in which the attributes are to be measured. That principle can be extended to cover other features deemed to be irrelevant.
Principle 2 for constructing a model calls for judicious attribute choices and transformations to reduce the sample size needed to fit the model. The specified attributes could be design variables selected in advance of sampling to maximize the value of the study. A notable example comes from computationally expensive computer simulators. These are run at a selected set of input attributes to develop computationally cheap emulators. This in turn leads to a need to reduce the number of inputs in the predictive model that would need to be fitted empirically. A classical example is given in Appendix B where Buckingham’s theorem leads to a model with just a single predictand and a single predictor, the latter being derived from the original set of five predictors.
Finally we dichotomize general approaches to modelling. Approach 1 i.e. scientific modelling, leads to what Meinsma (2019) calls “physical models.” The models are generally deterministic with attributes measured on a ratio-scale. The in Equation (5.2) is commonly known at least up to unknown parameters (e.g., Example 1), before any sampling is done.
Approach 2 leads to a second type of models commonly seen in the social and medical sciences. There, we have a sample of attribute-vectors, each of dimension to which the invariance principle is applied. And that application can lead to a nondimensionalization of the data with a consequent reduction in the number of attributes, all based on the aggregated sample of attribute-vectors. But beyond eliminating irrelevant units of measurement, applying the principle can eliminate other irrelevant features of the data, such as angle of rotation. In our approach to be described, the entire sample is holistically incorporated into model development and implementation. Now a single maximal invariant is used to summarize the sample.
In keeping with the goal of generalizing Buckingham’s theory, our approach will focus on the construction of a predictive distribution model. Model uncertainties can then be characterized through such things as conditional variances and residual analysis. Furthermore principled empirical assessments of the validity of can be made given the replicate samples.
Scales play a prominent role in modelling as well. So for categorical attributes, e.g. R red, Y yellow, G green, the model should be invariant under permutations of the code by which the attributes are recorded. Models with ordinal attributes e.g. small, medium, large should be invariant under positive monotone transformation. But, as noted in Section 1, this paper will focus mainly on ratio-scales and interval-scales. For all scales, and both approaches to modelling, transformation groups to which we now turn play a key role.
6.1 Transformation groups
This subsection reviews the theory of transformation groups and the statistical invariance principle, a topic that has a rich history (Eaton, 1983). These are needed for extending the Buckingham Pi-theory. That need is recognized by Shen and Lin (2019), although their applications concern ratio-scales and physical models. To introduce these groups, for simplicity, in both this section and the next, we will focus on how groups transform the sample space. Later, in Sections 6.3 and 6.5, we will use the same concepts for the full general theory of statistical invariance and generalized statistical invariance.
Each has a vector of measurable attributes :
A sample of ’s is to be drawn according to a probability distribution on , with inducing a probability distribution on . Buckingham’s theory (see Subsection 5.1) aims at relating ’s attributes through a model like that in Equation (5.4). Our extension of that theory below will be stochastic in nature and assign the special role of predictand.
A sample of size yields a sample of observations represented by . The statistical invariance principle posits that randomized statistical decision rules that determine actions should be invariant under transformations by members of an algebraic group of transformations . That is, any pair of points are considered equivalent for statistical inference if and only if for some . This equivalence is denoted . By definition, the equivalence classes formed by are disjoint and exhaustive so we can index them by a parameter and let , represent an equivalence class. The , are referred to as orbits, which could be indexed by a set of points, . If the set of points satisfies some regularity conditions, then it is called a cross-section, denoted , and its existence is studied by Wijsman (1967). Assuming a cross-section does exist, we may write
In other words, any point is represented by for appropriately chosen and .
The statistical invariance principle states that a statistical decision rule must be invariant, that is, the rule must take the same value for all points in a single orbit . Maximal invariant functions play a special role in statistics. The function is invariant if its value is constant on each orbit. Further, is a maximal invariant if it takes different values on each orbit.
The following example shows the statistical invariance principle in action.
Example 11.
A hard-to-make, short-lived product has an exponentially distributed random time to failure. A process-capability-analysis led to a published value of for that product’s average time-to-failure. The need to assure that standard is valid has led to a periodic sample of size resulting in a sample vector . To make inference about , the expected value of , following Remark 1, the analyst relies on the relative likelihood, i.e., ignoring irrelevant quantities
(6.1) |
where in general for any quantity with the same units as , is unitless. Differentiating the relative likelihood in Equation (6.1) yields the maximum likelihood (MLE)
Using the relative likelihood thus leads to any change in relative to the published value being expressed by their ratio as mandated by their lying on a ratio scale. The same is true of relative change estimated by the MLE.
The group transforms any realization as follows:
As a maximal invariant i.e. -function we may take,
where . The range of in is given by
Points in index the orbits of the group . To locate a point on the orbit, it entails taking and applying the transformation , to . Thus, the orbits created by are rays in the positive quadrant, emanating from, but not including, the point . Thus is the union of these rays. Finally, we may let .
, as a function, plays a key role in developing the randomized (if necessary) statistical procedures that are invariant under transformations of . For example
where and . But better choices of may be dictated by the manufacturer’s loss function.
Note that here , , and are all unitless.
Invariance of statistical procedures under the action of transformation groups may be a necessity of modelling. For instance, consider the extension of Newton’s second law (Example 1) to the case of vector fields where velocity replaces speed and direction now plays a role. The statistical model for this extension may need to be invariant under changes of direction. In other cases, invariance may be required under permutations and monotone transformations. So in summary transformation groups may play an important role in both scientific and statistical modelling.
6.2 Nondimensionalization
This section presents a novel feature of this paper, the need for dimensional consistency combined with the nondimensionalization principle, that no model should depend on the units in which the data have been measured. Of particular note is the comparison of the strict application of Buckingham’s -theory as described in Shen and Lin (2019) (Approach 1) and the one we are proposing (Approach 2). The comparison is best described in terms of an hypothetical example.
Example 12.
In a study of the magnitude of rainfall, the primary (repeating) variables are and , denoting the depth of the rain collected in a standardized cylinder and the duration of the rainfall, respectively. The third quantity represents the magnitude of the rainfall as measured by an electronic sensor that computes a weighted average of as a process over the continuous time period ending at time . The dimensions of the three measurable quantities are , and , which is secondary. Thus the attribute-vector is the column vector
The attribute-space is all possible values of .
The scales and units of measurement are selected by the investigators. These could be changed by an arbitrary linear transformation
(6.2) |
But the dimension of is related to those of (), and (). This relationship must be taken into account when the scales of these dimensions are selected with their associated units of measurement.
To begin, an experiment is to be performed twice and all three attributes measured each time. The result will be a dimensional matrix
Thus, the sample space will be the set of all possible realizations of .
Now a predictive model is to be constructed in accordance with Buckingham’s desideratum that the model should not depend on the measurement system the experimenters happen to choose. Furthermore, dimensional consistency dictates that any changes in the measurements must be consistently applied to all the attributes. More precisely the scales of measurement would require that in the transformation matrix of Equation (6.2).
Approach 1 focuses on , not , and is based on considering length and time as fundamental quantities, the primary attributes are and , with respective dimensions, and . The predictand, , must be nondimensionalized as a Buckingham -function. Thus, we get
for each one of the two sampled vectors. The primary variables are labelled . We define the nondimensionalized attributes vector as
In other words, in Buckingham’s theory, the function that expresses the relationship among the variables for each of the two samples is
But the right hand side of this equation is a constant, which is not unreasonable since Buckingham’s model was intended to be deterministic. To deal with that issue we might adopt the ad hoc solution proposed by Shen in a different example (Shen, 2015, p. 17) by introducing a further variable, namely a model error :
(6.3) |
Taking logarithms and fitting the resulting model yields an estimate for .
In predictive form, for a future , without an electronic sensor for measuring rainfall, Equation (6.3) yields, after estimating ,
where and are the depth and duration measurements. On the other hand, there are technical advantages to ignoring units of measurements as is commonly done in developing and validating statistical models, as noted by the anonymous reviewer quoted in Section 1. In that case we would obtain
Remark 4.
A more formal approach would include the model error in a nondimensional form so that . Going through the steps above yields
In contrast to Approach 1, our approach, Approach 2, treats the sample holistically, considering the whole data matrix , not just . We nondimensionalize the problem by choosing as the primary variables and , , although other choices are available. Let . We then form the -functions
Then for each of the two samples we obtain
In predictive form this result becomes
Suppose we take
for some positive . Then
From the last result we obtain the model of Shen and Lin (2019)
However, the final choice for could be dictated by an analysis of the data, an advantage of our holistic approach.
Finally we summarize our choice of -functions as a maximal invariant
Remark 5.
This example shows that Approach 2 can yield the same model as Approach 1 even though Approach 1 is designed for a single dimensional attribute vector unlike Approach 2, which starts with the entire sample matrix (with ). This phenomenon will be investigated in future work.
Following Example 11, we can formalize the creation of orbits, -functions and so on in terms of a transformation group acting on the attribute-vector . A subgroup obtains by restricting .
We are now prepared to move to the general case and a generalization of the concepts seen in this example.
6.3 Invariant statistical models
This subsection builds on Subsections 6.1 and 6.2 to obtain a generalized version of Buckingham’s Pi-theory. This means transforming the scales of each of the so-called primary attributes , which leads ineluctably to transforming the scales of the remaining, secondary attributes . Models like that in Equation (5.2) must reflect that link. In this subsection, we extend Buckingham’s idea beyond changes of scale by considering the application of a transformation of the attribute-scales. Our approach assumes a sample of attribute vectors. When prediction is the ultimate goal of inference, as in Example (1), our inferential aim is to construct a model as expressed in Equation (5.4).
Sample space.
We consider a sample of possibly dependent attribute-vectors collected from a sample of ’s from the population . The sample matrix, , is partitioned to reflect the primary and secondary attributes as follows.
(6.4) |
Let denote all possible values of , . We define a group of transformations on through the following theorem. Each transformation is first defined on , with an extension to that yields unit consistency.
Theorem 2.
Let be a group of transformations on with identity element . Assume the following.
-
1.
There exists a function defined on so that is always unitless.
-
2.
For each , there exists a with for all and .
-
3.
For all , .
-
4.
For all , .
Let be the set of all transformations from to of the form
Then is a group under composition.
Proof.
To show that is closed under composition, let and , both in , be associated with, respectively, and , both in . Then
by Assumption 4. So is associated with . We easily see that , and so is in . Clearly, the identity element of is given by , which equals by the definition of and by Assumption 3. The inverse of is easily found: if , then . ∎
Illustrating the Theorem via Example 11, we have
has members
One choice for the function is
For each , we see that . We also see that The set consists of transformations of the form
We easily see that . Therefore, is a group.
Thus, by the Theorem, given a group of transformations on the primary attributes in the sample, we can construct a group of transformations on all attributes and we can write
Orbits will be indexed by and will denote a maximal invariant under the action of . Let
Therefore, by the statistical invariance principle, acceptable randomized decision rules, which include equivariant estimators as a special case, depend on only through the maximal invariant. We obtain as a special case the Buckingham -functions as a special case where, in particular, the attributes are assessed on ratio-scales. Note that the -functions obtained in this way are not unique.
The maximal invariant’s distribution.
Suppose that the distribution of is in the collection of probability distributions, . Assume, for all , the distribution of is also contained in . More precisely assume that for each , there is a one-to-one transformation of onto such that has distribution if and only if has distribution . Assume further that the set of all is a transformation group under composition, with identity . Assume also that is homomorphic to , i.e. that there exists a one-to-one mapping from onto such that, for all , ; , and .
Let denote the set inverse, that is, : . Then since for any , for all and ,
Thus, any “connected to” via some induces the same distribution on . This implies that is invariant under transformations in and hence that depends on only through a maximal invariant on . We denote that maximal invariant by . Finally we relabel the distribution of under (and under all of the associated ’s by .
The actions of the group have nondimensionalized as . Thus we obtain a stochastic version of the Pi-theorem. More precisely using the general notation to represent “the distribution of” for any random object we have a nondimensionalized conditional distribution of the nondimensionalized predictand from sample given the transformed predictors of all samples as
(6.5) |
More specifically, we have derived the result seen in Equation (6.5), which is the conditional distribution assumed in a special case by Shen (2015) in his Assumption 2. Furthermore we predict for by its conditional expectation, using the distribution in equation (6.5), which can be derived once the joint distribution of the attributes has been specified. The conditional variance would express the predictor’s uncertainty. Hence statistical invariance implies that information about the variables can be summarized by maximal invariants in the sample space and in the parameter space.
6.4 interval-scales
Returning to Equation (5.2), recall that underlying the Buckingham Pi-theorem are variables that together describe a natural phenomenon through the relationship expressed in that equation. The Pi-theorem assumes that of these variables are designated as the repeating or primary variables, while the remainder, which are secondary, have scales of measurement that involve the dimensions of the primary variables. It is the latter that are converted to the -functions in the Buckingham theorem. But as we have seen in Subsection 6.3, it is these same variables that together yield the maximal invariant under the actions of a suitably chosen group, which was fairly easily identified in the case of ratio-scales.
Subsection 6.3 provides the bridge between the statistical invariance principle and the deterministic modeling theories described in Section 5 (i.e., the deterministic modeling frameworks developed in the physical sciences where ratio–scales are appropriate). Appendix C develops a similar bridge with such models in the social sciences where quantities on interval-scales are involved. For such quantities, allowable transformations extend from simple scale transformations to affine transformations. Examples of such quantities can be found in Kovera (2010): the Intelligence Quotient (IQ), Scholastic Assessment Test (SAT), Graduate Record Examination (GRE), Graduate Management Admission Test (GMAT), and Miller Analogies Test (MAT). Models for such quantities might involve variables measured on a ratio–scale as well. Since much of the development parallels that in Subsection 6.3, we omit a lot of the details for brevity and those that we do provide are in Appendix C.
6.5 Extending invariance to random effects and the Bayesian paradigm
This section extends the previous sections to incorporate random effects and the Bayesian paradigm. Its foundations lie in statistical decision theory as sketched in Appendix D. Here the action that is a component of decision theory is prediction based on a prediction model as in Equation (5.4). A training sample of attribute vectors of length provides data for building the prediction model. Thus the predictors and predictand are observed for each of sampled ’s to yield the random sample’s matrix seen in Equation (6.4), which we denote . Given a future (st attribute -vector , the goal is the prediction of its th component, , based on observations of its first components , all within a Bayesian framework with an appropriate extension of the framework presented in earlier sections. The situation is the one confronting the analyst who must fit a regression model based on data points and then predict a response given only the future predictors. We let denote the current data matrix and the future data vector , with , a matrix in .
The sampling distribution of is determined conditional on the random parameters . That means specifying ’s prior distribution, which in turn is conditional on the set of (specified) hyperparameters .
To extend the invariance principle requires, in addition to the structures described above, an action space, , that is the space of possible future predictions of the missing observation, a prior distribution on the parameter space and a loss function, which remain to be specified. We also require the specified transformation groups for and , in addition to the transformation groups for and . In summary, we have the homomorphically related transformation groups and acting on, respectively, and . The extended invariance principle then reduces points in these four spaces to their maximal invariants i.e. -functions, that can be used to index the orbits induced by their respective groups. Assuming a convex loss function, the Bayes predictor in this reduced problem is a nonrandomized decision rule leading to an action in . Each of the spaces can (subject to regularity conditions) be represented in the form
for the appropriate transformation group (Zidek, 1969). The corresponding maximal invariants can be expressed as matrices:
Finally using square brackets to represent the distributions involved, we get the predictive distribution of interest conditional on quantities we know:
(6.6) |
To fix ideas we sketch an application in the following example, where we take advantage of the sufficiency and ancillarity principles to simplify the construction of the principle.
Example 13.
Assume the vector of observable attributes, , is normally distributed, conditional on the mean and covariance matrix . We will sometimes parameterize in terms of the diagonal matrix of standard deviations and the correlation matrix , with . Therefore, the parameters are and, conditional on , . In practice, , the fifth of these observable attributes is difficult to assess, leading to the idea of making a predictand and the remaining four attributes predictors. All attributes lie on an interval-scale so a conventional approach would be seem to be straightforward: multivariate regression analysis. Simply collect a training sample of independent vectors and fit a regression model for the intended purpose
Complications arise due to the varying dimensions on which these attributes are to be measured. That in turn leads to different scales and different units of measurement, depending on how they are to be measured. That would not pose a problem for the unconscious statistician who might simply ignore the units. A better approach would be that suggested in Farawell (Faraway (2015), p.103), namely rescale the measurements in a thoughtful way to eliminate those units (see also Section 7). However, neither of those approaches deals with the rigid structural issue imposed by the need for dimensional consistency. That is, the units of measurement for the ’s are respectively , with and constrained to be and , respectively. To overcome the problem, Buckingham’s Pi-theorem suggests itself. Thus we might use as primary variables to nondimensionalize . But that does not work either since our variables lie on interval scales with ’s as conceptually possible values in the appropriate units. That is, -functions as simple ratios of these variables, cannot be constructed directly to be used to nondimensionalize the attribute measurements. So ultimately we turn to the statistical invariance principle to solve the problem. The relevant transformation groups are described in what follows.
The first step creates the training set of random vectors of attribute measurements, recorded in the matrix . Letting denote the th column of a realization of , and , the sample sum of squares, the likelihood function is
Conditional on and , we may invoke the sufficiency principle and replace the training matrix with its sufficient statistics
(6.7) |
i.e., the matrix whose first column consists of the sample (row) means of and the last five columns contain . Thus we may estimate the covariance by , factored as
Here denotes the diagonal matrix of estimates of the population standard deviations of the five attributes. Furthermore denotes the estimate of the matrix of correlations between the random attributes. It is invariant under changes of scale and transformations of their origins. Furthermore, these quantities would be independent given the parameters.
Turning to the Bayesian layer, we will adopt a conjugate prior (Gelman et al., 2014) for the illustrative purpose of this example, with hyperparameters :
(6.8) | |||||
(6.9) |
We specify the hyperparameters by equating prior knowledge with a hypothetical sample of ’s and their associated attribute vectors We will add a superscript on quantities below to indicate their hypothetical nature. Thus the hypothetical sample is of size with a likelihood derived from a prior sample with matrix , with sample mean and sample sum of squares . Thus we obtain the hypothetical likelihood for and given the independence of and :
Finally complement the hypothetical likelihood with a noninformative improper prior on with density where and obtain the specification of the prior distribution. We take the hyperparameters for the prior distributions in Equations (6.8) and (6.9) to get , and . That completes the construction of the prior.
For our prediction problem involving a future , we will use the posterior distribution of and based on the training data via the sufficient statistics . To get the posterior distributions for and entails taking the product of the prior density as determined above based on the hypothetical sample, with the actual likelihood.
This determines a posterior density of . Thus the covariance matrix associated with the posterior distribution of would depend on both the sample’s sum of squares and the hypothetical sample’s sum of squares matrix as
In other words, would replace to get us from its inverted Wishart prior distribution of to its posteriori distribution. Moreover, its degrees of freedom would move to to reflect the larger sample size. We omit further details since our primary interest lies in the reduced model obtained by applying the invariance principle, a reduction to which we now turn.
We now turn to the construction of the transformation groups needed to implement the invariance principle. All transformations of data derive from transforming an attribute vector as follows:
for any possible realization of , where the coordinates of have the same units as the corresponding coordinates of , where and to ensure dimensional consistency. The same and are used to transform all data vectors and so the transformation of the sufficiency-reduced matrix in Equation (6.7) and is as follows:
(6.10) |
We can study orbits and maximal invariants by considering the transformation of separately from the transformation of .
Consider the decomposition of into orbits. To index those orbits we first determine a maximal invariant. We do this in two ways, to make a point: first, we ignore dimensional consistency and then we include it. To begin, for both ways we set to transform to , where denotes the vector of units of . This means that all the points in an orbit can be reached from the new origin by choosing to be the sample average. Next we observe that we may estimate the population covariance for the attributes vector by . So the matrix in Equation (6.10), in effect, acts on the diagonal matrix . If no restrictions were placed on ’s fourth and fifth diagonal elements, then the orbits could simply be indexed by . In short the maximal invariant could be defined by . But dimensional consistency does not allow that choice of . Instead it demands the structural requirement that we use the and specified above. The modified transformation would then act on as follows:
The result would mean the changes in and would be cancelled out by the changes in the transformations of the first three ’s. The maximal invariant would then be dimensionless as required. It would make the maximal invariant for the sufficiency-reduced sample space
(6.11) |
not .
Remark 6.
The Buckingham theory concerned attributes measured on a ratio-scale. Were that the case in this example, we could have used the primary and secondary attributes differently. More precisely we could have let , and . The result would eliminate the first three variables while achieving the primary objective of non-dimensionalizing the model. We have achieved this goal using the standard deviations instead. But the method suggested here could also be used for scales other than interval, a subject of current research.
The corresponding maximal invariant in the parameter space for would be identical to that in Equation (6.11), albeit with the hats removed, to get
Observe that the ratios of the ’s with and without hats would be unitless and hence ancillary quantities, thus independent of the sufficient statistics. Hence the maximal invariants can be constructed from them by Basu’s theorem (Basu, 1958). Finally for the hyperparameter space we would obtain the analogous result:
We can now compute the posterior distribution
but we skip the details for brevity.
We now come to the principal objective of this example, namely a model for predicting a future but as yet unobserved value of the predictand, , based on the future covariate vector . The data in is used as the sufficiency-reduced training sample. As well we assume that, given the parameters of the sampling distribution, a future attribute vector is normal with mean and covariance matrix , and is conditionally independent of , given and .
In conformity with the modelling above, which through application of the invariance principle led to the -functions required to nondimensionalize the problem, we transform the predictand using statistics computed from the data in : . Furthermore normalized in this way, the predictand becomes invariant of those population parameters. But one more step is necessary to ensure that we have nondimensionalized the predictand in its -function, namely to align the dimensions of the predictand with those in the predictors to ensure dimensional consistency. The result is
(6.12) |
We would also need to convert the four predictor’s into their -functions and that would be done as in Equation (6.12). The result will be .
That predictor is found using Equation (6.6), with modified notation. It is given by
7 Discussion
Its roots in mathematical statistics along with its formalisms made dimensional analysis (DA) seem unnecessary in statistical science. In fact, Shen and Lin (2019) seem to have written the first paper in statistical science to recognize the need to incorporate units. For example, the authors propose what they call “physical Lebesgue measure” that integrates the units of measurement into the usual Lebesgue measure. Yet application of Buckingham’s desideratum eliminates those units. Paradoxically it does so by exploiting the units it eliminates. That is, it exploits the intrinsic structural relationships among those units that dictate how the model must be set up. This vital implicit connection is recognized in this paper and earlier, in other papers, in more specialized contexts (Shen, 2015; Shen and Lin, 2019).
Remark 7.
The linear model with a Gaussian stochastic structure implicitly assumes data are measured on an interval-scale. But for physical quantities on a ratio-scale, that model would at best be an approximation. Shen and Lin (2018) in their Section 3.3 argue in favor of using a power (meaning product rather than sum) model in this context. We give arguments in our Subsection 4.1 to show why additive linear models are inappropriate in this context other than as an approximation to a ratio in some sense.
That said, the simplest way of nondimensionalizing a model is by dividing each coordinate of by a known constant with the same units of measurement as the coordinate itself, thereby removing its units of measurement. Then in Buckingham’s notation, and where . This is in effect the approach used by Zhai et al. (2012a) and Zhai et al. (2012b) to resolve dimensional inconsistencies in models. It is also the approach generally implicit in regression analysis where e.g.
with being unitless and representing a combination of measurement and modelling error. The ’s play the key role of forcing the model to adhere to the principle of dimensional homogeneity when the ’s have different units of measurement. A preferable approach would be to nondimensionalize the ’s themselves in some meaningful way. For example if were the air pollution level at a specific site at a specific time, it might be divided by its long-term average over the population of sites and times. The relative sizes of the now dimensionless ’s are then readily interpretable – a large would mean the associated contributes a lot to the overall mean effect.
Buckingham’s theory does not specify the partition of attributes into the primary and secondary categories as is needed when deriving Buckingham’s -functions. That topic is a current area of active research. Recently, two approaches have been proposed in the context of the optimal design of computer experiments. Arelis (2020) suggests using functional analysis of variance to choose the quantities that contribute most to the variation in the output of interest as the base quantities. Yang and Lin (2021), on the other hand, propose a criterion based on measurement errors and choose the quantities that best minimize the measurement errors.
That optimal design is an ideal application for the Buckingham theory. There, the true model, called a simulator, is a deterministic numerical computer model of some phenomenon. But it is computationally intensive. So a stochastic alternative called an emulator is fitted to a sample of outputs from the simulator at a judiciously selected set of input vectors called design points, although they represent covariates in the reality the simulator purports to represent. The Buckingham Pi-theorem simplifies the model by reducing the dimension of the input vector and hence the number of design points. It also simplifies the form of the emulator in the process. That kind of application is discussed in Shen (2015), Arelis (2020) and Adragni and Cook (2009).
Our approach to extending Buckingham’s work differs from that in Shen (2015). Shen restricts the quantities to lie on ratio-scales so he can base his theory directly on Buckingham’s Theorem. His starting point is the application of that theorem and the dimensionless - functions it generates. In contrast, our theory allows a fully general group of transformations and arbitrary scales. Like Buckingham, we designate certain dimensions such as length as primary (or fundamental) while the others are secondary. We require that a transformation of any primary scale must be made simultaneously to all scales involving that primary scale including secondary scales. That requirement ensures consistency of change across all the quantities and leads to our version of the -functions. However, that leaves open the issue of which variables to serve as the primary and which the secondary variables, a topic under active investigation.
The paper has explored the nature and possible application of DA with the aim of integrating physical and statistical modelling. The result has been an extension of the statistical invariance principle as a way of embracing the principles that lay behind Buckingham’s development of his Pi-theory. The result is a restriction on the class of allowable models and resulting optimal statistical procedures based on those models. How does the performance of these procedures compare with the general class of unrestricted procedures? Would a minimax or Bayesian procedure in the restricted class of allowable procedures have these same performance properties if they were thrown in with the entire set of decision rules? Under certain conditions, the answer is affirmative in the minimax case (Kiefer et al., 1957) and in the Bayesian case (Zidek, 1969).
8 Concluding remarks
This paper has given a comprehensive overview of DA and its importance in statistical modelling. Dimensions have long been known to lie at the foundations of deterministic modelling, with each dimension requiring the specification of a scale and each scale requiring the specification of units of measurement. Dimensions, their scales, and the associated units of measurement lie at the heart of empirical science.
However, statistical scientists regularly ignore their importance. We have demonstrated with the examples presented in Section 2 that ignoring scales and units of measurement can lead to results that are either wrong or meaningless. This points to the need for statistics education to incorporate some basic training on quantity calculus and the importance of scales, along with the impact at a fundamental level of transforming data. Statistics textbooks should reflect these needs. Going beyond training is the whole process of disseminating statistical research. There again the importance of these concepts should be recognized by authors and reviewers to ensure quality.
But does any of this really matter? We assert that it does. First we have described in Example 5, the genesis of this paper, an important example of dimensional inconsistency. An application of Buckingham’s theory showed that an additional quantity needs to be added to complete the model in the sense of Buckingham to make it nondimensionalizable (Wong and Zidek, 2018). The importance of this famous model, a model which was subsequently revised, lay in its use in assessing the reliability of lumber as quantified in terms of its return period, an important component in the calculation of quality standards for lumber when placed in service as a building material. Papers flowing from that discovery soon followed (Wong and Zidek, 2018; Yang, Zidek and Wong, 2018). The work reported in this paper led us to a deeper level than mere dimensional consistency, namely the discovery that the units impose important intrinsic structural links among the various quantities involved in the model. These links lead in Example 13 to a new version of transformation groups usually adopted in invariant Bayesian multivariate regression models. This new version requires use of a subgroup dictated by those links.
At a still deeper level, we are confronted by structural constraints imposed by the scales. For example, the artificial origin , where denotes the units of , in the interval scale rule out use of Buckingham’s Pi-theory. Furthermore it leads to a new calculus for ratio-scales, a topic under active investigation.
Further, we have shown that, surprisingly, not all functions are candidates for use in formulating relationship among attribute variables. Thus functions like are transcendental and hence inadmissible for that role. This eliminates from consideration in relationships not only the natural logarithm but also, for example, the hyperbolic trigonometric functions. This knowledge would be useful to statistical scientists in developing statistical models.
On the other hand the paper reveals an important deficiency of deterministic physical models of natural phenomena in their failure to reflect their uncertainty about these phenomena. An approach to doing so is presented in the paper along with an extension of the classical theory to incorporate the Bayesian approach. That approach to this union of the different frameworks is reached via the statistical invariance principle, yielding a generalization of the famous theories of Buckingham, Bridgman and Luce.
In summary, each of the two approaches to modelling, physical and statistical, has valuable aspects that can inform the other. This paper provides the groundwork for the unification of these approaches, setting the stage for future research.
Appendix A Validity of using when has units of measurement: The debate goes on.
Whether as a transcendental function, the function may be applied to measurements with units of measurement has been much discussed in other scientific disciplines and we now present some of that discussion for illustrative purposes. Molyneux (1991) points out that both affirmative and negative views had been expressed on this issue. He argues in favor of a compromise, namely defining the logarithm by exploiting one of its most fundamental properties as . He finds support for his proposal by noting that the derivative of the constant term, , would be zero. It follows that
To see this under his definition of the logarithm
where . Note that his definition of the derivative differs from ours, given in Equation (4.4); we include the units of in the denominatorm as the derivative is taken with respect to , units and all. Furthermore Molyneux (1991) argues that the proposal makes explicit the units that are sometimes hidden, pointing to the same example, Example 6, that we have used to make the point. It is unitless because the logarithm is applied to a count, not a measurement, that count being the number of SIUs. Molyneux (1991) gives other such examples. The proposal not only makes the units explicit, but on taking the antilog of the result, you get the original value of on the raw scale with the units correctly attached.
But, in a letter to the Journal Editor (Mills, 1995), Ian Mills quotes Molyneux (1991), in which Molyneux himself says that his proposal “has no meaning.” Furthermore, Mills says he is “inclined to agree with him.” Furthermore Mills argues, like Bridgman, that the issue is mute since in practice since the logarithm is applied in the context of the difference of two logarithms, leading to , a unitless quantity. In the same issue of the journal, Molyneux publishes a lengthy rejoinder saying amongst other things that Mills misquoted him.
However, in so far as the authors of this paper are aware, Molyneux’s proposal was not accepted by the scientific community, leaving unresolved the issue of applying the natural logarithm to a dimensional quantity. In particular Matta et al. (2010) also rejects it in a totally different context. Mayumi and Giampietro (2010) pick up on this discussion in a recent paper regarding dimensional analysis in economics and the frequent application of logarithmic specifications. Their approach is based on Taylor expansion arguments that show that application of the logarithm to dimensional quantities is fallacious since in the expansion
(A.1) |
the terms on the right hand side would then have different units of measurement.
Mayumi and Giampietro then go on to describe a number of findings that are erroneous due to the misapplication of the logarithm. They also cite a “famous controversy” between A.C. Pigou and Milton Friedman that according to the authors, revolved around dimensional homogeneity (Pigou, Friedman and Georgescu-Roegen, 1936; Arrow et al., 1961), although not specifically involving the logarithm. One of the findings criticized in Mayumi and Giampietro (2010) is subsequently defended by Chilarescu and Viasu (2012). But Mayumi and Giampietro (2012) publish a rejoinder in which they reassert the violation of the principle of dimensional homogeneity in that finding and declare that the claim in Chilarescu and Viasu (2012) “is completely wrong. So, contrary to Chilarescu and Viasu’s claim, log(V/ L) or log W in Arrow et al. (1961) can never be used as a scientific representation.”
Although agreeing with the conclusion that the logarithm cannot be applied to a dimensional , Matta et al. (2010) states that Taylor expansion argument above, which the authors attribute to a Wikipedia article in September 2010, is fallacious. [The Wikipedia article actually misstates the Taylor expansion as
(A.2) |
but that does not negate the thrust of their argument.] They argue that the Taylor expansion should be
(A.3) |
so that if had units of measurement, they would cancel out. But then the authors don’t state that expansion for the logarithm. If they did, they would have had to deal with the issue of the units of , term, while the remainder of the expansion is unitless (see our comments on this issue in Subsection 4.4).
Baiocchi (2012) points out that if the claims of Mayumi and Giampietro (2010) were valid, they would make most “applications of statistics, economics, … unacceptable” for statistical inference based on the use the exponential and logarithmic transformations. Baiocchi then tries a rescue operation by arguing that the views of Mayumi and Giampietro (2010) go “against well established theory and practice of many disciplines including …statistics,…and that it rests on an inadequate understanding of dimensional homogeneity and the nature of empirical modeling.” The paper invokes the dominant theory of measurement in the social sciences that the author claims makes a numerical statement meaningful if it is invariant under legitimate scale transformations of the underlying variables. That idea of meaningfulness can then be applied to the logarithmic transformation of dimensional quantities in some situations.
To explain this idea, Baiocchi first gives the following analysis involving quantities on the ratio-scale. Start with the model . Let us rescale as . That leads to the need to appropriately rescale say as . Consequently our model becomes or with . But then this model for cannot be reduced to its original form because of its second log term. Thus the model would be considered empirically meaningless.
On the other hand if were unique up to a power transformation we would get or with . Therefore the model would be invariant under admissible transformations and hence empirically meaningful. So the situation is more complex than the paper of Mayumi and Giampietro (2010) would suggest.
Baiocchi (2012) also addresses other arguments given by Mayumi and Giampietro (2010). In particular, he is concerned with their Taylor expansion argument . They point out that for to make sense, the would have to have the same units as . They use the expansion to make the point that when has the same units as , the expansion is valid. However this argument ignores the fact that in , has units so this argument seems tenuous and therefore leaves doubt about their success in discrediting the arguments in Mayumi and Giampietro (2010). For brevity we will terminate our review of Baiocchi (2012) on that note. It is a lengthy paper with further discussion of the Mayumi and Giampietro (2010) arguments and a very lengthy bibliography of relatively recent relevant work on this issue.
Appendix B Application of Buckingham’s theorem and the discovery of Reynold’s number
This section provides a well-known example from fluid dynamics.
Example 14.
The example is a model for fluid flow around a sphere for the calculation of the drag force . It turns out that the model depends only on something called the coefficient of drag and on a complicated, single dimensionless number called the Reynolds number that incorporates all the relevant quantities.
To begin, we list all the relevant quantities, the drag force (), velocity (), fluid density (), viscosity () and sphere diameter (). We see that we have s in the notation of Buckingham’s theorem. We first note that the dimensions of these five quantities can be expressed in terms of the three dimensions, length (), mass () and time (). We treat these as the three primary dimensions and this tells us that we need at most dimensionless functions to define for our model.
We first write down the dimensions of each of the five quantities in terms of , and :
(B.1) |
We now proceed to sequentially eliminate the dimensions , and in all five equations. First we use to eliminate . The first four equations become
We next eliminate via , yielding
that is
To eliminate , we could use or or even, with a bit more work, . We use , yielding
that is
All the dimensions are now gone so we have nondimensionalized the problem and in the process found and as implied by Buckingham’s theorem:
Therefore, for some function ,
Remarkably we have also found the famous Reynolds number, (Friedmann, Gillis and Liron, 1968). The Reynolds number determines the coefficient of drag, , according to a fundamental law of fluid mechanics.
If we knew in equation (5.2) to begin with, we could track the series of transformations starting at (B.1) to find . If, however, we had no specified to begin with, we could use and to determine a model, that is, to find . For instance, we could carry out experiments, make measurements and determine from the data. In either case, we can use to determine the coefficient of drag from the Reynolds number and in turn calculate the drag force.
Invoking the principle of invariance enables us to embed Example 14 in a stochastic framework using Approach 1 as follows.
Example 15 (continues=ex:reynolds).
We continue to use the same notation. In this example, the random variable to be replicated in independent experiments is .
The sample space.
The creation of the transformation group and relevant subgroup follow the lines of Example 12. We choose as the primary dimensions. Then with the corresponding group of transformations is
For indexing the cross sections of we have the maximal invariant
(B.2) |
where and . Let . To show that is a maximal invariant, first observe that is invariant since each term is dimensionless. Thus showing is a maximal invariant reduces to finding a subgroup element for which when . For ,
Similarly, we get that . Relating these results to Shen’s thesis, this is essentially his Lemma 5.5. However Shen does not derive the maximal invariant; he simply uses the Pi quantities derived from Buckingham’s Pi-theorem as the maximal invariant. In contrast, for us the maximal invariant emerges in purely as an artifact of the need for dimensional consistency as expressed through the application of the invariance principle.
Observe that all points in obtain from the cross section in Equation (B.2) by application of the appropriate element of the group of transformations. To see this let us first choose . Then we have
Next let and get
Finally choose , which yields
Inverting this transformation takes us from the cross section to the point x.
The sampling distribution
The analysis above naturally suggests the transformation group and its cross section for the parameter space,
namely
where with and and characterizes the maximal invariant over the parameter space. It follows that for any ,
where
The statistical invariance implies that
(B.3) |
for any . Notice that
Now by taking the partial derivatives with respect to the variables, we obtain
Since this must hold for any , we may choose . Then
Thus the joint PDF is proportional to
(B.4) |
Hence the statistical invariance implies that information about the variables can be summarized by maximal invariants in the sample space and in the parameter space.
The sample
Now suppose independent experiments are performed and that they yield data . Further suppose for this illustrative example, that the model in Equation (B.3) and resulting likelihood derived from Equation (B.4), the sufficiency principle implies is a sufficient statistic. Then a maximal invariant for transformation group is
To see this, observe that each term is dimensionless, so is certainly invariant. Now suppose . Then we need to show that there exists such that . These do exist and they are
We conclude our discussion of this example. Proceeding further would entail the specification of the sampling distribution and that in turn would depend on contextual details.
Appendix C Invariance models for interval-scales
This section develops the theory for the interval case, which parallels that for ratio-scales seen in Subsection 6.3.
The sample space
We first partition the response vector X as in Equation (6.4). These partitions correspond to the primary and secondary quantities as in the Buckingham theory, although that distinction was not made as far as we know in the Luce work and its successors. Of particular interest again is in the model of Equation (5.4) . The first step in our construction entails a choice of the transformation group . That choice will depend on the dimensions involved. However, given that we are assuming in this subsection that quantities line an affine space, we will in the sequel rely on Paganoni (1987) as described in Subsection 5.3 for an illustration in this subsection.
We begin with a setup more general than that of Paganoni (1987) and would include for example the discrete seven point Semantic Differential Scale (SDM). So we we extend Equation (5.3) as follows
(C.1) | |||||
(C.2) |
where now is the final coordinate of when . Note that in the univariate version of the model model proposed by Paganoni (1987), Equation 5.3 has the vector replaced with . Here both the rescaling matrix and the translation vector depends on the pair and , i.e. and . Note that the ratio–scales are formally incorporated in this extension simply by setting to , the relevant coordinates of and .
Conditions are needed to ensure that is a transformation group. For definiteness we choose and where in general and . The objects and lie in the subspaces described in Subsection 5.3 while and lie in multidimensional rather than one dimensional spaces as before. We omit details for brevity.
Finally we index the transformation group acting on x by and define that associated transformation by
(C.3) |
It remains to show that in this case, is a transformation group and for this we need the conditions presented by Paganoni (1987).
Theorem 3.
The set of transformations defined by Equation (C.3)
is a transformation group acting on the sample space.
Proof. First we show that possesses an identity transformation. This is found simply by taking and and invoking the definitions of , and :
Next we show that the composition of two transformations indexed by and yield a transformation in . First we obtain where
(C.4) | |||||
(C.5) |
Next we compute
But
which proves that the composition is an element of .
Finally we need to show that for any member of indexed by there exists an inverse. Starting with the transformed quantities in Equations (C.4) and (C.5), let . Then we find that
But
and
Thus the transformation indexed by is the inverse of that indexed by . That concludes the proof that is a transformation group.
We now proceed, as outlined in Subsection 6.3, to find the analogues of the Pi function in Buckingham’s theory, which in our extension of that theory are coordinates of the maximal invariant under the transformation group . To that end we seek that transformation for which i.e. for an appropriate and , where and . It follows that for a designated fixed origin in the range of . Dimensional consistency calls for the transformation of by the that complements the found in the previous paragraph, the one indexed by . If we invoke the invariance principle, we may thus transform to
where and is the maximal invariant. Certainly it is invariant. Now we need to show there exists such that when . So suppose . We claim that , and hence , where and .
Proof. Assume below that .
Thus the proof is complete.
Example 16.
The linear regression model is one of the most famous models in statistics: . Shen (2015) shows using dimensional analysis that this model is inappropriate when all the variables are on a ratio–scale. Instead in that case the right hand side should be the product of powers of P–functions of the coordinates of x. But this section shows how to handle the case where the variables are regarded as interval–valued. The Pi functions would then be combinations of the coordinates depending on the units of measurement of and those.
To more specific we begin by defining for every , a such that , where , and in general denotes the vector of dimension , all of whose elements are , representing generically the unit on the coordinate’s scale. For the regression example the final coordinate in is . It then follows from the above analysis in the notation used there that where
and
is the non–dimensionalized maximal invariant. The distribution of then determines the nondimensionalized regression model. We omit the details for brevity.
Appendix D Foundations of statistical modelling
After a sample is selected, a statistical inquiry is expected to lead to a decision chosen from an action space . That action may be chosen by a randomized decision rule, i.e. a probability distribution for events . A nonrandomized rule would then correspond to a degenerate probability distribution for .
The decision would be based on the loss function or rather the expected loss function called the risk function
The minimax criterion is commonly used to determine the optimal decision rule as
An alternative is the Bayes rule where, given a prior distribution for , the Bayesian decision rule is found by minimizing
That prior distribution will be usually indexed by a hyperparameter vector lying in a hyperparameter space .
Acknowledgements
We are indebted to Professor George Bluman of the Department of Mathematics at the University of British Columbia (UBC) for his helpful discussions on dimensional analysis. Our gratitude goes to Professor Douw Steyn of the Earth and Ocean Sciences Department, also at UBC, for introducing the second author to the Unconscious Statistician. We also thank Yongliang (Vincent) Zhai, former Masters student of the last two authors of this paper, for contributing to Example 4 and for his work in his Masters thesis that inspired much of this research. Finally we acknowledge the key role played by the Forest Products Stochastic Modelling Group centered at UBC and funded by the a combination of FPInnovations and the Natural Sciences and Engineering Research Council of Canada through a Collaborative Research and Development Grant. It was the work of that Group that sparked our interest in the problems addressed in this paper. The research of the last two authors was also supported via the Discovery Grant program of the Natural Sciences and Engineering Research Council of Canada.
References
- Aczél, Roberts and Rosenbaum (1986) {barticle}[author] \bauthor\bsnmAczél, \bfnmJános\binitsJ., \bauthor\bsnmRoberts, \bfnmFred S\binitsF. S. and \bauthor\bsnmRosenbaum, \bfnmZangwill\binitsZ. (\byear1986). \btitleOn scientific laws without dimensional constants. \bjournalJournal of Mathematical Analysis and Applications \bvolume119 \bpages389–416. \endbibitem
- Adragni and Cook (2009) {barticle}[author] \bauthor\bsnmAdragni, \bfnmKofi P\binitsK. P. and \bauthor\bsnmCook, \bfnmR Dennis\binitsR. D. (\byear2009). \btitleSufficient dimension reduction and prediction in regression. \bjournalPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences \bvolume367 \bpages4385–4405. \endbibitem
- Albrecht et al. (2013) {barticle}[author] \bauthor\bsnmAlbrecht, \bfnmMark C\binitsM. C., \bauthor\bsnmNachtsheim, \bfnmChristopher J\binitsC. J., \bauthor\bsnmAlbrecht, \bfnmThomas A\binitsT. A. and \bauthor\bsnmCook, \bfnmR Dennis\binitsR. D. (\byear2013). \btitleExperimental design for engineering dimensional analysis. \bjournalTechnometrics \bvolume55 \bpages257–270. \endbibitem
- Arelis (2020) {bphdthesis}[author] \bauthor\bsnmArelis, \bfnmRodr\́@@lbibitem{}\NAT@@wrout{5}{}{}{}{(5)}{}\lx@bibnewblockguez Alexi Gilberto\binitsR. A. G. (\byear2020). \btitleHow to improve prediction accuracy in the analysis of computer experiments, exploitation of low-order effects and dimensional analysis, \btypePhD thesis, \bpublisherUniversity of British Columbia. \endbibitem
- Arrow et al. (1961) {barticle}[author] \bauthor\bsnmArrow, \bfnmKenneth J\binitsK. J., \bauthor\bsnmChenery, \bfnmHollis B\binitsH. B., \bauthor\bsnmMinhas, \bfnmBagicha S\binitsB. S. and \bauthor\bsnmSolow, \bfnmRobert M\binitsR. M. (\byear1961). \btitleCapital-labor substitution and economic efficiency. \bjournalThe Review of Economics and Statistics \bpages225–250. \endbibitem
- Baiocchi (2012) {barticle}[author] \bauthor\bsnmBaiocchi, \bfnmGiovanni\binitsG. (\byear2012). \btitleOn dimensions of ecological economics. \bjournalEcological Economics \bvolume75 \bpages1–9. \endbibitem
- Basu (1958) {barticle}[author] \bauthor\bsnmBasu, \bfnmD\binitsD. (\byear1958). \btitleOn statistics independent of sufficient statistics. \bjournalSankhyā: The Indian Journal of Statistics \bpages223–226. \endbibitem
- Bluman and Cole (1974) {bbook}[author] \bauthor\bsnmBluman, \bfnmG W\binitsG. W. and \bauthor\bsnmCole, \bfnmJ D\binitsJ. D. (\byear1974). \btitleSimilarity Methods for Differential Equations. \bseriesApplied Mathematical Sciences. \bpublisherSpringer-Verlag. \endbibitem
- Box and Cox (1964) {barticle}[author] \bauthor\bsnmBox, \bfnmGeorge EP\binitsG. E. and \bauthor\bsnmCox, \bfnmDavid R\binitsD. R. (\byear1964). \btitleAn analysis of transformations. \bjournalJournal of the Royal Statistical Society. Series B (Methodological) \bvolume26 \bpages211–252. \endbibitem
- Bridgman (1931) {bbook}[author] \bauthor\bsnmBridgman, \bfnmP. W.\binitsP. W. (\byear1931). \btitleDimensional Analysis, Revised Edition. \bpublisherYale University Press, \baddressNew Haven. \endbibitem
- Buckingham (1914) {barticle}[author] \bauthor\bsnmBuckingham, \bfnmEdgar\binitsE. (\byear1914). \btitleOn physically similar systems; illustrations of the use of dimensional equations. \bjournalPhysical Review \bvolume4 \bpages345–376. \endbibitem
- Chilarescu and Viasu (2012) {barticle}[author] \bauthor\bsnmChilarescu, \bfnmConstantin\binitsC. and \bauthor\bsnmViasu, \bfnmIoana\binitsI. (\byear2012). \btitleDimensions and logarithmic function in economics: A comment. \bjournalEcological Economics \bvolume75 \bpages10–11. \endbibitem
- Cohen et al. (2004) {barticle}[author] \bauthor\bsnmCohen, \bfnmAaron J\binitsA. J., \bauthor\bsnmAnderson, \bfnmH Ross\binitsH. R., \bauthor\bsnmOstro, \bfnmBart\binitsB., \bauthor\bsnmPandey, \bfnmK Dev\binitsK. D., \bauthor\bsnmKrzyzanowski, \bfnmMichal\binitsM., \bauthor\bsnmKünzli, \bfnmNino\binitsN., \bauthor\bsnmGutschmidt, \bfnmKersten\binitsK., \bauthor\bsnmPope III, \bfnmC Arden\binitsC. A., \bauthor\bsnmRomieu, \bfnmIsabelle\binitsI., \bauthor\bsnmSamet, \bfnmJonathan M\binitsJ. M. \betalet al. (\byear2004). \btitleUrban air pollution. \bjournalComparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors \bvolume2 \bpages1353–1433. \endbibitem
- De Oliveira, Kedem and Short (1997) {barticle}[author] \bauthor\bsnmDe Oliveira, \bfnmV.\binitsV., \bauthor\bsnmKedem, \bfnmB.\binitsB. and \bauthor\bsnmShort, \bfnmD. A.\binitsD. A. (\byear1997). \btitleBayesian prediction of transformed Gaussian random fields. \bjournalJournal of the American Statistical Association \bvolume92 \bpages1422–1433. \endbibitem
- Dou, Le and Zidek (2007) {btechreport}[author] \bauthor\bsnmDou, \bfnmYP\binitsY., \bauthor\bsnmLe, \bfnmND\binitsN. and \bauthor\bsnmZidek, \bfnmJV\binitsJ. (\byear2007). \btitleA dynamic linear model for hourly ozone concentrations \btypeTechnical Report No. \bnumber228, \bpublisherStatistics Department, University of British Columbia. \endbibitem
- Draper and Cox (1969) {barticle}[author] \bauthor\bsnmDraper, \bfnmN R\binitsN. R. and \bauthor\bsnmCox, \bfnmD R\binitsD. R. (\byear1969). \btitleOn distributions and their transformation to normality. \bjournalJournal of the Royal Statistical Society: Series B (Methodological) \bvolume31 \bpages472–476. \endbibitem
- Eaton (1983) {bbook}[author] \bauthor\bsnmEaton, \bfnmMorris L\binitsM. L. (\byear1983). \btitleMultivariate Statistics: a Vector Space Approach. \bpublisherJohn Wiley & Sons, Inc., 605 Third Ave., New York, NY 10158, USA, 1983, 512. \endbibitem
- Faraway (2015) {bbook}[author] \bauthor\bsnmFaraway, \bfnmJulian J\binitsJ. J. (\byear2015). \btitleLinear models with R: Second Edition. \bpublisherChapman and Hall/CRC. \endbibitem
- Finney (1977) {barticle}[author] \bauthor\bsnmFinney, \bfnmDJ\binitsD. (\byear1977). \btitleDimensions of statistics. \bjournalJournal of the Royal Statistical Society, Series C (Applied Statistics) \bvolume26 \bpages285–289. \endbibitem
- Foschi and Yao (1986) {binproceedings}[author] \bauthor\bsnmFoschi, \bfnmR. O.\binitsR. O. and \bauthor\bsnmYao, \bfnmF. Z.\binitsF. Z. (\byear1986). \btitleAnother look at three duration of load models. In \bbooktitleProceedings, XVII IUFRO Congress. \endbibitem
- Fourier (1822) {bbook}[author] \bauthor\bsnmFourier, \bfnmJoseph\binitsJ. (\byear1822). \btitleThéorie Analytique de la Chaleur, par M. Fourier. \bpublisherChez Firmin Didot, Père et Fils. \endbibitem
- Friedmann, Gillis and Liron (1968) {barticle}[author] \bauthor\bsnmFriedmann, \bfnmM\binitsM., \bauthor\bsnmGillis, \bfnmJ\binitsJ. and \bauthor\bsnmLiron, \bfnmN\binitsN. (\byear1968). \btitleLaminar flow in a pipe at low and moderate Reynolds numbers. \bjournalApplied Scientific Research \bvolume19 \bpages426–438. \endbibitem
- Gelman et al. (2014) {bbook}[author] \bauthor\bsnmGelman, \bfnmAndrew\binitsA., \bauthor\bsnmCarlin, \bfnmJohn B\binitsJ. B., \bauthor\bsnmStern, \bfnmHal S\binitsH. S., \bauthor\bsnmDunson, \bfnmDavid B\binitsD. B., \bauthor\bsnmVehtari, \bfnmAki\binitsA. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear2014). \btitleBayesian Data Analysis; Third edition. \bpublisherCRC press. \endbibitem
- Gibbings (2011) {bbook}[author] \bauthor\bsnmGibbings, \bfnmJ. C.\binitsJ. C. (\byear2011). \btitleDimensional Analysis. \bpublisherSpringer Verlag, \baddressLondon. \endbibitem
- Hand (1996) {barticle}[author] \bauthor\bsnmHand, \bfnmDavid J\binitsD. J. (\byear1996). \btitleStatistics and the theory of measurement. \bjournalJournal of the Royal Statistical Society. Series A (Statistics in Society) \bvolume159 \bpages445–492. \endbibitem
- Härdle and Vogt (2015) {barticle}[author] \bauthor\bsnmHärdle, \bfnmWolfgang Karl\binitsW. K. and \bauthor\bsnmVogt, \bfnmAnnette B.\binitsA. B. (\byear2015). \btitleLadislaus von Bortkiewicz—Statistician, Economist and a European Intellectual. \bjournalInternational Statistical Review \bvolume83 \bpages17-35. \bdoi10.1111/insr.12083 \endbibitem
- Hoffmeyer and Sørensen (2007) {barticle}[author] \bauthor\bsnmHoffmeyer, \bfnmPreben\binitsP. and \bauthor\bsnmSørensen, \bfnmJohn Dalsgaard\binitsJ. D. (\byear2007). \btitleDuration of load revisited. \bjournalWood Science and Technology \bvolume41 \bpages687–711. \endbibitem
- Kiefer et al. (1957) {barticle}[author] \bauthor\bsnmKiefer, \bfnmJack\binitsJ. \betalet al. (\byear1957). \btitleInvariance, minimax sequential estimation, and continuous time processes. \bjournalThe Annals of Mathematical Statistics \bvolume28 \bpages573–601. \endbibitem
- Köhler and Svensson (2002) {binproceedings}[author] \bauthor\bsnmKöhler, \bfnmJochen\binitsJ. and \bauthor\bsnmSvensson, \bfnmStaffan\binitsS. (\byear2002). \btitleProbabilistic modelling of duration of load effects in timber structures. In \bbooktitleProceedings of the 35th Meeting, International Council for Research and Innovation in Building and Construction, Working Commission W18–Timber Structures, CIB-W18, Paper \bvolume35-17 \bpages1. \endbibitem
- Kovera (2010) {bmisc}[author] \bauthor\bsnmKovera, \bfnmMargaret Bull\binitsM. B. (\byear2010). \btitleEncyclopedia of research design. \endbibitem
- Lehmann and Romano (2010) {bbook}[author] \bauthor\bsnmLehmann, \bfnmErick L\binitsE. L. and \bauthor\bsnmRomano, \bfnmJoseph P\binitsJ. P. (\byear2010). \btitleTesting Statistical Hypotheses. \bpublisherSpringer. \endbibitem
- LibreTexts (2019) {bmisc}[author] \bauthor\bsnmLibreTexts (\byear2019). \btitleThe Ideal Gas Law. \bhowpublishedhttps://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_(Physical_and_Theoretical_Chemistry)/Physical_Properties_of_Matter/States_of_Matter/Properties_of_Gases/Gas_Laws/The_Ideal_Gas_Law. \bnoteAccessed 01/04/2020. \endbibitem
- Lin and Shen (2013) {barticle}[author] \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. and \bauthor\bsnmShen, \bfnmWeijie\binitsW. (\byear2013). \btitleComment: some statistical concerns on dimensional analysis. \bjournalTechnometrics \bvolume55 \bpages281–285. \endbibitem
- Luce (1959) {barticle}[author] \bauthor\bsnmLuce, \bfnmR Duncan\binitsR. D. (\byear1959). \btitleOn the possible psychophysical laws. \bjournalPsychological Review \bvolume66 \bpages81. \endbibitem
- Luce (1964) {barticle}[author] \bauthor\bsnmLuce, \bfnmR Duncan\binitsR. D. (\byear1964). \btitleA generalization of a theorem of dimensional analysis. \bjournalJournal of Mathematical Psychology \bvolume1 \bpages278–284. \endbibitem
- Magnello (2009) {barticle}[author] \bauthor\bsnmMagnello, \bfnmM Eileen\binitsM. E. (\byear2009). \btitleKarl Pearson and the establishment of mathematical statistics. \bjournalInternational Statistical Review \bvolume77 \bpages3–29. \endbibitem
- Matta et al. (2010) {barticle}[author] \bauthor\bsnmMatta, \bfnmChérif F\binitsC. F., \bauthor\bsnmMassa, \bfnmLou\binitsL., \bauthor\bsnmGubskaya, \bfnmAnna V\binitsA. V. and \bauthor\bsnmKnoll, \bfnmEva\binitsE. (\byear2010). \btitleCan one take the logarithm or the sine of a dimensioned quantity or a unit? Dimensional analysis involving transcendental functions. \bjournalJournal of Chemical Education \bvolume88 \bpages67–70. \endbibitem
- Mayumi and Giampietro (2010) {barticle}[author] \bauthor\bsnmMayumi, \bfnmKozo\binitsK. and \bauthor\bsnmGiampietro, \bfnmMario\binitsM. (\byear2010). \btitleDimensions and logarithmic function in economics: A short critical analysis. \bjournalEcological Economics \bvolume69 \bpages1604–1609. \endbibitem
- Mayumi and Giampietro (2012) {barticle}[author] \bauthor\bsnmMayumi, \bfnmKozo\binitsK. and \bauthor\bsnmGiampietro, \bfnmMario\binitsM. (\byear2012). \btitleResponse to dimensions and logarithmic function in economics: A comment. \bjournalEcological Economics \bvolume75 \bpages12–14. \endbibitem
- Meinsma (2019) {barticle}[author] \bauthor\bsnmMeinsma, \bfnmGjerrit\binitsG. (\byear2019). \btitleDimensional and scaling analysis. \bjournalSIAM review \bvolume61 \bpages159–184. \endbibitem
- Mills (1995) {barticle}[author] \bauthor\bsnmMills, \bfnmIan M\binitsI. M. (\byear1995). \btitleDimensions of logarithmic quantities. \bjournalJournal of Chemical Education \bvolume72 \bpages954. \endbibitem
- Molyneux (1991) {barticle}[author] \bauthor\bsnmMolyneux, \bfnmP.\binitsP. (\byear1991). \btitleThe dimensions of logarithmic quantities: implications for the hidden concentration and pressure units in pH values, acidity constants, standard thermodynamic functions, and standard electrode potentials. \bjournalJournal of Chemical Education \bvolume68 \bpages467. \endbibitem
- Mosteller and Tukey (1977) {bbook}[author] \bauthor\bsnmMosteller, \bfnmFrederick\binitsF. and \bauthor\bsnmTukey, \bfnmJohn Wilder\binitsJ. W. (\byear1977). \btitleData Analysis and Regression: A Second Course in Statistics. \bpublisherAddison-Wesley Series in Behavioral Science: Quantitative Methods. \endbibitem
- Joint Commitee on Guides in Metrology (2012) {bmisc}[author] \bauthor\bsnmJoint Commitee on Guides in Metrology (\byear2012). \btitle200: 2012 — International Vocabulary of Metrology Basic and General Concepts and Associated Terms (VIM). \endbibitem
- Paganoni (1987) {barticle}[author] \bauthor\bsnmPaganoni, \bfnmL\binitsL. (\byear1987). \btitleOn a functional equation concerning affine transformations. \bjournalJournal of Mathematical Analysis and Applications \bvolume127 \bpages475–491. \endbibitem
- Pigou, Friedman and Georgescu-Roegen (1936) {barticle}[author] \bauthor\bsnmPigou, \bfnmArthur C\binitsA. C., \bauthor\bsnmFriedman, \bfnmMilton\binitsM. and \bauthor\bsnmGeorgescu-Roegen, \bfnmNicholas\binitsN. (\byear1936). \btitleMarginal utility of money and elasticities of demand. \bjournalThe Quarterly Journal of Economics \bvolume50 \bpages532–539. \endbibitem
- Shen (2015) {bphdthesis}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. (\byear2015). \btitleDimensional analysis in statistics: theories, methodologies and applications, \btypePhD thesis, \bpublisherThe Pennsylvania State University. \endbibitem
- Shen and Lin (2018) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2018). \btitleA conjugate model for dimensional analysis. \bjournalTechnometrics \bvolume60 \bpages79–89. \endbibitem
- Shen and Lin (2019) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2019). \btitleStatistical theories for dimensional analysis. \bjournalStatistica Sinica \bvolume29 \bpages527–550. \endbibitem
- Shen et al. (2014) {barticle}[author] \bauthor\bsnmShen, \bfnmWeijie\binitsW., \bauthor\bsnmDavis, \bfnmTim\binitsT., \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. and \bauthor\bsnmNachtsheim, \bfnmChristopher J\binitsC. J. (\byear2014). \btitleDimensional analysis and its applications in statistics. \bjournalJournal of Quality Technology \bvolume46 \bpages185–198. \endbibitem
- Stevens (1946) {barticle}[author] \bauthor\bsnmStevens, \bfnmStanley Smith\binitsS. S. (\byear1946). \btitleOn the theory of scales of measurement. \bjournalScience \bvolume103 \bpages677-680. \endbibitem
- Stevens (1951) {barticle}[author] \bauthor\bsnmStevens, \bfnmStanley Smith\binitsS. S. (\byear1951). \btitleMathematics, measurement, and psychophysics. \bjournalHandbook of Experimental Psychology \bpages1-49. \endbibitem
- Taylor (2018) {barticle}[author] \bauthor\bsnmTaylor, \bfnmBarry N\binitsB. N. (\byear2018). \btitleQuantity calculus, fundamental constants, and SI units. \bjournalJournal of Research of the National Institute of Standards and Technology \bvolume123 \bpages123008. \endbibitem
- Velleman and Wilkinson (1993) {barticle}[author] \bauthor\bsnmVelleman, \bfnmPaul F\binitsP. F. and \bauthor\bsnmWilkinson, \bfnmLeland\binitsL. (\byear1993). \btitleNominal, ordinal, interval, and ratio typologies are misleading. \bjournalThe American Statistician \bvolume47 \bpages65–72. \endbibitem
- Vignaux and Scott (1999) {barticle}[author] \bauthor\bsnmVignaux, \bfnmVA\binitsV. and \bauthor\bsnmScott, \bfnmJL\binitsJ. (\byear1999). \btitleTheory & methods: Simplifying regression models using dimensional analysis. \bjournalAustralian & New Zealand Journal of Statistics \bvolume41 \bpages31–41. \endbibitem
- Ward (2017) {barticle}[author] \bauthor\bsnmWard, \bfnmLawrence M\binitsL. M. (\byear2017). \btitleSS Stevens’s invariant legacy: scale types and the power law. \bjournalAmerican Journal of Psychology \bvolume130 \bpages401–412. \endbibitem
- Wijsman (1967) {binproceedings}[author] \bauthor\bsnmWijsman, \bfnmRobert A\binitsR. A. (\byear1967). \btitleCross-sections of orbits and their application to densities of maximal invariants. In \bbooktitleProc. Fifth Berkeley Symp. on Math. Stat. and Prob \bvolume1 \bpages389–400. \endbibitem
- Wikipedia (2020) {bmisc}[author] \bauthor\bsnmWikipedia (\byear2020). \btitleTranscendental function. \bhowpublishedhttps://en.wikipedia.org/wiki/Transcendental_function. \bnoteAccessed 2020/02/24. \endbibitem
- Wong and Zidek (2018) {barticle}[author] \bauthor\bsnmWong, \bfnmSamuel WK\binitsS. W. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2018). \btitleDimensional and statistical foundations for accumulated damage models. \bjournalWood science and technology \bvolume52 \bpages45–65. \endbibitem
- Yang and Lin (2021) {barticle}[author] \bauthor\bsnmYang, \bfnmChing-Chi\binitsC.-C. and \bauthor\bsnmLin, \bfnmDennis KJ\binitsD. K. (\byear2021). \btitleA note on selection of basis quantities for dimensional analysis. \bjournalQuality Engineering \bvolume33 \bpages240–251. \endbibitem
- Yang, Zidek and Wong (2018) {barticle}[author] \bauthor\bsnmYang, \bfnmChun-Hao\binitsC.-H., \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. and \bauthor\bsnmWong, \bfnmSamuel WK\binitsS. W. (\byear2018). \btitleBayesian analysis of accumulated damage models in lumber reliability. \bjournalTechnometrics. \endbibitem
- Zhai et al. (2012a) {btechreport}[author] \bauthor\bsnmZhai, \bfnmYongliang\binitsY., \bauthor\bsnmPirvu, \bfnmCiprian\binitsC., \bauthor\bsnmHeckman, \bfnmNancy\binitsN., \bauthor\bsnmLum, \bfnmConroy\binitsC., \bauthor\bsnmWu, \bfnmLang\binitsL. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2012a). \btitleA review of dynamic duration of load models for lumber strength \btypeTechnical Report No. \bnumber270, \bpublisherDepartment of Statistics, University of British Columbia. \endbibitem
- Zhai et al. (2012b) {btechreport}[author] \bauthor\bsnmZhai, \bfnmYongliang\binitsY., \bauthor\bsnmHeckman, \bfnmNancy\binitsN., \bauthor\bsnmLum, \bfnmConroy\binitsC., \bauthor\bsnmPirvu, \bfnmCiprian\binitsC., \bauthor\bsnmWu, \bfnmlang\binitsl. and \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear2012b). \btitleStochastic models for the effects of duration of load on lumber properties \btypeTechnical Report No. \bnumber271, \bpublisherDepartment of Statistics, University of British Columbia. \endbibitem
- Zidek (1969) {barticle}[author] \bauthor\bsnmZidek, \bfnmJames V\binitsJ. V. (\byear1969). \btitleA representation of Bayes invariant procedures in terms of Haar measure. \bjournalAnnals of the Institute of Statistical Mathematics \bvolume21 \bpages291–308. \endbibitem