This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Detection and Amelioration of Social Engineering Vulnerability in Contingency Table Data using an Orthogonalised Log-linear Analysis

Glynn Rogers,  Malcolm Crompton, Gaurav Sapre, and Jonathan Chan Glynn Rogers, Independent Researcher, glynn.rogers@internode.on.net, Corresponding author,Malcolm Crompton, Information Integrity Solutions Pty Ltd, mcrompton@iispartners.comGaurav Sapre, Data61, CSIRO, Australia, Gaurav.Sapre@hotmail.comJonathan Chan, Data61, CSIRO, Australia, Jonathan.Chan@data61.csiro.au
Abstract

Social Engineering has emerged as a significant threat in cyber security. In a dialog based attack, by having enough of a potential victim’s personal data to be convincing, a social engineer impersonates the victim in order to manipulate the attack’s target into revealing sufficient information for accessing the victim’s accounts etc. We utilise the developing understanding of human information processing in the Information Sciences to characterise the vulnerability of the target to manipulation and to propose a form of countermeasure. Our focus is on the possibility of the social engineer being able to build the victim’s profile by, in part, inferring personal attribute values from statistical information available either informally, from general knowledge, or, more formally, from some public database. We use an orthogonalised log linear analysis of data in the form of a contingence table to develop a measure of how susceptible particular subtables are to probabilistic inference as the basis for our proposed countermeasure. This is based on the observation that inference relies on a high degree of non-uniformity and exploits the orthogonality of the analysis to define the measure in terms of subspace projections.

Index Terms:
cyber security, social engineering, data privacy, identity, personal attributes, identity impersonation, contingency table, log linear, geometric marginalisation, de-personalisation.

I Introduction

I-A The Social Engineering Problem

Social engineering attacks on cyber infrastructure constitute a large and growing economic problem. As the comprehensive literature review in [1] reveals, the term ‘social engineering’ has been applied to a wide variety of illegitimate methods of gaining advantage ranging from simple breaches of physical security to sophisticated phishing campaigns [2]. However the central feature of social engineering that emerges is the exploitation of human vulnerability through some form of interaction between attacker and the attack target.

This is the case even where an unauthorised person gains physical entry to secure facilities by following legitimate personnel through a controlled access point (‘tailgating’) or enlisting the aid of an authorised person by pretending to be struggling with a heavy load [4] [3]. Here, the attacker exploits the reluctance of people to confront a potentially awkward situation or the desire to be helpful which illustrates the general observation that humans are the weakest link in any security system [4].

Phishing is also becoming a major threat to cyber security. From the discussion in [2] it is clear that there is a close relationship with social engineering, with some researchers describing phishing as a form of social engineering whereas others see phishing as employing social engineering. Regardless of the details, a unifying thread is the exploitation of human vulnerability for the creation of some form of deception for nefarious purposes.

Methods of defending against social engineering attacks are discussed in [4], [8] and [6]. A clearly defined and well documented security policy is regarded as essential in protecting information infrastructure including robust user authentication [4]. Two factor authentication is widely deployed and believed to be much more impervious to attack than simple passwords and PINS. However two factor authentication is itself vulnerable to social engineering [9] [10] by essentially tricking the victim into providing the verification code to the social engineer using the type of technique described in [9] and [10].

This illustrates the inescapable fact that the central goal of social engineering is to evade whatever security mechanisms are deployed to protect the target system by exploiting human vulnerabilities [8] [6] [5]. Grimes [9] observes, from an industry perspective, that “\cdots even the very best technical controls will allow some amount of social engineering to get by your defenses. What will save you then is good security awareness training.” Indeed [8] cites case studies which “\cdots demonstrate that security awareness is the crucial and most effective tool in the fight against social engineering attacks \cdots”.

Security awareness training is discussed from various points of view in [8], [6], [11] and [9] with the caution that this needs to be ongoing to overcome the natural tendency to forget or habituate [8]. Resistance training, “\cdots aimed at making employees resilient against persuasion techniques that a social engineer may employ” [4] are also advocated by some, particularly where the interaction is in the form of a dialog.

Extreme types of security awareness are proposals, discussed in [6] and [11], for a system to monitor the emotional state of, for example, call centre operators, to ensure they are in the correct frame of mind to resist social engineering. However we regard this type of intrusive approach as draconian and likely to be ineffective because it would generate resentment and encourage evasion by the operator as well as being disruptive when the monitor decreed the operator to be in the wrong frame of mind.

A more benign and, we suggest, more effective approach is to provide potential victims with an automated assistant which monitors the interaction with an attacker to detect indications that a social engineering attempt may be underway. For example, Bhakta and Harris [11] describe a system for analysing each line in a text based interaction to detect instances of a predefined ‘topic black list’ warning the potential victim and perhaps taking preemptive action to prevent a security violation. Similarly, in this paper we accept that the vulnerability of human operators to social engineering is inevitable and describe a system to assist them is assessing the overall reliability of the information they are being presented with.

I-B The Nature of Social Engineering

The centrality of human vulnerability in social engineering is emphasised by the philosophically well grounded definition proposed in [1] Section IV:

Definition I.1.

“In the context of cybersecurity, social engineering is a type of attack wherein the attacker(s) exploit human vulnerabilities by means of social interaction to breach cyber security, with or without the use of technical means and technical vulnerabilities.”

Continuing research in social engineering requires a sound intellectual foundation, beginning with a consistent ontology which then provides the basis for the underlying conceptual scheme. This, in turn, enables the empirical knowledge base to be ordered by constructing more specialised taxonomies.

Previous work on developing conceptual models and taxonomies is reviewed in [6]. Recently, Wang et. al. [12] have proposed a comprehensive foundation for social engineering research comprising an ontology and conceptual model with consequent taxonomies. Their conceptual model has eleven components, [12] Fig. 5, with each of these components having their own taxonomies populated by the ontology.

Defining a set of formal relations between the components of the conceptual model leads to a knowledge graph in which the vertices represent the elements of the ontology and the edges represent the relations between the conceptual model components whose taxonomies include the elements. All of this is machine readable so that the knowledge graph can be constructed computationally from empirical data on actual social engineering attack scenarios and then analysed to reveal general features of attacks.

For example the vertex degree, i.e. the number of edges associated with a vertex, is an indicator of how frequently an ontological element features in attacks. A relatively high degree can reveal a particular human or system vulnerability so that an alarm can be generated when it occurs during an attack or special attention can be paid to it in security awareness training [12].

Human vulnerability is examined in detail in [6] which proposes an “I-E Model of Human Weakness” as a basis for investigating social engineering attacks and potential defences. Here, ‘I’ denotes internal which refers to the psychological characteristics of potential victims whereas ‘E’ denotes external referring to the social environment in which manipulation of the victim occurs. Whereas [6] uses the term ‘socio - psychology’ as a general description of this class of model, we prefer the term ‘psycho - social’ because it is consistent with the description of similar issues in the health sciences [7] where the internal-external dichotomy is also employed, although the social environment there is somewhat different.

Because “\cdots the success of social engineering relies heavily on the information gathered \cdots[12] one of the conceptual model components is ‘social engineering information’. The importance of information is also stressed in [4] and [2], where ‘information gathering’ is listed as the first step in the attack cycle, and in [5] where ‘information requirements analysis’ is described as one of the prerequisites of an attack.

II Method

II-A The Dialog Based Social Engineering Context

While we agree with the generality of Definition I.1, in this paper we focus on that class of cyber security breach where the social engineer manipulates the target of an attack into revealing confidential information to be used in perpetrating some type of fraud. We distinguish between the target of the social engineering attack, on the one hand, and the victim of the subsequent fraud on the other. Whereas the target is a single human entity, the victim may be a corporate entity or the many customers of that corporate entity.

In a typical phishing attack the target is also the victim [2]. However in another class of attack, to which call centres in particular are susceptible, the social engineer attempts to persuade the target that the social engineer is, or represents, the intended victim by entering into some form of dialog. The immediate goal is to convince the target that the attacker is entitled to privileged information, information which can range from a simple account password to commercial or national secrets. If convinced, the target reveals the information to the attacker who then uses it to perpetrate some form of fraud against the victim. Identity theft, a growing problem, is a particular case of this form of impersonation.

Clearly the social engineer has to create a false identity based on some form of profile which will establish the credibility of the social engineer as the source of the information on which the attack is based. The wider question of assessing credibility in social networks has been surveyed in [13] which asserts that credibility is a synonym for believability and trustworthiness although the authors concede the multidimensional nature of credibility in the social network context. Multidimensionality is also stressed in [15] where trustworthiness and expertise are described as primary markers of source credibility. The basis of judgements of source credibility made by Facebook users is examined in [20] in four dimensions, one of which is sincerity. Judgements in this dimension were made in part on the basis of the source’s profile indicating the importance of that profile being sufficiently detailed to be convincing.

Source credibility has been examined in a variety of contexts, e.g. investigations of the evaluation of real and fake news articles [21] and the adoption of information in on-line communities [22]. In both cases, the theoretical context is provided by Dual-Process models of human information processing. Whereas the Heuristic-Systematic variant of the Dual-Process model is employed in [22], the work in [21] is based on another variant, the Elaboration Likelihood Model, originally developed to investigate persuasion.

The investigation of victimisation in phishing attacks in [23] is also based on the Heuristic-Systematic Model (HSM) where information processing occurs in two modes. In the heuristic mode a message is evaluated quickly on the basis of a set of available cues, one of which is source credibility, whereas in the systematic mode the evaluation involves cognitive processes and is consequently much slower and more deliberative.

These two modes are not mutually exclusive and have been demonstrated to operate concurrently [22] [23] such that one can moderate the other i.e. one can reinforce the other (referred to as additivity) or one can attenuate the effects of the other. In particular, source credibility has been demonstrated to reinforce the results of systematic processing [22] and, conversely, systematic processing can attenuate the effect of heuristic cues such as source credibility.

In the type of phishing attack considered here, the social engineer presents an argument aimed at persuading the target to perform some action - supplying confidential information or logging into a fake website for example. The target invokes some form of heuristic-systematic process to evaluate the argument. However if systematic processing dominates, the flaws in the argument are likely to be exposed [23] so it is in the social engineer’s interest to suppress systematic processing in favour of heuristic processing by manipulation of the target. One method of achieving this is to place the target under time pressure which suppresses systematic processing [23] [22].

An important aspect of the HSM is that it recognises that people will terminate the process when they feel comfortable with the judgement that has been evolving. In the model this is referred to as reaching the individual’s sufficiency threshold which depends on the prevailing circumstances. Lowering the sufficiency threshold will result in the individual favouring the heuristic over the systematic mode.

In [23], it is proposed that the success of an attack depends, at least in part, on using an attack pretext to suppress the target’s sufficiency threshold. Pretexting [23] [1] is where the social engineer, in preparing the attack, creates a scenario which provides the informational context within which the attack takes place, a context which can be also designed to reinforce the perception of source credibility. Indeed we regard the pretext as a core component of the type of dialog based phishing attack considered in this paper.

II-B Information Insufficiency and the Socially Engineered Phishing Attack Model

However we believe that a more detailed account of the relationship between the pretext and the sufficiency threshold can be developed from the Risk Information and Seeking Processing (RISP) model of information science which incorporates the Heuristic-Systematic Model. The RISP Model has been widely researched in the information sciences as an explanation for how and why people seek additional information in the context of risk (see [24] for a review and discussion). While developing the full relationship between social engineering and RISP is beyond the scope of this paper, the essential idea for our purposes is as follows.

Firstly we observe that risk is inherent in a dialog based phishing attack and that risk is perceived at some level by the target. One of the main components of RISP is information insufficiency which is defined as the difference between an individual’s current knowledge level and their sufficiency threshold. A higher level of information insufficiency, all else being equal, will drive the individual towards higher levels of information seeking and processing [25] and, consequently, to bias their processing mode towards systematic processing.

In light of this we propose that pretexting, rather than lowering the sufficiency threshold as hypothesised in [23], instead raises the perceived level of current knowledge relevant to the attack. The pretext thus decreases the level of information insufficiency thereby reducing the perceived need to seek additional information by interaction with the attacker. Furthermore a carefully crafted pretext can lower the level to the point where the heuristics will override any doubts that may arise from systematic processing of the pretext. Consequently pretexting complements source credibility by ensuring the target is more strongly influenced by source credibility and other heuristics than by the merits of the arguments presented by the social engineer.

All of these factors lead us to define a model of a socially engineered phishing (SEP) attack, which we refer to as a SEP attack model, as:

Definition II.1.

The SEP attack model has the following three components:

  1. 1.

    the source i.e. the attacker, with an associated source credibility,

  2. 2.

    the pretext providing the informational context in which the attack occurs and

  3. 3.

    the set of demands and the argument for why the target should conform to them.

Note that all three components are fake but presumably sufficiently plausible that the target is willing to engage with the attacker.

Credibility generally increases with familiarity [15] which implies that a completely unfamiliar pretext will be regarded with considerable skepticism by the target suggesting the engagement of some level of systematic processing. Provisionally accepting the pretext implies that at least some of the components of the pretext are held by the target to be true. Because a high level of systematic processing would require that the large proportion of the components be held to be true, suppressing the sufficiency threshold reduces the proportion of components that the target needs to recognise as true in order to accept the pretext.

The social engineer is then free to invent the remaining components but with the constraint that consistency is important for credibility [15] so that inconsistency in the information presented to the target results in a greater reliance on systematic processing [22]. This clearly conflicts with the original aim of ensuring the dominance of heuristic processing. Consequently, the fake components of the pretext or message must be chosen to appear consistent with the components that the social engineer can assume the target will regard as true on the basis of the social engineer’s research into the target’s background.

All of this emphasises the critical role played by information as was stressed in Section I-B. It becomes clear that a social engineer needs to mount a significant effort to painstakingly acquire the information required for a successful attack. The nature of the information is such that the acquisition process can be indirect and complex so that social engineers “\cdots will tie little pieces of information they have acquired over time, decipher cues and signals given to them by multiple employees, and then connect the pieces of the jigsaw puzzle to unearth the information they have been after” [8].

Previous work has focussed on the exploitation of whatever factual information the social engineer can extract directly from available sources. However we argue that the ‘pieces of the jigsaw puzzle’ acquired by a social engineer will in general be incomplete and perhaps inconsistent so that ‘unearth(ing) the information’ will require significantly more that merely assembling a collection of facts. This is particularly the case with the type of personal detail that will lend a strong sense of authenticity to the pretext.

Specifically, we base this paper on our assertion that information acquisition can be enhanced by inferring additional information from, not only that already acquired but, importantly, contextual information as well. Rarely, however, will this be a straightforward matter of deductive inference leading to certainty. Instead the inferences will be probabilistic in nature.

II-C Probabilistic Inference, Analogical Inference, and Social Engineering

Probabilistic inference techniques are finding increasing application in database security research particularly in characterising and detecting network based attacks. In fact in a database inference attack, inference procedures are an intrinsic component of the attack methodology itself and developing effective countermeasures against inference attacks is an established aspect of database management research [16]. The underlying concept is that of an ‘inference channel’ which is generated by the patterns of association in the data which support inferences.

Of greater relevance to this paper are a number of techniques which have been developed to infer unknown (latent) attribute or demographic values from a known set derived from social network data. For example a machine learning method based on feature vectors is described in [26] whereas probabilistic graph theoretic techniques based on Markov Random Field concepts are proposed in [27] and [28].

Underlying the probabilistic approaches is the characterisation of the set of attributes as random variables described by a multivariate probability distribution which is estimated, at least implicitly, from the known values. However in both [27] and [28] the computational problem is made tractable by using binary variables and by adopting a pairwise Markov Random Field Model. The latter places quite strong constraints on the form of the multivariate distribution restricting it to a product of univariate and bivariate ‘potential’ functions [29]. In addition, the assortativity [27] of the associated social network defines a neighbourhood structure which further restricts the form of distribution with consequent restrictions on the conditional independencies between the attributes [29].

Even with their simplifying assumptions these are not simple, straightforward procedures either conceptually or computationally. To be clear, we are not suggesting that social engineers in general have the capacity to utilise these techniques to infer missing information even where the data was available. Instead we develop the argument below that the social engineer will make these inferences analogically based on a broad understanding of the socioeconomic context and human nature.

To formalise these ideas we use the terminology of Clarke [30], which distinguishes between an entity and an identity. An entity is something in the world, not necessarily human, which can present many different identities depending on the particular role that entity is playing in the relevant context. Whereas the social engineer’s intended victim is an entity, it is that entity’s role in transacting with the organisation represented by the human target that is the social engineer’s focus.

Both the social engineer and the target are dealing with the particular identity presented by the entity comprising that subset of the entities attributes required for the role. Identities are distinguished by the values that those attributes have in each particular case, those values providing the basis for identification.

For identification purposes, the target will have access to a record associated with the victim in the form of a digital persona which Clarke defines as “(t)he collection of data stored in a record (which) is designed to be rich enough to provide the record-holder with an adequate image of the represented entity or identity” [30]. Bearing in mind that the digital persona is, by design, a limited view of the victim, the key aspect of this is the social engineer having a sufficiently plausible profile of the victim to convince the target that the social engineer is indeed the victim.

Unfortunately this is made increasingly easy by the eagerness of people in general to place personal details in specific profiles on the various forms of social media. However, even hiding identities behind handles is an increasingly flimsy defence [1] so that a social engineer is increasingly able to acquire key elements of the identity relevant to the attack.

An important component of the social engineer’s skill set is collecting these identity elements from a variety of public sources. Nevertheless, in general, publicly available information will be insufficient for plausibility, requiring supplementary information to flesh out the persona giving the target the sense that they are indeed communicating with the victim. For this to succeed the supplementary information need not be accurate but merely consistent with the digital persona.

One of the key proposals of this paper is that this supplementary information can be derived by the social engineer, in part by inference from already obtained information perhaps by formal statistical techniques or, more than likely, by using general background knowledge of existing socio-economic patterns. Indeed we assume that astute social engineers are adept at observing and exploiting patterns in the socio-economic events surrounding them. Experimental psychology has demonstrated a clear connection between explanation, on the one hand, and learning and inference on the other [31][32] so we assume, in addition, that those patterns which have an explanatory context, will predominate.

The implication is that the social engineer makes inferences in an explanatory framework. Based on the close psychological connection between explanation and analogy [33], we believe that this is best categorised as a form of analogical inference. For example a young person living in a country town is in an analogous situation to many young people living in many country towns. It can then be inferred that a particular young person is very likely to be unemployed because the economies of many country towns are contracting offering limited employment opportunities to young people as is well known.

The essential point is that it is this analogous explanatory framework which lends plausibility, in the mind of the target, to the inferred information used in perpetrating the fraud rather than the formal statistical properties of the publicly available information. Consequently we do not require social engineers to have access to a formal database, nor to have the skills to analyse the data even if they did. Nevertheless the social engineer will be well aware informally of those statistical properties which will form part of the motivation for drawing the inferences depending on the perceived level of statistical relevance.

The notion of statistical relevance in its various forms has played a role in the philosophical analysis of explanation in a scientific context since [34]. In experimental psychology the relationship between explanatory power and statistical relevance has been demonstrated using simple measures,[35]. At a more theoretical level, Schupbach and Sprenger [36] characterise statistical relevance in terms of the degree to which one proposition H (e.g. a hypothesis) makes a second proposition E (e.g. empirical evidence) less surprising. They then use this in a Bayesian context as part of their definition of a measure of explanatory power which Schupbach [37] has demonstrated successfully accounts for the judgements of explanatory power made by subjects in a series of psychological experiments.

All of this suggests that astute social engineers will generally base their analogical inferences on informal assessments of statistical relevance derived from observation of socio-economic patterns. However, instead of attempting to describe and analyse analogical inferences, we focus on the more easily defined task of analysing the underlying probabilistic structure that the psychological evidence suggests reflects and supports analogical inference. While the explanatory aspects of the type of analogical inference referred to here have a strong causal flavour, we will regard probabilistic information as purely associative having no sense of causality [38]. However we recognise that this creates the potential for false positives.

By analysing the probabilistic structure of the publicly available information we can anticipate the drawing of analogical inferences and a key outcome of our research is an indicator of the degree to which a dataset is susceptible to inference. This indicator can be exploited to design countermeasures that reveal, and warn of, the patterns of association that underly the analogical inference, alerting potential targets to the possibility that information that they might be presented with is unreliable.

II-D Contributions of the Paper

We have discussed the critical role played by information in Sections I-B and II-B with an essential part of the attack preparation being the careful assembly of as much detail about the attack victim and context as is feasible [12] [4] [2] [5]. This information is used the construct the pretext which is an essential component of a social engineering phishing attack. The Heuristic-Systematic (HS) model of information processing developed in the Information Sciences has previously been used in [23] as a basis for analysing the interaction between attacker and target.

However, because the attack occurs in the context of risk, in this paper we extend that analysis by adopting the Risk Information Seeking and Processing (RISP) model [25] which incorporates the HS model. An important component of the RISP model is information insufficiency which acts as the primary motivator for seeking and processing further information. In Section II-B we invoke the RISP model to argue the importance of a sufficiently convincing pretext to reduce the target’s information insufficiency level to the point where the target avoids the systematic mode of processing. The target then does not subject the available information to the more detailed analysis which might raise doubts in the target’s mind about the veracity of the attacker’s claims.

Our underlying thesis is that the details revealed by the attacker’s careful assembly of existing information are generally insufficient for building a convincing pretext. However they can be sufficient for the social engineer to infer probabilistically enough additional details to persuade the target that the claims being made in the pretext are valid if the target processes the claims heuristically. Consequently a seemingly convincing pretext may be less substantial than it appears because it is based partly on inferred attribute values. This raises the question of whether this situation can be detected and, if so, can the degree to which a pretext is dependent on inference be estimated. We seek to answer this question in this paper.

Our primary goal, then, is to devise a means of advising potential targets, particularly in informal database environments, that certain combinations of attributes might be unreliable for identification because the values of some are inferable from others. In other words the social engineer is likely to have inferred some attribute values thus appearing to be the victim who of course would know those values. We believe that our approach of using indications of inference as the basis for advising targets of the need for caution has not previously been reported in the literature.

Note that this is quite distinct from describing methods for actually drawing inferences, using, for example, the conceptually and computationally complex processes described in [26], [27], and [28]. Instead, in Section III-A below we develop a method of estimating the potential of a data set for inference to be performed without actually computing the inferences thus avoiding much of the complexity.

In the following we proceed from the observation that inference relies on relationships between the attributes where, in this case, the relationships are not only explanatory, thus supporting analogical inference, but are also manifested statistically enabling complementary probabilistic inference. The principal theoretical focus of the paper, then, is on the identification of attributes where unknown attribute values can be estimated by probabilistic inference from some database using the known values of other attributes.

Like [17], and [19] we assume the existence of a database derived from some defined population from which can be extracted a multidimensional contingency table 𝒯{\mathcal{T}} in which the cells of the table are indexed by an ordered index vector 𝐈\mathbf{I}. Each cell contains the number of members of the population with the combination of attribute values referenced by the index so the table can be conceptualised as an unnormalised discrete multivariate probability distribution. However, instead of using the posterior joint distribution [19] directly or the posterior marginal distribution [17], we employ a log linear transformation of the prior distribution to analyse the probabilistic relations between subsets of attributes which give rise to the structure in the table.

Because inferences are drawn from prior information we focus on the contingency table equivalent of a conditional probability distribution which we refer to as a conditional subtable. We will assume inference is immediate in the sense that the conditional distribution is dominated by, i.e. concentrated on, a particular combination of attribute values. These can then be taken as the unknown values with high likelihood.

We develop a novel computational procedure for detecting particular subsets of attributes which have conditional subtables dominated by a small set of values without having to analyse each individual subtable. The basis of this procedure is the observation that a uniform distribution is completely uninformative and that, in order for probabilistic inference to be viable, a distribution must exhibit significant deviation from uniformity. This is because of some property of whatever mechanism is generating the cell contents - a mechanism that causes some cells to be significantly more probable than others.

In other words these cells are salient and we measure salience as a deviation from uniformity. Roughly speaking we argue that a high level of salience in a subset of attributes results in the multivariate conditional subtable of those attributes being concentrated on just some small number of the attribute levels in the subtable. This does not require that the dominant cells have any topological relationship. In particular it does not require that they be neighbours so that standard measures based, for example, on second moments, are not suitable for our purpose.

In Section III-B we develop the vector space theory underlying our analysis utilising the orthogonal log linear transformation introduced by Dahinden et. al. [39]. Importantly this is based on projecting the vector representing the logarithm of the table onto a subspace orthogonal to the uniform vector. After analysing the orthogonalised log linear design matrix we state and prove Theorem III.1 in Section III-D. This novel result enables us to reduce a large contingency table to one involving a subset of attributes using geometrical means as a form of marginalisation while preserving salience as much as possible.

Theorem III.1 also provides the basis for our definition of a novel indicator, Probabilistic Salience, of the degree to which statistical data can support an inference that particular attributes have specific values (Definition III.2). We describe a means of evaluating that indicator from the data and demonstrate how it can identify those subsets of attributes in a contingency table which are most vulnerable to inferencing . Then, in Section III-E, we show why Probabilistic Salience does indeed measure the degree to which a subset of attributes is dominated by a relatively small number of members.

Most of Section III-B1 as well as the rest of Section III-B report the results of research which, to the best of our knowledge, is original. We believe that this work also makes a significant contribution to the contingency table literature because we were unable to find anything there that investigated this type of problem.

We envisage a novel system which employs Probabilistic Salience to generate a warning when some of the attribute values being claimed by a social engineer might have been inferred from others without having to classify them. The warning would take account of the level of any formal authentication provided and could be as simple as a red, amber, green traffic light indication. While this system would be most easily implemented in a text based dialog we see nothing in principle which would prevent speech recognition being used in a verbal dialog. Such a system could complement the topic blacklist system described in [11] as well as that referred to in [12] and might form a component of a more comprehensive intelligent assistant for combatting social engineering attacks in real time.

The central objective of our proposed probabilistic salience based warning system is to increase the target’s information insufficiency level by inducing the target to derate the inferred attributes thereby reducing the target’s current knowledge level. If the inferred content is large enough, the information insufficiency level will increase to the point where the target transitions to an information processing mode dominated by systematic processing.

In this mode the target could be expected to take a more deliberative and analytic approach to evaluating the attacker’s claims and expose the inevitable flaws and inconsistencies leading to a rejection of the attack. It is not intended that the system directly intervene in the attack by, for example, preemptively terminating the contact.

Investigating the conditions under which probabilistic inferencing can be performed, and assessing the potential for probabilistic inferences to be drawn is, to the best of our knowledge, a novel approach which this paper brings to the understanding of social engineering. Furthermore this novel approach is able to draw upon the understanding of human information processing being developed in the information sciences to begin constructing techniques for actively counteracting social engineering. We suggest that this is an example of a potentially fertile area of research applying the characteristics of human information processing in the context of risk to the development of effective countermeasures to social engineering.

Finally in Section V we propose a potential application of Probabilistic Salience and Theorem III.1 in what we refer to as ‘de-personalisation’ of a data set by analogy with the de-identification techniques used in data privacy. The objective here is to inhibit probabilistic inference by selectively reducing the data’s Probabilistic Salience without excessively compromising its utility.

III Theory

III-A The Multidimensional Contingency Table

The general problem area is that of NN dimensional contingency tables [40] formed from an ordered set of categorical variables, i.e. attributes, C={CN1,C0}C=\{C_{N-1},\cdots C_{0}\}. Each variable consists of a set of values or levels such that, for example, the kkth variable, CkC_{k}, comprises the MkM_{k} levels {ak1,ak2akMk}\{a_{k1},a_{k2}\cdots a_{kM_{k}}\}. There are MT=k=1NMkM_{T}=\prod_{k=1}^{N}M_{k} cells in the table, each being identified by an ordered index set

𝖎=𝔦N1,𝔦N2𝔦0\boldsymbol{\mathfrak{i}}=\langle\mathfrak{i}_{N-1},\mathfrak{i}_{N-2}\cdots\mathfrak{i}_{0}\rangle

where 𝔦k{0Mk1}\mathfrak{i}_{k}\in\{0\cdots M_{k}-1\} so that each attribute CkC_{k} is directly associated with the index 𝔦k\mathfrak{i}_{k}. Each cell contains the number of members of the population which exhibit the joint occurrence of the attribute levels. However, for notational convenience in what follows, there will be the same number of levels in each attribute, MM, so that MT=MNM_{T}=M^{N}.

We regard the table as a geometric entity, in this case a hypercube of cells. To provide flexibility in associating an NN element vector of attribute values with the indices of the corresponding cell, we adopt the convention of representing the kkth ‘axis’ of the hypercube by the NN element standard basis vector 𝐞𝐤\bf{e}_{k} in which the kkth element is ‘11’ with the remainder zero. This enables the set of attributes to be referenced by the index matrix

𝐈=|𝐞N1𝐞0|.\mathbf{I}=\left|\mathbf{e}_{N\!-\!1}\cdots\mathbf{e}_{0}\right|. (1)

If the levels of the attributes designating the \ellth cell are collected in the vector 𝐚=aN1𝐚0\mathbf{a}_{\ell}=\langle a_{\ell_{N\!-\!1}}\cdots\mathbf{a}_{\ell_{0}}\rangle, 𝐚j{0Mk1}\mathbf{a}_{\ell_{j}}\in\{0\cdots M_{k}-1\}, the indices of the cell are given by

𝖎=𝐈𝐚.\boldsymbol{\mathfrak{i}}_{\ell}=\mathbf{I}\>\mathbf{a}_{\ell}. (2)

The index matrix construct (1) also provides a mechanism for dealing with subtables. If CkC1C_{\ell_{k}}\cdots C_{\ell_{1}} is some subset 𝖼\mathbf{\mathsf{c}}^{\ell} of kk attributes, the attribute index vector 𝐢𝖼s\mathbf{i}^{s}_{\mathbf{\mathsf{c}}} is

𝐢𝖼s=iki1\mathbf{i}^{s}_{\mathbf{\mathsf{c}}}=\langle i_{\ell_{k}}\cdots i_{\ell_{1}}\rangle

where there is no implication that the integers {k1}\{\ell_{k}\cdots\ell_{1}\} are contiguous. Using 𝐢𝖼\mathbf{i}_{\mathbf{\mathsf{c}}} as a key provides the subtable index matrix

𝐈𝐬=|𝐞k𝐞1|.\mathbf{I^{s}}=\left|\mathbf{e}_{\ell_{k}}\cdots\mathbf{e}_{\ell_{1}}\right|. (3)

The contingency table is assumed to be a member of an ensemble of similar tables with the common characteristic that they represent the same population so that their respective entries sum to the population size say NTN_{T}. More formally this is equivalent to assuming that each table is drawn from a multinomial distribution although we will not make use of this fact. These tables can be represented as vectors by ordering the table indices according to some rule. Here we choose a lexicographic ordering in which the index, 𝔦0\mathfrak{i}_{0} varies the most rapidly followed by 𝔦1\mathfrak{i}_{1} and so on resulting in a table vector, 𝒯MT\mathcal{T}\in\mathfrak{R}^{M_{T}}.

Much of the literature on contingency tables is concerned with the estimation of the cell contents from incomplete data. However, to avoid being distracted by estimation procedures, important though they are, we will assume that the data in the cells is a reliable representation of the population. The only complication is presented by zero entries in one or more cells but we will take a pragmatic approach here and replace zeros by ones by applying an affine transformation to the table with minimal impact on the conclusions for reasons that will become apparent below.

III-B The Log Linear Transformation

The log linear transformation is based on the cell by cell logarithmic transformation of a contingency table vector 𝒯=[ρ0,ρMT]𝖳{\mathcal{T}}=[\rho_{0},\cdots\rho_{M_{T}}]^{\mathsf{T}} to generate a new table 𝐓=[γ0γMT]𝖳\mathbf{T}=[\gamma_{0}\cdots\gamma_{M_{T}}]^{\mathsf{T}} where γ=logρ\gamma=\log\rho. In its general form the log linear transformation is ([41])

𝐓(𝖎)=𝖼Cξ𝖼(𝐢𝖼)=ξ0+=1Nξ+|𝖼|=2𝖼Cξ𝖼(𝐢𝖼)+|𝖼|=3𝖼Cξ𝖼(𝐢𝖼)\mathbf{T}(\boldsymbol{\mathfrak{i}})=\sum_{\mathbf{\mathsf{c}}{\subseteq C}}\xi_{\mathbf{\mathsf{c}}}(\mathbf{i}_{\mathbf{\mathsf{c}}})=\xi_{0}+\sum_{\ell=1}^{N}\xi_{\ell}+\!\!\sum_{\stackrel{{\scriptstyle\scriptstyle{\mathsf{c}}\subseteq C}}{{{\scriptstyle|\mathsf{c}|=2}}}}\!\!\xi_{{\mathbf{\mathsf{c}}}}(\mathbf{i}_{\mathsf{c}})+\!\!\sum_{\stackrel{{\scriptstyle\scriptstyle{\mathsf{c}}\subseteq C}}{{{\scriptstyle|\mathsf{c}|=3}}}}\!\!\xi_{{\mathbf{\mathsf{c}}}}(\mathbf{i}_{\mathsf{c}})\cdots (4)

where the ξ\xi functions are parameterised by a subset of the attributes.

In what follows it is essential to impose some form of ordering on the set of subsets 𝖼\mathbf{\mathsf{c}} i.e. on the power set of CC, (C)\mathbb{P}(C). In the ordering adopted here, the first term has |𝖼|=0|\mathbf{\mathsf{c}}|=0, i.e. the null set representing the constant term. Next comes the group of subsets with |𝖼|=1|\mathbf{\mathsf{c}}|=1 representing the main effects , then those with |𝖼|=2|\mathbf{\mathsf{c}}|=2 and so on. The terms with |𝖼|2|\mathbf{\mathsf{c}}|\geq 2 are referred to as ‘interaction terms’ because they represent probabilistic interactions between the members of the subset [41][29], beginning with first order interactions between pairs of attributes. Within each group the subsets are ordered following the lexicographic principle applied to their member’s indices. For example where |𝖼|=2|{\mathbf{\mathsf{c}}}|=2, i.e. first order interactions, the ordering is

CN1CN2,CN1CN3CN1C0,CN2CN3C1C0.C_{N\!-\!1}C_{N\!-\!2},\ C_{N\!-\!1}C_{N\!-\!3}\cdots C_{N\!-\!1}C_{0},\ C_{N\!-\!2}C_{N\!-\!3}\cdots C_{1}C_{0}. (5)

Clearly, defining the ξ\xi functions, which are in the form of an MTM_{T} dimensional vector, is a critical task in constructing the expansion. The key development on which the process described here is constructed has been introduced by Dahinden et. al. [39] where they express each ξ\xi function as a linear combination of a set of basis vectors which span the subspace containing it. There is one basis vector for each particular combination of the levels associated with a subset of attributes. The number of such basis vectors, i.e. the dimension of the subspace, is determined by the number of possible configurations of the levels which is M|𝖼|M^{|\mathbf{\mathsf{c}}|}. Each ξ\xi vector can then be obtained from the data itself by determining the coefficients of the linear combination.

In matrix form, the log-linear model is

𝐓=𝐗𝜶{\bf T}={\bf X}{\boldsymbol{\alpha}} (6)

where 𝐗{\bf X} is a design matrix and 𝜶{\boldsymbol{\alpha}} is a vector of coefficients of the form

α0α1α2αNα11α12αNNα111α112αNNN𝖳\langle\alpha_{0}\ \alpha_{1}\ \alpha_{2}\cdots\alpha_{N}\ \alpha_{11}\ \alpha_{12}\cdots\alpha_{NN}\ \alpha_{111}\ \alpha_{112}\cdots\alpha_{NNN}\cdots\rangle^{\mathsf{T}}

The basis vectors of a particular ξ\xi function then form a subset of the column vectors of XX i.e. a submatrix of XX where each submatrix represents one of the subsets, 𝖼\mathsf{c}, of attributes involved in a particular level of interaction. Dahinden et. al. label the submatrices so generated as XCk1,Ck2Ck|𝗰|=X𝖼kX^{C_{k1},C_{k2}\cdots C_{k|{\boldsymbol{\mathsf{c}}}|}}=X^{\mathsf{c}_{k}} where the elements of each submatrix are either zero or one. The algorithm for generating the basis vectors is given in [39] but the essential idea is described below. However, because the detailed form of the design matrix is a critical part of our development, we examine its construction in more detail in Section III-C.

To proceed, it is necessary to introduce some additional notation. Firstly let kk be the interaction index, k=|𝗰|k=|\boldsymbol{\mathsf{c}}|, so that k=0k=0 indexes the constant term in (4), k=1k=1 the main effects and so on. For each value of kk, the number of submatrices of the form X𝗰iX^{\boldsymbol{\mathsf{c}}_{i}} is Nk=(Nk)N_{k}=\binom{N}{k} and the number of columns in each submatrix is MkM^{k}, one column for each of the possible combinations of levels in the set of kk attributes. Keep in mind that each cell in the table 𝒯{\mathcal{T}} is denoted by a specific index vector 𝖎=𝔦N1𝔦j𝔦0,𝔦j{0M1}\boldsymbol{\mathfrak{i}}=\langle\mathfrak{i}_{N-1}\cdots\mathfrak{i}_{j}\cdots\mathfrak{i}_{0}\rangle,\;\mathfrak{i}_{j}\in\{0\cdots M-1\} so that the rrth component of the table vector 𝐓\mathbf{T} is denoted by

r=j=0(N1)𝔦jMj=(𝐑)M.r=\sum_{j=0}^{(N-1)}\mathfrak{i}_{j}M^{j}=(\mathbf{R})_{M}. (7)

where (𝐑)M(\mathbf{R})_{M} is the MM-ary number 𝖎\boldsymbol{\mathfrak{i}} expressed in radix MM form. Note that this is simply the operation of counting through the elements of the table vector with radix MM.

Consider one specific value of kk and one specific submatrix, X𝗰kX^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}, where 𝗰k𝗰k(C)s.t.|𝗰k|=k\boldsymbol{\mathsf{c}}^{\ell}_{k}\in\boldsymbol{\mathsf{c}}_{k}\subseteq\mathbb{P}(C)\ s.t.\ |\boldsymbol{\mathsf{c}}_{k}|=k and ={1(Nk)}\ell\in\boldsymbol{\ell}=\{1\cdots\binom{N}{k}\} which specifies the subsets 𝗰k\boldsymbol{\mathsf{c}}^{\ell}_{k} of 𝗰k\boldsymbol{\mathsf{c}}_{k}. For each value of \ell, the attributes are designated by the ordered vector of attribute numbers 𝐢ks=iki1\mathbf{i}^{s_{\ell}}_{k}=\langle i_{\ell_{k}}\cdots i_{\ell_{1}}\rangle so that e.g. CikC_{i_{\ell_{k}}} is the leftmost attribute in the subset. Then each column of the submatrix represents one particular combination of the levels of the subset of attributes designated by 𝐢ks\mathbf{i}^{s_{\ell}}_{k}. Note that there is no implication that the kk indices are contiguous which, in general, they are not as indicated by the example (5).

If the levels of the attributes are a{0M1}a\in\{0\cdots M-1\}, the vector 𝐚k,=aka1\mathbf{a}^{k,\ell}=\langle a_{\ell_{k}}\cdots a_{\ell_{1}}\rangle, represents one combination of values of the attributes 𝗰k\boldsymbol{\mathsf{c}}^{\ell}_{k}. The ν\nuth column of the submatrix is then associated with a particular 𝐚jk,\mathbf{a}^{k,\ell}_{j} as well as 𝐢ks\mathbf{i}^{s_{\ell}}_{k}. The remaining (Nk){(N-k)} table axes form the vector 𝐢kg\mathbf{i}^{g_{\ell}}_{k} with corresponding attributes vector 𝐠k,=gNkg1\mathbf{g}^{k,\ell}=\langle g_{\ell_{N-k}}\cdots g_{\ell_{1}}\rangle. Using (7), the elements of the ν\nuth column vector having 𝐢ks\mathbf{i}^{s_{\ell}}_{k} and 𝐚jk,\mathbf{a}^{k,\ell}_{j} in common, {x}g\{x\}_{g}, are designated by the table index vector, from (2) and (3)

𝖎j=𝐈ks𝐚jk,+𝐈Nkg𝐠k,\boldsymbol{\mathfrak{i}}_{j}=\mathbf{I}^{s_{\ell}}_{k}\mathbf{a}^{k,\ell}_{j}+\mathbf{I}^{g_{\ell}}_{N-k}\mathbf{g}^{k,\ell} (8)

where the elements of 𝐠k,\mathbf{g}^{k,\ell} range over all MNkM^{N-k} combinations of the attribute levels.

To define the basis vector recognise that the available information does not allow the elements of {x}g\{x\}_{g} to be distinguished. Consequently the ν\nuth column vector becomes a basis vector by setting the elements of {x}g\{x\}_{g} to unity with the remaining elements zero.

The lexicographic ordering is a bijective mapping, so that if there is a one at a particular position in the ν\nuth column of the submatrix X𝗰kX^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}, there cannot be a one in the corresponding position in any of the remaining Mk1M^{k}-1 columns. Consequently, the columns of the submatrix in question, i.e. the basis vectors, are orthogonal. Furthermore, because the total number of ones in the submatrix is MkMNk=MNM^{k}M^{N-k}=M^{N}, the sum j=1MkXj𝗰k\sum_{j=1}^{M^{k}}{X_{j}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}} is equal to the unit vector X^0\widehat{X}^{0}.

The full matrix then has

1+NM+(Nk)Mk+MN=(1+M)N>MN1+NM+\cdots\binom{N}{k}M^{k}\cdots+M^{N}=(1+M)^{N}>M^{N}

columns so that (6) is very underdetermined. Conventionally the approach to this problem is to impose ‘identifiability’ by removing the null space of XX using, for example, the Moore-Penrose pseudo inverse. However this approach results in losing the explicit connection between the coefficient vector and the interaction terms.

Fortunately Darhinden et. al., [39], have derived an orthogonal form of XX which leads to a coefficient vector that can be related to the interaction terms in (4). The essential idea is as follows. Using a modified version of the notation in [39], the vector space, X^0\widehat{X}^{0} spanned by the constant vector k=0k=0 has dimension 1. Then the vector space X^C\widehat{X}^{C_{\ell}}, spanned by the columns of XCX^{C_{\ell}}, is composed of X^C=X^0\widehat{X}^{C_{\ell}}_{-}=\widehat{X}^{0} and its orthogonal complement, X^C\widehat{X}^{C_{\ell}}_{\perp}, in X^C\widehat{X}^{C_{\ell}} the orthogonal complement having dimension M1M-1. The space, X^C1C2\widehat{X}^{C_{\ell_{1}}C_{\ell_{2}}}, spanned by XC1C2X^{C_{\ell_{1}}C_{\ell_{2}}} is composed of X^C1C2={X^C1,X^C2}{\widehat{X}}^{C_{\ell_{1}}C_{\ell_{2}}}_{-}=\{\widehat{X}^{C_{\ell_{1}}},\widehat{X}^{C_{\ell_{2}}}\} and its orthogonal complement, X^C1C2{\widehat{X}}^{C_{\ell_{1}}C_{\ell_{2}}}_{\perp} in X^C1C2{\widehat{X}}^{C_{\ell_{1}}C_{\ell_{2}}} which has dimension M22(M1)1=(M1)2M^{2}-2(M-1)-1=(M-1)^{2}.

For the kkth interaction index, there are (Nk)\binom{N}{k} spaces of the form X^C1Ck\widehat{X}^{C_{\ell_{1}}\cdots C_{\ell_{k}}} where each of the (Nk)\binom{N}{k} sets {1k}\{\ell_{1}\cdots\ell_{k}\} is a distinct subset of the index set {1N}\{1\cdots N\}. The space spanned by the columns of X𝗰kX^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}, X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}, is composed of the subspace

X^𝗰k={X^C1Ck1,X^C1Ck2Ck,X^C2Ck}\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{-}=\{\widehat{X}^{C_{\ell_{1}}\cdots C_{\ell_{k-1}}},\widehat{X}^{C_{\ell_{1}}\cdots C_{\ell_{k-2}}C_{\ell_{k}}},\cdots\widehat{X}^{C_{\ell_{2}}\cdots C_{\ell_{k}}}\}

and its orthogonal complement in X^C1Ck\widehat{X}^{C_{\ell_{1}}\cdots C_{\ell_{k}}}, X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}. This orthogonal complement has dimension

Mk(1+(k1)(M1)++(kk1)(M1)k1)=(M1)k.M^{k}-\left(1+\binom{k}{1}(M-1)+\cdots+\binom{k}{k-1}(M-1)^{k-1}\right)=(M-1)^{k}. (9)

Beginning with the constant vector, the orthogonalised form of 𝐗\bf{X}, 𝐗¯{\overline{\bf X}}, is derived by successively projecting the columns of the submatrix X𝗰kX^{\boldsymbol{\mathsf{c}}^{\ell}_{k}} onto the orthogonal complement X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}. Then 𝐗¯{\overline{\bf X}} is constructed from the constant vector and the columns of the submatrices following the ordering described above. Consequently the number of columns is

1+N(M1)++(Nk)(M1)k++(M1)N=MN1+N(M-1)+\cdots+\binom{N}{k}(M-1)^{k}+\cdots+(M-1)^{N}=M^{N} (10)

so that 𝐗¯{\overline{\bf X}} is a square matrix.

III-B1 Constructing the ξ\xi Functions

In general, if the orthogonalised matrix is 𝐗¯{\overline{\bf X}} then the log-linear model becomes, from (6),

𝐓=𝐗¯𝜷{\bf T}={\overline{\bf X}}{\boldsymbol{\beta}} (11)

where 𝜷{\boldsymbol{\beta}} is the modified vector of structural parameters. The vector 𝐓{\bf T} therefore exists in a space

X¯^=X^0+k=1N=1(Nk)X^𝗰k{\widehat{\overline{X}}}=\widehat{X}^{0}+\sum_{k=1}^{N}\sum_{\ell=1}^{\binom{N}{k}}\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp} (12)

where 𝗰k\boldsymbol{\mathsf{c}}^{\ell}_{k} is the \ellth subset of attributes having cardinality kk in the lexicographic ordering specified above.

Unfortunately, whereas the elements of 𝜶{\boldsymbol{\alpha}} in (6) are associated directly with the corresponding interaction terms in 𝐗{\bf X} those in 𝜷{\boldsymbol{\beta}} are not. For example, the elements of 𝜶{\boldsymbol{\alpha}} corresponding to the first order interactions describe the structure of the interaction graph, while those associated with the first order interactions in 𝜷{\boldsymbol{\beta}} do not, at least not directly. This is because the coefficients associated with the X¯𝗰2\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{2}} sub matrix of 𝐗¯{\overline{\bf X}} in (11) are the coefficients of the basis functions spanning the subspace X^𝗰2\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{2}}_{\perp}.

However, because the matrix 𝐗¯\overline{\bf X} is partitioned into the submatrices, X¯𝗰k\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}, whose columns are the orthogonal basis vectors of the subspaces X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}, the vector 𝜷\boldsymbol{\beta} can be partitioned into subvectors as

𝜷=(β0𝜷11𝜷1N𝜷k1𝜷k(Nk))\boldsymbol{\beta}=(\beta_{0}\;\boldsymbol{\beta}_{11}\cdots\boldsymbol{\beta}_{1N}\cdots\ \cdots\boldsymbol{\beta}_{k1}\cdots\boldsymbol{\beta}_{k\binom{N}{k}}\cdots\ \cdots) (13)

with 𝜷k=(βk1βk(M1)k).\boldsymbol{\beta}_{k\ell}=(\beta_{k\ell 1}\;\cdots\;\beta_{k\ell(M-1)^{k}}). Then (11) becomes

𝐓=X¯0β0+k=1N=1(Nk)X¯𝗰k𝜷k.{\bf T}={\overline{X}}^{0}\beta_{0}+\sum_{k=1}^{N}\sum_{\ell=1}^{\binom{N}{k}}\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}\boldsymbol{\beta}_{k\ell}. (14)

Now each term of the outer summation in (14) does refer to a specific interaction, namely, the interaction between the members of 𝗰k\boldsymbol{\mathsf{c}}^{\ell}_{k}.

To connect with the ξ\xi functions of (4) recognise that a vector 𝝌k\boldsymbol{\chi}^{\perp}_{k\ell} representing the function ξ𝗰𝗸:|𝗰𝗸|=k,{1(Nk)}\xi_{\boldsymbol{\mathsf{c_{k}^{\ell}}}}:|\boldsymbol{\mathsf{c_{k}^{\ell}}}|=k,\ell\in\{1\cdots\binom{N}{k}\} exists in the subspace spanned by the basis vectors forming the submatrix X¯𝗰k{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}} of 𝐗¯{\overline{\bf X}} in (11). Furthermore, the subvector 𝜷kl{\boldsymbol{\beta}_{kl}} is associated with ξ𝗰𝗸\xi_{\boldsymbol{\mathsf{c_{k}^{\ell}}}}. In other words the function ξ𝗰𝗸\xi_{\boldsymbol{\mathsf{c_{k}^{\ell}}}} is represented by the linear combination

𝝌k=X¯𝗰k𝜷k.\boldsymbol{\chi}^{\perp}_{k\ell}={\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}\boldsymbol{\beta}_{k\ell}. (15)

Because the underlying subspaces are orthogonal, 𝝌k\boldsymbol{\chi}^{\perp}_{k\ell} is orthogonal to the vectors representing the other ξ\xi functions.

Exploiting the fact that 𝐗¯\overline{{\bf X}} is an orthogonal matrix, the vector 𝜷\boldsymbol{\beta} is obtained by premultiplying (11) by 𝐗¯𝖳\overline{\bf X}^{\mathsf{T}} to give

𝐗¯𝖳𝐓=𝐗¯𝖳𝐗¯𝜷=D𝐗¯𝖳𝜷\overline{\bf X}^{\mathsf{T}}{\bf T}=\overline{\bf X}^{\mathsf{T}}{\overline{\bf X}}{\boldsymbol{\beta}}=D_{\overline{\bf X}^{\mathsf{T}}}{\boldsymbol{\beta}}

where D𝐗¯𝖳D_{\overline{\bf X}^{\mathsf{T}}} is a diagonal matrix containing the squared magnitudes of the basis vectors and is readily inverted. Then

𝜷=D𝐗¯𝖳1𝐗¯𝖳𝐓\boldsymbol{\beta}=D^{-1}_{\overline{\bf X}^{\mathsf{T}}}\overline{\bf X}^{\mathsf{T}}{\bf T} (16)

and

𝜷k=D𝗰k1X¯𝗰k𝖳𝐓\boldsymbol{\beta}_{k\ell}=D^{-1}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}\bf{T} (17)

where D𝗰kD_{\boldsymbol{\mathsf{c}}_{k}^{\ell}} is a diagonal matrix containing the squared magnitudes of the basis vectors in the subspace X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}. From (15)

𝝌k=X¯𝗰kD𝗰k1X¯𝗰k𝖳𝐓\boldsymbol{\chi}^{\perp}_{k\ell}={\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}D^{-1}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}\bf{T} (18)

so that

|𝝌k|2\displaystyle|\boldsymbol{\chi}^{\perp}_{k\ell}|^{2} =\displaystyle= 𝐓𝖳X¯𝗰kD𝗰k1X¯𝗰k𝖳X¯𝗰kD𝗰k1X¯𝗰k𝖳𝐓\displaystyle{\bf T}^{\mathsf{T}}{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}D^{-1}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}D^{-1}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}\bf{T} (19)
=\displaystyle= 𝐓𝖳X¯𝗰kD𝗰k1X¯𝗰k𝖳𝐓\displaystyle{\bf T}^{{}^{\mathsf{T}}}{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}D^{-1}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}\bf{T}

because of orthogonality. This shows that the matrix in (18 is self adjoint and idempotent so that it is in fact a projector onto the subspace X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}. Consequently the magnitude of the function ξ𝗰𝗸\xi_{\boldsymbol{\mathsf{c_{k}^{\ell}}}} is the magnitude of the projection of 𝐓\bf{T} onto the subspace X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp} i.e. the subspace in which the vector representation of the function ξ𝗰𝒌\xi_{\boldsymbol{\mathsf{c}_{k}^{\ell}}} exists. In a sense it describes the magnitude of the contribution the population data in the table makes to the interaction represented by ξ𝗰𝒌\xi_{\boldsymbol{\mathsf{c}_{k}^{\ell}}}.

III-C The Structure of the Design Matrix

To explore the overall form of the design matrix X¯\overline{X} we return to the construction of the original form XX and the discussion following (7). The columns of the submatrix XCik,Ci1X^{C_{i_{k}},\cdots C_{i_{1}}} are numbered left to right by the radix MM number ν=(ak,a1)M\nu=(a_{k},\cdots a_{1})_{M} i.e. in the first column of the submatrix, ak=0a1=0a_{k}=0\cdots a_{1}=0 whereas in the second column ak=0a2=0a_{k}=0\cdots a_{2}=0 and a1=1a_{1}=1. There are therefore MkM^{k} columns in the submatrix where each is generated from the preceding column by counting in radix MM.

Lemma III.1.

Column ν=akMk1+a2M+a1\nu=a_{k}M^{k-1}\cdots+a_{2}M+a_{1} of the submatrix XCikCi1X^{C_{i_{k}}\cdots C_{i_{1}}} has the form

𝚇iki1(aka1)=[00akMik00ak1Mik100a1Mi111Mi10 0(M1a1)Mi1×Mi2i110 0(M1ak1)Mik1×Mikik110 0(M1ak)Mik×MNik1]𝖳.\begin{array}[]{ll}\mathtt{X}^{i_{k}\cdots i_{1}}(a_{k}\cdots a_{1})=\\ \left[\underbrace{\overbrace{0\cdots 0}^{a_{k}M^{i_{k}}}\underbrace{\overbrace{0\cdots 0}^{a_{k\!-\!1}M^{i_{k\!-\!1}}}\cdots\underbrace{\overbrace{0\cdots 0}^{a_{1}M^{i_{1}}}\overbrace{1\cdots 1}^{M^{i_{1}}}\overbrace{0\ \cdots\ 0}^{(M\!-\!1\!-\!a_{1})M^{i_{1}}}}_{\times M^{i_{2}-i_{1}-1}}\cdots\overbrace{0\ \cdots\ 0}^{(M\!-\!1\!-\!a_{k\!-\!1})M^{i_{k\!-\!1}}}}_{\times M^{i_{k}-i_{k\!-\!1}-1}}\overbrace{0\ \cdots\ 0}^{(M\!-\!1\!-\!a_{k})M^{i_{k}}}}_{\times M^{N-i_{k}-1}}\right]^{\mathsf{T}}.\end{array} (20)
Proof.

The column is initially all zeros, the table index is 𝖎=00\boldsymbol{\mathfrak{i}}=\langle 0\cdots 0\rangle and the index of the table vector 𝒯\mathcal{T} is r=(00)M=0r\!=\!(0\cdots 0)_{M}\!=\!0. Now begin incrementing the table index in a counting operation. A count of a1Mi1a_{1}M^{i_{1}} results in 𝔦i1=a1\mathfrak{i}_{i_{1}}=a_{1}, a count of a2Mi2a_{2}M^{i_{2}} sets 𝔦i2=a2\mathfrak{i}_{i_{2}}=a_{2} and so on until a final count of akMka_{k}M^{k} will set 𝔦ik=ak\mathfrak{i}_{i_{k}}=a_{k}. At this point the table index 𝖎\boldsymbol{\mathfrak{i}} points to the first cell where the attributes in 𝗰k\boldsymbol{\mathsf{c}}^{\ell}_{k} have the values given by aka1\langle a_{k}\cdots a_{1}\rangle and a one is entered in row r=akMik++a1Mi1r=a_{k}M^{i_{k}}+\cdots+a_{1}M^{i_{1}} (remembering that the row numbering starts at zero) of column ν\nu. There are then a total of rr zeros preceding this one in the column.

A further count of Mi1M^{i_{1}} is required before 𝔦i1\mathfrak{i}_{i_{1}} goes to (a1+1)M(a_{1}+1)_{M} so that there are Mi1M^{i_{1}} consecutive ones in the column. This is followed by a sequence of (M1a1)Mi1(M-1-a_{1})M^{i_{1}} zeros before all of the (𝔦i1𝔦0)(\mathfrak{i}_{i_{1}}\cdots\mathfrak{i}_{0}) indices are reset to zero. This whole first level pattern is then repeated until 𝔦i2\mathfrak{i}_{i_{2}} goes to (a2+1)M(a_{2}+1)_{M} i.e. the pattern of a1Mi1a_{1}M^{i_{1}} zeros, Mi1M^{i_{1}} ones and (M1a1)Mi1(M\!-\!1\!-\!a_{1})M^{i_{1}} zeros occurs M(i2i11)M^{(i_{2}-i_{1}-1)} times.

This set of repeated first level patterns extends the a2Mi2a_{2}M^{i_{2}} zeros from the initial count and this second level pattern is completed by the sequence of (M1a2)Mi2(M\!-\!1\!-\!a_{2})M^{i_{2}} zeros needed to set all of the (𝔦𝔦2𝔦0)(\mathfrak{i}_{\mathfrak{i}_{2}}\cdots\mathfrak{i}_{0}) indices to zero.

Continuing the count keeps repeating the second level pattern until 𝔦i3=a3\mathfrak{i}_{i_{3}}=a_{3} so that there are Mi3i21M^{i_{3}-i_{2}-1} repetitions of the second level pattern. A third level pattern is then completed by the sequence of (M1a3)Mi3(M\!-\!1\!-\!a_{3})M^{i_{3}} zeros needed to set all of the (𝔦i3𝔦0)(\mathfrak{i}_{i_{3}}\cdots\mathfrak{i}_{0}) indices to zero. Finally there is a kkth level pattern with MNik1M^{N\!-\!i_{k}\!-1\!} repetitions of the (k1)(k\!-\!1)th level pattern until the count is complete. ∎

Equation (20) can be expressed in the alternative form

𝚇iki1(aka1)=1Mk[0a2Mi10Mi1     0(M1a2)Mi1×a2Mi2i110a1Mi11Mi1     0(M1a1)Mi1×Mi2i110a2Mi10Mi1     0(M1a2)Mi1×(M1a2)Mi2i11×MNik1]𝖳.\begin{array}[]{ll}\mathtt{X}^{i_{k}\cdots i_{1}}(a_{k}\cdots a_{1})=\frac{1}{M^{k}}\>\huge{\cdot}\\ \!\!\!\!\left[\begin{array}[]{ll}\underbrace{\>\cdots\>\underbrace{\overbrace{0}^{a_{2}M^{i_{1}}}\overbrace{\!0}^{M^{i_{1}}}\overbrace{\;\;\;\;\;0\;\;\;\;\;}^{(M\!-\!1\!-\!a_{2})M^{i_{1}}}}_{\times a_{2}M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{0}^{a_{1}M^{i_{1}}}\overbrace{1}^{M^{i_{1}}}\overbrace{\;\;\;\;\;0\;\;\;\;\;}^{(M\!-\!1\!-\!a_{1})M^{i_{1}}}}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{0}^{a_{2}M^{i_{1}}}\overbrace{\!0}^{M^{i_{1}}}\overbrace{\;\;\;\;\;0\;\;\;\;\;}^{(M\!-\!1\!-\!a_{2})M^{i_{1}}}}_{\times(M-1-a_{2})M^{i_{2}-i_{1}-1}}\cdots}_{\times M^{N-i_{k}-1}}\end{array}\right]^{\mathsf{T}}\!\!\!\!.\end{array} (21)

Now consider the orthogonalisation process. With 𝚇i\mathtt{X}^{i} denoting a column of the submatrix XCiX^{C_{i}}, the first step obtains the projection of the column vector onto the orthogonal complement of the normalised constant vector X¯0\overline{X}^{0} as

𝚇i=𝚇i(X¯0𝚇i)X¯0.\mathtt{X}^{i}_{\perp}=\mathtt{X}^{i}-(\overline{X}^{0}\cdot\mathtt{X}^{i})\overline{X}^{0}. (22)

Applying (22) to the form (21) gives

𝚇iki1(aka1)=1Mk[1a2Mi11Mi11(M1a2)Mi1×a2Mi2i111a1Mi1Mk1Mi11(M1a1)Mi1×Mi2i111a2Mi11Mi11(M1a2)Mi1×(M1a2)Mi2i11×MNik1]𝖳.\begin{array}[]{ll}\mathtt{X}^{i_{k}\cdots i_{1}}_{\perp}(a_{k}\cdots a_{1})=\frac{1}{M^{k}}\>\huge{\cdot}\\ \left[\begin{array}[]{ll}\underbrace{\>\cdots\>\underbrace{\overbrace{-\!1}^{a_{2}M^{i_{1}}}\overbrace{\!-\!1}^{M^{i_{1}}}\overbrace{\;\;\;\;\;-\!1\;\;\;\;\;}^{(M\!-\!1\!-\!a_{2})M^{i_{1}}}}_{\times a_{2}M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{-\!1}^{a_{1}M^{i_{1}}}\overbrace{M^{k}\!\!-\!\!1}^{M^{i_{1}}}\overbrace{\;\;\;\;\;-\!1\;\;\;\;\;}^{(M\!-\!1\!-\!a_{1})M^{i_{1}}}}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{-\!1}^{a_{2}M^{i_{1}}}\overbrace{\!-\!1}^{M^{i_{1}}}\overbrace{\;\;\;\;\;-\!1\;\;\;\;\;}^{(M\!-\!1\!-\!a_{2})M^{i_{1}}}}_{\times(M-1-a_{2})M^{i_{2}-i_{1}-1}}\cdots}_{\times M^{N-i_{k}-1}}\end{array}\right]^{\mathsf{T}}\!\!\!\!.\end{array} (23)

Note that because (22) operates on rows only and the projection is onto the uniform vector, the hierarchical sequence of patterns in (21) is preserved in (23).

Using the case k=2k=2 with a1=a2=0a_{1}=a_{2}=0 to illustrate the process, the orthogonalisation represented by (22) to (23) results in 𝚇i2i1\mathtt{X}^{i_{2}i_{1}}_{\perp}. Then, the projection of this onto the orthogonal complement of the subspace formed by 𝚇¯i2\overline{\mathtt{X}}^{i_{2}}_{\perp} and X0X^{0} is obtained by again applying (22) resulting in 𝚇i2i1\mathtt{X}^{i_{2}i_{1}}_{\perp\perp}. Finally, projecting this onto the orthogonal complement of the subspace formed by 𝚇¯i1\overline{\mathtt{X}}^{i_{1}}_{\perp}, 𝚇¯i2\overline{\mathtt{X}}^{i_{2}}_{\perp} and X0X^{0} yields

𝚇i2i1(0,0)=1M2[(M1)2Mi1(M1)(M1)Mi1×Mi2i11(M1)Mi111(M1)Mi1×(M1)Mi2i11×M(Ni21)]𝖳.\begin{array}[]{ll}\mathtt{X}^{i_{2}i_{1}}_{\perp\perp\perp}(0,0)=\frac{1}{M^{2}}\boldsymbol{\cdot}\left[\begin{array}[]{ll}\underbrace{\underbrace{\overbrace{(M\!\!-\!\!1)^{2}}^{M^{i_{1}}}\overbrace{-\!(M\!\!-\!\!1)}^{(M\!-\!1)M^{i_{1}}}}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{-\!(M\!\!-\!\!1)}^{M^{i_{1}}}\overbrace{1\cdots 1}^{(M\!-\!1)M^{i_{1}}}}_{\times(M\!-\!1)M^{i_{2}-i_{1}-1}}}_{\times M^{(N-i_{2}-1)}}\end{array}\right]^{\mathsf{T}}\!\!.\end{array} (24)

The form of (24) and numerical calculations suggest the kk attribute generalisation

𝚇iki2i1(00)=1Mk[(M1)k×Mi1(M1)k1×(M1)Mi1×Mi2i11(M1)k1×Mi1(M1)k2×(M1)Mi1×(M1)Mi2i11×M(I3i21)(M1)k1×Mi1(M1)k2×(M1)Mi1×Mi2i11(M1)k2×Mi1(M1)k3×(M1)Mi1×(M1)Mi2i11×(M1)M(I3i21)×M(i4i31)(M1)k1×Mi1(M1)k2×(M1)Mi1×Mi2i11(M1)k2×Mi1(M1)k3×(M1)Mi1×(M1)Mi2i11×M(I3i21)(M1)k2×Mi1(M1)k3×(M1)Mi1×Mi2i11(M1)k3×Mi1(M1)k4×(M1)Mi1×(M1)Mi2i11×(M1)M(I3i21)×(M1)M(i4i31)×MNik1]𝖳.\begin{array}[]{ll}\mathtt{X}^{i_{k}\cdots i_{2}i_{1}}_{\perp\cdots\perp}(0\cdots 0)=\frac{1}{M^{k}}\boldsymbol{\cdot}\\ \\ \!\!\!\!\!\!\!\!\left[\begin{array}[]{ll}\underbrace{\underbrace{\overbrace{(M\!\!-\!\!1)^{k}}^{\times M^{i_{1}}}\;\;\overbrace{-\!(M\!\!-\!\!1)^{k-1}}^{\times(M\!-\!1)M^{i_{1}}}\;\;}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{-\!(M\!\!-\!\!1)^{k-1}}^{\times M^{i_{1}}}\;\;\overbrace{(M\!\!-\!\!1)^{k-2}}^{\times(M\!-\!1)M^{i_{1}}}}_{\times(M\!-\!1)M^{i_{2}-i_{1}-1}}}_{\times M^{(I_{3}-i_{2}-1)}}\\ \underbrace{\qquad\underbrace{\underbrace{\overbrace{-(M\!\!-\!\!1)^{k-1}}^{\times M^{i_{1}}}\;\;\overbrace{\!(M\!\!-\!\!1)^{k-2}}^{\times(M\!-\!1)M^{i_{1}}}\;\;}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{\!(M\!\!-\!\!1)^{k-2}}^{\times M^{i_{1}}}\;\;\overbrace{-(M\!\!-\!\!1)^{k-3}}^{\times(M\!-\!1)M^{i_{1}}}}_{\times(M\!-\!1)M^{i_{2}-i_{1}-1}}}_{\times(M\!-\!1)M^{(I_{3}-i_{2}-1)}}}_{\times M^{(i_{4}-i_{3}-1)}}\\ \qquad\qquad\underbrace{\underbrace{\overbrace{-(M\!\!-\!\!1)^{k-1}}^{\times M^{i_{1}}}\;\;\overbrace{\!(M\!\!-\!\!1)^{k-2}}^{\times(M\!-\!1)M^{i_{1}}}\;\;}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{\!(M\!\!-\!\!1)^{k-2}}^{\times M^{i_{1}}}\;\;\overbrace{-(M\!\!-\!\!1)^{k-3}}^{\times(M\!-\!1)M^{i_{1}}}}_{\times(M\!-\!1)M^{i_{2}-i_{1}-1}}}_{\times M^{(I_{3}-i_{2}-1)}}\\ \underbrace{\qquad\qquad\underbrace{\qquad\underbrace{\underbrace{\overbrace{(M\!\!-\!\!1)^{k-2}}^{\times M^{i_{1}}}\;\;\overbrace{\!-(M\!\!-\!\!1)^{k-3}}^{\times(M\!-\!1)M^{i_{1}}}\;\;}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{\!-(M\!\!-\!\!1)^{k-3}}^{\times M^{i_{1}}}\;\;\overbrace{(M\!\!-\!\!1)^{k-4}}^{\times(M\!-\!1)M^{i_{1}}}}_{\times(M\!-\!1)M^{i_{2}-i_{1}-1}}}_{\times(M\!-\!1)M^{(I_{3}-i_{2}-1)}}}_{\times(M-1)M^{(i_{4}-i_{3}-1)}}\;\;\mathbf{\cdots}}_{\times M^{N-i_{k}-1}}\end{array}\right]^{\mathsf{T}}.\end{array} (25)
Lemma III.2.

Column vector 𝚇iki2i1(00)\mathtt{X}^{i_{k}\cdots i_{2}i_{1}}_{\perp\cdots\perp}(0\cdots 0) of the submatrix X¯𝗰k{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}} is given by (25).

Proof.

Note that (25) is composed of basic units each of length Mi1M^{i_{1}} each individual unit having identical components. These are grouped into a sequence of first level modules each containing Mi1+1M^{i_{1}+1} components such that each of these modules sums to zero. In turn these first level modules are grouped into second level modules each containing Mi2+1M^{i_{2}+1} components and so on. This hierarchical construction is continued eventually resulting in kkth level modules each containing Mik1+1M^{i_{k-1}+1} components. There are then Nk1N\!-\!k\!-\!1 of these kk level modules containing a total of NN components. Each module is composed of two submodules 𝒮1\mathcal{S_{M}}_{1} and 𝒮2\mathcal{S_{M}}_{2} such that

𝒮1=(M1)𝒮2.\mathcal{S_{M}}_{1}=-(M-1)\mathcal{S_{M}}_{2}. (26)

Orthogonality of (25) over the (Nk)\binom{N}{k} vectors 𝐢k𝐬\mathbf{i}^{\mathbf{s}_{\ell}}_{k} can be seen by considering two values of \ell say 𝐢k𝐬\mathbf{i}^{\mathbf{s}_{\ell^{\prime}}}_{k}and 𝐢k𝐬′′\mathbf{i}^{\mathbf{s}_{\ell^{\prime\prime}}}_{k}. If i1<i1′′i_{1}^{\prime}<i_{1}^{\prime\prime} the length of the first level modules in vector 𝚇′′\mathtt{X}^{\prime\prime} is a multiple of the length of the first level modules in 𝚇\mathtt{X}^{\prime} so that the components of 𝚇′′\mathtt{X}^{\prime\prime} are uniform over every first level module of 𝚇\mathtt{X}^{\prime}. Consequently, in the scalar product of the two vectors, every first level module in the vector 𝚇\mathtt{X}^{\prime} sums to zero so that the vectors are orthogonal.

If i1=i1′′i_{1}^{\prime}=i_{1}^{\prime\prime}, there must be some jj where ijij′′i_{j}^{\prime}\neq i_{j}^{\prime\prime}. Suppose ij=ij′′1i_{j}^{\prime}=i_{j}^{\prime\prime}-1. For each j1j-1th module, in the scalar product of 𝚇\mathtt{X}^{\prime} and 𝚇′′\mathtt{X}^{\prime\prime}, the submodule 𝒮1\mathcal{S_{M}}_{1}^{\prime} is multiplied component by component by the submodule 𝒮1′′\mathcal{S_{M}}_{1}^{\prime\prime}. However each of the M1M-1 submodules 𝒮2\mathcal{S_{M}}_{2}^{\prime} is also multiplied by 𝒮1′′\mathcal{S_{M}}_{1}^{\prime\prime}. From (26),

𝒮1𝒮1′′=(M1)𝒮2𝒮1′′\mathcal{S_{M}}_{1}^{\prime}\cdot\mathcal{S_{M}}_{1}^{\prime\prime}=-(M-1)\mathcal{S_{M}}_{2}^{\prime}\cdot\mathcal{S_{M}}_{1}^{\prime\prime}

but there are M1M-1 of the 𝒮2𝒮1′′\mathcal{S_{M}}_{2}^{\prime}\cdot\mathcal{S_{M}}_{1}^{\prime\prime} product components so that the scalar product contribution of each (j1j-1)th module is zero and the vectors are orthogonal. Clearly this is true also when ij′′i_{j}^{\prime\prime} is any multiple of iji_{j}^{\prime}. In fact this argument is independent of kk so that all of the vectors 𝚇iki2i1(00)|k=1:N1\mathtt{X}^{i_{k}\cdots i_{2}i_{1}}_{\perp\cdots\perp}(0\cdots 0)|_{k=1:N-1}, including the main effects, are mutually orthogonal.

Because the orthogonalisation process of Section III-B begins with the leftmost column vectors of each submatrix, the remaining column vectors will be orthogonal to (25) so that (25) is the orthogonalised form of 𝚇iki1(aka1)\mathtt{X}^{i_{k}\cdots i_{1}}(a_{k}\cdots a_{1}). ∎

For 𝐚k=0,1\mathbf{a}_{k}=\langle 0,\cdots 1\rangle the column vector 𝚇iki1(0,1)\mathtt{X}^{i_{k}\cdots i_{1}}_{\perp\cdots\perp}(0,\cdots 1) lies in the orthogonal complement of the uniform vector and 𝚇iki1|(0,0)\mathtt{X}^{i_{k}\cdots i_{1}}_{\perp\cdots\perp}|(0,\cdots 0). Explicit derivation of the k=2k=2 case and numerical calculations suggest that the jjth second level module has the form

0×Mi1(M1)kj(M2)×Mi1(M1)kj×(M2)Mi1×Mi2i110×Mi1(M1)kj1(M2)×Mi1(M1)kj1×(M2)Mi1×(M1)Mi2i11×M(I3i21).\begin{array}[]{ll}\underbrace{\underbrace{\overbrace{0}^{\times M^{i_{1}}}\;\;\overbrace{(M\!\!-\!\!1)^{k\!-\!j}(M\!\!-\!\!2)}^{\times M^{i_{1}}}\;\overbrace{\;-(M\!\!-\!\!1)^{k\!-\!j}\;\;}^{\times(M\!-\!2)M^{i_{1}}}}_{\times M^{i_{2}-i_{1}-1}}\underbrace{\overbrace{0}^{\times M^{i_{1}}}\overbrace{-(M\!\!-\!\!1)^{k\!-\!j\!-\!1}(M\!\!-\!\!2)}^{\times M^{i_{1}}}\;\overbrace{(M\!\!-\!\!1)^{k\!-j\!-\!1}}^{\times(M\!-\!2)M^{i_{1}}}}_{\times(M\!-\!1)M^{i_{2}-i_{1}-1}}}_{\times M^{(I_{3}-i_{2}-1)}}.\end{array} (27)

Note that this has the same modular structure as (25) and again the first level modules sum to zero. Indeed the structures are sufficiently similar that the above orthogonality arguments can be invoked to show that (27) is orthogonal to (25) for all kk as well as to the corresponding 𝚇iki2i1(01)\mathtt{X}^{i_{k}^{\prime}\cdots i_{2}i_{1}}_{\perp\cdots\perp}(0\cdots 1) for kkk^{\prime}\neq k.

Because the origin of this critical modularity is (23), the orthogonalisation process applied to all of the remaining independent column vectors of X𝗰kX^{\boldsymbol{\mathsf{c}}_{k}^{\ell}} will result in the orthogonalised submatrice X¯𝗰k{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}} having the same modular structure as (25) and (27).

III-D Conditional Subtables and their Geometric Means

Each term of the log linear expansion (4), ξ𝗰𝒌\xi_{\boldsymbol{\mathsf{c}_{k}^{\ell}}}, involves the subset of kk attributes 𝖼k\mathsf{c}_{k}^{\ell} which raises the question of how to handle the remaining NkN\!-\!k. While marginalisation is the common approach, the nonlinear log\log transformation makes the result difficult to interpret. Instead we adopt the alternative approach in which the members of 𝖼\mathsf{c} determine a subtable conditioned on the remaining attributes.

Recall from Section III-B that the kk attributes associated with the submatrix X¯𝗰k{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}} are designated by 𝐢ks\mathbf{i}^{s_{\ell}}_{k} with the remaining NkN\!-\!k attributes designated by the vector 𝐢kg\mathbf{i}^{g_{\ell}}_{k}. Via (7), the column vectors of the submatrix have exactly the same indexing structure as the contingency table 𝐓\mathbf{T}, so, from (8), we can interpret an index vector of the form 𝖎\boldsymbol{\mathfrak{i}} as indexing a subtable designated by 𝐠k,\mathbf{g}^{k,\ell}. This leads to:

Definition III.1.

Let the attribute vector 𝐠k,\mathbf{g}^{k,\ell} be fixed whereas the attribute vector 𝐚k,\mathbf{a}^{k,\ell} is allowed to range over the MkM^{k} combinations of the attribute levels. Then the attribute index vector 𝐢gk,=ikgi1g\mathbf{i}^{g_{k,\ell}}=\langle i^{g}_{\ell_{k}}\cdots i^{g}_{\ell_{1}}\rangle specifies the conditioning attributes of the conditional subtable 𝐓𝐚k,|𝐠k,\mathbf{T}_{\mathbf{a}^{k,\ell}}|_{\mathbf{g}^{k,\ell}} with indices given by, from (8)

𝖎jsk,=𝐈ks𝐚k,+𝐈Nkg𝐠jk,\boldsymbol{\mathfrak{i}}^{s_{k,\ell}}_{j}=\mathbf{I}^{s_{\ell}}_{k}\mathbf{a}^{k,\ell}+\mathbf{I}^{g_{\ell}}_{N-k}\mathbf{g}^{k,\ell}_{j} (28)

where the subscript jj indicates that the index set is associated with the jjth combination of conditioning attributes.

Consequently, it is the conditioning attributes which are associated with the bottom level modules in the modular hierarchy of (25) and (27) with (21) ensuring that the elements in each of the bottom level modules are equal. Because of the row by row operation of the orthogonalisation process, the equality of the elements of these components is preserved by the orthogonalisation process even if the values of the elements end up differing from module to module. This is evident from (25) and (27). Furthermore each bottom level module represents one particular combination of attribute values.

We then have the following Lemma.

Lemma III.3.

Let 𝔛¯\overline{\mathfrak{X}} be the orthogonalised Mk×MkM^{k}\!\times\!M^{k} design matrix and let 𝚪k\boldsymbol{\Gamma}^{\ell}_{k} be the table of log transformed geometric means, over the conditioning attributes 𝐠k,\mathbf{g}^{k,\ell}, of corresponding entries in the MNkM^{N-k} conditional subtables 𝐓𝐚k,|𝐠k,\mathbf{T}\!_{\mathbf{a}^{k,\ell}}|_{\mathbf{g}^{k,\ell}}. Then, the magnitude of the projection of 𝚪k\boldsymbol{\Gamma}^{\ell}_{k} onto the subspace 𝔛^𝗰k\widehat{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}, 𝒳k\mathbf{\mathcal{X}}^{\perp}_{k\ell}, is given by

|𝒳k|=MNk2|𝝌k||\mathbf{\mathcal{X}}^{\perp}_{k\ell}|=M^{-\frac{N-k}{2}}|\boldsymbol{\chi}^{\perp}_{k\ell}| (29)

where, from (19), |𝛘k||\boldsymbol{\chi}^{\perp}_{k\ell}| is the magnitude of the projection of the originating table vector 𝐓\mathbf{T} onto the subspace X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\perp}.

Proof.

It is clear from (8) and the construction of the basis vectors that the 11’s in (20) constitute an indicator vector for the conditioning indices

𝖎jgk,=𝐈ks𝐚jk,+𝐈Nkg𝐠k,\boldsymbol{\mathfrak{i}}^{g_{k,\ell}}_{j}=\mathbf{I}^{s_{\ell}}_{k}\mathbf{a}^{k,\ell}_{j}+\mathbf{I}^{g_{\ell}}_{N-k}\mathbf{g}^{k,\ell} (30)

associated with the jjth combination of conditional attribute values as 𝐠k,\mathbf{g}^{k,\ell} ranges over the MNkM^{N-k} combinations of the conditioning attributes. The scalar product of the ν\nuth column of X¯𝗰k{\overline{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}} and the table vector 𝐓\mathbf{T} is composed of the sums of the MkM^{k} individual scalar products of the subtables 𝐓(𝖎jgk,)\mathbf{T}(\boldsymbol{\mathfrak{i}}^{g_{k,\ell}}_{j}) and the corresponding X¯ν𝗰k(𝖎jgk,)\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\nu}(\boldsymbol{\mathfrak{i}}^{g_{k,\ell}}_{j}), j{1Mk}j\in\{1\cdots M^{k}\}. However, because all of the elements of the latter are equal, each of the individual scalar products is simply the sum of the values in the particular subtable 𝐓(𝖎jgk,)\mathbf{T}(\boldsymbol{\mathfrak{i}}^{g_{k,\ell}}_{j}) multiplied by one of the elements of X¯ν𝗰k(𝖎jgk,)\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\nu}(\boldsymbol{\mathfrak{i}}^{g_{k,\ell}}_{j}).

The MkM^{k} values of 𝐚k,\mathbf{a}^{k,\ell} then generate the elements of an MkM^{k} dimensional column vector 𝔛¯ν𝗰k\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\nu} by sampling the column vector X¯ν𝗰k\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}_{\nu} at the entry indexed by (30) for every 𝐚jk,\mathbf{a}^{k,\ell}_{j}. This structure is shared by all (M1)k(M\!-\!1)^{k} columns of the submatrix X¯𝗰k\overline{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}} so the resultant vectors can be collected into the submatrix 𝔛¯𝗰k\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}. In fact it is clear from the proof of Lemma 20 and (21) that the sampling is performed by eliminating the repetition as the row index count proceeds from iji_{j} to ij+1i_{j+1} which requires the indices i1=0i_{1}=0 and Ij+1=Ij+1I_{j+1}=I_{j}+1 in the reduced matrix ensuring ik=k1i_{k}=k-1.

With 𝔇𝗰k=𝔛¯𝗰k𝖳𝔛¯𝗰k\mathfrak{D}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}=\overline{\mathfrak{X}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}, this enables the magnitude of the function ξ𝗰𝟮\xi_{\boldsymbol{\mathsf{c_{2}^{\ell}}}} to be expressed as, using (19),

|𝝌k|=M(Nk)2|𝔇𝗰k12𝔛¯𝗰k𝖳𝐠k,𝐓𝐚k,|𝐠k,|=M(Nk)2|𝔇𝗰k12𝔛¯𝗰k𝖳𝐠k,𝐓𝐚k,|𝐠k,M(Nk)|.|\boldsymbol{\chi}^{\perp}_{k\ell}|=M^{-\frac{(N-k)}{2}}\!\left|\mathfrak{D}^{-\frac{1}{2}}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{\mathfrak{X}}}^{{\boldsymbol{\mathsf{c}}_{k}^{\ell}}^{\mathsf{T}}}\sum_{\mathbf{g}^{k,\ell}}\mathbf{T}\!_{\mathbf{a}^{k,\ell}}|_{\mathbf{g}^{k,\ell}}\right|=M^{\frac{(N-k)}{2}}\left|\mathfrak{D}^{-\frac{1}{2}}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}}^{\mathsf{T}}\frac{\sum_{\mathbf{g}^{k,\ell}}\mathbf{T}\!_{\mathbf{a}^{k,\ell}}|_{\mathbf{g}^{k,\ell}}}{M^{(N-k)}}\right|. (31)

Inverting the log\log transform, the summation term in (31) becomes

𝒯k=(𝐠k,𝒯𝐚k,|𝐠k,)1M(N2)\mathcal{T}_{k}^{\ell}=\left(\prod_{\mathbf{g}^{k,\ell}}\mathcal{T}_{\mathbf{a}^{k,\ell}}|_{\mathbf{g}^{k,\ell}}\right)^{\frac{1}{M^{(N-2)}}} (32)

which is the geometric mean of the conditional subtable 𝒯𝐚k,|𝐠k,\mathcal{T}_{\mathbf{a}^{k,\ell}}|_{\mathbf{g}^{k,\ell}} over the conditioning attributes 𝐠k,\mathbf{g}^{k,\ell}. ∎

This leads to our main theoretical result:

Theorem III.1.

Let 𝔛¯0\overline{\mathfrak{X}}_{0} be the Mk0×Mk0M^{k_{0}}\times M^{k_{0}} design matrix and 𝚪k0\boldsymbol{\Gamma}^{\ell}_{k_{0}} be the log geometric mean of the conditional subtables 𝐓𝐚k0,|𝐠k0,\mathbf{T}\!_{\mathbf{a}^{k_{0},\ell}}|_{\mathbf{g}^{k_{0},\ell}} over the conditioning attributes 𝐠k0,\mathbf{g}^{k_{0},\ell}. Then the magnitude of the projection of 𝚪k0\boldsymbol{\Gamma}^{\ell}_{k_{0}} onto the subspace 𝔛^𝗰k{\widehat{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}_{\perp}} for kk0k\leq k_{0} is given by

|𝝌k|=M(Nk0)2|𝔇𝗰k12𝔛¯𝗰k𝖳𝚪k0|.|\boldsymbol{\chi}^{\perp}_{k\ell}|=M^{\frac{(N-k_{0})}{2}}\left|\mathfrak{D}^{-\frac{1}{2}}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}{\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}}^{\mathsf{T}}\boldsymbol{\Gamma}^{\ell}_{k_{0}}\right|. (33)

Furthermore the magnitude of the projection of the original table vector onto the Mk0M^{k_{0}} dimensional subspace X^𝗰k\widehat{X}^{\boldsymbol{\mathsf{c}}^{\ell}_{k}}, 𝛘k0\boldsymbol{\chi}_{k_{0}\ell}, orthogonal to the uniform vector is given by

|𝝌k0|=M(Nk0)2|𝔗𝐤𝟎||\boldsymbol{\chi}_{k_{0}\ell}|=M^{\frac{(N-k_{0})}{2}}\left|\bf{\mathfrak{T}}_{k_{0}\ell}\right| (34)

where 𝔗𝐤𝟎\bf{\mathfrak{T}}_{k_{0}\ell} is the magnitude of the projection of Γk0\Gamma^{\ell}_{k_{0}} onto the Mk0M^{k_{0}} dimensional subspace 𝔛^𝗰k{\widehat{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}} , orthogonal to the uniform vector.

Proof.

From (20), the number of 11s in a pre-orthogonalised column of X¯𝗰k0{\overline{X}}^{\boldsymbol{\mathsf{c}}_{k_{0}}^{\ell}} is MNk0M^{N\!-\!k_{0}} which is the number of combinations of the conditioning attributes 𝐠k0,\mathbf{g}^{k_{0},\ell}. For k=k0jk=k_{0}\!-\!j, jj attributes have been changed from conditional to conditioning so there are kk sets of k0jk_{0}\!-\!j attributes and, for each set, Nk0+jN\!-\!k_{0}\!+\!j conditioning attributes. Because of the additional conditioning attribute, the set of MNk0M^{N\!-\!k_{0}} 11s is expanded by a factor of MjM^{j} in a pattern determined by the form of (20). These 11s are transformed to equal elements in the orthogonalisation process.

As in the proof of Lemma III.3, the scalar product of 𝚪k0\boldsymbol{\Gamma}^{\ell}_{k_{0}} and a column of the submatrix 𝔛¯k𝗰k{\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}_{k}} is composed of sums of MjM^{j} terms from 𝚪k0\boldsymbol{\Gamma}^{\ell}_{k_{0}} each sum multiplied by an identical element of the submatrix column. Then

𝔛¯𝗰k𝖳𝚪k=𝔛¯𝗰k0𝖳𝚪k0/Mj.{\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}}^{\mathsf{T}}\boldsymbol{\Gamma}^{\ell}_{k}={\overline{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k_{0}}^{\ell}}}^{\mathsf{T}}\boldsymbol{\Gamma}^{\ell}_{k_{0}}/M^{j}.

With 𝔇𝗰k12=𝔇𝗰k012Mj2\mathfrak{D}^{-\frac{1}{2}}_{\boldsymbol{\mathsf{c}}_{k}^{\ell}}=\mathfrak{D}^{-\frac{1}{2}}_{\boldsymbol{\mathsf{c}}_{k_{0}}^{\ell}}M^{\frac{j}{2}} this gives (33). Equation (34) then follows because of the orthogonality of the subspaces means that |𝝌k0||\boldsymbol{\chi}_{k_{0}\ell}| and |𝔗𝐤𝟎|\left|\bf{\mathfrak{T}}_{k_{0}\ell}\right| are each given by the square root of the sum of the squares of the magnitudes of the respective projections onto the individual subspaces. ∎

The immediate implication of Theorem III.1 is that ‘marginalisation’ of a contingency table to reduce the number of attributes using a geometric mean preserves the interaction structure of the table up to the interactions between all of the conditional attributes.

There are also implications for the collapsibility [42] of the table although investigation of this aspect is beyond the scope of the current paper. Further implications for cyber security are considered in the Discussion, Section V, below.

III-E Probabilistic Salience

That the projection of a log table vector onto the subspace 𝔛^𝗰k{\widehat{\mathfrak{X}}^{\boldsymbol{\mathsf{c}}_{k}^{\ell}}} is orthogonal to the corresponding uniform vector suggests that the larger the magnitude of this projection is in proportion to the vector magnitude, the more the table entries are concentrated on a small number of cells. That is to say, the more salient are those cells. This motivates the following definition.

Definition III.2.

Let CsCC_{s}\subset C be a subset of the set of attributes CC and 𝒯s\mathcal{T}_{s} be a conditional subtable in the attributes CsC_{s}.

a)

If |χs||\chi_{s}| is the magnitude of the projection of the log linear transform, TsT_{s}, of 𝒯s\mathcal{T}_{s} onto the subspace orthogonal to the uniform subtable vector TsuT_{su}, then the Probabilistic Salience of the conditional subtable 𝒯s\mathcal{T}_{s} is

ψs=|χs||Ts|.\psi_{s}=\frac{|\chi_{s}|}{|T_{s}|}. (35)
b)

Let 𝒯gm\mathcal{T}_{gm} be the geometric mean of all of the conditional subtables in CsC_{s} generated by assigning values to the conditioning attributes CCsC\!\setminus\!C_{s} and TgmT_{gm} be its log linear transform. If |χgm||\chi_{gm}| is the magnitude of the projection of TgmT_{gm} onto the subspace orthogonal to the uniform subtable vector TgmuT_{gmu} then the Probabilistic Salience Function of the attribute subset CsC_{s} is

Ψgm=|χgm||Tgm|\Psi_{gm}=\frac{|\chi_{gm}|}{|T_{gm}|} (36)

Our proposal is that Probabilistic Salience is a measure of the degree to which a contingency table is dominated by or concentrated on a small number of attribute values, a proposal which we will vindicate in section III-G below.

III-F Potential Application - Interaction Limiting

While the organisational counter measures discussed in the Introduction provide one line of defence against social engineering attacks, we propose an alternative approach which employs the techniques developed in this paper in a different role to that which has been the focus thus far. It has been demonstrated multiple times that releases of statistics from de-identified databases can be processed to re-identify individuals. Differential privacy techniques introduce controlled noise into the data such that the resultant statistics have a bounded error but the risk of re-identification is significantly reduced. Application of these techniques to contingency table data is described in [43] using a very different form of log linear model to that employed here.

By analogy, the objective of what we refer to as ‘de-personalisation’ is to inhibit the inference of attribute values in vulnerable cases while retaining the overall character of the data. De-personalisation is performed by firstly calculating the log linear expansion coefficients 𝜷\boldsymbol{\beta} using (16), then setting 𝜷k=0\boldsymbol{\beta}_{k\ell}=0 when k>kk>k_{\dagger} for some kk_{\dagger}. Finally the modified data is reconstructed using (14). This limits the interaction to that between kk_{\dagger} attributes, in a sense, blurring the cell contents. We refer to this as interaction limiting the data by analogy with bandlimiting in Fourier transform based signal processing.

Suppose k<k0k_{\dagger}<k_{0} and suppose the marginalised data is publicly released in the form of k0k_{0} dimensional geometrical mean subtables 𝒯k0\mathcal{T}^{\ell}_{k_{0}} involving subsets of k0k_{0} attributes. Theorem III.1 ensures that the maximum interaction order of the released data is limited to kk_{\dagger}. It then follows from definition III.2 b via (36) that the Probabilistic Salience Function of the released data is less than it would have been had the data not been de-personalised.

However simple interaction limiting as described is a heavy handed method of de-personalisation. A more subtle approach would be to use the Probabilistic Salience Function to detect subsets of attributes with high interactivity via their geometric means as demonstrated in the top left panel of Figure 1. The interaction structure of high interactivity subsets could then be investigated using the reduced design matrix 𝔛\mathfrak{X} with Theorem III.1 providing the link back to the full table. High value interaction coefficients 𝜷\boldsymbol{\beta} (Section III-B1) can be selectively set to zero although some care is required to maintain the hierarchical nature of the expansion [41][42].

While this is necessarily a brief sketch of the proposal we believe that the theory described here provides a solid foundation for a detailed investigation and development of de-personalisation techniques. However this lies well beyond the scope of this paper.

III-G Probabilistic Salience as a Measure of Concentration

We make the basic assumption that the contingency table is a member of an ensemble of similar tables with the common characteristic that they represent the same population so that their respective entries sum to the population size say NTN_{T}. More formally this is equivalent to assuming that each table is drawn from a multinomial distribution although we will not make use of this fact. Each table vector 𝒯0\mathcal{T}^{0} is then a point on a simplex, 𝒮0\mathcal{S}^{0} [45], and can be expressed in the form

𝒯0=k=1MTλkVk0:k=1MTλk=1\mathcal{T}^{0}=\sum_{k=1}^{M_{T}}\lambda_{k}V^{0}_{k}\;\;\ :\;\;\sum_{k=1}^{M_{T}}\lambda_{k}=1

where λk,k{1MT}\lambda_{k},k\in\{1\;M_{T}\} are the components of a parameter vector Λ\Lambda and the MTM_{T} dimensional vertex vector Vk0=[00 0k1NT 00]𝖳V^{0}_{k}=[\underbrace{0\cdots 0\;0}_{k-1}\;N_{T}\;0\cdots 0]^{\mathsf{T}} so that =1MTτ0=NT.\sum_{\ell=1}^{M_{T}}\tau^{0}_{\ell}=N_{T}.

However, to avoid having to deal with -\infty issues caused by zero entries in the table, we will work with the vector 𝒯\mathbf{\mathcal{T}} derived from the affine mapping

V=NTMTNTV0+1V=\frac{N_{T}-M_{T}}{N_{T}}V^{0}+1

of 𝒮0\mathcal{S}^{0} which results in a new simplex 𝒮\mathcal{S}. This is the convex hull of its vertices Vk;k=1:MTV_{k}\;;\;k=1:M_{T} where

Vk=[11  1k1(NTMT+1)  11]𝖳V_{k}=[\underbrace{1\cdots 1\;\;1}_{k-1}\;(N_{T}-M_{T}+1)\;\;1\cdots 1]^{\mathsf{T}}

and

𝒯=[λ1(NTMT)+1λk(NTMT)+1λMT(NTMT)+1]𝖳.\mathcal{T}=\left[\lambda_{1}(N_{T}-M_{T})+1\;\cdots\;\lambda_{k}(N_{T}-M_{T})+1\;\cdots\;\lambda_{M_{T}}(N_{T}-M_{T})+1\right]^{\mathsf{T}}. (37)

It is easily verified that again =1MTτ=NT.\sum_{\ell=1}^{M_{T}}\tau_{\ell}=N_{T}.

Then the log transformed vector 𝐓\mathbf{T} has the form

𝐓=[log(λk(NTMT)+1)]𝖳\mathbf{T}=[\;\cdots\;\;\log(\lambda_{k}(N_{T}-M_{T})+1)\;\;\cdots\;]^{\mathsf{T}} (38)

with 0tklog(NTMT)0\leq t_{k}\leq\log(N_{T}-M_{T}).

The above proposal can be validated by finding a vector 𝐓\mathbf{T} such that the magnitude of the projection of 𝐓\mathbf{T} onto the subspace orthogonal to the uniform vector is maximised subject to the constraints that the components of 𝒯\mathcal{T} sum to NTN_{T} and the components of (38) are positive. However, rather than solve this constrained optimisation problem directly we will tackle an approximation as defined below - an approximation which improves as NTN_{T} increases.

The ensemble of vectors (38) can be represented in functional form by expressing, without loss of generality, the component t1t_{1} as a function of the remaining components with the λ\lambdas as parameters. From (37),

λk=etk1(NTMT)\lambda_{k}=\frac{e^{t_{k}}-1}{(N_{T}-M_{T})} (39)

which, with

τ1=λ1(NTMT)+k=1MTλk=(1k=2MTλk)(NTMT)+1\tau_{1}=\lambda_{1}(N_{T}-M_{T})+\sum_{k=1}^{M_{T}}\lambda_{k}=\left(1-\sum_{k=2}^{M_{T}}\lambda_{k}\right)\left(N_{T}-M_{T}\right)+1

leads to

t1=log(NTk=2MTetk):k=2MTetkNT.t_{1}=\log\left(N_{T}-\sum_{k=2}^{M_{T}}e^{t_{k}}\right)\quad:\quad\sum_{k=2}^{M_{T}}e^{t_{k}}\leq N_{T}. (40)

Now the exponential function is convex and the sum of exponential functions is convex so the negative of the sum of exponential functions is concave [44]. Adding NTN_{T} to the argument of the log\log function retains its concavity so, because the log\log function is itself concave, (40) is a concave function. The significance of this here is that the function (40) is topographically simple with no inconvenient local maxima and minima, no hidden valleys etc.

We can understand the ‘shape’ of this function by considering the transform of the vertices of the simplex 𝒮\mathcal{S} which are described by vectors of the λ\lambda parameters having the form

[0 0 1 0 0]𝖳.[0\cdots\;0\;1\;0\;\cdots\;0]^{\mathsf{T}}.

Then, from (38), the transformed simplex vertices have the form,

Tv=[0 0log(NTMT+1) 0 0]𝖳T_{v}=[0\cdots\;0\;\log(N_{T}-M_{T}+1)\;0\;\cdots\;0]^{\mathsf{T}}

suggesting that they be viewed as those vertices of a hypercube which are adjacent to the origin, with the hypercube having edges of length log(NTMT+1)\log(N_{T}-M_{T}+1). Each vertex of the hypercube is described by a vector TvhT^{h}_{v} which has some number, rr, of components equal to log(NTMT+1)\log(N_{T}-M_{T}+1) with the remainder zero.

Furthermore if we consider the full set of parameter vectors, Λr\Lambda_{r}, having rr nonzero components all equal to 1/r1/r for r=1:MT1r=1:M_{T}-1, we can see from (38) that the corresponding vector TpvT^{pv} has the same number of zeros and that the nonzero components are

log((NTMT)/r+1)log(NTMT+1).\log\left((N_{T}-M_{T})/r+1\right)\leq\log(N_{T}-M_{T}+1). (41)

We can then associate each of these vectors TpvT^{pv} with that vertex of the hypercube, ThT^{h}, which has corresponding nonzero components.

Again without loss of generality, select a vertex other than the origin, say VrhV^{h}_{r}, and let the vector VrhV^{h}_{r} have rr non zero components. Now select any two of those components, say the \ellth and jjth components thereby defining a coordinate plane containing the \ellth and jjth axes of the hypercube. Consider the curve formed by the intersection of the function (40) with a plane parallel to the coordinate plane defined by the \ellth and jjth axes and through the point TrpvT^{pv}_{r} in which the \ellth and jjth components are tt_{\ell} and tjt_{j}. We can assume that 1\ell\neq 1 and j1j\neq 1. If one of them is =1=1 or if t1=0t_{1}=0, then the following argument is simplified slightly but comes to the same result.

Because at a vertex, λk\lambda_{k} is either 1/r1/r or 0, from (39) etke^{t_{k}} is either NTMTr+1\frac{N_{T}-M_{T}}{r}+1, ete^{t_{\ell}}, etje^{t_{j}} or e0e^{0}. The curve is then described, from (38) and (40), by

log((NTMT)r+1)=log[(NT((r3)(NTMTr+1)+et+etj+(MTr)e0)]\begin{array}[]{l}\log\left(\frac{(N_{T}-M_{T})}{r}+1\right)=\\ \hskip 42.67912pt\log\left[(N_{T}-\left((r-3)(\frac{N_{T}-M_{T}}{r}+1)+e^{t_{\ell}}+e^{t_{j}}+(M_{T}-r)e^{0}\right)\right]\end{array}

which leads to

t=log[2(NTMTr+1)etj]=log(Aetj).t_{\ell}=\log\left[2\left(\frac{N_{T}-M_{T}}{r}+1\right)-e^{t_{j}}\right]=\log\left(A-e^{t_{j}}\right). (42)

The standard formula for the curvature κ\kappa of y=f(x)y=f(x) is

κ=|y′′|(1+(y)2)32.\kappa=\frac{|y^{\prime\prime}|}{\left(1+(y^{\prime})^{2}\right)^{\frac{3}{2}}}.

Differentiating, assuming y′′0y^{\prime\prime}\neq 0, gives,

dκdx=1(1+(y)2)32(y′′′3yy′′21+(y)2)\frac{d\kappa}{dx}=\frac{1}{\left(1+(y^{\prime})^{2}\right)^{\frac{3}{2}}}\left(y^{\prime\prime\prime}-\frac{3y^{\prime}y^{\prime\prime 2}}{1+(y^{\prime})^{2}}\right) (43)

Then, with t=yt_{\ell}=y and tj=xt_{j}=x in (42)

y=exAex,y′′=(exAex+e2x(Aex)2)y^{\prime}=\frac{-e^{x}}{A-e^{x}},\;\;\;\;\;y^{\prime\prime}=-\left(\frac{e^{x}}{A-e^{x}}+\frac{e^{2x}}{(A-e^{x})^{2}}\right)

and

y′′′=(exAex+3e2x(Aex)2+2e3x(Aex)3).y^{\prime\prime\prime}=-\left(\frac{e^{x}}{A-e^{x}}+3\frac{e^{2x}}{(A-e^{x})^{2}}+2\frac{e^{3x}}{(A-e^{x})^{3}}\right).

At the point TrpvT^{pv}_{r} (from (38))

x=y=tj=log(NTMTr+1)=logA2x=y=t_{j}=\log\left(\frac{N_{T}-M_{T}}{r}+1\right)=\log\frac{A}{2}

which results in y=1y^{\prime}=-1, y′′=2y^{\prime\prime}=-2, and y′′′=6y^{\prime\prime\prime}=-6 so that the differential of the curvature is, from (43)

dκdx=0.\frac{d\kappa}{dx}=0. (44)

At TrpvT^{pv}_{r}, κ=212\kappa=2^{-\frac{1}{2}}. Setting etj=λj(NTMT)+1e^{t_{j}}=\lambda_{j}(N_{T}-M_{T})+1 from (38) it is evident that as λj\lambda_{j}, say, decreases from 1/r1/r, the magnitude of the numerator of yy^{\prime} decreases whereas the denominator increases so that |y||y^{\prime}| decreases from unity. Now y′′=y(1y)y^{\prime\prime}=y^{\prime}(1-y^{\prime}) and the curvature can be expressed as

κ=y(1y)(1+y2)32.\kappa=\frac{y^{\prime}(1-y^{\prime})}{(1+y^{\prime 2})^{\frac{3}{2}}}.

For |y|<<1|y^{\prime}|<<1, |κ||y|<1/2|\kappa|\approx|y^{\prime}|<1/\sqrt{2} which, with (44) demonstrates that the point TrpvT^{pv}_{r} is the point of maximum curvature of the curve (42).

This argument holds for any of the (r2)\binom{r}{2} curves defined by a pair of non-zero components from the vector Λr\Lambda_{r}, these curves representing mutually orthogonal two dimensional cross sections of the polytope through the vertex TrpvT^{pv}_{r}. The function (40) in the vicinity of TrpvT^{pv}_{r} can therefore be visualised as a rounded version of the vertex TrhT^{h}_{r} into which it fits. Note that the curvature is independent of the scaling factors of the function, NTN_{T} and MTM_{T}. Consequently as the size of the hypercube increases with NTN_{T} the curvature remains constant as the function (40) expands implying that the function fits deeper into the corner formed by the vertex i.e. that TrhT^{h}_{r} becomes a better approximation to TrpvT^{pv}_{r}. We will refer to TrpvT^{pv}_{r} as pseudo vertices.

At the transformed vertices of the simplex, r=1r=1 and can we see from (41) that the function coincides with the hypercube at those vertices, i.e. at T1hT^{h}_{1}. All of this leads to the conclusion that the function (40) is well approximated by the relative boundary of the hypercube excluding those faces containing the origin.

However, as described in Section III-B, our measure of probabilistic structure is derived from the projection of a log\log transformed contingency table vector onto the orthogonal subspace, X^C0\widehat{X}^{C_{0}}_{\perp}, of the constant vector X^0\widehat{X}^{0}. We will now approximate that projection by first approximating TT with a point on the hypercube and then projecting that point onto X^C0\widehat{X}^{C_{0}}_{\perp} noting that the projection of the entire hypercube is a convex polytope PP. For example in three dimensions, the projection of a cube onto X^C0\widehat{X}^{C_{0}}_{\perp} is a regular hexagon. To simplify the notation we will work with the unit hypercube having vertices Trh¯T^{\bar{h}}_{r} with the understanding that the final results are to be scaled by log(NTMT+1)\log(N_{T}-M_{T}+1).

The projection of the vector Trh¯T^{\bar{h}}_{r} onto X^0\widehat{X}^{0} is of length r/MTr/\sqrt{M_{T}} and subtracting the projection 𝟏^r/MT\hat{\mathbf{1}}\cdot r/M_{T} from TrhT^{h}_{r} results in TrpT^{p}_{r}, the projection onto X^C0\widehat{X}^{C_{0}}_{\perp}. This has rr components equal to 1r/MT1-r/M_{T} and MTrM_{T}-r components equal to r/MT-r/M_{T}. Its magnitude squared is

𝖫2(Trp)=r(1r/MT)2+(MTr)r2/MT2=r(1r/MT).\mathsf{L}^{2}(T^{p}_{r})=r(1-r/M_{T})^{2}+(M_{T}-r)r^{2}/M^{2}_{T}=r(1-r/M_{T}). (45)

The squared magnitude of Trh1T^{h1}_{r} is rr so that the Probabilistic Salience is, from (35),

ψ=1r/MT\psi=\sqrt{1-r/M_{T}} (46)

Now consider any two vertices of the hypercube other than the all ones vertex and the all zeros vertex. Let these be Trh0T^{h0}_{r} and Tsh1T^{h1}_{s} and their projections Trp0T^{p0}_{r} and Tsp1T^{p1}_{s}. It is clear that there must be some component, say the \ellth, of Trp0T^{p0}_{r} which is 1r/MT1-r/{M_{T}} whereas the \ellth component of Tsp1T^{p1}_{s} is r/MT-r/{M_{T}}, so that the \ellth term in the scalar product of Trp0T^{p0}_{r} and Tsp1T^{p1}_{s} is negative whereas the corresponding term in Trp0𝖳Trp0{T_{r}^{p0}}^{\mathsf{T}}\cdot T^{p0}_{r} is positive. Consequently

Trp0𝖳Tsp1<𝖫2(Trp0){T_{r}^{p0}}^{\mathsf{T}}\cdot T^{p1}_{s}<\mathsf{L}^{2}(T^{p0}_{r}) (47)

The line 0Trp00-T^{p0}_{r} is normal to a hyperplane H0H^{0} such that Trp0H0T^{p0}_{r}\in H^{0} and (47) shows that this hyperplane bounds a closed halfspace k0k^{0} which contains all of the other vertex projections.

An m-dimensional hypercube is constructed from hypercubes of lower dimension so its smallest faces are points (vertices), lines (edges) and squares. Select two of the hypercube vertices which are adjacent to some vertex Trh0|0<r<MTT^{h0}_{r}|0<r<M_{T} by taking the complements of, say, the \ellth and jjth components respectively. Assume for the sake of being definite that the \ellth component of Trh0T^{h0}_{r} is a one and the jjth component is a zero so that taking the complement of the \ellth component gives the vertex Tr1h1T^{h1}_{r-1} and the complement of the jjth component gives the vertex Tr+1h2T^{h2}_{r+1}. Then define a fourth vertex adjacent to both Tr1h1T^{h1}_{r-1} and Tr+1h2T^{h2}_{r+1} by complementing the jjth component of Tr1h1T^{h1}_{r-1} and the \ellth component of Tr+1h2T^{h2}_{r+1} to give the Trh3T^{h3}_{r}.

These four vertices define a 2-face of the hypercube which is a two dimensional polytope, in fact a square, and, because the hypercube vertices are the extreme points of the hypercube, these four vertices are the extreme points of the face ([45], Theorem 7.3). The face is then the convex hull of these four vertices ([45],Theorem 7.2) i.e. the set of all convex combinations of these vertices taken three at a time ([45] Corollary 5.11) .

Now let Trp0T^{p0}_{r}, Tr1p1T^{p1}_{r-1}, Tr+1p2T^{p2}_{r+1}, and Trp3T^{p3}_{r} be the projections of Trh0T^{h0}_{r}, Tr1h1T^{h1}_{r-1}, Tr+1h2T^{h2}_{r+1} and Trh3T^{h3}_{r}. Define Tr1pcT^{pc}_{r-1} as the midpoint of the line segment {Trp0,Trp3}\{T^{p0}_{r},T^{p3}_{r}\}. This is also the midpoint of the line segment {Tr1p1,Tr+1p2}\{T^{p1}_{r-1},T^{p2}_{r+1}\} implying that the points Trp0T^{p0}_{r}, Tr1p1T^{p1}_{r-1}, Tr+1p2T^{p2}_{r+1}, and Trp3T^{p3}_{r} all lie on a plane which is, in fact, the projection of the corresponding hypercube face as expected from the linearity of the projection operation. Furthermore, because the projection is a linear operation, any point, TvT^{v}, on the projection of the face in question is a convex combination of Trp0T^{p0}_{r}, Tr1p1T^{p1}_{r-1}, Tr+1p2T^{p2}_{r+1}, and Trp3T^{p3}_{r} i.e, for λ1,λ2,λ3,λ40\lambda_{1},\lambda_{2},\lambda_{3},\lambda_{4}\geq 0 and λ1+λ2+λ3+λ4=1,\lambda_{1}+\lambda_{2}+\lambda_{3}+\lambda_{4}=1,

Tv=k=14λkTpk.T^{v}=\sum_{k=1}^{4}\lambda_{k}T^{pk}_{\centerdot}.

Then

Trp0𝖳Tv=k=14λkTrp0𝖳Tpk<𝖫2(Trp0)=𝖫2(Trp3).{T^{p0}_{r}}^{\mathsf{T}}\cdot T^{v}=\sum_{k=1}^{4}\lambda_{k}{T^{p0}_{r}}^{\mathsf{T}}\cdot T^{pk}_{\centerdot}<\mathsf{L}^{2}(T^{p0}_{r})=\mathsf{L}^{2}(T^{p3}_{r}). (48)

Consequently the whole projected 2-face is contained in the halfspace kk. This argument can be repeated for all of the other 2-faces of the hypercube of which Trh0T^{h0}_{r} is a member establishing that kk is a supporting halfspace and H0H^{0} is a supporting hyperplane of PP. Furthermore H0P={Trp0}H^{0}\cap P=\{T^{p0}_{r}\} demonstrating that Trp0T^{p0}_{r} is an extreme point of PP i.e. a vertex. So we can conclude that the projections of the vertices of the hypercube excluding the all ones and all zeros vertices are in fact the vertices of the polytope PP.

In the context of an optimisation problem, |Tv|2|T^{v}|^{2} is a convex function with the projected 2-face acting as the feasible set and, because there are no distinct local minima or maxima, it attains its global minimum and maximum on the relative boundary of the projected 2-face ([44]). Indeed, it follows from [44] equation (4.21) that the maximum must occur at some vertex of the projected 2-face and the minimum at another.

Let TvνT^{v\nu} be a point on projected 2-face ν\nu and TvμT^{v\mu} be a point on a distinct projected 2-face μ\mu. Furthermore let the maximum length vertices be the projections of hypercube vertices with rν+r^{+}_{\nu} and rμ+r^{+}_{\mu} respectively and the minimum length vertices be the projections of hypercube vertices with rνr^{-}_{\nu} and rμr^{-}_{\mu} respectively. Then if rν+<rμr^{+}_{\nu}<r^{-}_{\mu}, from (46), ψν>ψμ\psi_{\nu}>\psi_{\mu}. This demonstrates that Probabilistic Salience is a strong indication of the degree to which the table entries are concentrated on a few cells vindicating our proposal above.

IV Computational Results

To investigate the type of result that the process can produce, Australian Census data was obtained from the Australian Bureau of Statistics (ABS) using their DATALAB facilities. Security restrictions imposed by the ABS limited us to 9 attributes and then only with a high level of aggregation, the aggregation scheme being designed by us. We limited the analysis to seven attributes resulting in 36 distinct cases i.e. nine attributes taken seven at a time. Comparison of the results of applying our algorithm to these 36 cases gave us a good sense of how it responded to changes in the attribute combination. One of the better cases is presented in Fig. 1 where we focus on bivariate conditional subtables.

While we acknowledge that this is a very small data set by data processing standards, we stress that the purpose is to provide a simple illustration of the operation of our process to enhance understanding rather than to explore large scale structure. The data set is bland in the sense that each attribute interacts strongly with every other attribute but we believe that this bland data set provides a sensitive test of the model’s ability to discriminate between similar levels of structure.

The top left hand panel in each figure provides a visual bar graph comparison of the Probabilistic Salience Function values, Ψ\Psi, for the geometric mean conditional subtables of each pair of attributes. In spite of the blandness of the data and the limitation to pairwise interactions, the bar graph indicates that Probabilistic Salience is able to clearly differentiate between subsets.

The top right panel displays, for the minimum and maximum values of Ψ\Psi, histograms of the Probabilistic Salience ψ\psi of individual conditional subtables over the set of conditioning attributes. In each case the ψ\psi values are reasonably well clustered around the Ψ\Psi values of the respective geometric mean although there is significant dispersion toward higher values in each case. The reason for this becomes is revealed by the remaining panels which show three dimensional histograms of the bivariate conditional subtable entries for the conditioning attribute values listed. For the minimum and maximum Ψ\Psi values these are for the subtable closest to the geometric mean (middle) and for the maximum ψ\psi (bottom).

Numerical checks confirm the visual impression that the maximum ψ\psi values are generated by subtables with (originally) many zero entries - indeed the bottom left subtable with the highest ψ\psi has only a single non zero entry. Although this is, somewhat ironically, associated with the minimum Ψ\Psi, it nevertheless demonstrates the effectiveness of Probabilistic Salience in detecting highly concentrated subtables.

Refer to caption
Figure 1: The top left panel shows the Probabilistic Salience Function values, Ψ\Psi, of the geometric means of the conditional subtables for each attribute pair. At top right are histograms of the Probabilistic Salience values, ψ\psi, of individual the conditional subtables for maximum and minimum Ψ\Psi over the range of conditioning values for the remaining attributes. The remaining panels show three dimensional histograms of the subtables as indicated.

V Summary and Discussion

What emerges from the various perspectives discussed in the Introduction is that, essentially, social engineering is the extension into the cyber realm of the long standing crime of confidence trickery. Like the confidence trickster, the social engineer exploits psycho-social vulnerabilities in deceiving and manipulating his victims. In light of the failure by the Law to eliminate traditional confidence trickery, the expectation that security technology will defeat social engineering appears to be unreasonably optimistic. The underlying goal of social engineering, in fact, is to bypass security measures such as authentication by focusing on the human element. Indeed authentication mechanisms themselves are susceptible to exploitation [9][10].

In response, defence strategies to counter social engineering attacks have concentrated on strengthening the human element by providing strong security policies backed up by ongoing security awareness and resistance training. A complementary approach is to increase the difficulty of social engineering by e.g. devising more robust SMS messages in two factor authentication [10] or by detecting exploitable personal information spread across social media and reducing its vulnerability. A third strategy, which is of particular interest to us, is developing technical aids to provide assistance in detecting and rebuffing an attack such as the Topic Blacklist system described in [11].

We focus on those areas of social engineering which involve impersonation based on pretexting where a targeted employee, such as a call centre operator or IT support person, is persuaded that the social engineer is the intended victim in order to obtain elements of the victim’s critical identifying information. This requires the social engineer to create a fictitious scenario, the pretext, based on sufficient personal information about the victim to convince the target of its veracity. Note that because the target does not know everything about the victim, the information does not have to be entirely accurate but the scenario as a whole does need to be coherent and plausible.

The underlying issue is that the social engineer is unlikely to be able to assemble enough factual information to construct a sufficiently complete profile of the victim to provide that sense of coherence. However enough gaps can be filled in by inferring missing information from general background data that the social engineer can acquire through thorough research. We have argued in Section II-C that the form of inference is analogical taking place in an explanatory context. However the psychological evidence suggests that this rests upon a sense of statistical relevance associated with the background data meaning, roughly, that some things appear intuitively more likely than others. We argue that this sense of statistical relevance is grounded in a perhaps informal knowledge of statistical data.

Our central point is that some elements, i.e. attribute values, of a victim’s profile are much more likely to have been inferred than others and these attributes need to be identified as unreliable in the sense that they do not add significantly to the profile’s veracity and so should be discounted. Further, an estimate of reliability can be obtained by analysing the formal statistical data relevant to the problem based on identifying subsets of the profile which have high probability given the other elements of the profile.

We assume that the data is in the form of a multivariate contingency table (Section III-A) so that the mathematical problem is that of analysing conditional association in the table using a log linear technique [46]. In this case we use a very specific form of log linear model based on an orthogonal transformation of the logarithms of the table entries into an ‘interaction space’ which reveals the statistical interactions between the data elements. The magnitude of the transformed table vector is the basis of our definitions of Probabilistic Salience and the Probabilistic Salience Function (Section III-E).

Estimating the reliability of profile attributes is then the problem of identifying subsets of profile attributes having values which occur frequently in the population. Consequently these values can be inferred probabilistically rendering the attributes unreliable. This becomes the problem of identifying conditional subtables of the contingency table in which the subtable contents are concentrated in a small number of cells.

Because there is no simple topological relationship between these cells in general, conventional measures of centrality such as variance are not relevant leading us to define our Probabilistic Salience measures. This, we show, does indeed provide an indication of concentration in this sense (Section III-G). Our computational results, while based on a limited data set, do support these theoretical results.

The Probabilistic Salience Function, then, is an indicator of the potential for exploitation of informal statistical information by social engineers in impersonating an individual. This can be applied to various subsets of attributes to detect vulnerabilities and can be used to warn potential targets such as operators in call centres that particular attributes are unreliable indicators of identity.

The underlying theme of the literature review in Section 1.1 is that the most effective method of defending against social engineering is to maximise the support provided to the humans who face these attacks. We envisage that the probabilistic salience measures can be used, not only as a stand alone warning, but also as a component of a more comprehensive warning system. This would include other forms of social engineering detector such as the Topic Blacklist [11] or indicators extracted from the knowledge graph of [12] as discussed there. Indeed our proposed probabilistic salience indicator could conceivably be used to enhance the automatic social engineering vulnerability scanner described in [14].

The principal objective of the probabilistic salience indicator is to alert the target to the presence, in the information presented by the attacker, of information that could have been inferred rather than be reliable factual information. In the context of the RISP model of information processing, this should increase the target’s information insufficiency level to the point where the target is driven to process the attacker’s claims systematically rather than heuristically, significantly increasing the chances of detecting the flaws and inconsistencies that would indicate the claims are false.

It is conceivable that RISP concepts could be incorporated into security awareness training [8], [6], [11] [9] in order to enhance a target’s response to a probabilistic salience alert by increasing the target’s information sufficiency threshold [23]. This would increase the target’s information insufficiency level thereby making the target more sensitive to the probabilistic saliency alert as well as other forms of alert. Indeed because we have focussed on only one aspect of RISP, it seems possible that other aspects could also be incorporated into security awareness training suggesting that this could be a fruitful area of future research.

Finally we recognise that our Probabilistic Salience measures are a double edged sword in the sense that social engineers could use the same process to find inferable attribute values to exploit. This leads to the suggestion that contingency table data of the type used here could be ‘sanitised’ similarly to data bases modified using differential privacy techniques. In this case we speculate in Section III-F that the sensitivity of contingency table data to inference could be reduced by limiting the magnitude of the transformed table vector in ‘interaction space’ thus reducing the Probabilistic Salience measures without significantly distorting the data.

VI Acknowledgment

The financial support for GS as well as the provision of meeting facilities by CSIRO Data 61 is gratefully acknowledged. We also appreciate the comments of the anonymous reviewers which resulted in a substantial improvement to the initial submission.

References

  • [1] Z. Wang , L. Sun, and H. Zhu, “Defining Social Engineering in Cybersecurity”, IEEE Access, Vol. 8, DOI: 10.1109/ACCESS.2020.2992807, (2020).
  • [2] Z. Alkhalil, C. Hewage, L. Nawaf, I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy”, Frontiers in Computer Science, Vol. 3, 2021. https://www.frontiersin.org/articles/10.3389/fcomp.2021.563060, DOI=10.3389/fcomp.2021.563060, ISSN=2624-9898.
  • [3] Z. Wang, H. Zhu, L. Sun, “Social Engineering in Cybersecurity: Effect Mechanisms, Human Vulnerabilities and Attack Methods”, IEEE Access, Vol. 9, 2021, DOI: 10.1109/ACCESS.2021.3051633.
  • [4] I. Ghafir, V. Prenosil, A. Alhejailan, and M. Hammoudeh, “Social Engineering Attack Strategies and Defence Approaches,” in Proc. IEEE 4th Int. Conf. Future Internet Things Cloud (FiCloud), Aug. 2016, pp. 145–149.
  • [5] R. E. Indrajit, “Social Engineering Framework: Understanding the Deception Approach to Human Element of Security,” Int. J. Comput. Sci. Issues, vol. 14, no. 2, pp. 8–16, 2017.
  • [6] W. Fan, K. Lwakatare, and R. Rong, “Social Engineering: I-E Based Model of Human Weakness to Investigate Attack and Defense,” SCIREA J. Inf. Sci. Syst. Sci., vol. 1, no. 2, pp. 34–57, 2016.
  • [7] A. D. B. Vizzotto, A. M. de Oliveira, H. Elkis, Q. Cordeiro, and P. C. Buchain, “Psychosocial Characteristics”, in: M.D. Gellman, J.R. Turner, (eds) Encyclopedia of Behavioral Medicine. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1005-9_918, 2013.
  • [8] J.Saleem and M. Hammoudeh, “Defense Methods Against Social Engineering Attacks”, in: K. Daimi (ed), Computer and Network Security Essentials, DOI:10.1007/978-3-319-58424-9, 2017.
  • [9] R. A. Grimes,“Social Engineering Attacks”, in; Hacking Multifactor Authentication, First Edition, John Wiley and Sons, 2021.
  • [10] H. Siadati, T. Nguyen, P. Gupta, M. Jakobsson, and N. Memon, ”Mind your SMSes: Mitigating Social Engineering in Second Factor Authentication”, Computers and Security, 68, September 2016.
  • [11] R. Bhakta and I. G. Harris, “Semantic Analysis of Dialogs to Detect Social Engineering Attacks”, Proc. IEEE 9th International Conference on Semantic Computing (IEEE ICSC), Feb 7-9 20I5, pp. 424-427.
  • [12] Z. Wang, H. Zhu, P. Liu, and L. Sun, “Social Engineering in Cybersecurity: a Domain Ontology and Knowledge Graph Application Examples”, Cybersecurity, 4:31, https://doi.org/10.1186/s42400-021-00094-6, 2021.
  • [13] M. Alrubaian et. al., “Credibility in Online Social Networks: A Survey”, IEEE Access, Vol. 7, January 2019, DOI 10.1109/ACCESS.2018.2886314.
  • [14] M. Edwards, R. Larson, B. Green, A. Rashid, A. Baron, “Panning for gold: Automatically analysing online social engineering attack surfaces”, Computers & Security 69, 18–34 (2017)
  • [15] C. N. Wathen, J. Burkell, “Believe It or Not: Factors Influencing Credibility on the Web”, J. American Society For Information Science And Technology, Vol. 53, No. 2, 2002, pp134-144.
  • [16] C. Farkas and S. Jajodia, ”The Inference Problem: A Survey”, SIGkDD Explorations, Vol. 4, Issue 2, 2002
  • [17] M. Guarnieri, S. Marinovic, D. Basin, ”Securing Databases from Probabilistic Inference”, 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, 2017, pp. 343-359, DOI: 10.1109/CSF.2017.30.
  • [18] K. Kenthapadi, N. Mishra, K. Nissim, ”Simulatable Auditing”, ACM PODS 2005 June 13-15, 2005, Baltimore, Maryland.
  • [19] A. Evfimievski, R. Fagin, and D. Woodruff, “Epistemic privacy”, J. ACM 58, 1, Article 2, (December 2010), 45 pages. DOI: 10.1145/1870103.1870105, http://doi.acm.org/10.1145/1870103.1870105.
  • [20] A. Algarni, Y. Xu, & T. Chan, “Social Engineering in Social Networking Sites: the Art of Impersonation”, In P. Hung, E. Ferrari, & R. Kaliappa, (Eds.) Proc. 11th IEEE International Conference on Services Computing, pp. 797-804, (2014).
  • [21] D. Pehlivanoglu, et. al., “The role of analytical reasoning and source credibility on the evaluation of real and fake full-length news articles”, Cogn. Research, vol. 6, No. 24, 2021, https://doi.org/10.1186/s41235-021-00292-3.
  • [22] W. Zang and S.A. Watts, “Capitalizing on Content: Information Adoption in Two Online Communities”, JAIS, Vol. 9, Issue 2, February 2008, pp. 73-94.
  • [23] X. (Robert) Luo, W. Zhang, S. Burd, A. Seazzu, “Investigating phishing victimization with the Heuristice-Systematic Model: A theoretical framework and an exploration”, Computers & Security, Vol. 38, 2013, 28-38.
  • [24] Z. J. Yang, M. A. Ariel, & T. H. Feeley, “Risk Information Seeking and Processing Model: A Meta-Analysis”, Journal of Communication, Vol. 64, 2014, pp. 20–41.
  • [25] R.J. Griffin, S. Dunwoody, K. Neuwirth, “Proposed model of the relationship of risk information seeking and processing to the development of preventive behaviors”, Environ Res., Feb. 1999;80(2 Pt 2):S230-S245. doi: 10.1006/enrs.1998.3940. PMID: 10092438.
  • [26] Zamal, Faiyaz Al et al.“Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors”, Proceedings of the International AAAI Conference on Web and Social Media (2012).
  • [27] D. Mulders, C. de Bodt, J. Bjelland, A. ’Sandy’ Pentland, M. Verleysen and Y-A.de Montjoye. “Inference of node attributes from social network assortativity.” Neural Computing and Applications 32 (2019): pp. 18023 - 18043.
  • [28] J. Jia, B. Wang, L. Zhang and N. Z. Gong. “AttriInfer: Inferring User Attributes in Online Social Networks Using Markov Random Fields.” Proceedings of the 26th International Conference on World Wide Web (2017).
  • [29] D. Koller, N. Friedman, L. Getoor, B. Taskar, ”Graphical models in a nutshell”, In L. Getoor and B. Taskar (Eds.), Introduction to Statistical Relational Learning, (pp. 13– 55). Cambridge, MA: MIT Press. (2007)
  • [30] R. Clarke, http://www.rogerclarke.com/ID/IdModel-1002.html, (2010)
  • [31] T. Lombrozo, “ Explanation and Abductive Inference”, In The Oxford Handbook of Thinking and Reasoning (Holyoak, K.J. and Morrison, R.G., eds), pp. 260–276, Oxford University Press, (2012)
  • [32] T. Lombrozo,“Explanatory Preferences Shape Learning and Inference”, Trends in Cognitive Sciences, Vol. 20, No. 10, October 2016, pp 748-759.
  • [33] J. E. Hummel, J. Licato and S. Bringsjord, “Analogy, Explanation, and Proof”, Frontiers in Human Neuroscience, Volume 8, Article 867, November 2014.
  • [34] Wesley C. Salmon, “Statistical Explanation”, in Salmon (ed.), Statistical Explanation and Statistical Relevance, Pittsburgh, PA: University of Pittsburgh Press, (1971), pp29-87.
  • [35] Matteo Colombo, Marie Postma and Jan Sprenger, “Explanatory Judgment, Probability, and Abductive Inference”, In A. Papafragou, D. Grodner, D. Mirman and J. C. Trueswell (eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society, Cognitive Science Society. Austin, TX: pp. 432-437 (2016)
  • [36] J. N. Schupbach and J. Sprenger, “The Logic of Explanatory Power”, Philosophy of Science, 78 (January 2011) pp. 105–127. 0031-8248/2011/7801-0006
  • [37] Jonah N. Schupbach, “Comparing Probabilistic Measures of Explanatory Power”, Philosophy of Science, Vol. 78, No. 5, December 2011
  • [38] J. Pearl, “Causal Inference in Statistics: An Overview”, Statistics Reviews, 8, 96 (2009)
  • [39] C. Dahinden, G. Parmigiani, M. C.  Emerick, P. Buhlmann, BMC Bioinformatics, 8, 476 (2007) (Additional File 1 Section 1)
  • [40] A. Dobra, S. E. Fienberg, A. Rinaldo, A. B. Slavkovic, Y. Zhou, “Algebraic statistics and contingency table problems: Log-linear models, likelihood estimation, and disclosure limitation”, In: Putinar, M., Sullivant, S. (eds.) Emerging Applications of Algebraic Geometry. IMA Series in Applied Mathematics, pp. 63–88. Springer, Heidelberg (2008)
  • [41] J. Darroch, S. Lauritzen, and T. Speed, “Markov fields and log-linear interaction models for contingency tables”. Annals of Statistics 8, 522–539.(1980).
  • [42] C. Dahinden, M. Kalisch, P. Buhlmann, ”Decomposition and Model Selection for Large Contingency Tables”, Biometrical Journal, 52, DOI: 10.1002/bimj.200900083, pp. 233-252 (2010)
  • [43] X. Yang, S. E. Fienberg, and A. Rinaldo. “Differential Privacy for Protecting Multi-Dimensional Contingency Table Data: Extensions and Application”. Journal of Privacy and Confidentiality, Vol. 4 (1). https://doi.org/10.29012/jpc.v4i1.613, 2012.
  • [44] S. Boyd L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, (2004).
  • [45] A. Brondsted, An Introduction to Convex Polytopes. (Graduate texts in mathematics;90), Springer-Verlag, NewYork, (1983).
  • [46] W. P. Bergsma and T. Rudas, “Modeling Conditional and Marginal Association in Contingency Tables”, Annales de la Faculté des Sciences de Toulouse 6e série, tome 11, no 4 (2002), p. 455-468.