Detection and Amelioration of Social Engineering Vulnerability in Contingency Table Data using an Orthogonalised Log-linear Analysis
Abstract
Social Engineering has emerged as a significant threat in cyber security. In a dialog based attack, by having enough of a potential victim’s personal data to be convincing, a social engineer impersonates the victim in order to manipulate the attack’s target into revealing sufficient information for accessing the victim’s accounts etc. We utilise the developing understanding of human information processing in the Information Sciences to characterise the vulnerability of the target to manipulation and to propose a form of countermeasure. Our focus is on the possibility of the social engineer being able to build the victim’s profile by, in part, inferring personal attribute values from statistical information available either informally, from general knowledge, or, more formally, from some public database. We use an orthogonalised log linear analysis of data in the form of a contingence table to develop a measure of how susceptible particular subtables are to probabilistic inference as the basis for our proposed countermeasure. This is based on the observation that inference relies on a high degree of non-uniformity and exploits the orthogonality of the analysis to define the measure in terms of subspace projections.
Index Terms:
cyber security, social engineering, data privacy, identity, personal attributes, identity impersonation, contingency table, log linear, geometric marginalisation, de-personalisation.I Introduction
I-A The Social Engineering Problem
Social engineering attacks on cyber infrastructure constitute a large and growing economic problem. As the comprehensive literature review in [1] reveals, the term ‘social engineering’ has been applied to a wide variety of illegitimate methods of gaining advantage ranging from simple breaches of physical security to sophisticated phishing campaigns [2]. However the central feature of social engineering that emerges is the exploitation of human vulnerability through some form of interaction between attacker and the attack target.
This is the case even where an unauthorised person gains physical entry to secure facilities by following legitimate personnel through a controlled access point (‘tailgating’) or enlisting the aid of an authorised person by pretending to be struggling with a heavy load [4] [3]. Here, the attacker exploits the reluctance of people to confront a potentially awkward situation or the desire to be helpful which illustrates the general observation that humans are the weakest link in any security system [4].
Phishing is also becoming a major threat to cyber security. From the discussion in [2] it is clear that there is a close relationship with social engineering, with some researchers describing phishing as a form of social engineering whereas others see phishing as employing social engineering. Regardless of the details, a unifying thread is the exploitation of human vulnerability for the creation of some form of deception for nefarious purposes.
Methods of defending against social engineering attacks are discussed in [4], [8] and [6]. A clearly defined and well documented security policy is regarded as essential in protecting information infrastructure including robust user authentication [4]. Two factor authentication is widely deployed and believed to be much more impervious to attack than simple passwords and PINS. However two factor authentication is itself vulnerable to social engineering [9] [10] by essentially tricking the victim into providing the verification code to the social engineer using the type of technique described in [9] and [10].
This illustrates the inescapable fact that the central goal of social engineering is to evade whatever security mechanisms are deployed to protect the target system by exploiting human vulnerabilities [8] [6] [5]. Grimes [9] observes, from an industry perspective, that “ even the very best technical controls will allow some amount of social engineering to get by your defenses. What will save you then is good security awareness training.” Indeed [8] cites case studies which “ demonstrate that security awareness is the crucial and most effective tool in the fight against social engineering attacks ”.
Security awareness training is discussed from various points of view in [8], [6], [11] and [9] with the caution that this needs to be ongoing to overcome the natural tendency to forget or habituate [8]. Resistance training, “ aimed at making employees resilient against persuasion techniques that a social engineer may employ” [4] are also advocated by some, particularly where the interaction is in the form of a dialog.
Extreme types of security awareness are proposals, discussed in [6] and [11], for a system to monitor the emotional state of, for example, call centre operators, to ensure they are in the correct frame of mind to resist social engineering. However we regard this type of intrusive approach as draconian and likely to be ineffective because it would generate resentment and encourage evasion by the operator as well as being disruptive when the monitor decreed the operator to be in the wrong frame of mind.
A more benign and, we suggest, more effective approach is to provide potential victims with an automated assistant which monitors the interaction with an attacker to detect indications that a social engineering attempt may be underway. For example, Bhakta and Harris [11] describe a system for analysing each line in a text based interaction to detect instances of a predefined ‘topic black list’ warning the potential victim and perhaps taking preemptive action to prevent a security violation. Similarly, in this paper we accept that the vulnerability of human operators to social engineering is inevitable and describe a system to assist them is assessing the overall reliability of the information they are being presented with.
I-B The Nature of Social Engineering
The centrality of human vulnerability in social engineering is emphasised by the philosophically well grounded definition proposed in [1] Section IV:
Definition I.1.
“In the context of cybersecurity, social engineering is a type of attack wherein the attacker(s) exploit human vulnerabilities by means of social interaction to breach cyber security, with or without the use of technical means and technical vulnerabilities.”
Continuing research in social engineering requires a sound intellectual foundation, beginning with a consistent ontology which then provides the basis for the underlying conceptual scheme. This, in turn, enables the empirical knowledge base to be ordered by constructing more specialised taxonomies.
Previous work on developing conceptual models and taxonomies is reviewed in [6]. Recently, Wang et. al. [12] have proposed a comprehensive foundation for social engineering research comprising an ontology and conceptual model with consequent taxonomies. Their conceptual model has eleven components, [12] Fig. 5, with each of these components having their own taxonomies populated by the ontology.
Defining a set of formal relations between the components of the conceptual model leads to a knowledge graph in which the vertices represent the elements of the ontology and the edges represent the relations between the conceptual model components whose taxonomies include the elements. All of this is machine readable so that the knowledge graph can be constructed computationally from empirical data on actual social engineering attack scenarios and then analysed to reveal general features of attacks.
For example the vertex degree, i.e. the number of edges associated with a vertex, is an indicator of how frequently an ontological element features in attacks. A relatively high degree can reveal a particular human or system vulnerability so that an alarm can be generated when it occurs during an attack or special attention can be paid to it in security awareness training [12].
Human vulnerability is examined in detail in [6] which proposes an “I-E Model of Human Weakness” as a basis for investigating social engineering attacks and potential defences. Here, ‘I’ denotes internal which refers to the psychological characteristics of potential victims whereas ‘E’ denotes external referring to the social environment in which manipulation of the victim occurs. Whereas [6] uses the term ‘socio - psychology’ as a general description of this class of model, we prefer the term ‘psycho - social’ because it is consistent with the description of similar issues in the health sciences [7] where the internal-external dichotomy is also employed, although the social environment there is somewhat different.
Because “ the success of social engineering relies heavily on the information gathered ” [12] one of the conceptual model components is ‘social engineering information’. The importance of information is also stressed in [4] and [2], where ‘information gathering’ is listed as the first step in the attack cycle, and in [5] where ‘information requirements analysis’ is described as one of the prerequisites of an attack.
II Method
II-A The Dialog Based Social Engineering Context
While we agree with the generality of Definition I.1, in this paper we focus on that class of cyber security breach where the social engineer manipulates the target of an attack into revealing confidential information to be used in perpetrating some type of fraud. We distinguish between the target of the social engineering attack, on the one hand, and the victim of the subsequent fraud on the other. Whereas the target is a single human entity, the victim may be a corporate entity or the many customers of that corporate entity.
In a typical phishing attack the target is also the victim [2]. However in another class of attack, to which call centres in particular are susceptible, the social engineer attempts to persuade the target that the social engineer is, or represents, the intended victim by entering into some form of dialog. The immediate goal is to convince the target that the attacker is entitled to privileged information, information which can range from a simple account password to commercial or national secrets. If convinced, the target reveals the information to the attacker who then uses it to perpetrate some form of fraud against the victim. Identity theft, a growing problem, is a particular case of this form of impersonation.
Clearly the social engineer has to create a false identity based on some form of profile which will establish the credibility of the social engineer as the source of the information on which the attack is based. The wider question of assessing credibility in social networks has been surveyed in [13] which asserts that credibility is a synonym for believability and trustworthiness although the authors concede the multidimensional nature of credibility in the social network context. Multidimensionality is also stressed in [15] where trustworthiness and expertise are described as primary markers of source credibility. The basis of judgements of source credibility made by Facebook users is examined in [20] in four dimensions, one of which is sincerity. Judgements in this dimension were made in part on the basis of the source’s profile indicating the importance of that profile being sufficiently detailed to be convincing.
Source credibility has been examined in a variety of contexts, e.g. investigations of the evaluation of real and fake news articles [21] and the adoption of information in on-line communities [22]. In both cases, the theoretical context is provided by Dual-Process models of human information processing. Whereas the Heuristic-Systematic variant of the Dual-Process model is employed in [22], the work in [21] is based on another variant, the Elaboration Likelihood Model, originally developed to investigate persuasion.
The investigation of victimisation in phishing attacks in [23] is also based on the Heuristic-Systematic Model (HSM) where information processing occurs in two modes. In the heuristic mode a message is evaluated quickly on the basis of a set of available cues, one of which is source credibility, whereas in the systematic mode the evaluation involves cognitive processes and is consequently much slower and more deliberative.
These two modes are not mutually exclusive and have been demonstrated to operate concurrently [22] [23] such that one can moderate the other i.e. one can reinforce the other (referred to as additivity) or one can attenuate the effects of the other. In particular, source credibility has been demonstrated to reinforce the results of systematic processing [22] and, conversely, systematic processing can attenuate the effect of heuristic cues such as source credibility.
In the type of phishing attack considered here, the social engineer presents an argument aimed at persuading the target to perform some action - supplying confidential information or logging into a fake website for example. The target invokes some form of heuristic-systematic process to evaluate the argument. However if systematic processing dominates, the flaws in the argument are likely to be exposed [23] so it is in the social engineer’s interest to suppress systematic processing in favour of heuristic processing by manipulation of the target. One method of achieving this is to place the target under time pressure which suppresses systematic processing [23] [22].
An important aspect of the HSM is that it recognises that people will terminate the process when they feel comfortable with the judgement that has been evolving. In the model this is referred to as reaching the individual’s sufficiency threshold which depends on the prevailing circumstances. Lowering the sufficiency threshold will result in the individual favouring the heuristic over the systematic mode.
In [23], it is proposed that the success of an attack depends, at least in part, on using an attack pretext to suppress the target’s sufficiency threshold. Pretexting [23] [1] is where the social engineer, in preparing the attack, creates a scenario which provides the informational context within which the attack takes place, a context which can be also designed to reinforce the perception of source credibility. Indeed we regard the pretext as a core component of the type of dialog based phishing attack considered in this paper.
II-B Information Insufficiency and the Socially Engineered Phishing Attack Model
However we believe that a more detailed account of the relationship between the pretext and the sufficiency threshold can be developed from the Risk Information and Seeking Processing (RISP) model of information science which incorporates the Heuristic-Systematic Model. The RISP Model has been widely researched in the information sciences as an explanation for how and why people seek additional information in the context of risk (see [24] for a review and discussion). While developing the full relationship between social engineering and RISP is beyond the scope of this paper, the essential idea for our purposes is as follows.
Firstly we observe that risk is inherent in a dialog based phishing attack and that risk is perceived at some level by the target. One of the main components of RISP is information insufficiency which is defined as the difference between an individual’s current knowledge level and their sufficiency threshold. A higher level of information insufficiency, all else being equal, will drive the individual towards higher levels of information seeking and processing [25] and, consequently, to bias their processing mode towards systematic processing.
In light of this we propose that pretexting, rather than lowering the sufficiency threshold as hypothesised in [23], instead raises the perceived level of current knowledge relevant to the attack. The pretext thus decreases the level of information insufficiency thereby reducing the perceived need to seek additional information by interaction with the attacker. Furthermore a carefully crafted pretext can lower the level to the point where the heuristics will override any doubts that may arise from systematic processing of the pretext. Consequently pretexting complements source credibility by ensuring the target is more strongly influenced by source credibility and other heuristics than by the merits of the arguments presented by the social engineer.
All of these factors lead us to define a model of a socially engineered phishing (SEP) attack, which we refer to as a SEP attack model, as:
Definition II.1.
The SEP attack model has the following three components:
-
1.
the source i.e. the attacker, with an associated source credibility,
-
2.
the pretext providing the informational context in which the attack occurs and
-
3.
the set of demands and the argument for why the target should conform to them.
Note that all three components are fake but presumably sufficiently plausible that the target is willing to engage with the attacker.
Credibility generally increases with familiarity [15] which implies that a completely unfamiliar pretext will be regarded with considerable skepticism by the target suggesting the engagement of some level of systematic processing. Provisionally accepting the pretext implies that at least some of the components of the pretext are held by the target to be true. Because a high level of systematic processing would require that the large proportion of the components be held to be true, suppressing the sufficiency threshold reduces the proportion of components that the target needs to recognise as true in order to accept the pretext.
The social engineer is then free to invent the remaining components but with the constraint that consistency is important for credibility [15] so that inconsistency in the information presented to the target results in a greater reliance on systematic processing [22]. This clearly conflicts with the original aim of ensuring the dominance of heuristic processing. Consequently, the fake components of the pretext or message must be chosen to appear consistent with the components that the social engineer can assume the target will regard as true on the basis of the social engineer’s research into the target’s background.
All of this emphasises the critical role played by information as was stressed in Section I-B. It becomes clear that a social engineer needs to mount a significant effort to painstakingly acquire the information required for a successful attack. The nature of the information is such that the acquisition process can be indirect and complex so that social engineers “ will tie little pieces of information they have acquired over time, decipher cues and signals given to them by multiple employees, and then connect the pieces of the jigsaw puzzle to unearth the information they have been after” [8].
Previous work has focussed on the exploitation of whatever factual information the social engineer can extract directly from available sources. However we argue that the ‘pieces of the jigsaw puzzle’ acquired by a social engineer will in general be incomplete and perhaps inconsistent so that ‘unearth(ing) the information’ will require significantly more that merely assembling a collection of facts. This is particularly the case with the type of personal detail that will lend a strong sense of authenticity to the pretext.
Specifically, we base this paper on our assertion that information acquisition can be enhanced by inferring additional information from, not only that already acquired but, importantly, contextual information as well. Rarely, however, will this be a straightforward matter of deductive inference leading to certainty. Instead the inferences will be probabilistic in nature.
II-C Probabilistic Inference, Analogical Inference, and Social Engineering
Probabilistic inference techniques are finding increasing application in database security research particularly in characterising and detecting network based attacks. In fact in a database inference attack, inference procedures are an intrinsic component of the attack methodology itself and developing effective countermeasures against inference attacks is an established aspect of database management research [16]. The underlying concept is that of an ‘inference channel’ which is generated by the patterns of association in the data which support inferences.
Of greater relevance to this paper are a number of techniques which have been developed to infer unknown (latent) attribute or demographic values from a known set derived from social network data. For example a machine learning method based on feature vectors is described in [26] whereas probabilistic graph theoretic techniques based on Markov Random Field concepts are proposed in [27] and [28].
Underlying the probabilistic approaches is the characterisation of the set of attributes as random variables described by a multivariate probability distribution which is estimated, at least implicitly, from the known values. However in both [27] and [28] the computational problem is made tractable by using binary variables and by adopting a pairwise Markov Random Field Model. The latter places quite strong constraints on the form of the multivariate distribution restricting it to a product of univariate and bivariate ‘potential’ functions [29]. In addition, the assortativity [27] of the associated social network defines a neighbourhood structure which further restricts the form of distribution with consequent restrictions on the conditional independencies between the attributes [29].
Even with their simplifying assumptions these are not simple, straightforward procedures either conceptually or computationally. To be clear, we are not suggesting that social engineers in general have the capacity to utilise these techniques to infer missing information even where the data was available. Instead we develop the argument below that the social engineer will make these inferences analogically based on a broad understanding of the socioeconomic context and human nature.
To formalise these ideas we use the terminology of Clarke [30], which distinguishes between an entity and an identity. An entity is something in the world, not necessarily human, which can present many different identities depending on the particular role that entity is playing in the relevant context. Whereas the social engineer’s intended victim is an entity, it is that entity’s role in transacting with the organisation represented by the human target that is the social engineer’s focus.
Both the social engineer and the target are dealing with the particular identity presented by the entity comprising that subset of the entities attributes required for the role. Identities are distinguished by the values that those attributes have in each particular case, those values providing the basis for identification.
For identification purposes, the target will have access to a record associated with the victim in the form of a digital persona which Clarke defines as “(t)he collection of data stored in a record (which) is designed to be rich enough to provide the record-holder with an adequate image of the represented entity or identity” [30]. Bearing in mind that the digital persona is, by design, a limited view of the victim, the key aspect of this is the social engineer having a sufficiently plausible profile of the victim to convince the target that the social engineer is indeed the victim.
Unfortunately this is made increasingly easy by the eagerness of people in general to place personal details in specific profiles on the various forms of social media. However, even hiding identities behind handles is an increasingly flimsy defence [1] so that a social engineer is increasingly able to acquire key elements of the identity relevant to the attack.
An important component of the social engineer’s skill set is collecting these identity elements from a variety of public sources. Nevertheless, in general, publicly available information will be insufficient for plausibility, requiring supplementary information to flesh out the persona giving the target the sense that they are indeed communicating with the victim. For this to succeed the supplementary information need not be accurate but merely consistent with the digital persona.
One of the key proposals of this paper is that this supplementary information can be derived by the social engineer, in part by inference from already obtained information perhaps by formal statistical techniques or, more than likely, by using general background knowledge of existing socio-economic patterns. Indeed we assume that astute social engineers are adept at observing and exploiting patterns in the socio-economic events surrounding them. Experimental psychology has demonstrated a clear connection between explanation, on the one hand, and learning and inference on the other [31][32] so we assume, in addition, that those patterns which have an explanatory context, will predominate.
The implication is that the social engineer makes inferences in an explanatory framework. Based on the close psychological connection between explanation and analogy [33], we believe that this is best categorised as a form of analogical inference. For example a young person living in a country town is in an analogous situation to many young people living in many country towns. It can then be inferred that a particular young person is very likely to be unemployed because the economies of many country towns are contracting offering limited employment opportunities to young people as is well known.
The essential point is that it is this analogous explanatory framework which lends plausibility, in the mind of the target, to the inferred information used in perpetrating the fraud rather than the formal statistical properties of the publicly available information. Consequently we do not require social engineers to have access to a formal database, nor to have the skills to analyse the data even if they did. Nevertheless the social engineer will be well aware informally of those statistical properties which will form part of the motivation for drawing the inferences depending on the perceived level of statistical relevance.
The notion of statistical relevance in its various forms has played a role in the philosophical analysis of explanation in a scientific context since [34]. In experimental psychology the relationship between explanatory power and statistical relevance has been demonstrated using simple measures,[35]. At a more theoretical level, Schupbach and Sprenger [36] characterise statistical relevance in terms of the degree to which one proposition H (e.g. a hypothesis) makes a second proposition E (e.g. empirical evidence) less surprising. They then use this in a Bayesian context as part of their definition of a measure of explanatory power which Schupbach [37] has demonstrated successfully accounts for the judgements of explanatory power made by subjects in a series of psychological experiments.
All of this suggests that astute social engineers will generally base their analogical inferences on informal assessments of statistical relevance derived from observation of socio-economic patterns. However, instead of attempting to describe and analyse analogical inferences, we focus on the more easily defined task of analysing the underlying probabilistic structure that the psychological evidence suggests reflects and supports analogical inference. While the explanatory aspects of the type of analogical inference referred to here have a strong causal flavour, we will regard probabilistic information as purely associative having no sense of causality [38]. However we recognise that this creates the potential for false positives.
By analysing the probabilistic structure of the publicly available information we can anticipate the drawing of analogical inferences and a key outcome of our research is an indicator of the degree to which a dataset is susceptible to inference. This indicator can be exploited to design countermeasures that reveal, and warn of, the patterns of association that underly the analogical inference, alerting potential targets to the possibility that information that they might be presented with is unreliable.
II-D Contributions of the Paper
We have discussed the critical role played by information in Sections I-B and II-B with an essential part of the attack preparation being the careful assembly of as much detail about the attack victim and context as is feasible [12] [4] [2] [5]. This information is used the construct the pretext which is an essential component of a social engineering phishing attack. The Heuristic-Systematic (HS) model of information processing developed in the Information Sciences has previously been used in [23] as a basis for analysing the interaction between attacker and target.
However, because the attack occurs in the context of risk, in this paper we extend that analysis by adopting the Risk Information Seeking and Processing (RISP) model [25] which incorporates the HS model. An important component of the RISP model is information insufficiency which acts as the primary motivator for seeking and processing further information. In Section II-B we invoke the RISP model to argue the importance of a sufficiently convincing pretext to reduce the target’s information insufficiency level to the point where the target avoids the systematic mode of processing. The target then does not subject the available information to the more detailed analysis which might raise doubts in the target’s mind about the veracity of the attacker’s claims.
Our underlying thesis is that the details revealed by the attacker’s careful assembly of existing information are generally insufficient for building a convincing pretext. However they can be sufficient for the social engineer to infer probabilistically enough additional details to persuade the target that the claims being made in the pretext are valid if the target processes the claims heuristically. Consequently a seemingly convincing pretext may be less substantial than it appears because it is based partly on inferred attribute values. This raises the question of whether this situation can be detected and, if so, can the degree to which a pretext is dependent on inference be estimated. We seek to answer this question in this paper.
Our primary goal, then, is to devise a means of advising potential targets, particularly in informal database environments, that certain combinations of attributes might be unreliable for identification because the values of some are inferable from others. In other words the social engineer is likely to have inferred some attribute values thus appearing to be the victim who of course would know those values. We believe that our approach of using indications of inference as the basis for advising targets of the need for caution has not previously been reported in the literature.
Note that this is quite distinct from describing methods for actually drawing inferences, using, for example, the conceptually and computationally complex processes described in [26], [27], and [28]. Instead, in Section III-A below we develop a method of estimating the potential of a data set for inference to be performed without actually computing the inferences thus avoiding much of the complexity.
In the following we proceed from the observation that inference relies on relationships between the attributes where, in this case, the relationships are not only explanatory, thus supporting analogical inference, but are also manifested statistically enabling complementary probabilistic inference. The principal theoretical focus of the paper, then, is on the identification of attributes where unknown attribute values can be estimated by probabilistic inference from some database using the known values of other attributes.
Like [17], and [19] we assume the existence of a database derived from some defined population from which can be extracted a multidimensional contingency table in which the cells of the table are indexed by an ordered index vector . Each cell contains the number of members of the population with the combination of attribute values referenced by the index so the table can be conceptualised as an unnormalised discrete multivariate probability distribution. However, instead of using the posterior joint distribution [19] directly or the posterior marginal distribution [17], we employ a log linear transformation of the prior distribution to analyse the probabilistic relations between subsets of attributes which give rise to the structure in the table.
Because inferences are drawn from prior information we focus on the contingency table equivalent of a conditional probability distribution which we refer to as a conditional subtable. We will assume inference is immediate in the sense that the conditional distribution is dominated by, i.e. concentrated on, a particular combination of attribute values. These can then be taken as the unknown values with high likelihood.
We develop a novel computational procedure for detecting particular subsets of attributes which have conditional subtables dominated by a small set of values without having to analyse each individual subtable. The basis of this procedure is the observation that a uniform distribution is completely uninformative and that, in order for probabilistic inference to be viable, a distribution must exhibit significant deviation from uniformity. This is because of some property of whatever mechanism is generating the cell contents - a mechanism that causes some cells to be significantly more probable than others.
In other words these cells are salient and we measure salience as a deviation from uniformity. Roughly speaking we argue that a high level of salience in a subset of attributes results in the multivariate conditional subtable of those attributes being concentrated on just some small number of the attribute levels in the subtable. This does not require that the dominant cells have any topological relationship. In particular it does not require that they be neighbours so that standard measures based, for example, on second moments, are not suitable for our purpose.
In Section III-B we develop the vector space theory underlying our analysis utilising the orthogonal log linear transformation introduced by Dahinden et. al. [39]. Importantly this is based on projecting the vector representing the logarithm of the table onto a subspace orthogonal to the uniform vector. After analysing the orthogonalised log linear design matrix we state and prove Theorem III.1 in Section III-D. This novel result enables us to reduce a large contingency table to one involving a subset of attributes using geometrical means as a form of marginalisation while preserving salience as much as possible.
Theorem III.1 also provides the basis for our definition of a novel indicator, Probabilistic Salience, of the degree to which statistical data can support an inference that particular attributes have specific values (Definition III.2). We describe a means of evaluating that indicator from the data and demonstrate how it can identify those subsets of attributes in a contingency table which are most vulnerable to inferencing . Then, in Section III-E, we show why Probabilistic Salience does indeed measure the degree to which a subset of attributes is dominated by a relatively small number of members.
Most of Section III-B1 as well as the rest of Section III-B report the results of research which, to the best of our knowledge, is original. We believe that this work also makes a significant contribution to the contingency table literature because we were unable to find anything there that investigated this type of problem.
We envisage a novel system which employs Probabilistic Salience to generate a warning when some of the attribute values being claimed by a social engineer might have been inferred from others without having to classify them. The warning would take account of the level of any formal authentication provided and could be as simple as a red, amber, green traffic light indication. While this system would be most easily implemented in a text based dialog we see nothing in principle which would prevent speech recognition being used in a verbal dialog. Such a system could complement the topic blacklist system described in [11] as well as that referred to in [12] and might form a component of a more comprehensive intelligent assistant for combatting social engineering attacks in real time.
The central objective of our proposed probabilistic salience based warning system is to increase the target’s information insufficiency level by inducing the target to derate the inferred attributes thereby reducing the target’s current knowledge level. If the inferred content is large enough, the information insufficiency level will increase to the point where the target transitions to an information processing mode dominated by systematic processing.
In this mode the target could be expected to take a more deliberative and analytic approach to evaluating the attacker’s claims and expose the inevitable flaws and inconsistencies leading to a rejection of the attack. It is not intended that the system directly intervene in the attack by, for example, preemptively terminating the contact.
Investigating the conditions under which probabilistic inferencing can be performed, and assessing the potential for probabilistic inferences to be drawn is, to the best of our knowledge, a novel approach which this paper brings to the understanding of social engineering. Furthermore this novel approach is able to draw upon the understanding of human information processing being developed in the information sciences to begin constructing techniques for actively counteracting social engineering. We suggest that this is an example of a potentially fertile area of research applying the characteristics of human information processing in the context of risk to the development of effective countermeasures to social engineering.
Finally in Section V we propose a potential application of Probabilistic Salience and Theorem III.1 in what we refer to as ‘de-personalisation’ of a data set by analogy with the de-identification techniques used in data privacy. The objective here is to inhibit probabilistic inference by selectively reducing the data’s Probabilistic Salience without excessively compromising its utility.
III Theory
III-A The Multidimensional Contingency Table
The general problem area is that of dimensional contingency tables [40] formed from an ordered set of categorical variables, i.e. attributes, . Each variable consists of a set of values or levels such that, for example, the th variable, , comprises the levels . There are cells in the table, each being identified by an ordered index set
where so that each attribute is directly associated with the index . Each cell contains the number of members of the population which exhibit the joint occurrence of the attribute levels. However, for notational convenience in what follows, there will be the same number of levels in each attribute, , so that .
We regard the table as a geometric entity, in this case a hypercube of cells. To provide flexibility in associating an element vector of attribute values with the indices of the corresponding cell, we adopt the convention of representing the th ‘axis’ of the hypercube by the element standard basis vector in which the th element is ‘’ with the remainder zero. This enables the set of attributes to be referenced by the index matrix
(1) |
If the levels of the attributes designating the th cell are collected in the vector , , the indices of the cell are given by
(2) |
The index matrix construct (1) also provides a mechanism for dealing with subtables. If is some subset of attributes, the attribute index vector is
where there is no implication that the integers are contiguous. Using as a key provides the subtable index matrix
(3) |
The contingency table is assumed to be a member of an ensemble of similar tables with the common characteristic that they represent the same population so that their respective entries sum to the population size say . More formally this is equivalent to assuming that each table is drawn from a multinomial distribution although we will not make use of this fact. These tables can be represented as vectors by ordering the table indices according to some rule. Here we choose a lexicographic ordering in which the index, varies the most rapidly followed by and so on resulting in a table vector, .
Much of the literature on contingency tables is concerned with the estimation of the cell contents from incomplete data. However, to avoid being distracted by estimation procedures, important though they are, we will assume that the data in the cells is a reliable representation of the population. The only complication is presented by zero entries in one or more cells but we will take a pragmatic approach here and replace zeros by ones by applying an affine transformation to the table with minimal impact on the conclusions for reasons that will become apparent below.
III-B The Log Linear Transformation
The log linear transformation is based on the cell by cell logarithmic transformation of a contingency table vector to generate a new table where . In its general form the log linear transformation is ([41])
(4) |
where the functions are parameterised by a subset of the attributes.
In what follows it is essential to impose some form of ordering on the set of subsets i.e. on the power set of , . In the ordering adopted here, the first term has , i.e. the null set representing the constant term. Next comes the group of subsets with representing the main effects , then those with and so on. The terms with are referred to as ‘interaction terms’ because they represent probabilistic interactions between the members of the subset [41][29], beginning with first order interactions between pairs of attributes. Within each group the subsets are ordered following the lexicographic principle applied to their member’s indices. For example where , i.e. first order interactions, the ordering is
(5) |
Clearly, defining the functions, which are in the form of an dimensional vector, is a critical task in constructing the expansion. The key development on which the process described here is constructed has been introduced by Dahinden et. al. [39] where they express each function as a linear combination of a set of basis vectors which span the subspace containing it. There is one basis vector for each particular combination of the levels associated with a subset of attributes. The number of such basis vectors, i.e. the dimension of the subspace, is determined by the number of possible configurations of the levels which is . Each vector can then be obtained from the data itself by determining the coefficients of the linear combination.
In matrix form, the log-linear model is
(6) |
where is a design matrix and is a vector of coefficients of the form
The basis vectors of a particular function then form a subset of the column vectors of i.e. a submatrix of where each submatrix represents one of the subsets, , of attributes involved in a particular level of interaction. Dahinden et. al. label the submatrices so generated as where the elements of each submatrix are either zero or one. The algorithm for generating the basis vectors is given in [39] but the essential idea is described below. However, because the detailed form of the design matrix is a critical part of our development, we examine its construction in more detail in Section III-C.
To proceed, it is necessary to introduce some additional notation. Firstly let be the interaction index, , so that indexes the constant term in (4), the main effects and so on. For each value of , the number of submatrices of the form is and the number of columns in each submatrix is , one column for each of the possible combinations of levels in the set of attributes. Keep in mind that each cell in the table is denoted by a specific index vector so that the th component of the table vector is denoted by
(7) |
where is the -ary number expressed in radix form. Note that this is simply the operation of counting through the elements of the table vector with radix .
Consider one specific value of and one specific submatrix, , where and which specifies the subsets of . For each value of , the attributes are designated by the ordered vector of attribute numbers so that e.g. is the leftmost attribute in the subset. Then each column of the submatrix represents one particular combination of the levels of the subset of attributes designated by . Note that there is no implication that the indices are contiguous which, in general, they are not as indicated by the example (5).
If the levels of the attributes are , the vector , represents one combination of values of the attributes . The th column of the submatrix is then associated with a particular as well as . The remaining table axes form the vector with corresponding attributes vector . Using (7), the elements of the th column vector having and in common, , are designated by the table index vector, from (2) and (3)
(8) |
where the elements of range over all combinations of the attribute levels.
To define the basis vector recognise that the available information does not allow the elements of to be distinguished. Consequently the th column vector becomes a basis vector by setting the elements of to unity with the remaining elements zero.
The lexicographic ordering is a bijective mapping, so that if there is a one at a particular position in the th column of the submatrix , there cannot be a one in the corresponding position in any of the remaining columns. Consequently, the columns of the submatrix in question, i.e. the basis vectors, are orthogonal. Furthermore, because the total number of ones in the submatrix is , the sum is equal to the unit vector .
The full matrix then has
columns so that (6) is very underdetermined. Conventionally the approach to this problem is to impose ‘identifiability’ by removing the null space of using, for example, the Moore-Penrose pseudo inverse. However this approach results in losing the explicit connection between the coefficient vector and the interaction terms.
Fortunately Darhinden et. al., [39], have derived an orthogonal form of which leads to a coefficient vector that can be related to the interaction terms in (4). The essential idea is as follows. Using a modified version of the notation in [39], the vector space, spanned by the constant vector has dimension 1. Then the vector space , spanned by the columns of , is composed of and its orthogonal complement, , in the orthogonal complement having dimension . The space, , spanned by is composed of and its orthogonal complement, in which has dimension .
For the th interaction index, there are spaces of the form where each of the sets is a distinct subset of the index set . The space spanned by the columns of , , is composed of the subspace
and its orthogonal complement in , . This orthogonal complement has dimension
(9) |
Beginning with the constant vector, the orthogonalised form of , , is derived by successively projecting the columns of the submatrix onto the orthogonal complement . Then is constructed from the constant vector and the columns of the submatrices following the ordering described above. Consequently the number of columns is
(10) |
so that is a square matrix.
III-B1 Constructing the Functions
In general, if the orthogonalised matrix is then the log-linear model becomes, from (6),
(11) |
where is the modified vector of structural parameters. The vector therefore exists in a space
(12) |
where is the th subset of attributes having cardinality in the lexicographic ordering specified above.
Unfortunately, whereas the elements of in (6) are associated directly with the corresponding interaction terms in those in are not. For example, the elements of corresponding to the first order interactions describe the structure of the interaction graph, while those associated with the first order interactions in do not, at least not directly. This is because the coefficients associated with the sub matrix of in (11) are the coefficients of the basis functions spanning the subspace .
However, because the matrix is partitioned into the submatrices, , whose columns are the orthogonal basis vectors of the subspaces , the vector can be partitioned into subvectors as
(13) |
with Then (11) becomes
(14) |
Now each term of the outer summation in (14) does refer to a specific interaction, namely, the interaction between the members of .
To connect with the functions of (4) recognise that a vector representing the function exists in the subspace spanned by the basis vectors forming the submatrix of in (11). Furthermore, the subvector is associated with . In other words the function is represented by the linear combination
(15) |
Because the underlying subspaces are orthogonal, is orthogonal to the vectors representing the other functions.
Exploiting the fact that is an orthogonal matrix, the vector is obtained by premultiplying (11) by to give
where is a diagonal matrix containing the squared magnitudes of the basis vectors and is readily inverted. Then
(16) |
and
(17) |
where is a diagonal matrix containing the squared magnitudes of the basis vectors in the subspace . From (15)
(18) |
so that
(19) | |||||
because of orthogonality. This shows that the matrix in (18 is self adjoint and idempotent so that it is in fact a projector onto the subspace . Consequently the magnitude of the function is the magnitude of the projection of onto the subspace i.e. the subspace in which the vector representation of the function exists. In a sense it describes the magnitude of the contribution the population data in the table makes to the interaction represented by .
III-C The Structure of the Design Matrix
To explore the overall form of the design matrix we return to the construction of the original form and the discussion following (7). The columns of the submatrix are numbered left to right by the radix number i.e. in the first column of the submatrix, whereas in the second column and . There are therefore columns in the submatrix where each is generated from the preceding column by counting in radix .
Lemma III.1.
Column of the submatrix has the form
(20) |
Proof.
The column is initially all zeros, the table index is and the index of the table vector is . Now begin incrementing the table index in a counting operation. A count of results in , a count of sets and so on until a final count of will set . At this point the table index points to the first cell where the attributes in have the values given by and a one is entered in row (remembering that the row numbering starts at zero) of column . There are then a total of zeros preceding this one in the column.
A further count of is required before goes to so that there are consecutive ones in the column. This is followed by a sequence of zeros before all of the indices are reset to zero. This whole first level pattern is then repeated until goes to i.e. the pattern of zeros, ones and zeros occurs times.
This set of repeated first level patterns extends the zeros from the initial count and this second level pattern is completed by the sequence of zeros needed to set all of the indices to zero.
Continuing the count keeps repeating the second level pattern until so that there are repetitions of the second level pattern. A third level pattern is then completed by the sequence of zeros needed to set all of the indices to zero. Finally there is a th level pattern with repetitions of the th level pattern until the count is complete. ∎
Equation (20) can be expressed in the alternative form
(21) |
Now consider the orthogonalisation process. With denoting a column of the submatrix , the first step obtains the projection of the column vector onto the orthogonal complement of the normalised constant vector as
(22) |
Note that because (22) operates on rows only and the projection is onto the uniform vector, the hierarchical sequence of patterns in (21) is preserved in (23).
Using the case with to illustrate the process, the orthogonalisation represented by (22) to (23) results in . Then, the projection of this onto the orthogonal complement of the subspace formed by and is obtained by again applying (22) resulting in . Finally, projecting this onto the orthogonal complement of the subspace formed by , and yields
(24) |
The form of (24) and numerical calculations suggest the attribute generalisation
(25) |
Lemma III.2.
Column vector of the submatrix is given by (25).
Proof.
Note that (25) is composed of basic units each of length each individual unit having identical components. These are grouped into a sequence of first level modules each containing components such that each of these modules sums to zero. In turn these first level modules are grouped into second level modules each containing components and so on. This hierarchical construction is continued eventually resulting in th level modules each containing components. There are then of these level modules containing a total of components. Each module is composed of two submodules and such that
(26) |
Orthogonality of (25) over the vectors can be seen by considering two values of say and . If the length of the first level modules in vector is a multiple of the length of the first level modules in so that the components of are uniform over every first level module of . Consequently, in the scalar product of the two vectors, every first level module in the vector sums to zero so that the vectors are orthogonal.
If , there must be some where . Suppose . For each th module, in the scalar product of and , the submodule is multiplied component by component by the submodule . However each of the submodules is also multiplied by . From (26),
but there are of the product components so that the scalar product contribution of each ()th module is zero and the vectors are orthogonal. Clearly this is true also when is any multiple of . In fact this argument is independent of so that all of the vectors , including the main effects, are mutually orthogonal.
For the column vector lies in the orthogonal complement of the uniform vector and . Explicit derivation of the case and numerical calculations suggest that the th second level module has the form
(27) |
Note that this has the same modular structure as (25) and again the first level modules sum to zero. Indeed the structures are sufficiently similar that the above orthogonality arguments can be invoked to show that (27) is orthogonal to (25) for all as well as to the corresponding for .
III-D Conditional Subtables and their Geometric Means
Each term of the log linear expansion (4), , involves the subset of attributes which raises the question of how to handle the remaining . While marginalisation is the common approach, the nonlinear transformation makes the result difficult to interpret. Instead we adopt the alternative approach in which the members of determine a subtable conditioned on the remaining attributes.
Recall from Section III-B that the attributes associated with the submatrix are designated by with the remaining attributes designated by the vector . Via (7), the column vectors of the submatrix have exactly the same indexing structure as the contingency table , so, from (8), we can interpret an index vector of the form as indexing a subtable designated by . This leads to:
Definition III.1.
Let the attribute vector be fixed whereas the attribute vector is allowed to range over the combinations of the attribute levels. Then the attribute index vector specifies the conditioning attributes of the conditional subtable with indices given by, from (8)
(28) |
where the subscript indicates that the index set is associated with the th combination of conditioning attributes.
Consequently, it is the conditioning attributes which are associated with the bottom level modules in the modular hierarchy of (25) and (27) with (21) ensuring that the elements in each of the bottom level modules are equal. Because of the row by row operation of the orthogonalisation process, the equality of the elements of these components is preserved by the orthogonalisation process even if the values of the elements end up differing from module to module. This is evident from (25) and (27). Furthermore each bottom level module represents one particular combination of attribute values.
We then have the following Lemma.
Lemma III.3.
Let be the orthogonalised design matrix and let be the table of log transformed geometric means, over the conditioning attributes , of corresponding entries in the conditional subtables . Then, the magnitude of the projection of onto the subspace , , is given by
(29) |
where, from (19), is the magnitude of the projection of the originating table vector onto the subspace .
Proof.
It is clear from (8) and the construction of the basis vectors that the ’s in (20) constitute an indicator vector for the conditioning indices
(30) |
associated with the th combination of conditional attribute values as ranges over the combinations of the conditioning attributes. The scalar product of the th column of and the table vector is composed of the sums of the individual scalar products of the subtables and the corresponding , . However, because all of the elements of the latter are equal, each of the individual scalar products is simply the sum of the values in the particular subtable multiplied by one of the elements of .
The values of then generate the elements of an dimensional column vector by sampling the column vector at the entry indexed by (30) for every . This structure is shared by all columns of the submatrix so the resultant vectors can be collected into the submatrix . In fact it is clear from the proof of Lemma 20 and (21) that the sampling is performed by eliminating the repetition as the row index count proceeds from to which requires the indices and in the reduced matrix ensuring .
With , this enables the magnitude of the function to be expressed as, using (19),
(31) |
Inverting the transform, the summation term in (31) becomes
(32) |
which is the geometric mean of the conditional subtable over the conditioning attributes . ∎
This leads to our main theoretical result:
Theorem III.1.
Let be the design matrix and be the log geometric mean of the conditional subtables over the conditioning attributes . Then the magnitude of the projection of onto the subspace for is given by
(33) |
Furthermore the magnitude of the projection of the original table vector onto the dimensional subspace , , orthogonal to the uniform vector is given by
(34) |
where is the magnitude of the projection of onto the dimensional subspace , orthogonal to the uniform vector.
Proof.
From (20), the number of s in a pre-orthogonalised column of is which is the number of combinations of the conditioning attributes . For , attributes have been changed from conditional to conditioning so there are sets of attributes and, for each set, conditioning attributes. Because of the additional conditioning attribute, the set of s is expanded by a factor of in a pattern determined by the form of (20). These s are transformed to equal elements in the orthogonalisation process.
As in the proof of Lemma III.3, the scalar product of and a column of the submatrix is composed of sums of terms from each sum multiplied by an identical element of the submatrix column. Then
With this gives (33). Equation (34) then follows because of the orthogonality of the subspaces means that and are each given by the square root of the sum of the squares of the magnitudes of the respective projections onto the individual subspaces. ∎
The immediate implication of Theorem III.1 is that ‘marginalisation’ of a contingency table to reduce the number of attributes using a geometric mean preserves the interaction structure of the table up to the interactions between all of the conditional attributes.
III-E Probabilistic Salience
That the projection of a log table vector onto the subspace is orthogonal to the corresponding uniform vector suggests that the larger the magnitude of this projection is in proportion to the vector magnitude, the more the table entries are concentrated on a small number of cells. That is to say, the more salient are those cells. This motivates the following definition.
Definition III.2.
Let be a subset of the set of attributes and be a conditional subtable in the attributes .
- a)
-
If is the magnitude of the projection of the log linear transform, , of onto the subspace orthogonal to the uniform subtable vector , then the Probabilistic Salience of the conditional subtable is
(35) - b)
-
Let be the geometric mean of all of the conditional subtables in generated by assigning values to the conditioning attributes and be its log linear transform. If is the magnitude of the projection of onto the subspace orthogonal to the uniform subtable vector then the Probabilistic Salience Function of the attribute subset is
(36)
Our proposal is that Probabilistic Salience is a measure of the degree to which a contingency table is dominated by or concentrated on a small number of attribute values, a proposal which we will vindicate in section III-G below.
III-F Potential Application - Interaction Limiting
While the organisational counter measures discussed in the Introduction provide one line of defence against social engineering attacks, we propose an alternative approach which employs the techniques developed in this paper in a different role to that which has been the focus thus far. It has been demonstrated multiple times that releases of statistics from de-identified databases can be processed to re-identify individuals. Differential privacy techniques introduce controlled noise into the data such that the resultant statistics have a bounded error but the risk of re-identification is significantly reduced. Application of these techniques to contingency table data is described in [43] using a very different form of log linear model to that employed here.
By analogy, the objective of what we refer to as ‘de-personalisation’ is to inhibit the inference of attribute values in vulnerable cases while retaining the overall character of the data. De-personalisation is performed by firstly calculating the log linear expansion coefficients using (16), then setting when for some . Finally the modified data is reconstructed using (14). This limits the interaction to that between attributes, in a sense, blurring the cell contents. We refer to this as interaction limiting the data by analogy with bandlimiting in Fourier transform based signal processing.
Suppose and suppose the marginalised data is publicly released in the form of dimensional geometrical mean subtables involving subsets of attributes. Theorem III.1 ensures that the maximum interaction order of the released data is limited to . It then follows from definition III.2 b via (36) that the Probabilistic Salience Function of the released data is less than it would have been had the data not been de-personalised.
However simple interaction limiting as described is a heavy handed method of de-personalisation. A more subtle approach would be to use the Probabilistic Salience Function to detect subsets of attributes with high interactivity via their geometric means as demonstrated in the top left panel of Figure 1. The interaction structure of high interactivity subsets could then be investigated using the reduced design matrix with Theorem III.1 providing the link back to the full table. High value interaction coefficients (Section III-B1) can be selectively set to zero although some care is required to maintain the hierarchical nature of the expansion [41][42].
While this is necessarily a brief sketch of the proposal we believe that the theory described here provides a solid foundation for a detailed investigation and development of de-personalisation techniques. However this lies well beyond the scope of this paper.
III-G Probabilistic Salience as a Measure of Concentration
We make the basic assumption that the contingency table is a member of an ensemble of similar tables with the common characteristic that they represent the same population so that their respective entries sum to the population size say . More formally this is equivalent to assuming that each table is drawn from a multinomial distribution although we will not make use of this fact. Each table vector is then a point on a simplex, [45], and can be expressed in the form
where are the components of a parameter vector and the dimensional vertex vector so that
However, to avoid having to deal with issues caused by zero entries in the table, we will work with the vector derived from the affine mapping
of which results in a new simplex . This is the convex hull of its vertices where
and
(37) |
It is easily verified that again
Then the log transformed vector has the form
(38) |
with .
The above proposal can be validated by finding a vector such that the magnitude of the projection of onto the subspace orthogonal to the uniform vector is maximised subject to the constraints that the components of sum to and the components of (38) are positive. However, rather than solve this constrained optimisation problem directly we will tackle an approximation as defined below - an approximation which improves as increases.
The ensemble of vectors (38) can be represented in functional form by expressing, without loss of generality, the component as a function of the remaining components with the s as parameters. From (37),
(39) |
which, with
leads to
(40) |
Now the exponential function is convex and the sum of exponential functions is convex so the negative of the sum of exponential functions is concave [44]. Adding to the argument of the function retains its concavity so, because the function is itself concave, (40) is a concave function. The significance of this here is that the function (40) is topographically simple with no inconvenient local maxima and minima, no hidden valleys etc.
We can understand the ‘shape’ of this function by considering the transform of the vertices of the simplex which are described by vectors of the parameters having the form
Then, from (38), the transformed simplex vertices have the form,
suggesting that they be viewed as those vertices of a hypercube which are adjacent to the origin, with the hypercube having edges of length . Each vertex of the hypercube is described by a vector which has some number, , of components equal to with the remainder zero.
Furthermore if we consider the full set of parameter vectors, , having nonzero components all equal to for , we can see from (38) that the corresponding vector has the same number of zeros and that the nonzero components are
(41) |
We can then associate each of these vectors with that vertex of the hypercube, , which has corresponding nonzero components.
Again without loss of generality, select a vertex other than the origin, say , and let the vector have non zero components. Now select any two of those components, say the th and th components thereby defining a coordinate plane containing the th and th axes of the hypercube. Consider the curve formed by the intersection of the function (40) with a plane parallel to the coordinate plane defined by the th and th axes and through the point in which the th and th components are and . We can assume that and . If one of them is or if , then the following argument is simplified slightly but comes to the same result.
Because at a vertex, is either or , from (39) is either , , or . The curve is then described, from (38) and (40), by
which leads to
(42) |
The standard formula for the curvature of is
Differentiating, assuming , gives,
(43) |
Then, with and in (42)
and
At the point (from (38))
which results in , , and so that the differential of the curvature is, from (43)
(44) |
At , . Setting from (38) it is evident that as , say, decreases from , the magnitude of the numerator of decreases whereas the denominator increases so that decreases from unity. Now and the curvature can be expressed as
For , which, with (44) demonstrates that the point is the point of maximum curvature of the curve (42).
This argument holds for any of the curves defined by a pair of non-zero components from the vector , these curves representing mutually orthogonal two dimensional cross sections of the polytope through the vertex . The function (40) in the vicinity of can therefore be visualised as a rounded version of the vertex into which it fits. Note that the curvature is independent of the scaling factors of the function, and . Consequently as the size of the hypercube increases with the curvature remains constant as the function (40) expands implying that the function fits deeper into the corner formed by the vertex i.e. that becomes a better approximation to . We will refer to as pseudo vertices.
At the transformed vertices of the simplex, and can we see from (41) that the function coincides with the hypercube at those vertices, i.e. at . All of this leads to the conclusion that the function (40) is well approximated by the relative boundary of the hypercube excluding those faces containing the origin.
However, as described in Section III-B, our measure of probabilistic structure is derived from the projection of a transformed contingency table vector onto the orthogonal subspace, , of the constant vector . We will now approximate that projection by first approximating with a point on the hypercube and then projecting that point onto noting that the projection of the entire hypercube is a convex polytope . For example in three dimensions, the projection of a cube onto is a regular hexagon. To simplify the notation we will work with the unit hypercube having vertices with the understanding that the final results are to be scaled by .
The projection of the vector onto is of length and subtracting the projection from results in , the projection onto . This has components equal to and components equal to . Its magnitude squared is
(45) |
The squared magnitude of is so that the Probabilistic Salience is, from (35),
(46) |
Now consider any two vertices of the hypercube other than the all ones vertex and the all zeros vertex. Let these be and and their projections and . It is clear that there must be some component, say the th, of which is whereas the th component of is , so that the th term in the scalar product of and is negative whereas the corresponding term in is positive. Consequently
(47) |
The line is normal to a hyperplane such that and (47) shows that this hyperplane bounds a closed halfspace which contains all of the other vertex projections.
An m-dimensional hypercube is constructed from hypercubes of lower dimension so its smallest faces are points (vertices), lines (edges) and squares. Select two of the hypercube vertices which are adjacent to some vertex by taking the complements of, say, the th and th components respectively. Assume for the sake of being definite that the th component of is a one and the th component is a zero so that taking the complement of the th component gives the vertex and the complement of the th component gives the vertex . Then define a fourth vertex adjacent to both and by complementing the th component of and the th component of to give the .
These four vertices define a 2-face of the hypercube which is a two dimensional polytope, in fact a square, and, because the hypercube vertices are the extreme points of the hypercube, these four vertices are the extreme points of the face ([45], Theorem 7.3). The face is then the convex hull of these four vertices ([45],Theorem 7.2) i.e. the set of all convex combinations of these vertices taken three at a time ([45] Corollary 5.11) .
Now let , , , and be the projections of , , and . Define as the midpoint of the line segment . This is also the midpoint of the line segment implying that the points , , , and all lie on a plane which is, in fact, the projection of the corresponding hypercube face as expected from the linearity of the projection operation. Furthermore, because the projection is a linear operation, any point, , on the projection of the face in question is a convex combination of , , , and i.e, for and
Then
(48) |
Consequently the whole projected 2-face is contained in the halfspace . This argument can be repeated for all of the other 2-faces of the hypercube of which is a member establishing that is a supporting halfspace and is a supporting hyperplane of . Furthermore demonstrating that is an extreme point of i.e. a vertex. So we can conclude that the projections of the vertices of the hypercube excluding the all ones and all zeros vertices are in fact the vertices of the polytope .
In the context of an optimisation problem, is a convex function with the projected 2-face acting as the feasible set and, because there are no distinct local minima or maxima, it attains its global minimum and maximum on the relative boundary of the projected 2-face ([44]). Indeed, it follows from [44] equation (4.21) that the maximum must occur at some vertex of the projected 2-face and the minimum at another.
Let be a point on projected 2-face and be a point on a distinct projected 2-face . Furthermore let the maximum length vertices be the projections of hypercube vertices with and respectively and the minimum length vertices be the projections of hypercube vertices with and respectively. Then if , from (46), . This demonstrates that Probabilistic Salience is a strong indication of the degree to which the table entries are concentrated on a few cells vindicating our proposal above.
IV Computational Results
To investigate the type of result that the process can produce, Australian Census data was obtained from the Australian Bureau of Statistics (ABS) using their DATALAB facilities. Security restrictions imposed by the ABS limited us to 9 attributes and then only with a high level of aggregation, the aggregation scheme being designed by us. We limited the analysis to seven attributes resulting in 36 distinct cases i.e. nine attributes taken seven at a time. Comparison of the results of applying our algorithm to these 36 cases gave us a good sense of how it responded to changes in the attribute combination. One of the better cases is presented in Fig. 1 where we focus on bivariate conditional subtables.
While we acknowledge that this is a very small data set by data processing standards, we stress that the purpose is to provide a simple illustration of the operation of our process to enhance understanding rather than to explore large scale structure. The data set is bland in the sense that each attribute interacts strongly with every other attribute but we believe that this bland data set provides a sensitive test of the model’s ability to discriminate between similar levels of structure.
The top left hand panel in each figure provides a visual bar graph comparison of the Probabilistic Salience Function values, , for the geometric mean conditional subtables of each pair of attributes. In spite of the blandness of the data and the limitation to pairwise interactions, the bar graph indicates that Probabilistic Salience is able to clearly differentiate between subsets.
The top right panel displays, for the minimum and maximum values of , histograms of the Probabilistic Salience of individual conditional subtables over the set of conditioning attributes. In each case the values are reasonably well clustered around the values of the respective geometric mean although there is significant dispersion toward higher values in each case. The reason for this becomes is revealed by the remaining panels which show three dimensional histograms of the bivariate conditional subtable entries for the conditioning attribute values listed. For the minimum and maximum values these are for the subtable closest to the geometric mean (middle) and for the maximum (bottom).
Numerical checks confirm the visual impression that the maximum values are generated by subtables with (originally) many zero entries - indeed the bottom left subtable with the highest has only a single non zero entry. Although this is, somewhat ironically, associated with the minimum , it nevertheless demonstrates the effectiveness of Probabilistic Salience in detecting highly concentrated subtables.

V Summary and Discussion
What emerges from the various perspectives discussed in the Introduction is that, essentially, social engineering is the extension into the cyber realm of the long standing crime of confidence trickery. Like the confidence trickster, the social engineer exploits psycho-social vulnerabilities in deceiving and manipulating his victims. In light of the failure by the Law to eliminate traditional confidence trickery, the expectation that security technology will defeat social engineering appears to be unreasonably optimistic. The underlying goal of social engineering, in fact, is to bypass security measures such as authentication by focusing on the human element. Indeed authentication mechanisms themselves are susceptible to exploitation [9][10].
In response, defence strategies to counter social engineering attacks have concentrated on strengthening the human element by providing strong security policies backed up by ongoing security awareness and resistance training. A complementary approach is to increase the difficulty of social engineering by e.g. devising more robust SMS messages in two factor authentication [10] or by detecting exploitable personal information spread across social media and reducing its vulnerability. A third strategy, which is of particular interest to us, is developing technical aids to provide assistance in detecting and rebuffing an attack such as the Topic Blacklist system described in [11].
We focus on those areas of social engineering which involve impersonation based on pretexting where a targeted employee, such as a call centre operator or IT support person, is persuaded that the social engineer is the intended victim in order to obtain elements of the victim’s critical identifying information. This requires the social engineer to create a fictitious scenario, the pretext, based on sufficient personal information about the victim to convince the target of its veracity. Note that because the target does not know everything about the victim, the information does not have to be entirely accurate but the scenario as a whole does need to be coherent and plausible.
The underlying issue is that the social engineer is unlikely to be able to assemble enough factual information to construct a sufficiently complete profile of the victim to provide that sense of coherence. However enough gaps can be filled in by inferring missing information from general background data that the social engineer can acquire through thorough research. We have argued in Section II-C that the form of inference is analogical taking place in an explanatory context. However the psychological evidence suggests that this rests upon a sense of statistical relevance associated with the background data meaning, roughly, that some things appear intuitively more likely than others. We argue that this sense of statistical relevance is grounded in a perhaps informal knowledge of statistical data.
Our central point is that some elements, i.e. attribute values, of a victim’s profile are much more likely to have been inferred than others and these attributes need to be identified as unreliable in the sense that they do not add significantly to the profile’s veracity and so should be discounted. Further, an estimate of reliability can be obtained by analysing the formal statistical data relevant to the problem based on identifying subsets of the profile which have high probability given the other elements of the profile.
We assume that the data is in the form of a multivariate contingency table (Section III-A) so that the mathematical problem is that of analysing conditional association in the table using a log linear technique [46]. In this case we use a very specific form of log linear model based on an orthogonal transformation of the logarithms of the table entries into an ‘interaction space’ which reveals the statistical interactions between the data elements. The magnitude of the transformed table vector is the basis of our definitions of Probabilistic Salience and the Probabilistic Salience Function (Section III-E).
Estimating the reliability of profile attributes is then the problem of identifying subsets of profile attributes having values which occur frequently in the population. Consequently these values can be inferred probabilistically rendering the attributes unreliable. This becomes the problem of identifying conditional subtables of the contingency table in which the subtable contents are concentrated in a small number of cells.
Because there is no simple topological relationship between these cells in general, conventional measures of centrality such as variance are not relevant leading us to define our Probabilistic Salience measures. This, we show, does indeed provide an indication of concentration in this sense (Section III-G). Our computational results, while based on a limited data set, do support these theoretical results.
The Probabilistic Salience Function, then, is an indicator of the potential for exploitation of informal statistical information by social engineers in impersonating an individual. This can be applied to various subsets of attributes to detect vulnerabilities and can be used to warn potential targets such as operators in call centres that particular attributes are unreliable indicators of identity.
The underlying theme of the literature review in Section 1.1 is that the most effective method of defending against social engineering is to maximise the support provided to the humans who face these attacks. We envisage that the probabilistic salience measures can be used, not only as a stand alone warning, but also as a component of a more comprehensive warning system. This would include other forms of social engineering detector such as the Topic Blacklist [11] or indicators extracted from the knowledge graph of [12] as discussed there. Indeed our proposed probabilistic salience indicator could conceivably be used to enhance the automatic social engineering vulnerability scanner described in [14].
The principal objective of the probabilistic salience indicator is to alert the target to the presence, in the information presented by the attacker, of information that could have been inferred rather than be reliable factual information. In the context of the RISP model of information processing, this should increase the target’s information insufficiency level to the point where the target is driven to process the attacker’s claims systematically rather than heuristically, significantly increasing the chances of detecting the flaws and inconsistencies that would indicate the claims are false.
It is conceivable that RISP concepts could be incorporated into security awareness training [8], [6], [11] [9] in order to enhance a target’s response to a probabilistic salience alert by increasing the target’s information sufficiency threshold [23]. This would increase the target’s information insufficiency level thereby making the target more sensitive to the probabilistic saliency alert as well as other forms of alert. Indeed because we have focussed on only one aspect of RISP, it seems possible that other aspects could also be incorporated into security awareness training suggesting that this could be a fruitful area of future research.
Finally we recognise that our Probabilistic Salience measures are a double edged sword in the sense that social engineers could use the same process to find inferable attribute values to exploit. This leads to the suggestion that contingency table data of the type used here could be ‘sanitised’ similarly to data bases modified using differential privacy techniques. In this case we speculate in Section III-F that the sensitivity of contingency table data to inference could be reduced by limiting the magnitude of the transformed table vector in ‘interaction space’ thus reducing the Probabilistic Salience measures without significantly distorting the data.
VI Acknowledgment
The financial support for GS as well as the provision of meeting facilities by CSIRO Data 61 is gratefully acknowledged. We also appreciate the comments of the anonymous reviewers which resulted in a substantial improvement to the initial submission.
References
- [1] Z. Wang , L. Sun, and H. Zhu, “Defining Social Engineering in Cybersecurity”, IEEE Access, Vol. 8, DOI: 10.1109/ACCESS.2020.2992807, (2020).
- [2] Z. Alkhalil, C. Hewage, L. Nawaf, I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy”, Frontiers in Computer Science, Vol. 3, 2021. https://www.frontiersin.org/articles/10.3389/fcomp.2021.563060, DOI=10.3389/fcomp.2021.563060, ISSN=2624-9898.
- [3] Z. Wang, H. Zhu, L. Sun, “Social Engineering in Cybersecurity: Effect Mechanisms, Human Vulnerabilities and Attack Methods”, IEEE Access, Vol. 9, 2021, DOI: 10.1109/ACCESS.2021.3051633.
- [4] I. Ghafir, V. Prenosil, A. Alhejailan, and M. Hammoudeh, “Social Engineering Attack Strategies and Defence Approaches,” in Proc. IEEE 4th Int. Conf. Future Internet Things Cloud (FiCloud), Aug. 2016, pp. 145–149.
- [5] R. E. Indrajit, “Social Engineering Framework: Understanding the Deception Approach to Human Element of Security,” Int. J. Comput. Sci. Issues, vol. 14, no. 2, pp. 8–16, 2017.
- [6] W. Fan, K. Lwakatare, and R. Rong, “Social Engineering: I-E Based Model of Human Weakness to Investigate Attack and Defense,” SCIREA J. Inf. Sci. Syst. Sci., vol. 1, no. 2, pp. 34–57, 2016.
- [7] A. D. B. Vizzotto, A. M. de Oliveira, H. Elkis, Q. Cordeiro, and P. C. Buchain, “Psychosocial Characteristics”, in: M.D. Gellman, J.R. Turner, (eds) Encyclopedia of Behavioral Medicine. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1005-9_918, 2013.
- [8] J.Saleem and M. Hammoudeh, “Defense Methods Against Social Engineering Attacks”, in: K. Daimi (ed), Computer and Network Security Essentials, DOI:10.1007/978-3-319-58424-9, 2017.
- [9] R. A. Grimes,“Social Engineering Attacks”, in; Hacking Multifactor Authentication, First Edition, John Wiley and Sons, 2021.
- [10] H. Siadati, T. Nguyen, P. Gupta, M. Jakobsson, and N. Memon, ”Mind your SMSes: Mitigating Social Engineering in Second Factor Authentication”, Computers and Security, 68, September 2016.
- [11] R. Bhakta and I. G. Harris, “Semantic Analysis of Dialogs to Detect Social Engineering Attacks”, Proc. IEEE 9th International Conference on Semantic Computing (IEEE ICSC), Feb 7-9 20I5, pp. 424-427.
- [12] Z. Wang, H. Zhu, P. Liu, and L. Sun, “Social Engineering in Cybersecurity: a Domain Ontology and Knowledge Graph Application Examples”, Cybersecurity, 4:31, https://doi.org/10.1186/s42400-021-00094-6, 2021.
- [13] M. Alrubaian et. al., “Credibility in Online Social Networks: A Survey”, IEEE Access, Vol. 7, January 2019, DOI 10.1109/ACCESS.2018.2886314.
- [14] M. Edwards, R. Larson, B. Green, A. Rashid, A. Baron, “Panning for gold: Automatically analysing online social engineering attack surfaces”, Computers & Security 69, 18–34 (2017)
- [15] C. N. Wathen, J. Burkell, “Believe It or Not: Factors Influencing Credibility on the Web”, J. American Society For Information Science And Technology, Vol. 53, No. 2, 2002, pp134-144.
- [16] C. Farkas and S. Jajodia, ”The Inference Problem: A Survey”, SIGkDD Explorations, Vol. 4, Issue 2, 2002
- [17] M. Guarnieri, S. Marinovic, D. Basin, ”Securing Databases from Probabilistic Inference”, 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, 2017, pp. 343-359, DOI: 10.1109/CSF.2017.30.
- [18] K. Kenthapadi, N. Mishra, K. Nissim, ”Simulatable Auditing”, ACM PODS 2005 June 13-15, 2005, Baltimore, Maryland.
- [19] A. Evfimievski, R. Fagin, and D. Woodruff, “Epistemic privacy”, J. ACM 58, 1, Article 2, (December 2010), 45 pages. DOI: 10.1145/1870103.1870105, http://doi.acm.org/10.1145/1870103.1870105.
- [20] A. Algarni, Y. Xu, & T. Chan, “Social Engineering in Social Networking Sites: the Art of Impersonation”, In P. Hung, E. Ferrari, & R. Kaliappa, (Eds.) Proc. 11th IEEE International Conference on Services Computing, pp. 797-804, (2014).
- [21] D. Pehlivanoglu, et. al., “The role of analytical reasoning and source credibility on the evaluation of real and fake full-length news articles”, Cogn. Research, vol. 6, No. 24, 2021, https://doi.org/10.1186/s41235-021-00292-3.
- [22] W. Zang and S.A. Watts, “Capitalizing on Content: Information Adoption in Two Online Communities”, JAIS, Vol. 9, Issue 2, February 2008, pp. 73-94.
- [23] X. (Robert) Luo, W. Zhang, S. Burd, A. Seazzu, “Investigating phishing victimization with the Heuristice-Systematic Model: A theoretical framework and an exploration”, Computers & Security, Vol. 38, 2013, 28-38.
- [24] Z. J. Yang, M. A. Ariel, & T. H. Feeley, “Risk Information Seeking and Processing Model: A Meta-Analysis”, Journal of Communication, Vol. 64, 2014, pp. 20–41.
- [25] R.J. Griffin, S. Dunwoody, K. Neuwirth, “Proposed model of the relationship of risk information seeking and processing to the development of preventive behaviors”, Environ Res., Feb. 1999;80(2 Pt 2):S230-S245. doi: 10.1006/enrs.1998.3940. PMID: 10092438.
- [26] Zamal, Faiyaz Al et al.“Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors”, Proceedings of the International AAAI Conference on Web and Social Media (2012).
- [27] D. Mulders, C. de Bodt, J. Bjelland, A. ’Sandy’ Pentland, M. Verleysen and Y-A.de Montjoye. “Inference of node attributes from social network assortativity.” Neural Computing and Applications 32 (2019): pp. 18023 - 18043.
- [28] J. Jia, B. Wang, L. Zhang and N. Z. Gong. “AttriInfer: Inferring User Attributes in Online Social Networks Using Markov Random Fields.” Proceedings of the 26th International Conference on World Wide Web (2017).
- [29] D. Koller, N. Friedman, L. Getoor, B. Taskar, ”Graphical models in a nutshell”, In L. Getoor and B. Taskar (Eds.), Introduction to Statistical Relational Learning, (pp. 13– 55). Cambridge, MA: MIT Press. (2007)
- [30] R. Clarke, http://www.rogerclarke.com/ID/IdModel-1002.html, (2010)
- [31] T. Lombrozo, “ Explanation and Abductive Inference”, In The Oxford Handbook of Thinking and Reasoning (Holyoak, K.J. and Morrison, R.G., eds), pp. 260–276, Oxford University Press, (2012)
- [32] T. Lombrozo,“Explanatory Preferences Shape Learning and Inference”, Trends in Cognitive Sciences, Vol. 20, No. 10, October 2016, pp 748-759.
- [33] J. E. Hummel, J. Licato and S. Bringsjord, “Analogy, Explanation, and Proof”, Frontiers in Human Neuroscience, Volume 8, Article 867, November 2014.
- [34] Wesley C. Salmon, “Statistical Explanation”, in Salmon (ed.), Statistical Explanation and Statistical Relevance, Pittsburgh, PA: University of Pittsburgh Press, (1971), pp29-87.
- [35] Matteo Colombo, Marie Postma and Jan Sprenger, “Explanatory Judgment, Probability, and Abductive Inference”, In A. Papafragou, D. Grodner, D. Mirman and J. C. Trueswell (eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society, Cognitive Science Society. Austin, TX: pp. 432-437 (2016)
- [36] J. N. Schupbach and J. Sprenger, “The Logic of Explanatory Power”, Philosophy of Science, 78 (January 2011) pp. 105–127. 0031-8248/2011/7801-0006
- [37] Jonah N. Schupbach, “Comparing Probabilistic Measures of Explanatory Power”, Philosophy of Science, Vol. 78, No. 5, December 2011
- [38] J. Pearl, “Causal Inference in Statistics: An Overview”, Statistics Reviews, 8, 96 (2009)
- [39] C. Dahinden, G. Parmigiani, M. C. Emerick, P. Buhlmann, BMC Bioinformatics, 8, 476 (2007) (Additional File 1 Section 1)
- [40] A. Dobra, S. E. Fienberg, A. Rinaldo, A. B. Slavkovic, Y. Zhou, “Algebraic statistics and contingency table problems: Log-linear models, likelihood estimation, and disclosure limitation”, In: Putinar, M., Sullivant, S. (eds.) Emerging Applications of Algebraic Geometry. IMA Series in Applied Mathematics, pp. 63–88. Springer, Heidelberg (2008)
- [41] J. Darroch, S. Lauritzen, and T. Speed, “Markov fields and log-linear interaction models for contingency tables”. Annals of Statistics 8, 522–539.(1980).
- [42] C. Dahinden, M. Kalisch, P. Buhlmann, ”Decomposition and Model Selection for Large Contingency Tables”, Biometrical Journal, 52, DOI: 10.1002/bimj.200900083, pp. 233-252 (2010)
- [43] X. Yang, S. E. Fienberg, and A. Rinaldo. “Differential Privacy for Protecting Multi-Dimensional Contingency Table Data: Extensions and Application”. Journal of Privacy and Confidentiality, Vol. 4 (1). https://doi.org/10.29012/jpc.v4i1.613, 2012.
- [44] S. Boyd L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, (2004).
- [45] A. Brondsted, An Introduction to Convex Polytopes. (Graduate texts in mathematics;90), Springer-Verlag, NewYork, (1983).
- [46] W. P. Bergsma and T. Rudas, “Modeling Conditional and Marginal Association in Contingency Tables”, Annales de la Faculté des Sciences de Toulouse 6e série, tome 11, no 4 (2002), p. 455-468.