This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.


thanks: marco.patriarca@kbfi.ee

A Bayesian Approach to the Naming Game Model

Abstract

Abstract. We present a novel Bayesian approach to semiotic dynamics, which is a cognitive analogue of the naming game model restricted to two conventions. The one-shot learning that characterizes the agent dynamics in the basic naming game is replaced by a word-learning process, in which agents learn a new word by generalizing from the evidence garnered through pairwise-interactions with other agents. The principle underlying the model is that agents — like humans — can learn from a few positive examples and that such a process is modeled in a Bayesian probabilistic framework. We show that the model presents some analogies but also crucial differences with respect to the dynamics of the basic two-convention naming game model. The model introduced aims at providing a starting point for the construction of a general framework for studying the combined effects of cognitive and social dynamics.
  
Keywords: Complex Systems, Language Dynamics, Bayesian Statistics, Cognitive Models, Consensus Dynamics, Semiotic Dynamics, naming game, Individual-Based Models

I Introduction

A basic question in complexity theory is how the interactions between the units of the system lead to the emergence of ordered states from initially disordered configurations Castellano-2009a ; Baronchelli-2018a . This general question concerns phenomena ranging from phase transitions in condensed matter systems and self-organization in living matter to the appearance of norm conventions and cultural paradigms in social systems. Various models were used in order to study social interactions and cooperation, e.g. models of condensed matter systems (such as spin systems), statistical mechanical models (e.g. based on the master equation), ecological competition models Castellano-2009a , many-agents game-theoretical models Xia-2017a ; Xia-2018a ; Zhang-2017a . Opinion dynamics and cultural spreading models represent suitable theoretical frameworks for a quantitative description of the emergence of social consensus Baronchelli-2018a .

In this respect, the emergence of human language remains a challenging, multi-fold question, related in turn to biological, ecological, social, logical, and cognitive aspects Mufwene-2001a ; Lass1997 ; Berruto-2004a ; Edelman-2007a ; Tenenbaum-1999 . Language dynamics Wichmann-2008b ; Wichmann-2008c has provided models describing phenomena of language competition and change that focus on the mutual interactions of linguistic traits (sounds, phonemes, grammatical rules, or languages understood as fixed entities) under the influence of ecological and social factors, modeling such interactions in analogy to biological competition and evolution.

However, the basic learning process of a word has a complex dynamics due to its cognitive dimension. In fact, learning a word means to learn a concept (understood as a pointer to a subset of objects, see Refs. Tenenbaum-1999 ; Tenenbaum-2000a ; Xu-2007a ) and a linguistic label —for example the name of the object— used for communicating the concept. The double concept\leftrightarrowname nature of words has been studied through semiotic dynamics models, such as the models of Hurford Hurford-1989a and Nowak Nowak-1999a (see also Nowak-2000a ; Trapa-2000a ) and the naming game (NG) model Baronchelli-2006c ; guanrong2019 .

In the basic version of the model of Nowak Nowak-1999a , the language spoken by each agent ii (i=1,,Ni=1,\dots,N) is defined by two personal matrices, representing the links of a bipartite network joining QQ names and RR concepts: (1) an active matrix 𝒰(i){\mathcal{U}}^{(i)} representing the concept\,\rightarrow\,name links, where the element 𝒰q,r(i){\mathcal{U}}^{(i)}_{q,r} (q(1,Q),r(1,R)q\in(1,Q),~r\in(1,R)) gives the probability that agent ii will utter the qqth name to communicate the rrth concept; (2) the passive matrix H(i)H^{(i)}, representing the name\,\rightarrow\,concept links, in which the element q,r(i){\mathcal{H}}^{(i)}_{q,r} represents the probability that an agent interprets the qqth name as referring to the rrth concept. In the models of Hurford and in the model of Nowak, the languages of each individual evolve with time according to a game-theoretical dynamics, with agents gaining a reproductive advantage if their matrices have a higher communication efficiency. These studies have achieved interesting results, such as the emergence of non-ambiguous one-to-one links between objects and sounds, and explain why homonyms are more frequent than synonyms Hurford-1989a ; Nowak-1999a ; Nowak-2000a ; Trapa-2000a .

In the NG model Baronchelli-2006c ; guanrong2019 there is only one concept (R=1R=1) that can be linked to a set of Q>1Q>1 different names. The model can be reformulated through the agents’ lists i{\mathcal{L}}_{i} of the name\leftrightarrowconcept connections known to each agent ii. In the case of two-conventions models, where the conventions are the names AA and BB, the list of the iith agent can be i={\mathcal{L}}_{i}=\emptyset (no connection), i=(A){\mathcal{L}}_{i}=(A) or (B)(B) (one name is known), or i=(A,B){\mathcal{L}}_{i}=(A,B) (both name\leftrightarrowconcept connections are known).

Extending semiotic dynamics models is not trivial and already two-opinion variants of the NG model, taking into account committed groups, show a remarkable phase diagram Xie-2012a ; and trying to describe actual cognitive effects requires entirely new features Fan-2018a . This paper presents a minimal model to study the interplay of the cognitive and social dynamical dimensions, assuming for simplicity the two-conventions NG model as a semiotic framework Baronchelli-2016a ; guanrong2019 and making a cognitive generalization within the experimentally validated Bayesian framework of Tenenbaum-1999 (see also Refs. Tenenbaum2001 ; Tenenbaum-2000a ; Griffiths2006 ; Xu-2007a ; Perfors-2011a ; Lake2015 ). In that framework, an individual can learn a concept from a small number of examples, a most remarkable feature of human learning Tenenbaum-1999 ; Tenenbaum-1999b ; Tenenbaum-2011a , to be contrasted with machine learning algorithms, which require a large amount of examples for generalizing successfully barber2012 ; Murphy-2012a ; Evgeniou-2000a .

The paper is organized as follows. The new model is introduced in Sec. II. In Sec. III, we present and discuss the features of the semiotic dynamics emerging from the numerical simulations and quantitatively compare them with those of the two-conventions NG model. Future directions in the study of the interplay of the cognitive and the social dynamics are outlined in Sec. V.

II A Bayesian learning approach to the naming game

II.1 The two-conventions naming game model

Before introducing the new model, we recall the basic 2-conventions NG model Castello2009 , in which there is a single concept CC, corresponding to an external object, and two possible names (synonyms) AA and BB for referring to CC. Thus, the possibility of homonymy is excluded Baronchelli-2016a . Each agent ii is equipped with the list i{\mathcal{L}}_{i} of the names known to the agent. We assume that at t=0t=0 each agent ii knows either AA or BB and has therefore a list i=(A){\mathcal{L}}_{i}=(A) or i=(B){\mathcal{L}}_{i}=(B), respectively.

During a pair-wise interaction, an agent can act as a speaker, when conveying a word to another agent, or as a hearer, when receiving a word from a speaker. One can think of an agent conveying a word as uttering a name, e.g. AA, while pointing at an external object, corresponding to concept CC: thus, the hearer records not only the name AA but also the name\leftrightarrowconcept association between AA and CC. At a later time t>0t>0, the list i{\mathcal{L}}_{i} of the iith agent can contain one or both names, i.e., i=(A){\mathcal{L}}_{i}=(A), (B)(B), or (A,B)(A,B).

The system evolves according to the following update rules Baronchelli-2016a :

  1. 1.

    Two agents ii and jj, the speaker and the hearer, respectively, are randomly selected.

  2. 2.

    The speaker ii randomly extracts a name (here either AA or BB) from the list i{\mathcal{L}}_{i} and conveys it to the hearer jj. Depending on the state of agent jj, the communication is usually described as:

    1. (a)

      Success: the conveyed name is present also in the hearer’s list j{\mathcal{L}}_{j}, i.e. also agent jj knows its meaning; then the two agents erase the other name from their lists, if present.

    2. (b)

      Failure: the conveyed name is not present in the hearer’s list j{\mathcal{L}}_{j}; then agent jj records and adds it to the list j{\mathcal{L}}_{j}.

  3. 3.

    Time is increased of one step, tt+1t\to t+1, and the simulation is reiterated from the first point above.

An example of unsuccessful and one of successful communication are schematized in the left panel (A) of Fig. 1, see Ref. Baronchelli-2006c for more examples. Despite its simple structure, the basic NG model describes the emergence of consensus about which name to use, which is reached for any (disordered) initial configuration baronchelli2007 .

Refer to caption
Figure 1: Comparison of the basic and Bayesian NG model.
Panel (A): basic 2-conventions NG model. In a communication failure (upper figure), the name conveyed, BB in the example, is not present in the list of the hearer, who adds it to the list. In a communication success (lower figure), the word BB is already present in the hearer’s list and both agents erase AA from their lists.
Panel (B): Bayesian NG model; in order to convey an example “+” to the hearer in association with name AA, the speaker must have already generalized concept CC in association with AA, represented here by the label [𝐀][\mathbf{A}]. In a communication failure (upper figure), the hearer computes the Bayes probability pp and the result is a p<1/2p<1/2; then the only outcome is that the hearer records the example (reinforcement). In the Bayesian NG, there are two ways, in which the communication can be successful. The first way (lower figure) is when p1/2p\geq 1/2: the hearer generalizes CC in association with AA and attaches the label [A][\textbf{A}] to the inventory. The second way (not shown) is the the agreement process, analogous to that of the basic NG, when both agents had already generalized concept CC in association with name AA and remove label [𝐁][\mathbf{B}] from the inventory [+++]B[+++\dots]_{B} if present. See text for further details.

II.2 Toward a Bayesian naming game model

From a cognitive perspective, a “communication failure” of the NG model can be understood as a learning process, in which the hearer learns a new word. It is a “one-shot learning process”, because it takes place instantaneously (in a single time step) and independently of the the agent’s history (i.e. of the previous knowledge of the agent). However, modeling an actual learning process should take into account the agents’ experience, based on the previous observations (the data already acquired) as well as the uncertain/incomplete character naturally accompanying any learning process.

Here, the one-shot learning is replaced by a process that can describe basic but realistic situations, such as the prototypical “linguistic games” Wittgenstein1953 . For example, consider a “lecture game”, in which a lecturer (speaker) utters the name AA of an object and shows a real example “+” of the object to a student (hearer), repeating this process a few times. Then, the teacher can e.g. (a) show another example and ask the student to name the object; (b) utter the same name and ask the student to show an example of that object; or (c) do both things (uttering the name and showing the object) and ask the student whether the name\leftrightarrowobject correspondence is correct. The student will not be able to answer correctly if not after having received some examples, enabling the student to generalize the concept CC corresponding to the object in association to name AA. To model these and similar learning processes, we need a criterion enabling the hearer to assess the degree of equivalence between the new example and a the examples recorded previously.

The starting point for the replacement of the one-shot learning is Bayes’ theorem. According to Bayes’ theorem, the posterior probability p(h|X)p\left(h|X\right) that the generic hypothesis hh is the true hypothesis, after observing a new evidence XX, reads Harney-2003a ; Jeffreys1961 ,

p(h|X)=p(X|h)p(h)p(X).p\left(h|X\right)=\frac{p\left(X|h\right)p\left(h\right)}{p\left(X\right)}\,. (1)

Here, the prior probability p(h)p\left(h\right) gives the probability of occurrence of the hypothesis hh before observing the data and p(X|h)p\left(X|h\right) gives the probability of observing XX if hh is given. Finally, p(X)p\left(X\right) gives the normalization constraint; in the applications it can be evaluated as p(X)=hp(X|h)p(h)p\left(X\right)=\sum_{h^{\prime}}p\left(X|h^{\prime}\right)p\left(h^{\prime}\right), where {h}H\{h^{\prime}\}\in H represents the set of hypotheses, within the hypothesis space HH.

The next step is to find a way to compute explicitly the posterior probability p(h|X)p\left(h|X\right), through a representation of the concepts and their relative examples in a suitable hypothesis space HH of the possible extensions of a given concept CC, constituted by the mutually exclusive and exhaustive hypotheses hh. Following the experimentally verified Bayesian statistical framework of Refs. Tenenbaum-1999 ; Tenenbaum-1999b , we adopt the paradigmatic representation of a concept as a geometrical shape. For example, the concept of “healthy level” of an individual in terms of the levels of cholesterol xx and insulin yy, defined by the ranges xaxxbx_{a}\leq x\leq x_{b} and yayyby_{a}\leq y\leq y_{b}, where xix_{i} and yiy_{i} (i=a,bi=a,b) are suitable values, represents a rectangle in the Euclidean xx-yy plane 2\mathbb{R}^{2}. Examples of healthy levels of specific individuals 1,2,1,2,\dots correspond to points (x1,y1),(x2,y2),2(x_{1},y_{1}),(x_{2},y_{2}),\dots\in\mathbb{R}^{2}. In the following, we assume that a hypothesis hh is represented by a rectangular region in 2\mathbb{R}^{2}. Figure 2 shows four positive examples, denoted by the symbol “+”, associated to four different points of the plane, consistent with (i.e. contained in) three different hypotheses, shown as rectangles.

Refer to caption
Figure 2: Three different hypotheses represented as axis-parallel rectangles in 2\mathbb{R}^{2} and four positive examples “+” that are all consistent with the three hypotheses. The set of all the rectangles that can be drawn in the plane constitutes the hypothesis space HH.

The problem of learning a word is now recast into an equivalent problem, consisting in acquiring the ability to infer whether a new example zz recorded, corresponding to a new point “+” in 2\mathbb{R}^{2}, corresponds to the concept CC, after having seen a small set of positive examples “+” of CC. More precisely, let X={(x1,y1),,(xn,yn)}X=\left\{\left(x_{1},y_{1}\right),\dots,\left(x_{n},y_{n}\right)\right\} be a sequence of nn examples of the true concept CC, already observed by the hearer, and z=(z1,z2)z=(z_{1},z_{2}) the new example. The learner does not know the true concept CC, i.e. the exact shape of the rectangle associated to CC, but can compute the generalization function p(zC|X)p\left(z\in C|X\right) by integrating the predictions of all hypotheses hh, weighted by their posterior probabilities p(h|X)p\left(h|X\right):

p(zC|X)=hHp(zC|h)p(h|X)𝑑h.p\left(z\in C|X\right)=\int_{h\in H}p\left(z\in C|h\right)p\left(h|X\right)dh\,. (2)

Clearly, p(zC|X)=1p\left(z\in C|X\right)=1 if zhz\in h and 0 otherwise. By means of the Bayes’ theorem (1), one can obtain the right Bayesian probability for the problem at hand. A successful generalization is then defined quantitatively by introducing a threshold pp^{*}, representing an acceptance probability: an agent will generalize if the Bayesian probability p(zC|X)pp\left(z\in C|X\right)\geq p^{*}. The value p=1/2p^{*}=1/2 is assumed, as in Ref. Tenenbaum-1999b .

We assume that an Erlang prior characterizes the agents’ background knowledge. For a rectangle in 2\mathbb{R}^{2} defined by the tuple (l1,l2,s1,s2)\left(l_{1},l_{2},s_{1},s_{2}\right), where l1,l2l_{1},l_{2} are the Cartesian coordinates of its lower-left corner and sis_{i} its sides along dimension i=1,2i=1,2, the Erlang prior density is Tenenbaum-1999 ; Tenenbaum-1999b

pE=s1s2exp{(s1σ1+s2σ2)},p_{E}=s_{1}s_{2}\exp\left\{-\left(\frac{s_{1}}{\sigma_{1}}+\frac{s_{2}}{\sigma_{2}}\right)\right\}\,, (3)

where the parameters σi\sigma_{i} represent the actual sizes of the concept, i.e. they are the sides of the concept rectangle CC along dimension ii. The choice of a specific informative prior, such as the Erlang prior, is well motivated by the fact that in the real world individuals have always some prior knowledge or expectation. In fact, a Bayesian learning framework with an Erlang prior of the form (3) well describes experimental observations of learning processes of human beings Tenenbaum-1999b . The final expression used below for computing the Bayesian probability pp that, given the set of previous examples XX, the new example zz falls in the same category of concept CC, reads Tenenbaum-1999b

p(zC|X)exp{(d~1σ1+d~2σ2)}[(1+d~1r1)(1+d~2r2)]n2.p\left(z\in C|X\right)\approx\frac{\exp\left\{-\left(\frac{\tilde{d}_{1}}{\sigma_{1}}+\frac{\tilde{d}_{2}}{\sigma_{2}}\right)\right\}}{\left[\left(1+\frac{\tilde{d}_{1}}{r_{1}}\right)\left(1+\frac{\tilde{d}_{2}}{r_{2}}\right)\right]^{n-2}}\,. (4)

Here rir_{i} (i=1,2i=1,2) is an estimate of the extension of the set of examples along direction ii, given by the maximum mutual distance along dimension ii between the examples of XX; d~i\tilde{d}_{i} measures an effective distance between the new example zz and the previously recorded examples, i.e., d~i=0\tilde{d}_{i}=0 if ziz_{i} falls inside the value range of the examples of XX along dimension ii, otherwise d~i\tilde{d}_{i} is the distance between zz and the nearest example in XX along the dimension ii. Equation (4) is actually a “quick-and-dirty” approximation that is reasonably good, except for n3n\leq 3 and riσ/10r_{i}\leq\sigma/10, estimating the actual generalization function within a 10%10\% error, see Refs. Tenenbaum-1999 ; Tenenbaum-1999b for details. Despite these approximations, Eq. (4) will ensure that our computational model, described in the next section, retains the main features of the Bayesian learning framework. It is to be noticed that for the validity of the Bayesian framework, it is crucial that the examples are drawn randomly from the concept (strong sampling assumption), i.e. they are extracted from a probability density that is uniform in the rectangle corresponding to the true concept Tenenbaum-1999b . This definition of generalization is now applied below to word-learning.

II.3 The Bayesian word-learning model

Based on the Bayesian learning framework discussed above, in this section we introduce a minimal Bayesian individual-based model of word-learning. For the sake of clarity, in analogy with the basic NG model, we study the emergence of consensus in the simple situation, in which two names AA and BB can be used for referring to the same concept CC in pair-wise interactions among NN agents.

At variance with the NG model, here in each basic pair-wise interaction an agent ii, acting as a speaker, conveys an example “+” of concept CC, in association with either name AA or BB, to another agent jj, who acts as hearer (i,j=1,,Ni,j=1,\dots,N). In order to be able to communicate concept CC uttering a name, e.g. name AA, the speaker ii must have already generalized concept CC in association with name AA. This is signalled by the presence of name AA in the list i{\mathcal{L}}_{i}. On the other hand, the hearer jj always records the example received in the respective inventory, in the example the inventory [+++]A[+++\dots]_{A}.

The state of a generic agent ii at time tt is defined by

  • the list i{\mathcal{L}}_{i}, to which a name is added whenever agent ii generalizes concept CC in association with that name; agent ii can use any name in i{\mathcal{L}}_{i} to communicate CC;

  • two inventories [+++]A[+++\dots]_{A} and [+++]B[+++\dots]_{B}, containing the examples “+” of concept CC received from the other agents in association with name AA and BB, respectively.

It is assumed that initially each agent knows one word: a fraction nA(0)n_{A}(0) of the agents know concept CC in association with name AA and the remaining fraction nB(0)=1nA(0)n_{B}(0)=1-n_{A}(0) in association with name BB — no agent knows both words, nAB(0)=0n_{AB}(0)=0. We will examine three different initial conditions:

symmetric initial conditions (SIC): nA(0)\displaystyle n_{A}(0) =nB(0)=0.5\displaystyle=n_{B}(0)=0.5
asymmetric initial conditions (AIC): nA(0)\displaystyle n_{A}(0) =0.3,nB(0)=0.7\displaystyle=0.3,~~~n_{B}(0)=0.7
reversed case of AIC (AICr): nA(0)\displaystyle n_{A}(0) =0.7,nB(0)=0.3\displaystyle=0.7,~~~n_{B}(0)=0.3

Initially, each agent ii, within the fraction nA(0)n_{A}(0) of agents that know name AA, is assigned nex,A=4n_{ex,A}=4 examples “+” of concept CC in association with name AA, but no examples in association with the other name BB, so that agent ii has an AA-inventory [++++]A[++++]_{A} and an empty BB-inventory []B[\cdot]_{B}. The complementary situation holds for the other agents that know only name BB, who initially receive nex,B=4n_{ex,B}=4 examples of concept CC in association with name BB but none in association with AA. This choice, somehow arbitrary, is dictated by the condition that Eq. (4) becomes a good approximation for n>3n>3 Tenenbaum-1999 .

Examples are points uniformly generated inside the fixed rectangle corresponding to the true concept CC, here assumed to be a rectangle with lower left corner coordinates (0,0)(0,0) and sizes σ1=3\sigma_{1}=3 and σ2=1\sigma_{2}=1 along the xx and yy axis, respectively. Results are independent of the assumed numerical values; in particular, no appreciable variation in the convergence times tconvt_{conv} is observed as the rectangle area is varied, which is consistent with the strong sampling assumption, on which the Bayesian learning framework rests; see Ref. Tenenbaum-1999 and Sec. III.

Furthermore, we introduce an element of asymmetry between the names AA and BB, related to the word-learning process: different minimum numbers of examples nex,A=5n^{\ast}_{ex,A}=5 and nex,B=6n^{\ast}_{ex,B}=6 will be used, which are needed by agents to generalize concept CC in association with AA and BB, respectively. This is equivalent to assume that concept CC is slightly easier to learn in association with name AA than BB. Such an asymmetry plays a relevant role in the model dynamics in differentiating the Bayesian generalization functions pAp_{A} and pBp_{B} from each other, see Sec. IV.

The dynamics of the model can be summarized by the following update rules:

  1. 1.

    A pair of agents ii and jj, acting as speaker and hearer, respectively, are randomly chosen among the agents.

  2. 2.

    The speaker selects randomly: (a) a name from the list i{\mathcal{L}}_{i} (or selects the name present if i{\mathcal{L}}_{i} contains a single name), for example AA (analogous steps follow if the word BB is selected); (b) an example zz among those contained in the corresponding inventory [+++]A[+++\dots]_{A} — ;
    then the speaker ii conveys the example extracted zz in association with (e.g. uttering) the name selected AA to the hearer jj.

  3. 3.

    The hearer adds the new example zz (in association with AA) to the inventory [+++]A[+++\dots]_{A}. This reinforcement process of the hearer’s knowledge always takes place.

  4. 4.

    Instead, the next step depends on the state of the hearer:

    1. (a)

      Generalization. If the selected name, AA in the example, is not present in the hearer’s list j{\mathcal{L}}_{j}, then the hearer jj computes the relative Bayesian probability pA=p(zC|XA)p_{A}=p(z\in C|X_{A}) that the new example zz falls in the same category of concept CC, using the examples previously recorded in association with AA, i.e. from the set of examples XA[+++]AX_{A}\in[+++\dots]_{A}. If pA1/2p_{A}\geq 1/2, the hearer has managed to generalize concept CC and connects the inventory [+++]A[+++\dots]_{A} to name AA; this is done by adding name AA to the list j{\mathcal{L}}_{j}. Starting from this moment, agent jj can communicate concept CC to other agents by conveying an example taken from the inventory [+++]A[+++\dots]_{A} while uttering the name AA. If pA<1/2p_{A}<1/2, the hearer has not managed to generalize the concept and nothing more happens (the reinforcement of the previous point is the only event taking place).

    2. (b)

      Agreement. The name uttered by the speaker, AA in the example, is present in the hearer’s list j{\mathcal{L}}_{j}, meaning that that agent jj has already generalized concept CC in association with name AA and has connected the corresponding inventory [+++]A[+++\dots]_{A} to AA. In this case, the hearer and the speaker proceed to make an agreement — analogous to that of the NG model, leaving AA in their lists i{\mathcal{L}}_{i} and j{\mathcal{L}}_{j} and removing BB is present. No examples contained in any inventory are removed.

  5. 5.

    Time is updated, tt+1t\to t+1, and the simulation is reiterated from the first point above.

Two examples of Bayesian word-learning process, a successful and an unsuccessful one, are illustrated in the cartoon in the right panel (B) of Fig. 1. Table 1 lists the possible encounter situations, together with the corresponding relevant probabilities.

Notice that an agent ii can enter a pair-wise interaction with a non-empty inventory of examples, e.g. [+++]A[+++\dots]_{A}, associated to name AA, without being able to use name AA to convey examples to other agents, i.e., without the name AA in the list i{\mathcal{L}}_{i} due to not having generalized concept CC in association with AA. Those examples can have different origins: (1) in the initial conditions, when nex,An_{ex,A} randomly extracted examples associated to AA and nex,Bn_{ex,B} to BB are assigned to each agent; (2) in previous interactions, in which the examples were conveyed by other agents; (3) in an agreement about convention BB, which removed label AA from the list i{\mathcal{L}}_{i} while leaving all the corresponding examples in the inventory associated to name AA. In the latter case, the inventory [+++]A[+++\dots]_{A} may be “ready” for a generalization process, since it contains a sufficient number of examples, i.e., agent ii will probably be able to generalize as soon as another example is conveyed by an agent. This situation is not as peculiar as it may look at first sight. In fact, there is a linguistic analogue in the case where a speaker that loses the habit to use a certain word (or a language) AA can regain it promptly, if exposed to AA again.

Notice also that without the agreement dynamics scheme introduced in the model, borrowed from the basic NG model, the population fraction nABn_{AB} of individuals who know both AA and BB (nA+nB+nAB=1n_{A}+n_{B}+n_{AB}=1) would be growing, until eventually nAB=1n_{AB}=1.

Table 1: Pair-wise interactions in the Bayesian NG model. The speaker (S) conveys a name 𝐴\overset{A}{\longrightarrow} or 𝐵\overset{B}{\longrightarrow} to the hearer (H) together with an example taken from the speaker’s inventory, [+++]A[+++\dots]_{A} or [+++]B[+++\dots]_{B}, respectively — this happens with a branching probability q=0.5q=0.5 if the speaker has the list (A,B)(A,B) and knows the meaning of both names. The outcome can be: (1) a reinforcement (only); (2) generalization of concept CC, if the Bayes probability is p>1/2p>1/2; (3) an agreement between hearer and speaker, if both agents know the meaning of the conveyed name. Even if not indicated, reinforcement takes place also in cases (2) and (3).
S-List Name H-List Branching Process Condition S- List H-List
(before) conveyed (before) probability (after) (after)
(A)(A) 𝐴\overset{A}{\longrightarrow} (A)(A) (q=1)(q=1) Reinforcement always (A)(A) (A)(A)
(A)(A) 𝐴\overset{A}{\longrightarrow} (B)(B) (q=1)(q=1) Reinforcement pA<1/2p_{A}<1/2 (A)(A) (B)(B)
(q=1)(q=1) Learning pA1/2p_{A}\geq 1/2 (A)(A) (A,B)(A,B)
(A)(A) 𝐴\overset{A}{\longrightarrow} (A,B)(A,B) (q=1)(q=1) Agreement always (A)(A) (A)(A)
(B)(B) 𝐵\overset{B}{\longrightarrow} (A)(A) (q=1)(q=1) Reinforcement pB<1/2p_{B}<1/2 (B)(B) (A)(A)
(q=1)(q=1) Learning pB1/2p_{B}\geq 1/2 (B)(B) (A,B)(A,B)
(B)(B) 𝐵\overset{B}{\longrightarrow} (B)(B) (q=1)(q=1) Reinforcement always (B)(B) (B)(B)
(B)(B) 𝐵\overset{B}{\longrightarrow} (A,B)(A,B) (q=1)(q=1) Agreement always (B)(B) (B)(B)
(A,B)(A,B) 𝐴\overset{A}{\longrightarrow} (A)(A) q=1/2q=1/2 Agreement always (A)(A) (A)(A)
(A,B)(A,B) 𝐵\overset{B}{\longrightarrow} (A)(A) q=1/2q=1/2 Reinforcement pB<1/2p_{B}<1/2 (A,B)(A,B) (A)(A)
Learning pB1/2p_{B}\geq 1/2 (A,B)(A,B) (A,B)(A,B)
(A,B)(A,B) 𝐴\overset{A}{\longrightarrow} (B)(B) q=1/2q=1/2 Reinforcement pA<1/2p_{A}<1/2 (A,B)(A,B) (B)(B)
Learning pA1/2p_{A}\geq 1/2 (A,B)(A,B) (A,B)(A,B)
(A,B)(A,B) 𝐵\overset{B}{\longrightarrow} (B)(B) q=1/2q=1/2 Agreement always (B)(B) (B)(B)
(A,B)(A,B) 𝐴\overset{A}{\longrightarrow} (A,B)(A,B) q=1/2q=1/2 Agreement always (A)(A) (A)(A)
(A,B)(A,B) 𝐵\overset{B}{\longrightarrow} (A,B)(A,B) q=1/2q=1/2 Agreement always (B)(B) (B)(B)

III Results

In this section we study numerically the Bayesian NG model introduced above and discuss its main features. We limit ourselves to study the model dynamics on a fully-connected network.

In the new learning scheme, which replaces the one-shot learning of the two-conventions NG model, an individual generalizes concept CC on a suitable time scale Δt>1\Delta t>1, rather than during a single interaction. However, a few examples are sufficient for an agent to generalize concept CC, as in a realistic concept-learning process. This is visible from the Bayesian probabilities pAp_{A} and pBp_{B} computed by agents in the role of hearer, according to Eq. (4), once at least nex,A=5n^{\ast}_{ex,A}=5 and nex,B=6n^{\ast}_{ex,B}=6 examples “+”, respectively, have been stored in the inventory associated to the name AA and BB: Figure 3 shows the histograms of the pAp_{A}’s and pBp_{B}’s computed from the initial time until consensus for a single run with N=2000N=2000 agents and starting with SIC. The low frequencies at small values of pAp_{A} and pBp_{B} and the highest frequencies at values close to unity are due to the fact that the Bayesian probabilities reach values pApB1p_{A}\approx p_{B}\approx 1 very fast, after a few learning attempts, consistently with the size principle, on which the Bayesian learning paradigm, and in turn Eq. (4), are based Tenenbaum-1999 .

Refer to caption
Figure 3: Histograms of the Bayesian probabilities pA,pBp_{A},p_{B} computed by agents during their learning attempts during a single run (for N=2000N=2000 agents, starting with SIC; nex,A=5n^{\ast}_{ex,A}=5, nex,B=6n^{\ast}_{ex,B}=6).

In order to visualize how the system approaches consensus, it is useful to consider some global observables, such as the fractions nA(t)n_{A}(t), nB(t)n_{B}(t), and nAB(t)n_{AB}(t) of agents that have generalized concept CC in association with name AA only, name BB only, or both names AA and BB, respectively, or the success rate S(t)S(t). The dynamics of a population of N=1000N=1000 agents (panels (A) and (B)) using different initial conditions, SIC, AIC, and AICr, and that of a population of N=100N=100 agents starting with SIC (panels (C) and (D)) are shown in Fig. 4.

Panel (A) of Fig. 4 shows only the population fractions corresponding to the name found at consensus, for the sake of clarity (the remaining population fractions eventually go to zero). For asymmetrical initial condition (AIC or AICr), it is the initial majority that determines the convention found at consensus (that is BB for AIC and AA for AICr). If the system starts from SIC, the convention AA, for which agents can generalize earlier (nex,A=5<nex,B=6n^{\ast}_{ex,A}=5<n^{\ast}_{ex,B}=6), is always found at consensus — in this case it is the asymmetry in the thresholds nex,An^{\ast}_{ex,A} and nex,Bn^{\ast}_{ex,B}, characterizing the Bayesian learning process, to determine consensus.

Panel (B) of Fig. 4 shows the success rate S(t=tk)S(t=t_{k}), representing the average over different runs of the instantaneous success rate SkS_{k} of the kkth interaction at time tkt_{k}, defined as follows: Sk=1S_{k}=1 in case of agreement between the two agents or when a successful learning of the hearer takes places, following a Bayes probability p>1/2p>1/2; or Sk=0S_{k}=0 in case of unsuccessful generalization, when p<1/2p<1/2 and only reinforcement takes place. The success rate S(t)S(t) varies between S(0)(nA(0))2+(nB(0))2S(0)\approx(n_{A}(0))^{2}+(n_{B}(0))^{2}, due to the respective fractions of agents that initially know the two conventions AA and BB, to S1S\approx 1 at consensus, following a typical S-shaped curve of learning processes Baronchelli2006 . In the case of SIC, the initial value is S(0)0.52+0.52=0.5S(0)\approx 0.5^{2}+0.5^{2}=0.5, while for AIC or AICr the initial value is S(0)(0.3)2+(0.7)20.58S(0)\approx(0.3)^{2}+(0.7)^{2}\approx 0.58.

Refer to caption
Figure 4: Average population fraction associated to the name shared in the final consensus state (upper panels (A) and (C)) and success rate S(t)S(t) (lower panels (B) and (D)) versus time. Left panels (A) and (B): system with N=1000N=1000 agents starting from different initial conditions, SIC, AIC, and AICr; averages done over 600600 runs. Right panels (C) and (D): system with N=100N=100 agents starting from SIC; averages done over 10001000 runs — notice that due to the smaller size N=100N=100, the system can converges to consensus both with name AA (in a fraction of cases pe,A0.9)p_{e,A}\approx 0.9) and with name BB (pe,B0.1p_{e,B}\approx 0.1).

We now investigate how the modified Bayesian dynamics affects the convergence times to consensus. The study of the size-dependence of the convergence to consensus shows that there is a critical value N500N^{\ast}\approx 500 in the case of SIC, such that for NNN\leq N^{\ast} there is a non-negligible probability that the final absorbing state is BB. Panels (C) and (D) of Fig. 4, representing the results for a system starting with SIC and a smaller size N=100N=100, show the existence of two possible final absorbing states and that there are different times scales associated to the convergence to consensus: name AA is found at consensus in about 90%90\% of cases and name BB in the remaining cases. The branching probability into AA or BB consensus is further investigated in panel (A) of Fig. 5, where we plot the branching probabilities pe,A,pe,Bp_{e,A},p_{e,B} versus the system sizes NN. The nonlinear behavior (symmetrical sigmoid) signals the presence of finite-size effects, particularly clear for relatively small NN-values. In fact, when the fluctuations in the system are larger, the system size can play an important role in the dynamics of social systems, as an actual thermodynamic limit is only allowed for simulations of macroscopic physical systems Toral2007a .

Refer to caption
Figure 5: Panel (A): probabilities pe,Ap_{e,A} and pe,Bp_{e,B} that the system reaches the consensus at AA and BB respectively, versus the system sizes NN, obtained by averaging over 10001000 runs of a system starting with SIC. Panel (B): average number of examples n¯ex,A\bar{n}_{ex,A} and n¯ex,B\bar{n}_{ex,B} recorded by an agent at consensus, for a system of N=50,100,500,1000,1500,2000N=50,100,500,1000,1500,2000 agents, starting with SIC, AIC, AICr. Averages are done over 600600 runs.

The convergence time tconvt_{conv} follows a simple scaling rule with the system size NN, related to the average number of examples n¯ex,A,n¯ex,B\bar{n}_{ex,A},\bar{n}_{ex,B} relative to A,BA,B respectively, stored in the agents’ inventories at consensus. These values depend on the number of learning and reinforcement processes, and hence are related to the system size NN. The average number of interactions undergone by the agents until the system reaches the consensus is given by the sum n¯int=n¯ex,A+n¯ex,B\bar{n}_{int}=\bar{n}_{ex,A}+\bar{n}_{ex,B} 111The nex,A=nex,B=4n_{ex,A}=n_{ex,B}=4 examples given initially to each agent are not accounted for by n¯ex,A\bar{n}_{ex,A} and n¯ex,B\bar{n}_{ex,B}.. One expects that

tconvn¯intN,t_{conv}\approx\bar{n}_{int}N\,, (5)

which suggests a linear scaling law (tconvNt_{conv}\sim N) for convergence time with the system size NN for all the possible initial conditions. A linear behavior is indeed confirmed by the numerical simulations with population sizes N=50,100,500,1000,1500,2000N=50,100,500,1000,1500,2000 starting from SIC, AIC, AICr. The relative numerical results are reported in Table 2. Moreover, in Eq. (5) the size-dependence of n¯int\bar{n}_{int} is ignored as it shows a weak dependence upon NN, see panel (B) in Fig. 5.

Table 2: Scaling laws tconvNαt_{conv}\sim N^{\alpha} with the system size NN. Here the parameters are nex,A=5n^{\ast}_{ex,A}=5, nex,B=6n^{\ast}_{ex,B}=6 with initial conditions SIC, AIC and AICr. The average number of examples, n¯ex,A,n¯ex,B\bar{n}_{ex,A},\bar{n}_{ex,B}, stored at tconvt_{conv}, are obtained averaging over 600600 runs of a system with N=1000N=1000 agents.
α\alpha n¯ex,A\bar{n}_{ex,A} n¯ex,B\bar{n}_{ex,B} outcome
SIC 1.061.06 2020 88 A,BA,B
AIC 1.081.08 33 1919 BB
AICr 1.091.09 1818 33 AA

From the above mentioned scaling law, it is clear that the average number of examples stored by the agents at consensus plays an important role in the semiotic dynamics. In particular, it is found that if the final absorbing state is AA (or B), then n¯ex,A>n¯ex,B\bar{n}_{ex,A}>\bar{n}_{ex,B} ( n¯ex,B>n¯ex,A\bar{n}_{ex,B}>\bar{n}_{ex,A}). Moreover, the average number of examples, relative to the absorbing state, always increases monotonically with the system size while a size-independent behavior is observed in the opposite case, see the right panel (B) of Fig. 5.

Finally, we compare the convergence time of the Bayesian word-learning model, tconvt_{conv}, with that of two-conventions NG model, t¯conv\bar{t}_{conv} Castello2009 , by studying the corresponding ratio R=tconv/t¯convR=t_{conv}/\bar{t}_{conv} for common initial conditions and population sizes. When starting with SIC, the values of the convergence times obtained from the two models become of the same order by increasing NN: RR decreases with NN, reaching unity for N=10000N=10000, see Fig. 6. In other words, the time scales of the two models become equivalent for relatively large system sizes, i.e., the learning processes of the two models perform equivalently and the Bayesian approach roughly gives rise to the one-shot learning that characterizes the two-conventions NG model. In the next section we discuss how the Bayesian model becomes asymptotically equivalent to the minimal NG model. The inset of Fig. 6 represents RR versus NN, for N<2000N<2000, given different starting configurations, with SIC, AIC and AICr, and different population sizes. In the following, we focus on the case of SIC.

Refer to caption
Figure 6: The ratio of the convergence times of the Bayesian word-learning model and the 2-conventions NG model, R=tconv/t¯convR=t_{conv}/\bar{t}_{conv}, versus the system size NN, for a system starting with SIC. The inset illustrates the dependence of RR on different initial conditions. The curves are obtained averaging over 900900 runs.
Refer to caption
Figure 7: Model scheme with two non-excluding options. Arrows indicate allowed transitions between the “bilingual” state (AA,BB) and the “monolingual” states AA and BB. Direct ABA\leftrightarrow B transitions are not allowed.

IV Stability analysis

In this section we investigate the stability properties of the mean-field dynamics of the Bayesian NG model, in which statistical fluctuations and correlations are neglected. In the Bayesian NG model, as in the basic NG, agents can use two non-excluding options AA and BB to refer to the same concept CC. The main difference between the Bayesian model and the basic NG model is in the learning process: a one-shot learning process in the basic NG and a Bayesian process in the Bayesian NG model. In the latter case the presence of a name in the word list indicates that the agent has generalized the corresponding concept from a set of positive recorded examples.

The NG model belongs to the wide class of models with two non-excluding options AA and BB, such as many models of bilingualism Patriarca2012a , in which transitions between state (A)(A) and state (B)(B) are allowed only through an intermediate (“bilingual”) state (A,B)(A,B), as schematized in Fig. 7. The mean-field equations for the fractions nA(t)n_{A}(t) and nB(t)n_{B}(t) can be obtained considering the gain and loss contributions of the transitions depicted in Fig. 7,

n˙A=pABAnABpAABnA,\displaystyle\dot{n}_{A}=p_{AB\rightarrow A^{\,}}\,n_{AB}-p_{A\rightarrow AB}\,n_{A}\,,
n˙B=pABBnABpBABnB.\displaystyle\dot{n}_{B}=p_{AB\rightarrow B}\,n_{AB}-p_{B\rightarrow AB}\,n_{B}\,. (6)

Here na˙(t)=dna(t)/dt\dot{n_{a}}(t)=dn_{a}(t)/dt and the quantities pabp_{a\rightarrow b} represent the respective transition rates per individual, corresponding to the arrows in Fig. 7 (a,b=A,B,ABa,b=A,B,AB). The equation for nAB(t)n_{AB}(t) was omitted, since it is determined by the condition that the total number of agents is constant, nA(t)+nB(t)+nAB(t)=1n_{A}(t)+n_{B}(t)+n_{AB}(t)=1.

The details of the possible pair-wise interactions in the Bayesian naming game are listed in Table 1. From the various contributions, one obtains the master equation

n˙A=pBnAnB+nAB2+3pB2nAnAB,\displaystyle\dot{n}_{A}=-p_{B}n_{A}n_{B}+n_{AB}^{2}+\frac{3-p_{B}}{2}n_{A}n_{AB}\,,
n˙B=pAnAnB+nAB2+3pA2nBnAB,\displaystyle\dot{n}_{B}=-p_{A}n_{A}n_{B}+n_{AB}^{2}+\frac{3-p_{A}}{2}n_{B}n_{AB}\,, (7)

which can be rewritten in the form (IV) with transition rates per individual given by

pAAB=pBnB+12pBnAB,\displaystyle p_{A\rightarrow AB}=p_{B}n_{B}+\frac{1}{2}p_{B}n_{AB}\,,\qquad pBAB=pAnA+12pAnAB,\displaystyle p_{B\rightarrow AB}=p_{A}n_{A}+\frac{1}{2}p_{A}n_{AB}\,, (8)
pABA=32nA+nAB,\displaystyle p_{AB\rightarrow A}=\frac{3}{2}n_{A}+n_{AB}\,,\qquad pABB=32nB+nAB.\displaystyle p_{AB\rightarrow B}=\frac{3}{2}n_{B}+n_{AB}\,. (9)

Equations (8) provide the transition rates of learning processes, while Eqs. (9) give the transition rates of agreement processes. Setting xnAx\equiv n_{A}, ynBy\equiv n_{B}, and z=nAB1xyz=n_{AB}\equiv 1-x-y, the autonomous system (IV) becomes

x˙=fx(x,y)pBxy+(1xy)2+12(3pB)x(1xy),\displaystyle\dot{x}=f_{x}\left(x,y\right)\equiv-p_{B}xy+(1-x-y)^{2}+\frac{1}{2}(3-p_{B})x(1-x-y)\,, (10)
y˙=fy(x,y)pAxy+(1xy)2+12(3pA)y(1xy),\displaystyle\dot{y}=f_{y}\left(x,y\right)\equiv-p_{A}xy+(1-x-y)^{2}+\frac{1}{2}(3-p_{A})y(1-x-y)\,, (11)

where 𝐯=(fx(x,y),fy(x,y))\mathbf{v}=\left(f_{x}(x,y),f_{y}(x,y)\right) is the velocity field in the phase plane. For the following analysis, it is convenient to write the Bayesian probabilities pAp_{A} and pBp_{B} appearing in these equations as time-dependent parameters of the model, but they are actually highly non-linear functions of the variables. In fact, they can be thought as averages of the microscopic Bayesian probability in Eq. (4) over the possible dynamical realizations. For this reason, they have also a complex non-local time-dependence on the previous history of the interactions between agents. For the moment, we assume pA(t)=pB(t)=p(t)p_{A}(t)=p_{B}(t)=p(t), returning later to the general case.

From the conditions defining the critical points, fx(x,y)=fy(x,y)=0f_{x}\left(x,y\right)=f_{y}\left(x,y\right)=0, one obtains (xy)z=0\left(x-y\right)z=0. Setting z=0z=0, one obtains two solutions that correspond to consensus in AA or BB, given by (x1,y1,z1)=(1,0,0)(x_{1},y_{1},z_{1})=(1,0,0) and (x2,y2,z2)=(0,1,0)(x_{2},y_{2},z_{2})=(0,1,0). Instead, setting (xy)=0\left(x-y\right)=0 leads to the equation

2x2(p+5)x+2=0,2x^{2}-(p+5)x+2=0\,, (12)

that has the solutions

x±=p+5±(p+5)2164.x_{\pm}=\frac{p+5\pm\sqrt{(p+5)^{2}-16}}{4}\,. (13)

For p(0,1]p\in\left(0,1\right], the corresponding solutions (x±,x±,12x±)(x_{\pm},x_{\pm},1-2x_{\pm}) are not suitable solutions, because z±=12x±<0z_{\pm}=1-2x_{\pm}<0.

This analysis is valid for p>0p>0. In fact, p=p(t)p=p(t) is a function of time and for a finite interval of time after the initial time one has that p=0p=0, which defines a different dynamical system. In the initial conditions used, z(0)=0z(0)=0, which implies z(t)=0z(t)=0, x(t)=x(0)x(t)=x(0), and y(t)=y(0)y(t)=y(0) at any later time tt as long as p(t)=0p(t)=0, since x˙(t)=y˙(t)=z˙(t)=0\dot{x}(t)=\dot{y}(t)=\dot{z}(t)=0 (see Eq. (IV); in fact, the whole line x+y=1x+y=1 (for 0<x,y<10<x,y<1) represents a continuous set of equilibrium points. The reason why in this model p(0)=0p(0)=0 at t=0t=0 and also during a subsequent finite interval of time is twofold. First, agents do not have any examples associated to the name not known and they have to receive at least nex,An_{ex,A}^{\ast} or nex,Bn_{ex,B}^{\ast} examples, before being able to compute the corresponding Bayesian probability pA(t)p_{A}(t) or pB(t)p_{B}(t) — thus it is to be expected that p(t)=0p(t)=0 meanwhile. Furthermore, even when agents can compute the Bayesian probabilities, the effective probability to generalize is actually zero, due to the threshold p=0.5p^{\ast}=0.5 for a generalization to take place. The existence of the (temporary) equilibrium points on the line x+y=1x+y=1 ends as soon as the parameter p(t)>pp(t)>p^{\ast} and, according to Eqs. (IV), the two AA- and BB-consensus states become the only stable equilibrium points. The representative point in the xx-yy-plane is deemed to leave the initial conditions on the z=1xy=0z=1-x-y=0 line, due to the stochastic nature of the dynamics, which is not invariant under time reversal hinrichsen2006 .

To determine the nature of the critical points (x1,y1)=(1,0)(x_{1},y_{1})=(1,0) and (x2,y2)=(0,1)(x_{2},y_{2})=(0,1), one needs to evaluate at the equilibrium points the 2×22\times 2 Jacobian matrix A(x,y)={ifj}A(x,y)=\{\partial_{i}f_{j}\}, where i,j=x,yi,j=x,y. It is easy to show that both the critical points (0,1)(0,1) and (1,0)(1,0) are asymptotically stable strogatz2000 .

As long as the general case pApBp_{A}\neq p_{B}, it can be shown that the trajectory of the system can point toward and eventually reach the consensus state with AA or BB, depending on whether pA(t)>pB(t)p_{A}\left(t^{\ast}\right)>p_{B}\left(t^{\ast}\right) or pA(t)<pB(t)p_{A}\left(t^{\ast}\right)<p_{B}\left(t^{\ast}\right), where t>0t^{\ast}>0 is the critical time at which the representative point leaves the initial position.

Refer to caption
Figure 8: Results from two single simulations of a system with N=100N=100 agents, starting from SIC at (x0,y0)=(0.5,0.5)(x_{0},y_{0})=(0.5,0.5) and reaching two different consensus states about name AA or BB. Panels (A) and (B) show the population fractions x(t)=nA(t)x(t)=n_{A}(t) and y(t)=nB(t)y(t)=n_{B}(t). Panels (C) and (D) show the corresponding average number of examples recorded by an agent, n¯ex,A(t)\bar{n}_{ex,A}(t) and n¯ex,B(t)\bar{n}_{ex,B}(t).

The convention AA or BB is selected randomly, depending on various factors related to the specific realization of the system evolution, such as the numbers of examples n¯ex,A(t)\bar{n}_{ex,A}(t) and n¯ex,B(t)\bar{n}_{ex,B}(t) recorded by the agents until time tt, their quality from the point of view of the generalization, and the initial asymmetry of the thresholds for generalizing, nex,Anex,Bn^{\ast}_{ex,A}\neq n^{\ast}_{ex,B}. The asymmetrical thresholds nex,A=4<nex,B=5n^{\ast}_{ex,A}=4<n^{\ast}_{ex,B}=5 produce a bias toward consensus in AA and play a crucial role in the subsequent Bayesian semiotic dynamics; in fact, swapping the threshold values (setting nex,A=5>nex,B=4n^{\ast}_{ex,A}=5>n^{\ast}_{ex,B}=4), the approach to consensus occurs with the outcomes AA, BB swapped.

We observed that for NN500N\gtrsim N^{\ast}\approx 500, the chances that the system converges to (B)(B) become negligible. This can be seen in panels (C) and (D) of Fig. 8, showing n¯ex,A(t)\bar{n}_{ex,A}(t) and n¯ex,B(t)\bar{n}_{ex,B}(t) versus time (averaged over the agents of the system) for single runs, a population of N=100N=100 agents, and SIC, for different runs that relax toward consensus AA and BB, respectively. After an initial transient, in which n¯ex,A(t)n¯ex,B(t)\bar{n}_{ex,A}(t)\approx\bar{n}_{ex,B}(t), they differ more and more from each other at times t>tt>t^{\ast}. In turn, also pAp_{A} and pBp_{B} begin to differ significantly from each other, thus affecting the rate of depletion of the populations during the subsequent dynamics. For instance, if pA>pBp_{A}>p_{B}, then pBAB>pAABp_{B\rightarrow AB}>p_{A\rightarrow AB}, see Eqs. (8), which means that the depletion of nBn_{B} occurs faster then that of nAn_{A}. In turn, this favours the decay of the mixed states (A,B)(A,B) into the state (A)(A), see Eqs. (9), being nA>nBn_{A}>n_{B}.

Refer to caption
Figure 9: Population fractions nA(t)n_{A}(t), nB(t)n_{B}(t), and nAB(t)n_{AB}(t), versus time, starting from SIC; results are obtained by averaging over 600600 runs. Left column (panels (A) and (B)): a system with N=100N=100 agents can reach consensus with name AA (panel(A), about 91%91\% of runs) or name BB (panel (B), about 9%9\% of runs). Right column (panels (C) and (D)): system with N=200N=200 agents, reaching consensus with name AA in about 96%96\% of runs (panel (C))) and with name BB in about in the remainder 4% of runs (panel (D)).

The asymmetry discussed above also affects the convergence times tconvAt^{A}_{conv} and tconvBt^{B}_{conv} and we find tconvB>tconvAt^{B}_{conv}>t^{A}_{conv} in all the numerical simulations. Despite the noise, such a trend is already appreciable in a single run, as shown in panels (A) and (B) of Fig. 8. The mean fractions nA(t)n_{A}(t), nB(t)n_{B}(t), and nAB(t)n_{AB}(t) versus time, obtained by averaging over many runs, result in less noisy outputs and provide a more clear picture of the difference, which is visible in Fig. 9, obtained using 600600 runs starting with SIC and for N=100N=100 agents (panels (A) and (B)) and N=200N=200 agents (panels (C) and (D)). In addition, the convergence times depend on the system size, increasing with the number of agents NN: compare the left panels (A) and (B), where N=100N=100 agents, with the right panels (C) and (D), where N=200N=200 agents.

The possibility that the same system, starting with the same initial conditions and evolving with the same dynamical parameters, can reach either AA or BB is a consequence of the stochastic nature of dynamics. This does not happen for NNN\gtrsim N^{\ast}, when both n¯ex,A\bar{n}_{ex,A} and n¯ex,B\bar{n}_{ex,B} reach some threshold values close to those observed at tconvt_{conv}, which is clearly a value sufficient for the agents to generalizing concept CC. In fact, the scaling law of tconvt_{conv} with NN shows that the sum of n¯ex,A\bar{n}_{ex,A} with n¯ex,B\bar{n}_{ex,B} becomes nearly constant for NNN\gtrsim N^{\ast}, implying that the dynamics is uniquely determined, that is, the consensus always occurs at AA from SIC, once the agents have stored a threshold number of n¯ex,A\bar{n}_{ex,A}, n¯ex,B\bar{n}_{ex,B}. It is found that these threshold values correspond to n¯ex,A=21\bar{n}^{\ast}_{ex,A}=21, n¯ex,B=12\bar{n}^{\ast}_{ex,B}=12. Note that in n¯ex,A\bar{n}^{\ast}_{ex,A}, n¯ex,B\bar{n}^{\ast}_{ex,B} we add values the four initial given examples stored in the agents’ inventories at the beginning. The reason is that the generalization function p(t)p(t) outputs will effectively depend on them all. Therefore, at these threshold values, it would be very unlikely that pB>pAp_{B}>p_{A}, and so it would be the same for the consensus at BB.

Refer to caption
Figure 10: Average values p¯A(t)\bar{p}_{A}(t) and p¯B(t)\bar{p}_{B}(t) computed using a (temporal) bin Δt=16×103\Delta t=16\times 10^{3}, versus time from for a single run of a system reaching consensus at AA. The convergence time is tconv160×103t_{conv}\approx 160\times 10^{3} and the population size is N=5000N=5000. The inset shows the average number of learning attempts noAno_{A}, noBno_{B} versus time for the same single run.

Now we consider the Bayesian probabilities pA(t)p_{A}(t) and pB(t)p_{B}(t) computed by agents and the corresponding number of learning attempts noA(t)no_{A}(t) and noB(t)no_{B}(t) made by agents at time tt to learn concept CC in association with word AA or BB, respectively, i.e. the number of times that the agents compute pAp_{A} or pBp_{B} (only the case of a system starting with SIC is considered). We consider a single run of a system with N=5000N=5000 agents and study the average values p¯A(t),p¯B(t)\bar{p}_{A}(t),\bar{p}_{B}(t), obtained by averaging pA(t)p_{A}(t) and pB(t)p_{B}(t) over the agents of the system. We also assume a coarse-grained view, consisting in an additional average of p¯A(t)\bar{p}_{A}(t), p¯B(t)\bar{p}_{B}(t), and noAno_{A}, noBno_{B}, over a a temporal bin Δt=16×103\Delta t=16\times 10^{3}, in order to reduce random fluctuations. Figure 10 shows the time evolution of the average probabilities p¯A(t)\bar{p}_{A}(t) and p¯B(t)\bar{p}_{B}(t) in the time-range where data allow a good statistics. The probabilities grow monotonically and eventually reach the value one. While this points at an equivalence between the mean-field regime of the Bayesian naming game and that of the two-conventions NG model, in which agents learn at the first attempt (one-shoot learning), such an equivalence is suggested but not fully reproduced by the coarse-grained analysis. The time evolution of the number of learning attempts noA(t)no_{A}(t) and noB(t)no_{B}(t) shows that they are negligible both at the beginning and at the end of the dynamics — see inset in Fig. 10. This is due to the fact that at the beginning it is most likely that either interactions between agents with the same conventions take place (starting with SIC, each agent has a probability of 50% to interact with an agent having the same convention) or interactions between agents with different conventions but with still too small inventories to be able to generalize concept CC, leading to reinforcement processes only. When approaching consensus, agents with one of the conventions constitute the large majority of the population and thus they are again most likely to interact through reinforcements only. Thus, the largest numbers of attempt to learn concept CC in association with AA and BB are expected to occur at the intermediate stage of the dynamics. In fact, noA(t)no_{A}(t) and noB(t)no_{B}(t) are observed to reach a maximum at ttconv/2t\approx t_{conv}/2 for any given system size NN, as visible in the inset of Fig.  10. Notice that also the fraction of agents nABn_{AB}, who know both conventions and can communicate using both name AA and name BB, possibly allowing other agents to generalize in association with name AA or BB, reaches its maximum roughly at the same time.

V Conclusion

We introduced a novel agent-based model that describes the appearance of linguistic consensus through a word-learning process. The work presented is exploratory in nature, concerning the minimal problem of a single concept that can be associated to two different possible names AA or BB, but is aimed at providing a prototype of general framework for describing the interaction between the social and the cognitive dimension. To this aim, the model is constructed on the basis of the semiotic dynamics of the NG model and is then extended by adding a Bayesian cognitive process, mimicking human learning processes.

The model describes in a natural way (1) the uncertainty accompanying the first phase of a learning process, (2) the gradual reduction of the uncertainty as more and more examples are provided, and (3) the ability to learn from a few examples. The semiotic dynamics of the synonyms is different from the basic NG, in that it depends on parameters that are of a strictly cognitive nature, such as the thresholds nexn_{ex}^{\ast} of the number of examples necessary before an agent can try to generalize and the acceptance threshold pp^{\ast} for carrying out the generalization of a concept. The interplay between the asymmetry of the conventions AA and BB, the system size, and the stochastic character of the time evolution, have dramatic consequences on the consensus dynamics: there is a critical time t>0t^{\ast}>0, when the system begins to move in the phase-plane to eventually converge toward a consensus state; there is a critical system size NN^{*}, such that for N<NN<N^{\ast} the system can end up in any of the two consensus states and the convergence times depends on NN; there is an asymmetry in the branching probabilities that the system converges toward one of the two possible conventions and of the corresponding convergence times; the scaling laws of the convergence times versus NN differ from those observed in the basic NG model, because they depend on the learning experience of the agents.

The cognitive dimension offers additional possibilities for modelling in terms of specific cognitive parameters problems that are out of the reach of traditional social dynamics models. The model illustrated in this work represents a step toward a generalized Bayesian approach to social interactions, leading to cultural conventions.

Future work can address specific problems of current interest from the point of view of cognitive processes; or features relevant from the general standpoint of complexity theory. In the first case, it is possible to study in the cognitive dimension the semiotic dynamics of homonyms, synonyms, and innovation, e.g., the cognitive conditions leading to a name A1A_{1}, associated to a concept C1C_{1}, splitting into two names A1A_{1} and A2A_{2}, associated to two related but distinct concepts C1C_{1} and C2C_{2}, as more examples become available that make the two concepts eventually distinguishable from each other — a type of problems that cannot be tackled within models of cultural competition. In the second case, one can mention the classical problem of the interplay between a central information source (bias) and the local influences of individuals — this time in a cognitive framework.

Another question to be investigated within a cognitive framework would be the role of heterogeneity. In fact, heterogeneity is known to characterize most of the known complex systems at various levels — here the diversity could affect the dynamical parameters of e.g. the different competing names as well as those of the agents. Heterogeneity of individuals can lead to counter-intuitive effects, such as resonant behaviors Tessone2009a ; VazMartins2009a . Furthermore, the complex, heterogeneous nature of a local underlying social network can change drastically the co-evolution and the time-scales of the conventions in competition with each other Toivonen2009a .

Acknowledgements

The authors acknowledge support from the Estonian Ministry of Education and Research through Institutional Research Funding IUT (IUT39-1), the Estonian Research Council through Grant PUT (PUT1356), and the ERDF (European Development Research Fund) CoE (Center of Excellence) program through Grant TK133.

We also thank Andrea Baronchelli for providing useful remarks about the naming game model and the manuscript.

References

  • (1) C. Castellano, S. Fortunato, and V. Loreto. Statistical physics of social dynamics. Rev. Mod. Phys., 81:591, 2009.
  • (2) A. Baronchelli. The emergence of consensus: a primer. R. Soc. open sci., 5:172189, 2018.
  • (3) C. Xia, S. Ding, C. Wang, J. Wang, and Z. Chen. Risk analysis and enhancement of cooperation yielded by the individual reputation in the spatial public goods game. IEEE Systems Journal, 11(3):1516–1525, Sep. 2017.
  • (4) Chengyi Xia, Xiaopeng Li, Zhen Wang, and Matjaž Perc. Doubly effects of information sharing on interdependent network reciprocity. New Journal of Physics, 20(7):075005, jul 2018.
  • (5) Yingchao Zhang, Juan Wang, Chenxi Ding, and Chengyi Xia. Impact of individual difference and investment heterogeneity on the collective cooperation in the spatial public goods game. Knowledge-Based Systems, 136:150 – 158, 2017.
  • (6) S. Mufwene. The ecology of language evolution. Cambridge University Press, Cambridge, 2001.
  • (7) R. Lass. Historical Linguistics and Language Change. Cambridge University Press, Cambridge, UK, 1997.
  • (8) C. Berruto. Prima lezione di sociolinguistica. Laterza, Roma & Bari, 2004.
  • (9) S. Edelman and H. Waterfall. Behavioral and computational aspects of language and its acquisition. Phys. Life Rev., 4:253–277, 2007.
  • (10) J. B Tenenbaum. A Bayesian Framework For Concept Learning. PhD thesis, MIT, 1999.
  • (11) S. Wichmann. The emerging field of language dynamics. Language and Linguistics Compass, 2/3:442, 2008.
  • (12) S. Wichmann. Teaching & learning guide for: The emerging field of language dynamics. Language and Linguistics Compass, 2, 2008.
  • (13) Joshua B. Tenenbaum and Fei Xu. Word learning as Bayesian inference. In Proceedings of the 22nd Annual Conference of the Cognitive Science Society, 2000.
  • (14) F. Xu and J. B. Tenenbaum. Word learning as Bayesian inference. Psychological Review, 114:245–272, 2007.
  • (15) J.R. Hurford. Biological evolution of the saussurean sign as a component of the language-acquisition device. Lingua, 77:187–222, 1989.
  • (16) Martin A. Nowak, Joshua B. Plotkin, and David C. Krakauer. The evolutionary language game. Journal of Theoretical Biology, 200(2):147 – 162, 1999.
  • (17) M.A. Nowak. Evolutionary biology of language. Philos. Trans. R. Soc. London B Biol. Sci., 355(1403):1615–22, 2000.
  • (18) Peter E. Trapa and Martin A. Nowak. Nash equilibria for an evolutionary language game. Journal of Mathematical Biology, 41(2):172–188, Aug 2000.
  • (19) Andrea Baronchelli, Maddalena Felici, Vittorio Loreto, Emanuele Caglioti, and Luc Steels. Sharp transition towards shared vocabularies in multi-agent systems. Journal of Statistical Mechanics: Theory and Experiment, 2006(06):P06014, 2006.
  • (20) Guanrong Chen and Yang Lou. Naming Game. Springer International Publishing, Switzerland, 2019.
  • (21) Jierui Xie, Jeffrey Emenheiser, Matthew Kirby, Sameet Sreenivasan, Boleslaw K. Szymanski, and Gyorgy Korniss. Evolution of opinions on social networks in the presence of competing committed groups. PLOS ONE, 7(3):1–9, 03 2012.
  • (22) Zhong-Yan Fan, Ying-Cheng Lai, and Wallace Kit-Sang Tang. Knowledge consensus in complex networks: the role of learning. tt arXiv:1809.00297, 2018.
  • (23) Andrea Baronchelli. A gentle introduction to the minimal naming game. Belgian Journal of Linguistics, 30(1):171–192, 2016.
  • (24) Joshua B. Tenenbaum and Thomas L. Griffiths. Generalization, similarity, and Bayesian inference. The Behavioral and brain sciences, 24 4:629–40; discussion 652–791, 2001.
  • (25) Thomas L. Griffiths and Joshua B. Tenenbaum. Optimal predictions in everyday cognition. Psychological Science, 17(9):767–773, 2006. PMID: 16984293.
  • (26) Amy Perfors, Joshua B. Tenenbaum, Thomas L. Griffiths, and Fei Xu. A tutorial introduction to Bayesian models of cognitive development. Cognition, 120 3:302–21, 2011.
  • (27) Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  • (28) Joshua B. Tenenbaum. Bayesian modeling of human concept learning. In Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pages 59–65, Cambridge, MA, USA, 1999. MIT Press.
  • (29) Joshua B. Tenenbaum, Charles Kemp, Thomas L. Griffiths, and Noah D. Goodman. How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022):1279–1285, 2011.
  • (30) D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge, UK, 2012.
  • (31) K.P. Murphy. Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning series. MIT Press, Cambridge,MA, 2012.
  • (32) Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio. Statistical learning theory: A primer. International Journal of Computer Vision, 38(1):9–13, Jun 2000.
  • (33) Castelló, X., Baronchelli, A., and Loreto, V. Eur. Phys. J. B, 71(4):557–564, 2009.
  • (34) Andrea Baronchelli, Luca Dall’Asta, Alain Barrat, and Vittorio Loreto. Phys. Rev. E, 76:051102, Nov 2007.
  • (35) Ludwig Wittgenstein. Philosophical investigations. Macmillan, New York, 1953.
  • (36) Hanns Ludwig Harney. Bayesian Inference. Data Evaluation and Decision. Springer, 2003.
  • (37) H. Jeffreys. Theory of Probability. Clarendon Press, Oxford, 1939.
  • (38) Andrea Baronchelli, Maddalena Felici, Vittorio Loreto, Emanuele Caglioti, and Luc Steels. Journal of Statistical Mechanics: Theory and Experiment, 2006(06):P06014, 2006.
  • (39) R. Toral and C.J. Tessone. Finite size effects in the dynamics of opinion formation. Comm. Comp. Phys., 2:177, 2007.
  • (40) The nex,A=nex,B=4n_{ex,A}=n_{ex,B}=4 examples given initially to each agent are not accounted for by \mathaccentVbar016nex,A\mathaccentV{bar}016{n}_{ex,A} and \mathaccentVbar016nex,B\mathaccentV{bar}016{n}_{ex,B}.
  • (41) M. Patriarca, X. Castelló, J.R. Uriarte, V.M. Eguíluz, and M. San Miguel. Modeling two-language competition dynamics. Adv. Comp. Syst., 15(3&4):1250048, 2012.
  • (42) Haye Hinrichsen. Non-equilibrium phase transitions. Physica A: Statistical Mechanics and its Applications, 369(1):1 – 28, 2006. Fundamental Problems in Statistical Physics.
  • (43) Steven H. Strogatz. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry and Engineering. Westview Press, 1995.
  • (44) C.J. Tessone and R. Toral. Diversity-induced resonance in a model for opinion formation. Eur. Phys. J. B, 71:549, 2009.
  • (45) T. Vaz Martins, R. Toral, and M.A. Santos. Divide and conquer: resonance induced by competitive interactions. Eur. Phys. J. B, 67:329–336, 2009.
  • (46) R. Toivonen, X. Castelló, V.M. Eguíluz, J. Saramäki, K. Kaski, and M. San Miguel. Broad lifetime distributions for ordering dynamics in complex networks. Phys. Rev. E, 79:016109, 2009.