This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

DISCD: Distributed Lossy Semantic Communication for Logical Deduction of Hypothesis

Ahmet Faruk Saz,  Siheng Xiong,  and Faramarz Fekri
Abstract

In this paper, we address hypothesis testing in a distributed network of nodes, where each node has only partial information about the State of the World (SotW) and is tasked with determining which hypothesis, among a given set, is most supported by the data available within the node. However, due to each node’s limited perspective of the SotW, individual nodes cannot reliably determine the most supported hypothesis independently. To overcome this limitation, nodes must exchange information via an intermediate server. Our objective is to introduce a novel distributed lossy semantic communication framework designed to minimize each node’s uncertainty about the SotW while operating under limited communication budget. In each communication round, nodes determine the most content-informative message to send to the server. The server aggregates incoming messages from all nodes, updates its view of the SotW, and transmits back the most semantically informative message. We demonstrate that transmitting semantically most informative messages enables convergence toward the true distribution over the state space, improving deductive reasoning performance under communication constraints. For experimental evaluation, we construct a dataset designed for logical deduction of hypotheses and compare our approach against random message selection. Results validate the effectiveness of our semantic communication framework, showing significant improvements in nodes’ understanding of the SotW for hypothesis testing, with reduced communication overhead.

Index Terms:
Logic, semantic, lossy, communication, deduction

I Introduction

Low latency, minimal power consumption, and efficient bandwidth are critical in today’s hyper-connected systems. Autonomous technologies like self-driving cars and drones depend on real-time data exchange to ensure safety, while smart cities require reliable communication to optimize services. In healthcare, semantic communication enhances data accuracy for remote monitoring, improving patient outcomes. In industry 4.0, it ensures efficient, real-time information flow between machines and sensors, driving automation and advancing manufacturing intelligence. Semantic systems will support fast and explainable decision-making in next-generation applications such as automated truth verification, data connectivity and retrieval, financial modeling, risk analysis, privacy-enhancing technologies, and optimized agriculture [1, 2, 3]. In these applications, distributed systems may need to exchange data for deduction of a hypothesis to improve their generalization ability, decision-making performance, or statistical analysis. However, due to privacy concerns and communication limitations, transmitting their entire dataset may not be feasible. In such a case, semantic communication may play an essential role.

Rudolf Carnap’s foundational work introduced a model-theoretic approach to semantic communication compatible with First-Order Logic (FOL) to represent states. His framework provided a way to quantify semantic information using inductive probability distributions. Later, Hintikka expanded on Carnap’s work by addressing infinite universes and generalization issues [4]. Early post-5G semantic communication methods like D-JSCC and autoencoders employed neural network-based solutions [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. Traditional model-theoretic approaches like Carnap’s and Floridi’s [19] offer interpretability but suffer from computational inefficiency and lack probabilistic foundations. Newer approaches include information expansion and knowledge collusion [20], knowledge-graph integration enhanced with probability graphs [21], memory supported deep learning based approaches [22] and topological approaches [23]. These methods, while effective, are not particularly tailored to FOL reasoning tasks [24, 25, 26, 27, 28] such as deduction of a hypotheses.

In this paper, we propose a novel lossy semantic communication framework for a distributed set of nodes, each with a deductive task and partial, possibly overlapping views of the world aiming to identify, among a set of hypotheses, the one best supported by the SotW. Nodes communicate with a central server to enhance their understanding of the SotW, transmitting the most informative messages about their local environments under bandwidth limitations and privacy concerns. The server aggregates data and returns the most informative messages to help nodes refine their local probability distributions over the state space.

Our semantic encoding algorithm selects messages based on the semantic informativeness of a message to a specific observer within communication constraints. Messages that have higher degree of confirmation (e.g., inductive logical probability) are deemed more informative as they leave less uncertainty about the SotW. In addition, this new paradigm is able to effectively manage large state spaces, maximizing each node’s understanding of the SotW while enabling more robust deduction performance despite nodes being unaware of each other’s or the central node’s full perspective and limited communication resources.

Finally, we showcase that as the number of communication rounds increases, nodes distribution over the state space converge towards the true distribution, minimizing Bayes risk over hypothesis set. Empirical validation through experiments using a FOL deduction dataset shows that our framework significantly improves nodes’ understanding of the world and deduction performance with reduced overhead.

II Background on FOL, Inductive Probabilities, and Semantic Information

II-A Representing the State of the World via First-Order Logic

We begin by illustrating how the state of the world can be represented using First-Order Logic (FOL). Consider a finite world 𝐖F\mathbf{W}_{F} characterized by a finite set of entities 𝐄\mathbf{E} and a finite set of predicates 𝐏\mathbf{P}. The set of all possible states of 𝐖F\mathbf{W}_{F} is denoted by 𝐒F={𝐬1,𝐬2,,𝐬n}\mathbf{S}_{F}=\{\mathbf{s}_{1},\mathbf{s}_{2},\dots,\mathbf{s}_{n}\}, where n=2|𝐏|×|𝐄|2n=2^{|\mathbf{P}|\times|\mathbf{E}|^{2}}. Each state 𝐬i\mathbf{s}_{i} provides a complete description of the world using the logical language \mathcal{L}, which includes the predicates 𝐏\mathbf{P}, entities 𝐄\mathbf{E}, quantifiers ,\forall,\exists, and logical connectives ,,¬\land,\lor,\neg.

Definition 1 (State Description).

A state description 𝐬i\mathbf{s}_{i} in 𝐖F\mathbf{W}_{F} is defined as:

𝐬i=rPa,b𝐄δrRr(a,b),\mathbf{s}_{i}=\bigwedge_{r\in\mathcal{I}_{P}}\bigwedge_{a,b\in\mathbf{E}}\delta_{r}R_{r}(a,b), (1)

where δr{+1,1}\delta_{r}\in\{+1,-1\} indicates the presence (+1+1) or negation (1-1) of the predicate RrR_{r} for entities aa and bb, P\mathcal{I}_{P} is the index set of predicates.

In this setup, each state is a conjunction (\bigwedge) of a specific enumeration of all possible predicates (or their negations), Rr for rPR_{r}\text{ for }r\in\mathcal{I}_{P}, applied to all ordered pairs of entities, (a,b)𝐄(a,b)\in\mathbf{E}. Any FOL sentence mm in this finite world can thus be expressed as a disjunction of these state descriptions.

Theorem 1 (FOL Representation in Finite World [29]).

Any FOL sentence mm in 𝐖F\mathbf{W}_{F} can be represented as:

mi𝐒Fsuch that m𝐬i𝐬i,m\equiv\bigvee_{\begin{subarray}{c}i\in\mathcal{I}_{\mathbf{S}_{F}}\\ \text{such that }m\models\mathbf{s}_{i}\end{subarray}}\mathbf{s}_{i}, (2)

where \models is the logical entailment operator and 𝐒F\mathcal{I}_{\mathbf{S}_{F}} is the index set of finite states.

When extending to an infinite world 𝐖I\mathbf{W}_{I} where entities 𝐄\mathbf{E} becomes a countably infinite set, a more intricate approach is needed. We introduce the concept of Q-sentences to handle this complexity.

Definition 2 (Q-sentence).

A Q-sentence Qi(a,b)Q_{i}(a,b) in 𝐖I\mathbf{W}_{I} is defined as:

Qi(a,b)=rPa,b𝐄δrRr(a,b)δrRr(b,a),Q_{i}(a,b)=\bigwedge_{r\in\mathcal{I}_{P}}\bigwedge_{a,b\in\mathbf{E}}\delta_{r}R_{r}(a,b)\land\delta^{\prime}_{r}R_{r}(b,a), (3)

where δr,δr{+1,1}\delta_{r},\delta^{\prime}_{r}\in\{+1,-1\} indicate the presence or negation of predicates. This formulation encompasses all combinations, resulting in 4|𝐏|×|𝐄|2/24^{|\mathbf{P}|\times|\mathbf{E}|^{2}}/2 distinct Q-sentences, i.e., i=1,,4|𝐏|×|𝐄|2/2i=1,...,4^{|\mathbf{P}|\times|\mathbf{E}|^{2}}/2.

Building upon Q-sentences, we define attributive constituents to capture the relationships of individual entities.

Definition 3 (Attributive Constituent).

An attributive constituent Ct(x)C_{t}(x) in 𝐖I\mathbf{W}_{I} is:

Ct(x)=i𝒬δk{(Qi(x,b)):b},C_{t}(x)=\bigwedge_{i\in\mathcal{I}_{\mathcal{Q}}}\delta_{k}\{(Q_{i}(x,b)):\exists b\}, (4)

where δi{+1,1}\delta_{i}\in\{+1,-1\}, 𝒬\mathcal{I}_{\mathcal{Q}} is the index set over Q-sentences, and i=1,|𝒬|i=1,...|\mathcal{I}_{\mathcal{Q}}|, and b𝐄b\in\mathbf{E}. The expression encapsulates all relationships entity xx has within the world, expressing what kind of individual xx is.

To represent the entire infinite world, we define constituents as combinations of attributive constituents.

Definition 4 (Constituent).

A constituent CwC^{w} of width ww in 𝐖I\mathbf{W}_{I} is a conjunction of attributive constituents:

Cw=t=1w{Ct(x):x},C^{w}=\bigwedge_{t=1}^{w}\;\{C_{t}(x):\exists x\}, (5)

which represents the relational structure of entities in the world 𝐖I\mathbf{W}_{I}. Here, the width ww is the number of attributive constituents Ct(x)C_{t}(x) present in the conjunction forming CwC^{w}.

Any FOL sentence mm in 𝐖I\mathbf{W}_{I} can then be expressed as a disjunction of such constituents.

Theorem 2 (FOL Representation in Infinite World [30]).

Any FOL sentence mm in 𝐖I\mathbf{W}_{I} can be represented as:

mj𝒞such that mCjCj,m\equiv\bigvee_{\begin{subarray}{c}j\in\mathcal{I}_{\mathcal{C}}\\ \text{such that }m\models C^{j}\end{subarray}}C^{j}, (6)

where 𝒞\mathcal{I}_{\mathcal{C}} is the index set of constituents.

As an illustration, consider the statement ”Every person owns a book.”, or more explicitly, ”For every x, if x is a person, then there exists a y such that y is a book and x owns y.”, formally expressed as:

m=x(Person(x)y(Book(y)Owns(x,y))).m=\forall x\left(Person(x)\rightarrow\exists y\left(Book(y)\land Owns(x,y)\right)\right). (7)

To represent mm as a disjunction of constituents, we identify all constituents CjC^{j^{\prime}} where mm holds. In each such constituent, for every entity xx satisfying Person(x)Person(x), the attributive constituent Ct(x)C^{\prime}_{t}(x) must include:

Ct(x)={[Person(x)y(Book(y)Owns(x,y))][i𝒬δi{(Qi(x,b)):b}], if Person(x)1,[¬Person(x)]i𝒬δi{(Qi(x,b)):b},otw.C^{\prime}_{t}(x)=\begin{cases}\left[Person(x)\land\exists y\left(Book(y)\land Owns(x,y)\right)\right]\land\\ \left[\bigwedge_{i\in\mathcal{I}_{\mathcal{Q}}}\delta_{i}\{(Q_{i}(x,b)):\exists b\}\right],\textbf{ if }Person(x)\equiv 1,\\ \\ \left[\lnot Person(x)\right]\land\bigwedge_{i\in\mathcal{I}_{\mathcal{Q}}}\delta_{i}\{(Q_{i}(x,b)):\exists b\},otw.\end{cases}

Each constituent CjC^{j^{\prime}} is then constructed as:

Cj=t=1j{Ct(x):x},C^{j^{\prime}}=\bigwedge_{t=1}^{j^{\prime}}\{C_{t}^{\prime}(x):\exists x\}, (8)

ensuring that all entities are accounted for. Therefore, mm can be expressed as:

mj𝒞such that mCjCj,m\equiv\bigvee_{\begin{subarray}{c}j^{\prime}\in\mathcal{I}_{\mathcal{C}}\\ \text{such that }m\models C^{j^{\prime}}\end{subarray}}C^{j^{\prime}}, (9)

State descriptions and constituents act like basis vectors in a vector space (as they partition the space and hence any two are mutually exclusive), allowing us to define a probability measure over them. The inductive logical probability of any sentence in the FOL language \mathcal{L} is then the sum of the probabilities of its constituents as per axioms of probability. In the next section, we will discuss how to define this probability measure using the frameworks proposed by Carnap and Hintikka.

II-B Inductive Logical Distribution on States

In FOL-based world 𝐖 (could be 𝐖F or 𝐖I)\mathbf{W}\text{ (could be }\mathbf{W}_{F}\text{ or }\mathbf{W}_{I}), we introduce a probability measure 𝒫I\mathcal{P}_{I} over the fundamental elements—either state descriptions 𝐬i\mathbf{s}_{i} or constituents CjC^{j}, collectively referred to as ”world states” SiS_{i}. Let Si𝒮S_{i}\in\mathcal{S} where 𝒮\mathcal{S} is the state space. To define this measure, we make use of inductive logical probabilities, which serve as inductive Bayesian posteriors, built upon empirical observations and a prior distribution.

Definition 5 (Inductive Logical Probability).

For any state SiS_{i} and corresponding evidence ee (observations) in 𝐖\mathbf{W}, the inductive logical probability (a.k.a., degree of confirmation) c(Si,e)c(S_{i},e) that evidence ee supports sentence SiS_{i} is defined as:

c(Si,e)=p(Sie)=p(Sie)p(e)=lSi+λ(wSi)/wSil+λ(wSi),c(S_{i},e)=p(S_{i}\mid e)=\frac{p(S_{i}\land e)}{p(e)}=\frac{l_{S_{i}}+\lambda(w_{S_{i}})/w_{S_{i}}}{l+\lambda(w_{S_{i}})}, (10)

where ll is the number of observations, lSil_{S_{i}} is the number of observations confirming state SiS_{i}, wSiw_{S_{i}} is the weight assigned to state SiS_{i}, λ(wSi)\lambda(w_{S_{i}}) is the prior coefficient, dependent on wSiw_{S_{i}}. The parameter λ(wSi)\lambda(w_{S_{i}^{\prime}}), ranging from 0 to \infty, balances the weight of prior against empirical data.

Then, the inductive logical probability measure 𝒫I\mathcal{P}_{I} over 𝒮\mathcal{S} is defined as 𝒫I={c(Si,e)i𝒮}\mathcal{P}_{I}=\{c(S_{i},e)\mid i\in\mathcal{I}_{\mathcal{S}}\} where i𝒮c(Si,e)=1\sum_{i\in\mathcal{I}_{\mathcal{S}}}c(S_{i},e)=1. In the following section, we make use of inductive logical probabilities to define a semantic information metric to effectively quantify semantic informativeness regarding states of the world.

II-C Measuring Semantic Uncertainty about the SotW

To quantify the remaining uncertainty after new observations, Carnap proposed the concept of ”cont-information”. This metric assesses how observations influence an observer’s understanding of the SotW by evaluating the extent to which they decrease observer’s uncertainty.

Definition 6.

The cont-information in a world 𝐖\mathbf{W} quantifies the informativeness of an observation ee to a specific observer by measuring the remaining uncertainty about the SotW. Given a state SiS_{i} and evidence ee (observations) in 𝐖\mathbf{W}, cont-information is defined as:

cont(Si;e)=1c(Si,e)=1p(Si|e),\text{cont}(S_{i};e)=1-c(S_{i},e)=1-p(S_{i}|e), (11)

In simple terms cont(Si;e)\text{cont}(S_{i};e) measures the uncertainty remaining about state SiS_{i} after observing e. Next, the cont-information measure is used to devise a communication framework. This measure captures the dynamic and cumulative nature of evidence accumulation in the communication and, as a result, the refinement in an observer’s understanding of the world state.

III Distributed Semantic Communication Framework

Refer to caption
Figure 1: Distributed semantic communication framework. Nodes exchange most semantically informative messages.

III-A Hypothesis Deduction Problem

We start by formalizing a hypothesis deduction problem at each user node NjN_{j} in the wireless network. Let j={hii}\mathcal{H}_{j}=\{h_{i}\mid i\in\mathcal{I}_{\mathcal{H}}\} be a finite or infinite set of self-consistent statements representing possible hypotheses about the world 𝐖\mathbf{W} at node NjN_{j}. There exists a hypothesis hh^{*}\in\mathcal{H} which is most consistent, among all other hypothesis, with the actual SotW. The statements hih_{i} can be represented by a non-empty disjunction of state descriptions or constituents, as per Theorem 1 and 2. At each node NjN_{j}, the task is to deduce the hypothesis hjh_{j}^{*} (the most consistent hypothesis with the SotW). However, as each node has incomplete information, this deduction cannot be concluded reliably. The aim of the communication is, then, to receive most cont-informative information regarding world state so as more accurately determine which hypothesis is supported the most by world (entity) population.

III-B System Model

As shown in Fig.1, we consider a distributed network comprising a set of edge nodes 𝒩={N1,N2,,Nv}\mathcal{N}=\{N_{1},N_{2},\dots,N_{v}\} and a central server CSCS. Each edge node possesses evidence eje_{j} obtained through observations of the world 𝐖\mathbf{W}, which can be finite (𝐖F\mathbf{W}_{F}) or infinite (𝐖I\mathbf{W}_{I}). Each node NjN_{j} maintains a local inductive probability distribution 𝒫I,tj\mathcal{P}^{j}_{I,t} over the state space 𝒮\mathcal{S}, reflecting its view of the State of the World (SotW) at round tt. Each node NjN_{j} aims to solve a hypothesis deduction problem j\mathcal{B}_{j}, involving determining the hypothesis hjh_{j}^{*} most supported by evidence from a set of possible hypotheses. Since the nodes have incomplete observations about SotW, they wish to collaborate with each other through a central server to share some information with each other to improve each node’s local distribution over states. However, due to communication constraints, every node can transmit only a limited amount of information to the server in each communication round. The central server CSCS aggregates messages from all nodes to update its own perception of the SotW, denoted by 𝒫I,tCS\mathcal{P}^{CS}_{I,t}. The server then selects a message and broadcasts it back to the nodes to aid them in refining their local distributions. In the next section, we turn our attention to, for each round t of communication: (i) how a node chooses its message for transmission to the server, (ii) how the server updates its perceived SotW upon receiving all incoming messages from the nodes, (iii) how the server selects which message to broadcast to all nodes, (iv) how a node updates it’s perceived SotW upon receiving the message from the server, and (v) how the hypothesis deduction problem is solved.

III-C Communication Protocol

The communication occurs in iterative rounds, each consisting of two phases: an uplink phase and a downlink phase.

Uplink Phase

In round tt, each node NjN_{j} selects a message mj,tm^{*}_{j,t} (which is in the form of a FOL sentence) to transmit to the server CSCS. The message mj,tm^{*}_{j,t} is chosen to maximize the semantic content informativeness with respect to the node’s observations eje_{j}, while considering previously received messages {mcs,z}z=1t1\{m_{cs,z}\}_{z=1}^{t-1} from the server and previously transmitted messages to the server {mj,z}z=1t1\{m_{j,z}\}_{z=1}^{t-1}. The optimization objective for node NjN_{j} becomes:

mj,t=argminmz=1t1mcs,zmj,zcont(ejz=1t1mcs,z,m)s.t. |m|B,m_{j,t}^{*}=\mathop{\mathrm{arg\,min}}_{m\notin\bigcup_{z=1}^{t-1}m_{cs,z}\land m_{j,z}}\text{cont}(e_{j}\land\bigcup_{z=1}^{t-1}m_{cs,z},m)\quad\text{s.t. }|m|\leq B, (12)

where |m||m| is the number of FOL sentences message mm contains.

Downlink Phase

Upon receiving messages {mj,t}j=1v\{m_{j,t}\}_{j=1}^{v} from all nodes, the server CSCS updates its distribution on state space 𝒫I,tCS\mathcal{P}^{CS}_{I,t} based on the aggregated information:

𝒫I,tCS=Update(𝒫I,t1CS,{mj,t}k=1K),\mathcal{P}^{CS}_{I,t}=\text{Update}(\mathcal{P}^{CS}_{I,t-1},\{m^{*}_{j,t}\}_{k=1}^{K}),

where Update function involves Bayesian updating (c.f., eqn. 10 and 18) to incorporate new evidence and adjust probabilities over 𝒮\mathcal{S}.

The server then selects a message mCS,tm_{CS,t} to broadcast, containing information not previously sent out:

mCS,t=argminmz=1t1mcs,z cont(z=1t{mj,z}j=1V,m)s.t. |m|B,m_{CS,t}^{*}=\mathop{\mathrm{arg\,min}}_{m\notin\bigcup_{z=1}^{t-1}m_{cs,z}}\text{ cont}(\bigcup_{z=1}^{t}\{m^{*}_{j,z}\}_{j=1}^{V},m)\quad\text{s.t. }|m|\leq B, (13)

while considering previously broadcasted messages from server {mcs,z}z=1t1\{m_{cs,z}\}_{z=1}^{t-1}. Upon receiving mCS,tm_{CS,t}^{*}, node NjN_{j} updates its distribution:

𝒫I,tj=Update(𝒫I,t1j,mCS,t).\mathcal{P}^{j}_{I,t}=\text{Update}(\mathcal{P}^{j}_{I,t-1},m_{CS,t}^{*}).

This iterative process allows nodes to progressively refine their understanding of the SotW. Once the communication concludes, nodes perform deduction.

Hypothesis Deduction

After TT rounds of communication, each node NjN_{j} choose the hypothesis best supported by evidence as:

hj=argmaxhc(h,ejz=1Tmcs,z)h_{j}^{*}=\mathop{\mathrm{arg\,max}}_{h}c(h,e_{j}\land\bigcup_{z=1}^{T}m_{cs,z}) (14)

In the next section, we turn our attention to the theoretical properties of this communication scheme.

III-D Convergence Analysis Under Finite Evidence

In this section, we examine the convergence behavior of the communication framework under finite evidence to determine the relationship between empirical and true distribution over state space 𝒮\mathcal{S}. To preserve generality, we assume an infinite universe 𝐖I\mathbf{W}_{I} and conduct the analysis over constituents.

Let ej,lce_{j,l}^{c} describe a finite sample of ll individual observations, containing cc distinct attributive constituents accumulated at node NjN_{j} after TT rounds of communication. A constituent CwC^{w} of width ww is compatible with ej,lce_{j,l}^{c} only if cwKc\leq w\leq K, where K=4|P|×|E|2/2K=4^{|P|\times|E|^{2}}/2.

To compute the posterior distribution

𝒫I,Tj={c(Cw,ej,lc)cwK},\mathcal{P}^{j}_{I,T}=\{c(C^{w},e_{j,l}^{c})\mid c\leq w\leq K\}, (15)

one needs to specify the prior c(Cw,)c(C^{w},\emptyset) and the likelihood c(ej,lc,Cw)c(e_{j,l}^{c},C^{w}). A natural method to assign the prior c(Cw,)=P(Cw)c(C^{w},\emptyset)=P(C^{w}) is to assume it is proportional to (wK)α\left(\frac{w}{K}\right)^{\alpha}, reflecting the assumption that there are α\alpha individuals compatible with a particular constituent CwC^{w}. This approach captures the intuition that constituents with larger widths (i.e., involving more attributive constituents) are more probable if we believe there are more individuals exemplifying them. The explicit equation for the prior P(Cw)P(C^{w}) is:

c(Cw,)=P(Cw)=Γ(α+wK)Γ(wK)/i=0K(Ki)Γ(α+iK)Γ(iK),c(C^{w},\emptyset)=P(C^{w})=\frac{\Gamma\left(\alpha+\frac{w}{K}\right)}{\Gamma\left(\frac{w}{K}\right)}\bigg{/}\sum_{i=0}^{K}\binom{K}{i}\frac{\Gamma\left(\alpha+\frac{i}{K}\right)}{\Gamma\left(\frac{i}{K}\right)}, (16)

where Γ()\Gamma(\cdot) is the gamma function, and α0\alpha\geq 0 is a parameter reflecting our prior belief about the number of individuals compatible with CwC^{w}.

The likelihood P(ej,lcCw)P(e_{j,l}^{c}\mid C^{w}) is computed based on the number lzl_{z} of observed samples confirming each attributive constituent Cz(x)C_{z}(x) in the evidence ej,lce_{j,l}^{c}. It is given by:

c(ej,lc,Cw)=P(ej,lcCw)=Γ(λ(w))Γ(l+λ(w))z=1cΓ(lz+λ(w)w)Γ(λ(w)w),c(e_{j,l}^{c},C^{w})=P(e_{j,l}^{c}\mid C^{w})=\frac{\Gamma(\lambda(w))}{\Gamma(l+\lambda(w))}\prod_{z=1}^{c}\frac{\Gamma\left(l_{z}+\frac{\lambda(w)}{w}\right)}{\Gamma\left(\frac{\lambda(w)}{w}\right)}, (17)

where λ\lambda is a prior coefficient, possibly dependent on ww. This likelihood reflects how well constituent CwC^{w} explains the observed evidence ej,lce_{j,l}^{c}.

Using the prior and likelihood, the posterior probability of constituent CwC^{w} given ej,lce_{j,l}^{c} is:

c(Cw,ej,lc)=P(Cwej,lc)=P(Cw)P(ej,lcCw)w𝒞P(Cw)P(ej,lcCw),c(C^{w},e_{j,l}^{c})=P(C^{w}\mid e_{j,l}^{c})=\frac{P(C^{w})\,P(e_{j,l}^{c}\mid C^{w})}{\sum_{w^{\prime}\in\mathcal{I}_{\mathcal{C}}}P(C^{w^{\prime}})\,P(e_{j,l}^{c}\mid C^{w^{\prime}})}, (18)

where the sum in the denominator runs over all constituents compatible with the language and 𝒞\mathcal{I}_{\mathcal{C}} is the index set of constituents.

Using the prior in equation (16) and the likelihood in equation (17), we can analyze the convergence of the posterior probability P(Cwej,lc)P(C^{w}\mid e_{j,l}^{c}) as the number of observations ll increases. Specifically, these equations are used to derive a Probably Approximately Correct (PAC) bound that determines the minimum number of samples required to consider the minimal constituent CcC^{c} to be approximately correct with high probability. (Despite omitted from this manuscript due to space concerns, it had been proven that asymptotically, the constituent with smallest width, a.k.a. the minimal constituent, receives probability 1 while all other constituents receive probability 0. See [31] for further discussion and proof.)

Theorem 3 (PAC-Bound for Constituents [31]).

Given evidence ej,lce_{j,l}^{c} of size ll, let l0l_{0} be such that P(Ccelc)1ϵP(C^{c}\mid e_{l}^{c})\geq 1-\epsilon for some ll0l\geq l_{0}. Then:

ϵmax0cK1{i=1Kc(Kci)(cc+i)lα},\epsilon^{\prime}\leq\max_{0\leq c\leq K-1}\left\{\sum_{i=1}^{K-c}\binom{K-c}{i}\left(\frac{c}{c+i}\right)^{l-\alpha}\right\}, (19)

where ϵ=ϵ1ϵ\epsilon^{\prime}=\frac{\epsilon}{1-\epsilon}.

This PAC bound provides a theoretical guarantee on the convergence of the posterior distribution 𝒫I,Tj\mathcal{P}^{j}_{I,T} to true distribution on the state space 𝒮\mathcal{S} as more evidence is accumulated. The detailed derivation of this bound relies on the prior and likelihood expressions, as also discussed in the next section. Now, we turn our attention to showing that DISCD is superior compared to random transmissions.

III-E Effectiveness of Proposed Framework

In this section, we compare two communication strategies under sentence level communication constraints, from the perspective of a node NjN_{j}, as detailed by the next theorem.

Theorem 4 (Effectiveness of DISCD Framework).

Given a fixed communication budget allowing the transmission of BB FOL sentences over the entire course of communication (e.g., in TT rounds) by the central server, consider the following two strategies:

1. DISCD Message Selection: Let ej,conte_{j,cont} constitute the BB sentences with the highest degree of confirmation that were selected and broadcast by central server during communication.

2. Random Message Selection: Let ej,rande_{j,rand} constitute BB sentences selected uniformly at random and broadcast by central server during communication.

Assume the minimal constituent CcC^{c} corresponds to the true state of the world. We establish the following theorem.

  1. (a)

    The posterior probability of the minimal constituent satisfies

    c(Cc,ej,cont)c(Cc,ej,rand).c(C^{c},e_{j,cont})\geq c(C^{c},e_{j,rand}). (20)
  2. (b)

    The evidence ej,conte_{j,cont} yields a more accurate posterior distribution 𝒫I,Tj,cont\mathcal{P}^{j,cont}_{I,T} over constituents and yields a tighter PAC bound (as given by Theorem 3) compared to that of ej,rande_{j,rand}, e.g., ϵcontϵrand\epsilon^{\prime}_{cont}\leq\epsilon^{\prime}_{rand}.

Proof.

The posterior distributions 𝒫I,Tj,cont\mathcal{P}^{j,cont}_{I,T} and 𝒫I,Tj,rand\mathcal{P}^{j,rand}_{I,T} are given by (15) and (18). The likelihood P(ej,contCw)P(e_{j,cont}\mid C^{w}) depends on how well constituent CwC^{w} explains the evidence ej,conte_{j,cont}, and similarly for P(ej,randCw)P(e_{j,rand}\mid C^{w}). ej,conte_{j,cont} affects the likelihoods such that P(ej,contCc)P(e_{j,cont}\mid C^{c}) is higher because CcC^{c} aligns closely with the specific observations in ej,conte_{j,cont}. For incorrect constituents CwCcC^{w}\neq C^{c}, P(ej,contCw)P(e_{j,cont}\mid C^{w}) is lower due to poorer alignment with the detailed evidence as per (6). Therefore, the numerator P(Cw)P(ej,contCw)P(C^{w})P(e_{j,cont}\mid C^{w}) increases, while the denominators adjust based on the altered likelihoods, leading to:

P(Cwej,cont)P(Cwej,rand).P(C^{w}\mid e_{j,cont})\geq P(C^{w}\mid e_{j,rand}). (21)

Similarly, the posterior probabilities for incorrect constituents decrease under ej,conte_{j,cont}, resulting in a more accurate posterior distribution. From the convergence analysis and Theorem 3, we can observe that the PAC bound depends on ll, the number of observations, and the likelihoods. As evidence ej,conte_{j,cont} increases with ll, it yields increasingly more discriminative likelihoods (i.e., higher probability concentration around more accurate constituents and the true constituent), tightening the PAC bound. Specifically, with higher ll and more informative likelihoods, the sum in the PAC bound inequality decreases:

ϵi=1Kc(Kci)(P(eCc+i)P(Cc+i)P(eCc)P(Cc)).\epsilon^{\prime}\leq\sum_{i=1}^{K-c}\binom{K-c}{i}\left(\frac{P(e\mid C^{c+i})P(C^{c+i})}{P(e\mid C^{c})P(C^{c})}\right). (22)

With ej,conte_{j,cont}, the ratio P(ej,contCc+i)P(ej,contCc)\frac{P(e_{j,cont}\mid C^{c+i})}{P(e_{j,cont}\mid C^{c})} becomes smaller due to the more discriminative likelihoods, leading to a smaller ϵ\epsilon^{\prime} (tighter error bound) compared to ej,rande_{j,rand}.

Elaborating on Theorem 4, it is observed that evidence ej,conte_{j,cont} affects the posterior in two ways. It provides more specific information and as the evidence is more informative (larger ll, larger lwl_{w}) and the likelihood function P(ej,contCw)P(e_{j,cont}\mid C^{w}) will often be higher for constituents that align closely with the evidence and lower for others. Also, ej,conte_{j,cont} may render some constituents impossible (assigning them a likelihood of zero) because they are inconsistent with the evidence.

Under ej,rande_{j,rand}, the likelihoods P(ej,randCw)P(e_{j,rand}\mid C^{w}) are less discriminative because random evidence provides less specific information. The differences in likelihoods between the true constituent and incorrect ones are smaller. As the likelihoods under ej,conte_{j,cont} are more discriminative, the posterior distribution P(Cwej,cont)P(C^{w}\mid e_{j,cont}) is more peaked around the true constituent CcC^{c}, assigning higher probability to CcC^{c} and lower probabilities to others.

In other words, ej,conte_{j,cont} improves discrimination. By providing more specific observations, evidence ej,conte_{j,cont} with high degree of confirmation enhances the differences in likelihoods between the true constituent CcC^{c} and incorrect constituents. The increased likelihood P(ej,contCc)P(e_{j,cont}\mid C^{c}) and decreased likelihoods P(ej,contCwCc)P(e_{j,cont}\mid C^{w}\neq C^{c}) result in a higher posterior probability for CcC^{c}. The posterior distribution over constituents becomes more concentrated around CcC^{c}, assigning lower probabilities to incorrect constituents and leading to a more accurate posterior. The improved likelihood ratios lead to a tighter PAC bound, reflecting a better estimation of CcC^{c}. In the next section, we comment on the impact of improved posterior estimation on hypothesis deduction task success.

III-F Analysis of Semantic Uncertainty and Task Success

Albeit discussions in this section are for a single node NjN_{j}, they can be trivially extended to multi node case. Each node NjN_{j} aims to minimize its risk (expected loss) R𝒫I,Tj(hj)R_{\mathcal{P}^{j}_{I,T}}(h_{j}) over j\mathcal{B}_{j}:

R𝒫I,Tj(hj)=Ehj𝒫I,Tj[L(hj,hj)],R_{\mathcal{P}^{j}_{I,T}}(h_{j})=E_{h_{j}\sim\mathcal{P}^{j}_{I,T}}[L(h_{j},h_{j}^{*})],

where L(hj,hj)L(h_{j},h_{j}^{*}) quantifies the discrepancy between hjh_{j} and hjh_{j}^{*} and can be selected as any appropriate distance metric.

As nodes exchange information and receive informative messages, the distributions 𝒫I,Tj\mathcal{P}^{j}_{I,T} converge towards 𝒫I,Tj\mathcal{P}^{j*}_{I,T} (true distribution), reducing expected loss and enhancing their ability to identify hjh_{j}^{*}, as hjw𝒞such that hjCwCwh_{j}\equiv\bigvee_{\begin{subarray}{c}w\in\mathcal{I}_{\mathcal{C}}\\ \text{such that }h_{j}\models C^{w}\end{subarray}}C^{w} and c(hj,ejz=1Tmcs,z)=wc(Cw,ejz=1Tmcs,z)c(h_{j},e_{j}\land\bigcup_{z=1}^{T}m_{cs,z})=\sum_{w}c(C^{w},e_{j}\land\bigcup_{z=1}^{T}m_{cs,z}).

Therefore, as evidence accumulates and as per Theorems 3 & 4, ej,conte_{j,cont} leads to tighter converge towards the true (minimal) constituent CcC^{c}, it better minimizes the Bayes risk R𝒫I,Tj(hj)R_{\mathcal{P}^{j}_{I,T}}(h_{j}) compared to random selection of messages ej,rande_{j,rand} (e.g., R𝒫I,Tj,cont(hj)R𝒫I,Tj,rand(hj)R_{\mathcal{P}^{j,cont}_{I,T}}(h_{j})\leq R_{\mathcal{P}^{j,rand}_{I,T}}(h_{j})), yielding larger hypothesis deduction success rate. In the next section, we experimentally verify these theoretical observations.

IV Experiment Results

The effectiveness of DISCD communication algorithm compared to random selection was tested by applying it to a custom deduction dataset. The relevant code and dataset are accessible on GitHub [32]. The custom deduction dataset describes a story where each node NjN_{j} has only a portion of it, i.e., possess incomplete information about the story. The story is divided across all nodes, each node receiving equal number of sentences. In order to reflect possible overlaps in the worldviews of different nodes, 30% of each node’s data is shared with at least one other whereas the remaining 70% is unique to that node. The task in each node is to determine, among 8 distinct hypothesis, the one most supported by the evidence (story) for the entities that are present in the node. However, as the information is incomplete, for some entities, nodes require information from others to be able to deduce the correct hypothesis.

During communication, each node identifies the most informative subset of BB sentences from a total of N=40N=40 sentences by solving the optimization problem in (12). Minimizing (12) requires calculating inductive logical probabilities, which, in turn necessitates counting the number of states in 𝒮\mathcal{S} a FOL expression satisfies. As the state space 𝒮\mathcal{S} is exponentially large, we utilize a state-of-the-art algorithmic boolean model counting tool sharpSAT-td to determine the number of states. To shed more light on how the this optimization works, consider the following example. Let node NjN_{j} has sentences {FOL1,,FOL40}\{FOL_{1},...,FOL_{40}\} in its dataset. Assuming B=1B=1, in round 0, the node calculates the number of states in 𝒮\mathcal{S} each sentence FOLiFOL_{i} satisfies, and determines their inductive logical probability. Based on those probabilities, NjN_{j} optimizes (12). After transmitting, say FOL3FOL_{3}, and receiving another sentence from the server, call it FOLcs,0FOL_{cs,0}, it continues with the round 1, where it calculates the probabilities for FOLcs,0FOL3FOLiFOL_{cs,0}\land FOL_{3}\land FOL_{i} for i1,2,4,,40i\in{1,2,4,...,40}. Then, another FOLiFOL_{i} is selected, transmitted, and so on. For more details on how this optimization algorithm works, see [33, 34]. In our experiments, we tested subsets with sizes B=1,2B=1,2. Once the communication concludes, the nodes perform their respective hypothesis deduction tasks, which is same for all nodes in our experiments. We measure the success rate with the ratio of the population (entities) where the correct hypothesis can be deduced. The results are presented in Fig. 2.

Refer to caption
Figure 2: Success rate in hypothesis deduction task vs. communication round

DISCD clearly outperforms random selection in hypothesis deduction performance, for both message sizes (B=1,2B=1,2). In other terms, DISCD consistently selects sentences that significantly improve hypothesis deduction accuracy compared to random transmissions. Indeed, it only takes 5 rounds of communication for DISCD-1 to achieve 46.5% success rate whereas it takes 22 rounds of communication for Random-1 to achieve the same performance. Similarly, it takes 13 rounds to achieve 62.1% success rate for DISCD-2 whereas the same rate of success is achieved for Random-2 only after 34 rounds. After 40 rounds of communication, both DISCD-1 and DISCD-2 achieve 75.0% performance whereas random-1 can only achieve 70.0%. This demonstrates that DISCD can achieve similar performance in less number of rounds, saving from communication.

An important clarification shall be made regarding why the performance stalls from time to time. As each hypothesis has multiple premises which may require multiple pieces of essential information to be transmitted, it can take a number of rounds of communication until the performance improves. However, DISCD, thanks to rapid selection of important semantic information regarding SotW, improves accuracy rapidly before stalling, whereas in random selection, the stalling is distributed, and it takes longer to achieve the same level of performance.

Another clarification regards the distinction between DISCD-1 and DISCD-2. In DISCD-2, two FOL sentences are transmitted from nodes to server per round and server broadcasts two sentences back to nodes. This implies what sentences are selected and in which order can be different between the two cases and hence accounts for the different rates of increase in hypothesis deduction performance.

As the communication machinery, i.e., “compression and channel coding” is common between random method and our method, we did not consider them in comparative results. However, based on the number of sentences transmitted, we studied the average communication costs per node for both schemes for different levels of accuracies in deduction. The results are presented at Table I.

Success Rate DISCD-1 Random-1 DISCD-2 Random-2
40% 70.9 bits 402.3 bits 94.6 bits 354.9 bits
45% 141.9 bits 496.9 bits 141.9 bits 449.6 bits
50% 307.6 bits 544.3 bits 212.9 bits 638.9 bits
55% 402.3 bits 591.6 bits 260.3 bits 757.3 bits
60% 473.3 bits 828.3 bits 283.9 bits 780.9 bits
65% 591.6 bits 851.9 bits 567.9 bits 804.6 bits
70% 780.9 bits 946.6 bits 686.3 bits 875.6 bits
75% 946.6 bits N/A 828.3 bits 946.6 bits
TABLE I: Success Rate vs. Communication Cost (uplink, per node (average))

Based on Table I, DISCD-1 method, on average, saves approximately 270.49 bits of communication per node compared to random selection Random-1. Similarly, DISCD-2 method, on average, saves approximately 316.56 bits of communication compared to random selection Random-2.

In conclusion, the experiments demonstrate that the proposed semantic communication algorithm DISCD effectively selects and transmits the most informative sentences, achieving substantial success for deductive hypothesis selection task with fewer bits compared to random selection faster. Furthermore, these results validates the theoretical findings regarding convergence, effectiveness and task success. DISCD framework better informs the nodes regarding SotW, leading to better task success.

References

  • [1] Deniz Gündüz, Zhijin Qin, Inaki Estella Aguerri, Harpreet S. Dhillon, Zhaohui Yang, Aylin Yener, Kai Kit Wong, and Chan-Byoung Chae. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE Journal on Selected Areas in Communications, 41(1):5–41, 2023.
  • [2] Jihong Park, Jinho D. Choi, Seong-Lyun Kim, and Mehdi Bennis. Enabling the wireless metaverse via semantic multiverse communication. 2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pages 85–90, 2022.
  • [3] Omar Hashash, Christina Chaccour, Walid Saad, Kei Sakaguchi, and Tao Yu. Towards a decentralized metaverse: Synchronized orchestration of digital twins and sub-metaverses. ICC 2023 - IEEE International Conference on Communications, pages 1905–1910, 2022.
  • [4] Jaakko Hintikka. A two-dimensional continuum of inductive methods*. Studies in logic and the foundations of mathematics, 43:113–132, 1966.
  • [5] Yashas Malur Saidutta, Afshin Abdi, and Faramarz Fekri. Joint source-channel coding over additive noise analog channels using mixture of variational autoencoders. IEEE Journal on Selected Areas in Communications, 39(7):2000–2013, 2021.
  • [6] Zhenzi Weng, Zhijin Qin, and Geoffrey Ye Li. Semantic communications for speech recognition. In 2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6, 2021.
  • [7] Chao Zhang, Hang Zou, Samson Lasaulce, Walid Saad, Marios Kountouris, and Mehdi Bennis. Goal-oriented communications for the iot and application to data compression. IEEE Internet of Things Magazine, 5(4):58–63, 2022.
  • [8] Başak Güler, Aylin Yener, and Ananthram Swami. The semantic communication game. IEEE Transactions on Cognitive Communications and Networking, 4(4):787–802, 2018.
  • [9] Zhijin Qin, Xiaoming Tao, Jianhua Lu, and Geoffrey Y. Li. Semantic communications: Principles and challenges. ArXiv, abs/2201.01389, 2021.
  • [10] Inaki Estella Aguerri and Abdellatif Zaidi. Distributed variational representation learning. IEEE transactions on pattern analysis and machine intelligence, 43(1):120–138, 2019.
  • [11] Abdellatif Zaidi and Inaki Estella Aguerri. Distributed deep variational information bottleneck. In 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pages 1–5. IEEE, 2020.
  • [12] An Xu, Zhouyuan Huo, and Heng Huang. On the acceleration of deep learning model parallelism with staleness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2088–2097, 2020.
  • [13] Vipul Gupta, Dhruv Choudhary, Ping Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, and Michael W Mahoney. Training recommender systems at scale: Communication-efficient model and data parallelism. arXiv preprint arXiv:2010.08899, 2020.
  • [14] Mounssif Krouka, Anis Elgabli, Chaouki ben Issaid, and Mehdi Bennis. Communication-efficient split learning based on analog communication and over the air aggregation. In 2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6. IEEE, 2021.
  • [15] Huiqiang Xie, Zhijin Qin, Geoffrey Y. Li, and Biing-Hwang Juang. Deep learning enabled semantic communication systems. IEEE Transactions on Signal Processing, 69:2663–2675, 2020.
  • [16] Yulin Shao, Qingqing Cao, and Deniz Gündüz. A theory of semantic communication. ArXiv, abs/2212.01485, 2022.
  • [17] Gangtao Xin, Pingyi Fan, and Khaled B. Letaief. Semantic communication: A survey of its theoretical development. Entropy, 26(2), 2024.
  • [18] Shengteng Jiang, Yueling Liu, Yichi Zhang, Peng Luo, Kuo Cao, Jun Xiong, Haitao Zhao, and Jibo Wei. Reliable semantic communication system enabled by knowledge graph. Entropy, 24(6), 2022.
  • [19] Luciano Floridi. Outline of a theory of strongly semantic information. SSRN Electronic Journal, 01 2004.
  • [20] Gangtao Xin and Pingyi Fan. Exk-sc: A semantic communication model based on information framework expansion and knowledge collision. Entropy, 24(12), 2022.
  • [21] Zhouxiang Zhao, Zhaohui Yang, Mingzhe Chen, Zhaoyang Zhang, and H. Vincent Poor. A joint communication and computation design for probabilistic semantic communications. Entropy, 26(5), 2024.
  • [22] Huiqiang Xie, Zhijin Qin, and Geoffrey Y. Li. Semantic communication with memory. IEEE Journal on Selected Areas in Communications, 41:2658–2669, 2023.
  • [23] Qiyang Zhao, Mehdi Bennis, Mérouane Debbah, and Daniella Harth da Costa. Semantic-native communication: A simplicial complex perspective. 2022 IEEE Globecom Workshops (GC Wkshps), pages 1513–1518, 2022.
  • [24] Siheng Xiong, Yuan Yang, Faramarz Fekri, and James Clayton Kerce. Tilp: Differentiable learning of temporal logical rules on knowledge graphs. In The Eleventh International Conference on Learning Representations.
  • [25] Siheng Xiong, Yuan Yang, Ali Payani, James C Kerce, and Faramarz Fekri. Teilp: Time prediction over knowledge graphs via logical reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16112–16119, 2024.
  • [26] Yuan Yang, Siheng Xiong, Ali Payani, James C Kerce, and Faramarz Fekri. Temporal inductive logic reasoning over hypergraphs. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pages 3613–3621, 2024.
  • [27] Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, and Faramarz Fekri. Harnessing the power of large language models for natural language to first-order logic translation. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6942–6959, Bangkok, Thailand, August 2024. Association for Computational Linguistics.
  • [28] Siheng Xiong, Ali Payani, Ramana Kompella, and Faramarz Fekri. Large language models can learn temporal reasoning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10452–10470, Bangkok, Thailand, August 2024. Association for Computational Linguistics.
  • [29] Rudolf Carnap and Yehoshua Bar-Hillel. An outline of a theory of semantic information. 1952.
  • [30] Jaakko Hintikka. Distributive normal forms in first-order logic. In J.N. Crossley and M.A.E. Dummett, editors, Formal Systems and Recursive Functions, volume 40 of Studies in Logic and the Foundations of Mathematics, pages 48–91. Elsevier, 1965.
  • [31] Jaakko Hintikka and Risto Hilpinen. Knowledge, acceptance, and inductive logic. In Jaakko Hintikka and Patrick Suppes, editors, Aspects of Inductive Logic, volume 43 of Studies in Logic and the Foundations of Mathematics, pages 1–20. Elsevier, 1966.
  • [32] https://github.com/ahmetfsaz/DISCD. Accessed: 2024-11-28.
  • [33] Ahmet Faruk Saz, Siheng Xiong, and Faramarz Fekri. Lossy semantic communication for the logical deduction of the state of the world. arXiv preprint arXiv:2410.01676, 2024.
  • [34] Ahmet Faruk Saz, Siheng Xiong, Yashas Malur Saidutta, and Faramarz Fekri. Model-theoretic logic for mathematical theory of semantic information and communication. arXiv preprint arXiv:2401.17556, 2024.