Subsethood Measures of Spatial Granules

Liquan Zhao, Yiyu Yao Liqan Zhao and Yiyu Yao are with the Department of Computer Science, University of Regina, Regina, Saskatchewan S4S 0A2, Canada (e-mail: bushzhao@gmail.com; yiyu.yao@uregina.ca)

Abstract

Subsethood, which is to measure the degree of set inclusion relation, is predominant in fuzzy set theory. This paper introduces some basic concepts of spatial granules, coarse-fine relation, and operations like meet, join, quotient meet and quotient join. All the atomic granules can be hierarchized by set-inclusion relation and all the granules can be hierarchized by coarse-fine relation. Viewing an information system from the micro and the macro perspectives, we can get a micro knowledge space and a micro knowledge space, from which a rough set model and a spatial rough granule model are respectively obtained. The classical rough set model is the special case of the rough set model induced from the micro knowledge space, while the spatial rough granule model will be play a pivotal role in the problem-solving of structures. We discuss twelve axioms of monotone increasing subsethood and twelve corresponding axioms of monotone decreasing supsethood, and generalize subsethood and supsethood to conditional granularity and conditional fineness respectively. We develop five conditional granularity measures and five conditional fineness measures and prove that each conditional granularity or fineness measure satisfies its corresponding twelve axioms although its subsethood or supsethood measure only hold one of the two boundary conditions. We further define five conditional granularity entropies and five conditional fineness entropies respectively, and each entropy only satisfies part of the boundary conditions but all the ten monotone conditions.

Index Terms:

Subsethood, supsethood, fuzzy set, rough set, granularity, fineness, conditional granularity, conditional fineness, conditional granularity entropy, conditional fineness entropy.

I Introduction

SUBSETHOOD was first used to measure fuzzy sets, and it is denoted by a bivalent function to show the degree of a fuzzy set being a subset of another fuzzy set [1, 2, 3, 4, 5]. Kosko [5, 6, 7, 8] generalized this concept and defined a multivalent subsethood measure. Subsethood has drawn the attention of many scholars who related subsethood with entropy [5, 9, 10, 11, 12], distance measure [11, 13, 14], similarity measure [14, 15, 16, 17] and logical implication [18, 19, 20, 21, 22, 23]. Most of subsethood studies focus on fuzzy sets and there are only a few of them in rough sets. What’s more, these studies mainly discussed the desired properties of subsethood measures or weak subsethood measures and paid little attention to the construction of specific measures. Yao and Deng [24] constructed subsethood measures of two sets based on two views: one is different equivalent expressions of the condition $A\subseteq B$ and the other is the grouping of objects based on two sets $A$ and $B$ . When applying subsethood to rough sets, it shows the graded set-inclusion relation of different sets, they are quantitative generalizations of the set-inclusion relation and can be used to distinguish those sets with same size in some degree.

A partition is the simplest granulation scheme and hence measurement of partitions has been proposed and studied. Yao and Zhao [25] divide these measures into two classes: information-theoretic measures and interaction-based measures. Hartley entropy and Shannon entropy are typical representatives of information-theoretic measures. Although Hartely entropy coincides with the Shannon entropy in the case of a uniform probability distribution, Klir and Golger [26] pointed out they are semantic differences. Shannon entropy is a measure of information induced by a probability distribution while Hartley entropy is a measure of nonspecificity of a finite set. Their uses as measures of the granularity of partitions were suggested and examined in [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]. Interaction based measures count the number of interacting pairs of elements of a universal set under a partition. Each pair in the equivalence relation is counted as one interaction, and the size of the equivalence relation denotes the total number of interactions. Miao and Fan [43] first defined an interaction based measure of granularity of a partition which may be interpreted as a normalized cardinality of an equivalence relation. Many authors studied this measure and extended it [25, 31, 32, 33, 34, 39, 44]. However, the extensions mainly focus on non-equivalence relations.

Granular computing (GrC) is not an algorithm or process but an idea, and, in fact, this idea has been permeated through every computing theory since the very beginning. The definition or construction of information granules is one of the basic issues of GrC. By Merriam-webster dictionary, the word “granule” has two meanings: one is a small particle, and the other is one of numerous particles forming a larger unit. People generally choose its first meaning, that is, a granule is defined as a simple crisp or fuzzy set. Zhao [45, 46] first introduced its second meaning as the general definition of granules, and extended the partitions to equivalence granules and the finite set to infinite set as well. He think a granule is made up of one or more atomic granules, which are indivisible under the giving subdivision rule. However, these atomic granules may be divisible under its finer subdivision rules, that is to say whether an atomic granule is divisible or not is relative. There are structural and nonstructural relationships between the atomic granules. This is a structural definition which can show the spatiality of a granule, and the granules defined by this way is called the spatial granules so as to distinguish from the granules defined by the previous way.

The contribution and organization of this paper is organized as follows:

In Section II, we introduce the basic notions of granules, coarse-fine relation, which is the generalization of set-inclusion relation, and operations like meet, join, quotient meet and quotient join, which are generalizations of intersection or union. All the atomic granules can be hierarchized by set-inclusion relation, and all the granules can be hierarchized by coarse-fine relation. Given an information system, when performing the micro and macro granular analysis on it, we can generate a micro knowledge space and a macro knowledge space, from which a rough set model and a spatial rough granule model are respectively induced. The rough set model can be used for incomplete and complete information systems on any domain, and the classical rough set model is the special case of this one. The coarse-fine relation is the key to the success of hierarchical machine learning algorithms, and the spatial rough granule model will play a very important role in the structure problem solving. All the atomic granules can be hierarchized in a plane by set inclusion relation, and all granules can be hierarchized in an $n$ -dimensional space by coarse-fine relation.

In Section III, we discuss twelve properties of monotonically increasing subsethood and twelve corresponding properties of monotonically decreasing subsethood not only for atomic granules but also for granules, and the properties can be divided into two classes: boundary conditions and monotone conditions. The five monotonically increasing subsethood measures satisfy only one of the two boundary conditions but all ten monotone conditions. We construct five monotonically decreasing subsethood measures for atomic granules, and each one satisfies one or both the boundary conditions and ten monotonically decreasing conditions. Conditional granularity and conditional fineness are introduced to measure the coarse-fine relation between two granules. Conditional granularity is defined as the expectation of monotonically increasing subsethood of atomic granules with respect to the probability distribution of the meet of the two granules, and conditional fineness is defined as the expectation of monotonically decreasing subsethood of atomic granules with respect to the probability distribution of the meet of the two granules. We construct five conditional granularity measures and five conditional fineness measures and prove that each measure satisfies its corresponding twelve properties. Conditional granularity entropy and conditional fineness entropy are defined by their corresponding subsethood and the probability distribution of the meet of the two granules, where the five conditional granularity entropies satisfy part of the boundary conditions and ten monotonically increasing conditions and the five conditional fineness entropies satisfy part of the boundary conditions and ten monotonically decreasing conditions.

II A Model of Spatial Granules

II-A Preliminaries

Given a universe of discourse $X=\{x_{1},\cdots,x_{n}\}$ , the granules and binary relations on $X$ are one-to-one corresponding, where the granules corresponding to fuzzy equivalence relations are called fuzzy equivalence granules and the granules corresponding to equivalence relations are called equivalence granules. Each equivalence granule is a partition of a subset of $X$ , and, in particular, a partitions of $X$ is also called a quotient granule on $X$ . For the sake of simplicity, we only discuss equivalence granules in this paper, that is, the atomic granules of a granule are its equivalence classes.

Assume $A$ and $B$ are two subsets of $X,R_{A}$ and $R_{B}$ are equivalence relations on $A$ and $B$ respectively, and the equivalence granules corresponding to $R_{A}$ and $R_{B}$ are $A_{R}=\{a_{1},\cdots,a_{k}\}$ and $B_{R}=\{b_{1},\cdots,b_{l}\}$ respectively. For convenience, $A_{R}$ can also be denoted by $A$ , and use granule $A$ or set $A$ to to distinguish them so as not to cause ambiguity, that is, the granule $A$ is a partition of the set $A$ . The operations of meet, join, quotient meet and quotient join are respectively defined as follow:

Definition II.1.

1.

$A\wedge B$ is called the meet of $A$ and $B$ , which is the granule corresponding to $R_{A}\cap R_{B}$ ;
2.

$A\vee B$ is called the join of $A$ and $B$ , which is the granule corresponding to $R_{A}\cup R_{B}$ ;
3.

$A\wedge_{t}B$ is called the quotient meet of $A$ and $B$ , which is the granule corresponding to $t(R_{A}\cap R_{B})$ , the transitive closure of $R_{A}\cap R_{B}$ ;
4.

$A\vee_{t}B$ is called the quotient join of $A$ and $B$ , which is the granule corresponding to $t(R_{A}\cup R_{B})$ , the transitive closure of $R_{A}\cup R_{B}$ .

Where the quotient meet and quotient join operations are for (fuzzy) equivalence granules while the meet and join operations are for other granules. Obviously, for equivalence granules, the quotient meet is the same with meet but the join and quotient join are different.

II-B Rough Set Model in Micro Knowledge Space

Given an information system $I=(X,\textit{{R}})$ , where $\textit{{R}}=\{R_{1},\cdots,R_{m}\}$ is a family of equivalence relations on subsets of $X=\{x_{1},\cdots,x_{n}\}$ . This information system can be viewed from the micro and the macro perspectives respectively. From the micro perspective, we think about all the subsets of $X$ , denoted as $\sigma(X)$ . $(\sigma(X),\supseteq)$ is a complete lattice, and all the elements in $\sigma(X)$ can be hierarchized under set inclusion relation.

Assume the equivalence granules corresponding to $R_{i}$ are $P_{i}(i=1,\cdots,m)$ , respectively, $R$ is the intersection of all $R_{i}(i=1,\cdots,m)$ , and $P$ is the quotient meet of all $P_{i}(i=1,\cdots,m)$ . For any $A\in\sigma(X),A$ is called $R$ -definable if it is one of the equivalence classes in $P$ or a union of two or more equivalence classes in $P$ . Assume $d(\sigma(X))$ is a family of all definable sets in $\sigma(X)$ and $d_{0}(\sigma(X))$ is a family of the empty set and all definable sets. Then $(d_{0}(\sigma(X)),\supseteq)$ is a complete bounded sublattice of $(\sigma(X),\supseteq)$ , and $d_{0}(\sigma(X))$ , which is closed under under union and intersection operations, is called the micro knowledge space generated from $I=(X,\textit{{R}})$ . Therefore, $\sigma(X)$ can be divided into two categories: $d(\sigma(X))$ and $\widetilde{d}(\sigma(X))$ , i.e., the family of all undefinable sets. By rough set theory, $\widetilde{d}(\sigma(X))$ can be further divided into $d_{r}(\sigma(X))$ , i.e., the set of roughly definable sets, and $\widetilde{d}_{r}(\sigma(X))$ , i.e., the set of roughly or totally undefinable sets.

Definition II.2.

For any $A\in\sigma(X)$ , the lower and upper approximations of $A$ with respect to $R$ can be defined as: for every $B\in d(\sigma(X))$ ,

	$\displaystyle\underline{R}(A)$	$\displaystyle=\bigcup\{A\cap B\mid A\supseteq B\},$
	$\displaystyle\overline{R}(A)$	$\displaystyle=\bigcap\{A\cup B\mid B\supseteq A\}.$		(1)

Obviously, for any $A\in\sigma(X)$ , its upper approximation is to find its least upper bound in $d_{0}(\sigma(X))$ , and its lower approximation is to find the greatest lower bound in $d_{0}(\sigma(X))$ . $(\underline{R}(A),\overline{R}(A))$ is called an approximation space of $A$ .

When $I$ is complete, i.e., all $R_{i}(i=1,\cdots,n)$ are equivalence relations on $X$ , every atomic granule in $d(\sigma(X))$ can be obtained from the atomic granules of $P$ , and then we can replace $d(\sigma(X))$ with $P$ . However, we should examine two extreme cases: $\forall B\in P,A\supset B$ and $\forall B\in P,B\supset A$ . We can define its upper approximation as $A$ in the first case and define its lower approximation as $A$ in the second case. When $I$ is incomplete, not all of the atomic granules in $d(\sigma(X))$ can be obtained from the atomic granules of $P$ . Therefore, we cannot replace $d(\sigma(X))$ with $P$ . It can be seen that the classical rough set model is only for complete information systems while the above model is not only for complete information systems but also for incomplete information systems. When $X$ a domain, we can divide it into $n$ subdomains, which can be regarded as $n$ objects, and the above model is also applicable. All the extended models developed from the classical rough set model can be accordingly defined by $d(\sigma(X))$ so as to be applicable to any information system, which will be discussed in another paper.

II-C Rough granule Model in Macro Knowledge Space

Assume $\Pi(\sigma(X))$ is the family of all equivalence granules on $X$ and $\Pi_{0}(\sigma(X))$ is the family of the empty granule and all equivalence granules on $X$ . Then viewing $I$ from the macro perspective, the whole space is $\Pi_{0}(\sigma(X))$ . There is no set inclusion relation between two granules, and we must define new relation.

Definition II.3.

For any two equivalence relations $R_{A},R_{B}$ over subsets of $X$ , assume that their corresponding equivalence granules are $A$ and $B$ , respectively.

1.

If $x,y\in X,xR_{A}y\rightarrow xR_{B}y,$ then $B$ is coarser than $A$ (or $A$ is finer than $B$ ), denoted by $B\succeq A$ (or $A\preceq B$ );
2.

If $B\succeq A$ and $R_{A}\subset R_{B}$ , then $B$ is strictly coarser than $A$ (or $A$ is strictly finer than $B$ ), denoted by $B\succ A$ (or $A\prec B$ );
3.

If $B\succeq A$ and $A\succeq B$ , then two granules $A$ and $B$ are equal, denoted by $A=B$ .

$(\Pi_{0}(\sigma(X)),\succeq)$ is a complete bounded lattice [45], all the elements in $\Pi_{0}(\sigma(X))$ and the vertices of the unit $n$ -dimensional hypercube are one-to-one corresponding, and $\Pi_{0}(\sigma(X))$ can be hierarchized by coarse-fine relation. For any granule $A\in\Pi(\sigma(X))$ , it is called $R$ -definable under this information system if $A\succ P$ . Assume $d(\Pi(\sigma(X)))$ is a family of all definable granules in $\Pi(\sigma(X))$ and $d_{0}(\Pi(\sigma(X)))$ is a family of $P$ and all definable granules. Then $(d_{0}(\Pi(\sigma(X))),\succeq)$ is a complete bounded sublattice of $(\Pi_{0}(\sigma(X)),\succeq)$ , and $d_{0}(\Pi(\sigma(X)))$ , which is closed under under quotient meet and quotient join operations, is called the macro knowledge space generated from $I$ . Therefore, $\Pi_{0}(\sigma(X))$ can be divided into two categories: $d(\Pi(\sigma(X)))$ and $\widetilde{d}(\Pi_{0}(\sigma(X)))$ , i.e., the family of all undefinable granules. While $\widetilde{d}(\Pi_{0}(\sigma(X)))$ can be further divided into $d_{r}(\Pi(\sigma(X)))$ , i.e., the set of roughly definable granules, and $\widetilde{d}_{r}(\Pi_{0}(\sigma(X)))$ , i.e., the set of roughly or totally undefinable granules.

For any granule $A$ in $\Pi(\sigma(X))$ , its upper approximation is to find its lowest upper bound in $d_{0}(\Pi(\sigma(X)))$ , and its lower approximation is to find the greatest lower bound in it.

Definition II.4.

The upper and lower approximations of granule $A$ with respect to $R$ can be defined as follows: for every $B\in d(\Pi(\sigma(X)))$

	$\displaystyle\underline{R}(A)$	$\displaystyle=\bigvee\nolimits_{t}\{A\wedge_{t}B\mid A\succeq B\},$
	$\displaystyle\overline{R}(A)$	$\displaystyle=\bigwedge\nolimits_{t}\{A\vee_{t}B\mid B\succeq A\}.$		(2)

The upper and lower approximations in the above model are not obtained from one of its tangent planes but from the $n$ -dimensional space. Therefore, the model is also called the spatial rough granule model which can be applied to any structural information system and non-structural information system as well. In particular, we have $\underline{R}(A)=A\wedge_{t}P$ and $\overline{R}(A)=A\vee_{t}P$ when $I$ is complete.

III Subsethood Measures of Two Granules

Measurement is the most important foundation of all computational theories and measurement of information granules is naturally the keystone of granular computing. Many measures of information granules have been discussed in different areas in isolation, and most of them focus on the measures of sets. We divide the measures into two classes: granularity or coarseness and fineness, where granularity is to measure the coarse degree of a granule and fineness is to measure the fineness degree of a granule [45, 47]. People mainly discuss granularity, to the extent that many people confuse the concepts of granularity and granule, and, in fact, entropy is a kind of fineness. Measurement of granules is not just to know the granularity or the fineness of each granule, but to know the coarse-fine relation, similarity and difference between two granules. The conditional granularity and conditional fineness defined in [45, 47] are to show the coarse-fine relation to some degree between two granules, and conditional granularity and conditional fineness clearly reflects the monotonically increasing and the monotonically decreasing respectively. While subsethood, in general, discuss monotonically increasing. In [47], we also show the conditional granularity is a generalization of subsethood measure, and it holds the axiomatic properties of subsethood measures that Yao and Deng discussed in [24]. Conditional granularity and conditional fineness are named from the point of view of probability distribution, while subsethood is named from the point of view of set inclusion. We can extend subsethood function to discuss monotonically decreasing so as to be generalized to denote conditional fineness. We can use any one to express the coarse-fine relation.

III-A Subsethood of Two Atomic Granules

Subsethood measures should satisfy some axioms to make them to be meaningful. Sinha and Dougherty [48] presented nine axioms for subsethood and the last five ones further restrict subsethood measures, and Young [12] mainly discussed the first four. Different scholars may define different axioms in different fields [9, 15, 49, 11, 24]. However, we can divided these axioms into two classes: basic axioms and extended axioms. Basic axioms are similar, and extended axioms may be different by the properties of empirical objects.

In many situations, it is more convenient to consider a normalized measure for which the maximum value is $1$ and the minimum is $0$ . For any two atomic granules $a,b\in\sigma(X)$ , the basic axioms of a subsethood measure should satisfy: a subsethood measure must reach the maximum value if and only if $a\subseteq b$ , it reaches the minimum value if and only if $a\cap b=\emptyset$ , and it belongs to $[0,1]$ ; it should show the monotonicity because the set inclusion is a partial order relation.

Definition III.1.

For any atomic granules $a,b\in\sigma(X)$ , a function $sh:\sigma(X)\times\sigma(X)\longrightarrow{[}0,1{]}$ is called a normalized measure of subsethood if it satisfies the following two axioms (boundary conditions):

(A1): $sh(b,a)=1\Longleftrightarrow a\subseteq b;$
(A2): $sh(b,a)=0\Longleftrightarrow a\cap b=\emptyset,$

where the value $sh(b,a)$ is the degree of $a$ being a subset of $b$ .

For the classical set inclusion, a set $a$ is either a subset of another set $b$ or not, i.e., $sh(b,a)$ is either 1 or 0, and the conditions (A1) and (A2) are dual each other. Some authors [50, 51] used a single implication:

a\subseteq b\Longrightarrow sh(b,a)=1.

That is, $sh(b,a)$ reaches the maximum value if $a\subseteq b$ . However we may still have $sh(b,a)=1$ even though $\neg(a\subseteq b)$ . Gomolińska [52, 53] considered the other single implication:

sh(b,a)=1\Longrightarrow a\subseteq b.

In this case, we can get $a\subseteq b$ from $sh(b,a)=1$ , and the other way around is not true. None of the two single implications can faithfully reflect whether a set is a subset of another besides the double implication.

For the general set inclusion, one set can be a subset of another one to some degree, that is, $sh(b,a)$ , the degree of the inclusion, can be any value between 0 and 1. When researching on subsethood measure, (A1) is the only condition for normalized measure, which is to extend subsethood function. If our purpose is to measure the degree of coarse-fine relation of two granules and the boundary conditions defined in Definition III.1 are the minimum requirements that subsethood measures can truthfully reflect the basic properties of inclusion degree or coarse-fine degree unless we do not consider the special case $a\cap b=\emptyset$ . If our purpose is to judge whether a granule is coarser than or finer than another granule, then the axiom (A1) is enough for normalized measure, i.e. boundary condition, and the focus is on monotonicity.

Definition III.2.

For any three atomic granules $a,b,c\in\sigma(X)$ on a universe $X$ , a measure of subsethood $sh:\sigma(X)\times\sigma(X)\longrightarrow{[}0,1{]}$ is called a monotonically increasing measure if it satisfies the following monotone properties:

(A3): $b\subseteq c\Rightarrow sh(b,a)\leq sh(c,a)$ ;
(A4): $b\subseteq c\Rightarrow sh(a,c)\leq sh(a,b)$ .

In [24], Yao and Deng discussed four monotone properties of subsethood measures among three sets $a,b,c\in\sigma(X)$ as follows.

(M1): $b\subseteq c\Rightarrow sh(b,a)\leq sh(c,a);$
(M2): $b\subseteq c\wedge(b\cap a=c\cap a)\Rightarrow sh(a,c)\leq sh(a,b);$
(M3): $b\subseteq c\Rightarrow sh(a,c)\leq sh(a,b);$
(M4): $a\subseteq b\subseteq c\Rightarrow sh(a,c)\leq sh(a,b);$

Comparing with the conditions (M1) and (M3), we know the monotonicity of function $sh(a,b)$ is reversed with that of function $sh(b,a)$ , and we have (A3) $\Rightarrow$ (A4) and (A4) $\Rightarrow$ (A3). Therefore, (A3) or (A4) alone can be thought as the monotonically increasing condition of subsethood. In condition (M2), $b\cap a=c\cap a$ is the greatest lower bound of $a,b$ and $c$ , which reminds us to think about its dual question, that is, their corresponding least upper bound $b\cup a=c\cup a$ . Therefore, we have the following monotone properties.

(A5): $b\subseteq c\wedge(b\cap a=c\cap a)\Rightarrow sh(b,a)\leq sh(c,a)$ ;
(A6): $b\subseteq c\wedge(b\cap a=c\cap a)\Rightarrow sh(a,c)\leq sh(a,b)$ ;
(A7): $b\subseteq c\wedge(b\cup a=c\cup a)\Rightarrow sh(b,a)\leq sh(c,a)$ ;
(A8): $b\subseteq c\wedge(b\cup a=c\cup a)\Rightarrow sh(a,c)\leq sh(a,b)$ ;
(A9): $a\subseteq b\subseteq c\Rightarrow sh(b,a)\leq sh(c,a)$ ;
(A10): $a\subseteq b\subseteq c\Rightarrow sh(a,c)\leq sh(a,b)$ ;
(A11): $b\subseteq c\subseteq a\Rightarrow sh(b,a)\leq sh(c,a)$ ;
(A12): $b\subseteq c\subseteq a\Rightarrow sh(a,c)\leq sh(a,b)$ .

The axioms (A5), (A7), (A9) and (A11) are weaker versions of (A3), i.e., (A3) $\Rightarrow$ (A5), (A7), (A9) and (A11); the axioms (A6), (A8), (A10) and (A12) are weaker versions of (A4), i.e., (A4) $\Rightarrow$ (A6), (A8), (A10) and (A12). Therefore, we can only discuss the axioms (A1), (A2), (A3) and (A4). The axioms (A5) and (A6) are the dual questions of (A7) and (A8) respectively, and the axioms (A9) and (A10) are the dual questions of (A11) and (A12) respectively.

Yao and Deng [24] reviewed existing subsethood measures including $sh_{l}$ [18, 19, 20, 9, 1, 15, 2, 52, 5, 6, 54, 3, 12, 55], $sh_{\cap}$ [52, 56], $sh_{\cup}$ [15, 5, 57, 4], $sh_{\cap}^{c}$ [15], and $sh_{\cup}^{c}$ [15, 58, 51] that have been considered in many studies. Most of them focus on fuzzy sets, but not on crisp sets. Yao and Deng gives the five subsethood measures of two crisp sets and have the corresponding probabilistic interpretations as follows.

	$\displaystyle sh_{1}(b,a)$	$\displaystyle=sh_{l}(b,a)=\frac{\|a^{c}\cup b\|}{\|X\|}=Pr(a^{c}\cup b);$
	$\displaystyle sh_{2}(b,a)$	$\displaystyle=sh_{\cap}(b,a)=\frac{\|a\cap b\|}{\|a\|}=Pr(b\|a);$
	$\displaystyle sh_{3}(b,a)$	$\displaystyle=sh_{\cup}(b,a)=\frac{\|b\|}{\|a\cup b\|}=Pr(b\|a\cup b);$
	$\displaystyle sh_{4}(b,a)$	$\displaystyle=sh^{c}_{\cup}(b,a)=\frac{\|a^{c}\|}{\|a^{c}\cup b^{c}\|}=Pr(a^{c}\|a^{c}\cup b^{c});$
	$\displaystyle sh_{5}(b,a)$	$\displaystyle=sh^{c}_{\cap}(b,a)=\frac{\|a^{c}\cap b^{c}\|}{\|b^{c}\|}=Pr(a^{c}\|b^{c}).$

If any of the value of subsethood measures is equal to 1, and we can judge the atomic $a$ is a subset of $b$ . It can be seen that only $sh_{\cap}$ satisfies both (A1) and (A2).

Definition III.3.

For any three atomic granules $a,b,c\in\sigma(X)$ , a measure of subsethood $sh:\sigma(X)\times\sigma(X)\longrightarrow{[}0,1{]}$ is called a monotonically decreasing measure if it satisfies the following monotone properties:

(A3^′): $b\subseteq c\Rightarrow sh(c,a)\leq sh(b,a)$ ;
(A4^′): $b\subseteq c\Rightarrow sh(a,b)\leq sh(a,c)$ .

Then these $sh_{i}^{\prime}(\cdot,\cdot)=1-sh_{i}(\cdot,\cdot)(i=1,\cdots,5)$ , which can be called supsethood, are the monotonically decreasing measures corresponding to $sh_{i}(b,a)(i=1,\cdots,5)$ , respectively, and every $sh_{i}^{\prime}(b,a)(i=1,\cdots,5)$ can be used to define conditional fineness. For these $sh_{i}^{\prime}(i=1,\cdots,5)$ , we have

(A1^′): $sh_{i}^{\prime}(b,a)=0\Longleftrightarrow a\subseteq b.$

For $sh_{2}^{\prime}$ , we also have

(A2^′): $sh_{2}^{\prime}(b,a)=1\Longleftrightarrow a\cap b=\emptyset.$

III-B Subsethood of Two Equivalence Granules

A subsethood measure of two sets is a quantitative generalization of the set inclusion relation, and a subsethood measure of two granules should be a quantitative generalization of the coarse-fine relation.

Definition III.4.

For any two equivalence granules $A,B$ on $X$ ,

1.

a function $sh(B,A)\to[0,1]$ is called a normalized measure of conditional granularity or subsethood if it satisfies the following two axioms:

(A1)

$sh(B,A)=\frac{m}{n}\Longleftrightarrow B\succeq A;$

(A2)

$sh(B,A)=0\Longleftrightarrow A\wedge B=\emptyset.$
2.

a function $sh(B,A)\to[0,1]$ is called a normalized measure of conditional fineness or subsethood if it satisfies the following two axioms:

(A1^′)

$sh(B,A)=0\Longleftrightarrow B\succeq A$ ;

(A2^′)

$sh(B,A)=\frac{m}{n}\Longleftrightarrow A\wedge B=\emptyset,$

where $n$ is the cardinality of $X$ and $m$ is the smaller one of the cardinalities of the sets $A$ and $B$ .

The monotonically increasing and monotonically decreasing measures corresponding to conditional granularity and conditional fineness respectively can be defined as follows.

Definition III.5.

For any three equivalence granules $A,B,C$ on $X$ ,

1.

a measure of subsethood $sh:\Pi(\sigma(X))\times\Pi(\sigma(X))\longrightarrow{[}0,1{]}$ is called a monotonically increasing measure if it satisfies the following monotone properties:

(A3)

$C\succeq B\Rightarrow sh(B,A)\leq sh(C,A)$ ;

(A4)

$C\succeq B\Rightarrow sh(A,C)\leq sh(A,B)$ .
2.

a measure of subsethood $sh:\Pi(\sigma(X))\times\Pi(\sigma(X))\longrightarrow{[}0,1{]}$ is called a monotonically decreasing measure if it satisfies the following monotone properties:

(A3^′)

$C\succeq B\Rightarrow sh(C,A)\leq sh(B,A)$ ;

(A4^′)

$C\succeq B\Rightarrow sh(A,B)\leq sh(A,C)$ .

We also have (A3) $\Rightarrow$ (A4) and (A4) $\Rightarrow$ (A3), and (A3^′) $\Rightarrow$ (A4^′) and (A4^′) $\Rightarrow$ (A3^′). Therefore, (A3) or (A4) alone can be the monotonically increasing condition, and (A3^′) or (A4^′) alone can be the monotonically decreasing condition.

For any equivalence granules $A,B,C$ on $X$ , the conditions (A5), $\cdots$ , (A12) and the conditions (A5^′), $\cdots$ ,(A12^′) are as follows.

(A5): $C\succeq B\wedge(B\wedge A=C\wedge A)\Rightarrow sh(B,A)\leq sh(C,A)$ ;
(A6): $C\succeq B\wedge(B\wedge A=C\wedge A)\Rightarrow sh(A,C)\leq sh(A,B)$ ;
(A7): $C\succeq B\wedge(B\vee A=C\vee A)\Rightarrow sh(B,A)\leq sh(C,A)$ ;
(A8): $C\succeq B\wedge(B\vee A=C\vee A)\Rightarrow sh(A,C)\leq sh(A,B)$ ;
(A9): $C\succeq B\succeq A\Rightarrow sh(B,A)\leq sh(C,A)$ ;
(A10): $C\succeq B\succeq A\Rightarrow sh(A,C)\leq sh(A,B)$ ;
(A11): $A\succeq C\succeq B\Rightarrow sh(B,A)\leq sh(C,A)$ ;
(A12): $A\succeq C\succeq B\Rightarrow sh(A,C)\leq sh(A,B)$ ;
(A5^′): $C\succeq B\wedge(B\wedge A=C\wedge A)\Rightarrow sh(C,A)\leq sh(B,A)$ ;
(A6^′): $C\succeq B\wedge(B\wedge A=C\wedge A)\Rightarrow sh(A,B)\leq sh(A,C)$ ;
(A7^′): $C\succeq B\wedge(B\vee A=C\vee A)\Rightarrow sh(C,A)\leq sh(B,A)$ ;
(A8^′): $C\succeq B\wedge(B\vee A=C\vee A)\Rightarrow sh(A,B)\leq sh(A,C)$ ;
(A9^′): $C\succeq B\succeq A\Rightarrow sh(C,A)\leq sh(B,A)$ ;
(A10^′): $C\succeq B\succeq A\Rightarrow sh(A,B)\leq sh(A,C)$ ;
(A11^′): $A\succeq C\succeq B\Rightarrow sh(C,A)\leq sh(B,A)$ ;
(A12^′): $A\succeq C\succeq B\Rightarrow sh(A,B)\leq sh(A,C)$ .

The conditions (A5), (A7), (A9) and (A11) are weaker versions of (A3), i.e., (A3) $\Rightarrow$ (A5), (A7), (A9) and (A11); the axioms (A6), (A8), (A10) and (A12) are weaker versions of (A4), i.e., (A4) $\Rightarrow$ (A6), (A8), (A10) and (A12). The conditions (A5), (A6), (A7) and (A8) are a special case of (A9), (A10), (A11) and (A12), respectively. The conditions (A5) and (A6) are the dual questions of (A7) and (A8) respectively, and the conditions (A9) and (A10) are the dual questions of (A11) and (A12) respectively. While the axiom (Ai) is reversed with (Ai^′) ( $i=1,\cdots,12$ ). The first four are their basic properties.

Given two equivalence granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ on $X$ . Then there are $|a_{i}\cap b_{j}|(a_{i}\cap b_{j})(i=1,\cdots,k,j=1,\cdots,l)$ in $A\wedge B$ , where $|a_{i}\cap b_{j}|$ is the cardinality of $a_{i}\cap b_{j}$ . We can normalize these $|a_{i}\cap b_{j}|(i=1,\cdots,k,j=1,\cdots,l)$ and get a probability distribution which is called a probability distribution of the granule $A\wedge B$ denoted as $P_{A\wedge B}$ .

	$\displaystyle P_{A\wedge B}$	$\displaystyle=\left({p(a_{1}\cap b_{1}),\cdots,p(a_{i}\cap b_{j}),\cdots,p(a_{k}\cap b_{l})}\right)$
		$\displaystyle=\left({\frac{\|a_{1}\cap b_{1}\|}{\|X\|},\cdots,\frac{\|a_{i}\cap b_{j}\|}{\|X\|},\cdots,\frac{\|a_{k}\cap b_{l}\|}{\|X\|}}\right),$		(3)

where $p(a_{i}\cap b_{j})$ indicates the probability of the intersection of $a_{i}$ and $b_{j}$ contained in $X$ . We have the following result.

Theorem III.1.

\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})}\leq\frac{m}{n},

where $n$ is the cardinality of the universe $X$ and $m$ is the smaller one of the cardinalities of the sets $A$ and $B$ .

Proof.

Let us assume that the cardinality of $A$ is the smaller one and $|A|=m$ , then, we have

	$\displaystyle\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})}$	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{1}{\|X\|}(\|a_{i}\cap b_{1}\|+\cdots+\|a_{i}\cap b_{l}\|)}$
		$\displaystyle=\frac{1}{n}\sum\limits_{i=1}^{k}{\|a_{i}\cap(b_{1}\cup\cdots\cup b_{l})\|}$
		$\displaystyle=\frac{1}{n}\sum\limits_{i=1}^{k}{\|a_{i}\cap B\|}$
		$\displaystyle\leq\frac{1}{n}\sum\limits_{i=1}^{k}{\|a_{i}\|}=\frac{m}{n}.$

∎

Given two equivalence granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ on $X$ . Then, for each $sh_{m}(m=1,\cdots,5)$ , the conditional granularity of $B$ with respect to $A$ is defined by the expectations of $sh_{m}(m=1,\cdots,5)$ with respect to the probability distribution of $A\wedge B$ .

Definition III.6.

	$\displaystyle G_{m}(B\|A)$	$\displaystyle=sh_{m}(B,A)=E_{P_{A\wedge B}}({sh_{m}}(\cdot,\cdot))$
		$\displaystyle=\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})sh_{m}(b_{i},a_{i})}.$		(4)

In general, we can take $sh_{m}^{\prime}(\cdot,\cdot)=1-sh_{m}(\cdot,\cdot)(m=1,\cdots,5)$ . Then, the expectations of $sh_{m}^{\prime}(\cdot,\cdot)=1-sh_{m}(\cdot,\cdot)(m=1,\cdots,5)$ with respect to the probability distribution of $A\wedge B$ is $E_{P_{A\wedge B}}({sh_{m}^{\prime}}(\cdot,\cdot))$

	$\displaystyle=\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})sh_{m}^{\prime}(b_{j},a_{i})}$
	$\displaystyle=\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})(1-sh_{m}(b_{j},a_{i})})$
	$\displaystyle=\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})}-\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j}){sh_{m}(b_{j},a_{i})}}$
	$\displaystyle\leq\frac{m}{n}-G_{m}(B\|A).$		(5)

Given two equivalence granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ on $X$ . The conditional fineness of $B$ with respect to $A$ can be defined by

Definition III.7.

1.

$F_{i}(B|A)=\frac{m}{n}-G_{i}(B|A)(i=1,\cdots,5);$

By the above definition, we can easily get the following theorems.

Theorem III.2.

For any equivalence granules $A$ and $B$ on $X$ , we have

1.

$G_{i}(B|A)(i=1,\cdots,5)$ satisfies the axiom (A2), namely,
$G_{i}(B|A)(i=1,\cdots,5)=0\Longleftrightarrow A\wedge B=\emptyset$ ;
2.

$F_{i}(B|A)(i=1,\cdots,5)$ satisfies the axiom (A2^′), namely,
$F_{i}(B|A)(i=1,\cdots,5)=\frac{m}{n}\Longleftrightarrow A\wedge B=\emptyset.$

Theorem III.3.

For any equivalence granules $A$ and $B$ on $X$ , we have

1.

$0\leq G_{i}(B|A)\leq 1(i=1,\cdots,5)$ ;
2.

$0\leq F_{i}(B|A)\leq 1(i=1,\cdots,5).$

Theorem III.4.

For any equivalence granule $A$ on $X$ , we have

1.

$G_{i}(A|\{X\})=G_{i}(A)(i=1,\cdots,5)$ ;
2.

$F_{i}(A|\{X\})=F_{i}(A)(i=1,\cdots,5).$

Definition III.8.

Given two equivalence granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ on $X$ . For any $i,j(i=1,\cdots,k,j=1,\cdots,l)$ , we have all $p(a_{i}\cap b_{j})=0$ , then $A$ and $B$ are independent, and, particularly, $B$ is called the quotient complement of $A$ if $B$ has only one atomic granule.

Theorem III.5.

For any two equivalence granules $A$ and $B$ on $X$ , we have

1.

$A$ and $B$ is independent if and only if $G_{i}(B|A)=G_{i}(A|B)=0(i=1,\cdots,5)$ ;
2.

$A$ and $B$ is independent if and only if $F_{i}(B|A)=F_{i}(A|B)=\frac{m}{n}(i=1,\cdots,5),$

where $n$ is the cardinality of the universe $X$ and $m$ is the smaller one of the cardinalities of the sets $A$ and $B$ .

Now we start to prove $G_{i}(B|A)(i=1,\cdots,5)$ satisfies the axiom (A1), (A3) and (A4), and $F_{i}(B|A)(i=1,\cdots,5)$ satisfies the axiom (A1^′), (A3^′) and (A4^′).

Theorem III.6.

Assume that $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ are two equivalence granules on $X$ . Then

1.

$A$ is finer than $B$ if and only if $G_{i}(B|A)=\frac{m}{n}(i=1,\cdots,5)$ ;
2.

$A$ is finer than $B$ if and only if $F_{i}(B|A)=1-\frac{m}{n}(i=1,\cdots,5)$ ,

where $n$ is the cardinality of the universe $X$ and $m$ is the smaller one of the cardinalities of the sets $A$ and $B$ .

The proofs are seen in Appendix Proof of the Theorem III.6. By the above theorem, we can get the following corollary.

Corollary 1.

Assume that $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ are two quotient granules on $X$ . Then

1.

$A$ is finer than $B$ if and only if $G_{i}(B|A)=1(i=1,\cdots,5)$ ;
2.

$A$ is finer than $B$ if and only if $F_{i}(B|A)=0(i=1,\cdots,5)$ .

Lemma III.7.

For any two equivalence granules $B=\{b_{1},\cdots,b_{l+1}\}$ and $C=\{c_{1},\cdots,c_{l}\}$ on $X$ . If $b_{l}\cup b_{l+1}\subseteq c_{l},b_{i}=c_{i}(i=1,\cdots,l-1),$ then for any equivalent granule $A=\{a_{1},\cdots,a_{k}\}$ on $X$ , we have

1.

$G_{i}(B|A)\leq G_{i}(C|A);$
2.

$G_{i}(A|C)\leq G_{i}(A|B).$

The proofs are seen in Appendix Proof of the Lemma III.14. Accordingly, we have the following result.

Lemma III.8.

For any two equivalence granules $B=\{b_{1},\cdots,b_{l+1}\}$ and $C=\{c_{1},\cdots,c_{l}\}$ on $X$ . If $b_{l}\cup b_{l+1}\subseteq c_{l},b_{i}=c_{i}(i=1,\cdots,l-1),$ then, for any equivalent granule $A=\{a_{1},\cdots,a_{k}\}$ on $X$ , we have

1.

$F_{i}(A|C)\leq F_{i}(A|B);$
2.

$F_{i}(B|A)\leq F_{i}(A|C).$

For any two equivalence granules $B=\{b_{1},\cdots,b_{l}\}$ and $C=\{c_{1},\cdots,c_{m}\}$ on $X$ and $B$ is finer than $C$ . For any $c_{j}$ in $C$ , there are two cases: either there exists some $b_{i}$ subjecting to $b_{i}=c_{j}$ or there exist some $b_{i}$ s which satisfy that the union of these $b_{i}$ is equal to $c_{j}$ . By repeating the above Lemmas, we can easily get the following two theorems.

Theorem III.9.

For any three equivalence granules $A=\{a_{1},\cdots,a_{k}\},B=\{b_{1},\cdots,b_{l}\}$ and $C=\{c_{1},\cdots,c_{m}\}$ on $X$ , we have, for $i=1,\cdots,5$ ,

(A3): $C\succeq B\Rightarrow G_{i}(B|A)\leq G_{i}(C|A)$ ;
(A4): $C\succeq B\Rightarrow G_{i}(A|C)\leq G_{i}(A|B)$ .

Theorem III.10.

For any three equivalence granules $A=\{a_{1},\cdots,a_{k}\},B=\{b_{1},\cdots,b_{l}\}$ and $C=\{c_{1},\cdots,c_{m}\}$ on $X$ , we have, for $i=1,\cdots,5$ ,

(A3^′): $C\succeq B\Rightarrow F_{i}(C|A)\leq F_{i}(B|A)$ ;
(A4^′): $C\succeq B\Rightarrow F_{i}(A|B)\leq F_{i}(A|C)$ .

All $sh_{i}(i=1,\cdots,5)$ satisfy the axioms (A1), (A2), (A3) and (A4) or the axioms (A1^′), (A2^′), (A3^′) and (A4^′). The axioms (A1) and (A2) (or (A1^′) and (A2^′)) are the two normalized boundary conditions. However, there does not exist the special case $A\wedge B=\emptyset$ when the granules are in a complete information system or subsystem. Therefore, it is reasonable to think (A1) (or (A1^′)) as the normalized boundary condition. The axioms (A3) and (A4) (or (A3^′) and (A4^′)) are monotone conditions which can be replaced by their weak axioms (A5) and (A6) (or (A7) and (A8) or (A9) and (A10) or (A11) and (A12)) or the axioms (A5^′) and (A6^′) (or (A7^′) and (A8^′) or (A9^′) and (A10^′) or (A11^′) and (A12^′)), and any one of monotone conditions alone can also be regarded as the monotone condition because they imply each other. Thus, the boundary condition (A1) and any one of the monotone conditions constitute the basic axioms.

III-C Subsethood Entropy

Entropy, an important concept of thermodynamics, was introduced by German physicist Rudolph Clausius in 1865 [59]. The term of entropy has been used in various areas like chemistry, physics, biology, cosmology, economics, statistics, sociology, weather science, and information science. Information entropy as a concept was introduced by C. E. Shannon who was the founder of information theory in 1948 [60]. Information entropy was introduced to measure the granularity of each partition [27, 28, 29, 26, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 25, 42]. After that, many other entropies have been introduced, and Hartley entropy, collision entropy, Rényi entropy, and min-entropy. have been introduced to measure granularity or fineness of equivalence granules. Accordingly, the subsethood measures $sh_{i}(i=1,\cdots,5)$ can also be generalized to their corresponding subsethood entropies by the probability distribution of the meet of two granules in Equation (III-B).

Assume $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ are two equivalence granules on $X$ . For each $sh_{i}(i=1,\cdots,5)$ , its corresponding subsethood entropy can be defined by.

Definition III.9.

	$\displaystyle H^{\prime}_{i}(B\|A)$	$\displaystyle=H^{\prime}_{sh_{i}}(B\|A)=E_{P_{A\wedge B}}(\log{sh_{i}}(\cdot,\cdot))$
		$\displaystyle=-\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})\log{sh_{i}(b_{j},a_{i})}}.$		(6)

$H^{\prime}_{i}(B|A)$ is a monotonically decreasing function, and it is also called the conditional fineness entropy of $B$ with respect to $A$ . Then, the expectations of logarithm of $\log{sh_{i}^{\prime}(\cdot,\cdot)}=\log{nsh_{i}(\cdot,\cdot)}(i=1,\cdots,5)$ with respect to the probability distribution of $A\wedge B$ is $E_{P_{A\wedge B}}({sh_{i}^{\prime}}(\cdot,\cdot))$

	$\displaystyle=\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})\log{sh_{i}^{\prime}(b_{j},a_{i})}}$
	$\displaystyle=\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})(\log{n}+\text{log}{sh_{i}(b_{j},a_{i})}})$
	$\displaystyle=\log{n}\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})}+\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j}){\text{log}{sh_{i}(b_{j},a_{i})}}}$
	$\displaystyle\leq\frac{m}{n}\log{n}-H_{i}(B\|A).$		(7)

Therefore, for any two equivalence granules $A$ and $B$ on $X$ , the conditional granularity entropy of $B$ with respect to $A$ can also be defined by

Definition III.10.

	$\displaystyle H_{i}(B\|A)$	$\displaystyle=H_{sh_{i}}(B\|A)$
		$\displaystyle=\frac{m}{n}\log{n}-H^{\prime}_{i}(B\|A)(i=1,\cdots,5).$

In those conditional granularities and conditional finenesses, for any equivalence granule $A$ on $X$ , we have $G(A|\{X\})=G(A)$ and $F(A|\{X\})=F(A)$ , and thus we can define

Definition III.11.

1.

$H_{i}(A)=H_{i}(A|\{X\})(i=1,\cdots,5)$ ;
2.

$H_{i}^{\prime}(A)=H^{\prime}_{i}(A|\{X\})(i=1,\cdots,5).$

By the above definitions, we can easily get the following theorems.

Theorem III.11.

For any two equivalence granules $A$ and $B$ on $X$ , we have

1.

$0\leq H_{i}(B|A)\leq\log{n}(i=1,\cdots,5)$ ;
2.

$0\leq H^{\prime}_{i}(B|A)\leq\log{n}(i=1,\cdots,5).$

Theorem III.12.

Given a universe $X$ . For any two granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ on $X$ , we have

1.

$B\succeq A\Rightarrow H^{\prime}_{i}(B|A)=0$ ;
2.

if $H^{\prime}_{i}(B|A)=0$ , then $A$ is finer than $B$ or $A$ and $B$ are independent.

Proof.

1.

If $A$ is finer than $B$ , that is, for any $a_{i}(i=1,\cdots,k)$ , there exists only one $b_{j}(j\in\{1,\cdots,l\})$ , which subjects $a_{i}\subseteq b_{j}$ . That is, $\log{sh(b_{j},a_{i})}=0$ because all $sh_{i}(b,a)(i=1,\cdots,5)$ reach the maximum 1 when $a\subseteq b$ , i.e., $a\cap b=a$ . For other $h\neq j\in\{1,\cdots,l\}$ , we have $p(a_{i}\cap b_{j})=0$ . Therefore, $p(a_{i}\cap b_{j})\log{sh(b_{j},a_{i})}=0(i=1,\cdots,k,j=1,\cdots,l)$ . Thus $H^{\prime}_{i}(B|A)=0$ .
2.

Every item of $-\sum\limits_{i=1}^{k}{\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})\log{sh(b_{j},a_{i})}}}$ is more than or equal to $0$ if $H^{\prime}_{i}(B|A)=0$ , and thus, we have $p(a_{i}\cap b_{j})\log{sh(b_{j},a_{i})}=0(i=1,\cdots,k,j=1,\cdots,l)$ . There are two cases:
For any $i,j(i=1,\cdots,k,j=1,\cdots,l)$ , all $p(a_{i}\cap b_{j})=0$ , that is, $A$ and $B$ are independent;
For each $a_{i}(i\in\{1,\cdots,k\})$ , for $j=1,\cdots,l)$ , either $p(a_{i}\cap b_{j})=0$ or $sh(b_{j},a_{i})=1$ , that is, $|{a_{i}}\cap{b_{j}}|=0$ or $a_{i}\subseteq b_{j}$ . By the definitions of equivalence granules, for each $a_{i}(i\in\{1,\cdots,k\})$ , there exists only one $j\in\{1,\cdots,l\}$ which subjects to $a_{i}\subseteq b_{j}$ , and so $A$ is finer than $B$ .

∎

Theorem III.13.

Given a universe $X$ . For any two equivalence granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ on $X$ , we have

1.

$B\succeq A\Rightarrow H_{i}(B|A)=\frac{m}{n}\log{n}$ ;
2.

if $H_{i}(B|A)=\frac{m}{n}\log{n}$ , then $A$ is finer than $B$ or $A$ and $B$ are independent,

where $n$ is the cardinality of $X$ and $m$ is the smaller one of the cardinalities of $A$ and $B$ .

It can be seen that $H_{i}(B|A)$ does not satisfy axiom (A1) and $H^{\prime}_{i}(B|A)$ does not satisfy axiom A1^′ even if they are normalized. For any two equivalence granules $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ in a complete information system on $X$ , and we have the following result.

Corollary 2.

1.

$B\succeq A\Longleftrightarrow H_{i}(B|A)=\frac{m}{n}\log{n}$ ;
2.

$B\succeq A\Longleftrightarrow H^{\prime}_{i}(B|A)=0$ ,

where $n$ is the cardinality of $X$ and $m$ is the smaller one of the cardinalities of $A$ and $B$ .

That means $H_{i}(B|A)$ satisfies axiom (A1) and $H^{\prime}_{i}(B|A)$ satisfies axiom A1^′ if they are normalized. However, $H_{i}(B|A)$ does not satisfy axiom (A2) and $H^{\prime}_{i}(B|A)$ does not satisfy axiom A2^′.

Corollary 3.

Assume that $A=\{a_{1},\cdots,a_{k}\}$ and $B=\{b_{1},\cdots,b_{l}\}$ are two quotient granules on $X$ . Then

1.

$A$ is finer than $B$ if and only if $H_{i}(B|A)=\log{n}(i=1,\cdots,5)$ ;
2.

$A$ is finer than $B$ if and only if $H^{\prime}_{i}(B|A)=0(i=1,\cdots,5)$ .

Because $H_{i}$ and $H^{\prime}_{i}$ keep the same monotonicity of $G_{i}$ and $F_{i}$ respectively, we have the following result.

Lemma III.14.

1.

$H_{i}(B|A)\leq H_{i}(C|A)$ and $H_{i}(A|C)\leq H_{i}(A|B);$
2.

$H^{\prime}_{i}(C|A)\leq H^{\prime}_{i}(B|A)$ and $H^{\prime}_{i}(A|B)\leq H^{\prime}_{i}(A|C).$

For any two equivalence granules $B=\{b_{1},\cdots,b_{l}\}$ and $C=\{c_{1},\cdots,c_{m}\}$ on $X$ and $B$ is finer than $C$ . For any $c_{j}$ in $C$ , there are two cases: either there exists some $b_{i}$ subjecting to $b_{i}=c_{j}$ or there exist some $b_{i}$ s which satisfy that the union of these $b_{i}$ is equal to $c_{j}$ . By repeated use of above Lemma, we can easily get the following two theorems.

Theorem III.15.

For any three equivalence granules $A=\{a_{1},\cdots,a_{k}\},B=\{b_{1},\cdots,b_{l}\}$ and $C=\{c_{1},\cdots,c_{m}\}$ on $X$ , we have, for $i=1,\cdots,5$ ,

(A3): $C\succeq B\Rightarrow H_{i}(B|A)\leq H_{i}(C|A)$ ;
(A4): $C\succeq B\Rightarrow H_{i}(A|C)\leq H_{i}(A|B)$ .

Theorem III.16.

For any three equivalence granules $A=\{a_{1},\cdots,a_{k}\},B=\{b_{1},\cdots,b_{l}\}$ and $C=\{c_{1},\cdots,c_{m}\}$ on $X$ , we have, for $i=1,\cdots,5$ ,

(A3^′): $C\succeq B\Rightarrow H^{\prime}_{i}(C|A)\leq H^{\prime}_{i}(B|A)$ ;
(A4^′): $C\succeq B\Rightarrow H^{\prime}_{i}(A|B)\leq H^{\prime}_{i}(A|C)$ .

IV Conclusion

GrC is to imitate two types of granulation process in human recognition: micro granular analysis process and macro granular analysis processes. Micro granular analysis focuses on the parts while macro granular analysis focuses on the whole. All the knowledge generated in the process of micro granular analysis constitute a micro knowledge space, and all the knowledge generated in the process of macro granular analysis constitute a macro knowledge space. Viewing an information system from micro perspective, we can get a micro knowledge space, and, viewing it from macro perspective, we can get a macro knowledge space, from which we obtain the rough set model and the spatial rough granule model respectively. The classical rough set model can only be used for complete information systems, while the rough set model obtained from micro knowledge space can also be used for incomplete information systems, what’s more, the universe of discourse can be any domain. The spatial rough granule model will play a pivotal role in the problem solving of structures like graph partition, image processing, face recognition, 3D technologies, etc.

Subsethood measures have been well studied and generally accepted in many fields other than fuzzy sets and rough sets. Subsethood measures which is used to measure the set-inclusion relation between two sets are generalized to measure the coarse-fine relation between two granules. This paper defines conditional granularity, conditional fineness, conditional granularity entropy and conditional fineness entropy and discuss their properties including coarse-fine relation determination theorem, and all of these are very important foundations for learning and reasoning of structural problems. These measures can be used for fuzzy granules, and they have a close relation with similarity and difference, which will be studied in the future.

Appendix

Proof of the Theorem III.6

We only prove $G_{1}(B|A)=\frac{m}{n}$ , the others are similar

Proof.

The sufficiency is obvious. Now we prove its necessity. We may assume that $|A|=m\leq|B|$ . By Definition III.6, we have ${sh_{l}}(B,A)$

	$\displaystyle=\sum\limits_{i=1}^{k}{\sum\limits_{j=1}^{l}{\frac{{\|{a_{i}}\cap{b_{j}}\|}}{{\|X\|}}}}\times\frac{{\|{a_{i}^{c}}\cup{b_{j}}\|}}{{\|X\|}}$
	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}\|(\|X-{a_{i}}\|+\|{a_{i}}\cap{b_{j}}\|)}$
	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{1}{{\|X{\|^{2}}}}}\left({\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}\|\|X-{a_{i}}\|+\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}{\|^{2}}}}}\right)$

Assume the union of all $b_{j}$ is the set $B$ . Then $|a_{i}\cap b_{1}|+\cdots+|a_{i}\cap b_{l}|=|a_{i}\cap(b_{1}\cup\cdots\cup b_{l})|=|a_{i}\cap B|=|a_{i}|$ . Thus

	$\displaystyle\sum\limits_{i=1}^{k}{\frac{1}{{\|X{\|^{2}}}}}\left({\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}\|\|X-{a_{i}}\|+\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}{\|^{2}}}}}\right)$
	$\displaystyle\leq\sum\limits_{i=1}^{k}{\frac{{(\|{a_{i}}\|\|X-{a_{i}}\|+\|{a_{i}}{\|^{2}})}}{{\|X{\|^{2}}}}}$
	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{{\|{a_{i}}\|}}{{\|X\|}}}\frac{{\|X-{a_{i}}\|+\|{a_{i}}\|}}{{\|X\|}}$
	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{{\|{a_{i}}\|}}{{\|X\|}}}=\frac{m}{n}$

When there exists some one such that $|a_{i}\cap b_{h}|=|a_{i}|,|a_{i}\cap b_{j}|=0(j\neq h,j\in I),\sum\nolimits_{j}{|a_{i}\cap b_{j}|^{2}}=|a_{i}|^{2}$ reaches the maximum, that is, for any $a_{i}$ there must exist some $b_{h}$ which satisfies $a_{i}\cap b_{h}=a_{i}$ and $a_{i}\cap b_{j}=\emptyset(j\neq h,j\in I)$ . Therefore $A$ is finer than $B$ . ∎

Proof of the Lemma III.14

We only prove $sh_{1}$ , and the others are similar

Proof.

Here only prove (2)
Suppose there are $h(0\leq h\leq|c_{l}|)$ equivalence classes intersecting with $c_{l}$ in $A$ . When $h=0$ we have

	$\displaystyle{sh_{l}}(A,B)$	$\displaystyle=\sum\limits_{i=1}^{l+1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\sum\limits_{i=1}^{l-1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\sum\limits_{i=1}^{l-1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{c_{i}}\cap{a_{j}}\|\|c_{i}^{c}\cup{a_{j}}\|}={sh_{l}}(A,C)$

When $1\leq h\leq|c_{l}|$ , let them be $a_{1},\cdots,a_{h}$ , respectively, we have

	$\displaystyle{sh_{l}}(A,B)$	$\displaystyle=\sum\limits_{i=1}^{l+1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\frac{{\sum\limits_{j=1}^{h}{\left({\|{b_{l}}\cap{a_{j}}\|\|b_{l}^{c}\cup{a_{j}}\|+\|{b_{l+1}}\cap{a_{j}}\|\|b_{l+1}^{c}\cup{a_{j}}\|}\right)}}}{{\|X{\|^{2}}}}$
		$\displaystyle+\frac{{\sum\limits_{i=1}^{l-1}{\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}}}}{{\|X{\|^{2}}}}$
	$\displaystyle{sh_{l}}(A,C)$	$\displaystyle=\sum\limits_{i=1}^{l}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{c_{i}}\cap{a_{j}}\|\|c_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\frac{{\sum\limits_{j=1}^{h}{\|{c_{l}}\cap{a_{j}}\|\|c_{l}^{c}\cup{a_{j}}\|}}}{{\|X{\|^{2}}}}+\frac{{\sum\limits_{i=1}^{l-1}{\sum\limits_{j=1}^{k}{\|{c_{i}}\cap{a_{j}}\|\|c_{i}^{c}\cup{a_{j}}\|}}}}{{\|X{\|^{2}}}}$

While $b_{i}=c_{i}(i=1,\cdots,l-1)$ , and $|{c_{l}}\cap{a_{j}}||c_{l}^{c}\cup{a_{j}}|$

	$\displaystyle={\|({b_{l}}\cup{b_{l+1}})\cap{a_{j}}\|\|{{({b_{l}}\cup{b_{l+1}})}^{c}}\cup{a_{j}}\|}$
	$\displaystyle={(\|{b_{l}}\cap{a_{j}}\|+\|{b_{l+1}}\cap{a_{j}}\|)\|(b_{l}^{c}\cup{a_{j}})\cap(b_{l+1}^{c}\cup{a_{j}})\|}$
	$\displaystyle\leq\|{b_{l}}\cap{a_{j}}\|\|b_{l}^{c}\cup{a_{j}}\|+\|{b_{l+1}}\cap{a_{j}}\|\|b_{l+1}^{c}\cup{a_{j}}\|$

Thus we have ${sh_{l}}(A,C)\leq{sh_{l}}(A,B)$ . ∎

Acknowledgments

This work is partially supported by a Discovery Grant from NSERC Canada.

References

[1] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications. New York: Academic Press, 1980.
[2] J. A. Goguen, “The logic of inexact concepts,” Synthese, vol. 19, no. 3-4, pp. 325–373, 1969.
[3] R. Willmott, “Two fuzzier implication operators in the theory of fuzzy power sets,” Fuzzy Sets and Systems, vol. 4, no. 1, pp. 31–36, 1980.
[4] ——, “On the transitivity of containment and equivalence in fuzzy power set theory,” Journal of Mathematical Analysis and Applications, vol. 120, no. 1, pp. 384–396, 1986.
[5] B. Kosko, “Fuzzy entropy and conditioning,” Information Sciences, vol. 40, no. 2, pp. 165–174, 1986.
[6] ——, “Fuzziness vs. probability,” International Journal of General Systems, vol. 17, no. 2-3, pp. 211–240, 1990.
[7] ——, Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. New Jersey, USA: Prentice Hall, 1991.
[8] ——, Fuzzy Engineering. Englewood Cliffs, New Jersey: Prentice Hall, 1997.
[9] H. Bustince, V. Mohedano, E. Barrenechea, and M. Pagola, “Definition and construction of fuzzy DI-subsethood measures,” Information Sciences, vol. 176, no. 21, pp. 3190–3231, 2006.
[10] H. Bustince, E. Barrenechea, and M. Pagola, “A method for constructing V. Young’s fuzzy subsethood measures and fuzzy entropies,” in Intelligent Techniques and Tools for Novel System Architectures. Springer, 2008, pp. 123–138.
[11] I. K. Vlachos and G. D. Sergiadis, “Subsethood, entropy, and cardinality for interval-valued fuzzy sets—An algebraic derivation,” Fuzzy Sets and Systems, vol. 158, no. 12, pp. 1384–1396, 2007.
[12] V. R. Young, “Fuzzy subsethood,” Fuzzy Sets and Systems, vol. 77, no. 3, pp. 371–384, 1996.
[13] P. Grzegorzewski and E. Mrowka, “Subsethood measure for intuitionistic fuzzy sets,” in 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542). IEEE, 2004, vol. 1, pp. 139–142.
[14] H. Y. Zhang and W. X. Zhang, “Hybrid monotonic inclusion measure and its use in measuring similarity and distance between fuzzy sets,” Fuzzy Sets and Systems, vol. 160, no. 1, pp. 107–118, 2009.
[15] J. L. Fan and W. X. Xie, “Some notes on similarity measures and proximity measures,” Fuzzy Sets and Systems, vol. 101, pp. 403–412, 1999.
[16] G. S. Huang and Y. S. Liu, “New subsethood measures and similarity measures of fuzzy sets,” in Proceedings of 2005 International Conference on Communications, Circuits and Systems, 2005. IEEE, 2005, vol. 2, pp. 999–1002.
[17] Y. F. Li, K. Y. Qin, and X. X. He, “Relations among similarity measure, subsethood measure and fuzzy entropy,” International Journal of Computational Intelligence Systems, vol. 6, no. 3, pp. 411–422, 2013.
[18] W. Bandler and L. J. Kohout, “Fuzzy relational products and fuzzy implication operators,” in International Workshop on Fuzzy Reasoning Theory and Applications, 1978, pp. 239–244.
[19] W. Bandler and L. Kohout, “Fuzzy power sets and fuzzy implication operators,” Fuzzy Sets and Systems, vol. 4, no. 1, pp. 13–30, 1980.
[20] P. Burillo, N. Frago, and R. Fuentes, “Inclusion grade and fuzzy implication operators,” Fuzzy Sets and Systems, vol. 114, no. 3, pp. 417–429, 2000.
[21] J. L. Fan, W. X. Xie, and J. H. Pei, “Subsethood measure: new definitions,” Fuzzy Sets and Systems, vol. 106, no. 2, pp. 201–209, 1999.
[22] C. C. Ragin, The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press, 1987.
[23] M. J. Wierman, J. N. Mordeson, T. D. Clark, and J. M. Larson, “Fuzzy subsethood, fuzzy implication, and causality,” in Information Sciences 2007. World Scientific, 2007, pp. 1412–1418.
[24] Y. Y. Yao and X. F. Deng, “Quantitative rough sets based on subsethood measures,” Information Sciences, vol. 267, pp. 306–322, 2014.
[25] Y. Y. Yao and L. Q. Zhao, “A measurement theory view on the granularity of partitions,” Information Sciences, vol. 213, pp. 1–13, 2012.
[26] G. J. Klir and T. A. Folger, Fuzzy Sets, Uncertainty and Information. New Jersey, USA: Prentice Hall, 1988.
[27] T. Beaubouef, F. E. Petry, and G. Arora, “Information-theoretic measures of uncertainty for rough sets and rough relational databases,” Information Sciences, vol. 109, no. 1, pp. 185–195, 1998.
[28] I. Düntsch and G. Gediga, “Uncertainty measures of rough set prediction,” Artificial Intelligence, vol. 106, no. 1, pp. 109–137, 1998.
[29] ——, “Roughian: Rough information analysis,” International Journal of Intelligent Systems, vol. 16, no. 1, pp. 121–147, 2001.
[30] T. T. Lee, “An infornation-theoretic analysis of relational databases—part I: Data dependencies and information metric,” IEEE Transactions on Software Engineering, vol. SE-13, no. 10, pp. 1049–1061, 1987.
[31] J. Y. Liang, K. S. Chin, C. Y. Dang, and R. C. M. Yam, “A new method for measuring uncertainty and fuzziness in rough set theory,” International Journal of General Systems, vol. 31, no. 4, pp. 331–342, 2002.
[32] J. Y. Liang and Z. Z. Shi, “The information entropy, rough entropy and knowledge granulation in rough set theory,” International Journal of Uncertainty Fuzziness Knowledge-Based Systems, vol. 12, no. 01, pp. 37–46, 2004.
[33] J. Y. Liang, Z. Z. Shi, D. Y. Li, and M. J. Wierman, “Information entropy, rough entropy and knowledge granulation in incomplete information systems,” International Journal of General Systems, vol. 35, no. 6, pp. 641–654, 2006.
[34] J. Y. Liang, J. H. Wang, and Y. H. Qian, “A new measure of uncertainty based on knowledge granulation for rough sets,” Information Sciences, vol. 179, no. 4, pp. 458–470, 2009.
[35] D. Q. Miao and J. Wang, “On the relationships between information entropy and roughness of knowledge in rough set theory (in Chinese),” Pattern Recognition and Artficial Intelligence, vol. 11, no. 1, pp. 34–40, 1998.
[36] ——, “An information representation of the concepts and operations in rough set theory (in Chinese),” Journal of Software, vol. 10, no. 2, pp. 113–116, 1999.
[37] Y. H. Qian, J. Y. Liang, and C. Y. Dang, “Converse approximation and rule extraction from decision tables in rough set theory,” Computers and Mathematics with Applications, vol. 55, no. 8, pp. 1754–1765, 2008.
[38] ——, “Knowledge structure, knowledge granulation and knowledge distance in a knowledge base,” International Journal of Approximate Reasoning, vol. 50, no. 1, pp. 174–188, 2009.
[39] J. H. Wang, J. Y. Liang, Y. H. Qian, and C. Y. Dang, “Uncertainty measure of rough sets based on a knowledge granulation for incomplete information systems,” International Journal of Uncertainty Fuzziness Knowledge-Based Systems, vol. 16, no. 02, pp. 233–244, 2008.
[40] M. J. Wierman, “Measuring uncertainty in rough set theory,” International Journal of General Systems, vol. 28, no. 4-5, pp. 283–297, 1999.
[41] Y. Y. Yao, “Information-theoretic measures for knowledge discovery and data mining,” in Entropy Measures, Maximum Entropy and Emerging Applications. Springer, 2003, pp. 115–136.
[42] P. Zhu and Q. Y. Wen, “Information-theoretic measures associated with rough set approximations,” Information Sciences, vol. 212, pp. 33–43, 2012.
[43] D. Q. Miao and S. D. Fan, “ The calculation of knowledge granulation and its application (in Chinese),” Systems Engineering Theory and Practice, vol. 22, no. 1, pp. 48–56, 2002.
[44] B. W. Xu, Y. M. Zhou, and H. M. Lu, “An improved accuracy measure for rough sets,” Journal of Computer and System Sciences, vol. 71, no. 2, pp. 163–173, 2005.
[45] L. Q. Zhao, “Study on the Model of Granular Computing (in Chinese),” Ph.D. dissertation, Anhui University, Hefei, China, 2007.
[46] L. Q. Zhao and L. Zhang, “Model of granular computing,” in International Conference of Theoretical and Mathematical Foundations of Computer Science, 2008, pp. 95–101.
[47] L. Q. Zhao, Y. Y. Yao, and L. Zhang, “Measurement of general granules,” Information Sciences, vol. 415-416, pp. 128–141, 2017.
[48] D. Sinha and E. R. Dougherty, “Fuzzification of set inclusion: Theory and applications,” Fuzzy Sets and Systems, vol. 55, no. 1, pp. 15–42, 1993.
[49] R. Sahin and M. Karabacak, “A multi attribute decision making method based on inclusion measure for interval neutrosophic sets,” International Journal of Engineering and Applied Sciences, vol. 2, no. 2, p. 258001, 2015.
[50] Z. B. Xu, J. Y. Liang, C. Y. Dang, and K. S. Chin, “Inclusion degree: a perspective on measures for rough set data analysis,” Information Sciences, vol. 141, no. 3, pp. 227–236, 2002.
[51] W. X. Zhang and Y. Leung, The Uncertainty Reasoning Principles (in Chinese). Xi’an, China: Xi’an Jiaotong University Press, 1996.
[52] A. Gomolińska, “On certain rough inclusion functions,” in Transactions on Rough Sets IX. Springer, 2008, pp. 35–55.
[53] A. Gomolińska and M. Wolski, “Rough inclusion functions and similarity indices,” Fundamenta Informaticae, vol. 133, no. 2-3, pp. 149–163, 2014.
[54] C. C. Wang and H. S. Don, “A modified measure for fuzzy subsethood,” Information Sciences, vol. 79, no. 3, pp. 223–232, 1994.
[55] M. Zhang, L. D. Xu, W. X. Zhang, and H. Z. Li, “A rough set approach to knowledge reduction based on inclusion degree and evidence reasoning theory,” Expert Systems, vol. 20, no. 5, pp. 298–304, 2003.
[56] E. Sanchez, “Inverses of fuzzy relations: Application to possibility distributions and medical diagnosis,” Fuzzy Sets and Systems, vol. 2, no. 1, pp. 75–86, 1979.
[57] L. I. Kuncheva, “Fuzzy rough sets: Application to feature selection,” Fuzzy Sets and Systems, vol. 51, no. 2, pp. 147–153, 1992.
[58] W. X. Zhang and Y. Leung, “Theory of including degrees and its applications to uncertainty inferences,” in Soft Computing in Intelligent Systems and Information Processing. Proceedings of the 1996 Asian Fuzzy Systems Symposium. IEEE, 1996, pp. 496–501.
[59] R. Clausius, Ueber verschiedene für die Anwendung bequeme Formen der Hauptgleichungen der mechanischen Wärmetheorie. Éditeur Inconnu, 1865.
[60] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.

	$\displaystyle\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}{p(a_{i}\cap b_{j})}$	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{1}{\|X\|}(\|a_{i}\cap b_{1}\|+\cdots+\|a_{i}\cap b_{l}\|)}$
		$\displaystyle=\frac{1}{n}\sum\limits_{i=1}^{k}{\|a_{i}\cap(b_{1}\cup\cdots\cup b_{l})\|}$
		$\displaystyle=\frac{1}{n}\sum\limits_{i=1}^{k}{\|a_{i}\cap B\|}$
		$\displaystyle\leq\frac{1}{n}\sum\limits_{i=1}^{k}{\|a_{i}\|}=\frac{m}{n}.$

	$\displaystyle\sum\limits_{i=1}^{k}{\frac{1}{{\|X{\|^{2}}}}}\left({\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}\|\|X-{a_{i}}\|+\sum\limits_{j=1}^{l}{\|{a_{i}}\cap{b_{j}}{\|^{2}}}}}\right)$
	$\displaystyle\leq\sum\limits_{i=1}^{k}{\frac{{(\|{a_{i}}\|\|X-{a_{i}}\|+\|{a_{i}}{\|^{2}})}}{{\|X{\|^{2}}}}}$
	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{{\|{a_{i}}\|}}{{\|X\|}}}\frac{{\|X-{a_{i}}\|+\|{a_{i}}\|}}{{\|X\|}}$
	$\displaystyle=\sum\limits_{i=1}^{k}{\frac{{\|{a_{i}}\|}}{{\|X\|}}}=\frac{m}{n}$

	$\displaystyle{sh_{l}}(A,B)$	$\displaystyle=\sum\limits_{i=1}^{l+1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\sum\limits_{i=1}^{l-1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\sum\limits_{i=1}^{l-1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{c_{i}}\cap{a_{j}}\|\|c_{i}^{c}\cup{a_{j}}\|}={sh_{l}}(A,C)$

	$\displaystyle{sh_{l}}(A,B)$	$\displaystyle=\sum\limits_{i=1}^{l+1}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\frac{{\sum\limits_{j=1}^{h}{\left({\|{b_{l}}\cap{a_{j}}\|\|b_{l}^{c}\cup{a_{j}}\|+\|{b_{l+1}}\cap{a_{j}}\|\|b_{l+1}^{c}\cup{a_{j}}\|}\right)}}}{{\|X{\|^{2}}}}$
		$\displaystyle+\frac{{\sum\limits_{i=1}^{l-1}{\sum\limits_{j=1}^{k}{\|{b_{i}}\cap{a_{j}}\|\|b_{i}^{c}\cup{a_{j}}\|}}}}{{\|X{\|^{2}}}}$
	$\displaystyle{sh_{l}}(A,C)$	$\displaystyle=\sum\limits_{i=1}^{l}{\frac{1}{{\|X{\|^{2}}}}}\sum\limits_{j=1}^{k}{\|{c_{i}}\cap{a_{j}}\|\|c_{i}^{c}\cup{a_{j}}\|}$
		$\displaystyle=\frac{{\sum\limits_{j=1}^{h}{\|{c_{l}}\cap{a_{j}}\|\|c_{l}^{c}\cup{a_{j}}\|}}}{{\|X{\|^{2}}}}+\frac{{\sum\limits_{i=1}^{l-1}{\sum\limits_{j=1}^{k}{\|{c_{i}}\cap{a_{j}}\|\|c_{i}^{c}\cup{a_{j}}\|}}}}{{\|X{\|^{2}}}}$