\onlineid

9717 \vgtccategoryVAHC \ieeedoi10.1109/VAHC47919.2019.8945032 \preprinttext $\copyright$ 2019 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE VAHC. The final version of this record is available at: 10.1109/VAHC47919.2019.8945032

Towards a Structural Framework
for Explicit Domain Knowledge in Visual Analytics

Alexander Rind e-mail: {firstname}.{lastname}@fhstp.ac.at Markus Wagner¹¹footnotemark: 1 Wolfgang Aigner¹¹footnotemark: 1 Center for Digital Health Innovation, Institute of Creative

\backslash{}

Media/Technologies
St. Pölten University of Applied Sciences, Austria

Abstract

Clinicians and other analysts working with healthcare data are in need for better support to cope with large and complex data. While an increasing number of visual analytics environments integrates explicit domain knowledge as a means to deliver a precise representation of the available data, theoretical work so far has focused on the role of knowledge in the visual analytics process. There has been little discussion about how such explicit domain knowledge can be structured in a generalized framework. This paper collects desiderata for such a structural framework, proposes how to address these desiderata based on the model of linked data, and demonstrates the applicability in a visual analytics environment for physiotherapy.

\CCScatlist\CCScatTwelve

Human-centered computingVisualizationVisualization theory, concepts and paradigms; \CCScatTwelveApplied computingLife and medical sciencesHealth care information systems

Introduction

To keep pace with the tremendously expanding volumes of complex and heterogeneous data, experts in many domains such as healthcare need to apply high-performance data analysis methods. Even though the sheer quantity demands automated analysis methods, this process cannot be automated completely, since domain experts need to be in the loop to identify, correct, and disambiguate intermediate results [48]. Visual analytics (VA) intertwines interactive visual interfaces with automated data analysis methods in order to support humans in data analysis [19, 40]. This allows for effectively distributing the workload of cognitive reasoning between human and machine [22, 26]. However, this endeavor is not straightforward as initial results from automated analysis are often trivial or irrelevant to the work of the domain experts [28]. Domain knowledge is needed to sift the relevant from the trivial.

Let us illustrate this with a hypothetical example: The general practitioner Jane is treating Mary who suffers from diabetes for more than a decade. Today, Mary’s creatinine levels are elevated and Jane needs to adjust treatment in order to avert kidney damage. She needs to consult Mary’s electronic health record to check for past medication with possible side effects on the kidney. However, the record is quite voluminous due to the long duration of treatment, comorbidities, lifestyle changes, pregnancy, and unrelated events such as seasonal colds. Jane’s medical experience helps her identify the relevant events, but she wonders if this knowledge can be part of her VA environment so that it automatically links medical findings to possible causes and available treatments.

Better integrating analysts’ knowledge has been emphasized by the VA community as a central research challenge in the field [4, 8, 9, 21, 29]. Consequently, an increasing number of VA environments integrate explicit knowledge, which is knowledge that “can be processed by a computer, transmitted electronically, or stored in a database” [47, p. 617]. Federico, Wagner et al. [15] recently coined the term knowledge-assisted visual analytics for such environments that have “features to generate, transform, and utilize explicit knowledge” [15, p. 92].

The emerging integration of explicit knowledge in VA environments, in particular through design studies that are grounded on the concrete needs of a target audience [37], provides the opportunity to reflect current practice and develop general guidelines and theoretical models for knowledge in VA [40]. However, theoretical work in knowledge-assisted VA to date tended to focus on the role of knowledge in the VA process [15, 32, 47] rather than on the explicit knowledge itself. Our field still lacks a generalized framework of how explicit domain knowledge can be structured, stored and made accessible to a VA environment. The prospective value of generalizing the particularly designed mechanisms for explicit knowledge are threefold: (i) it allows us to better understand and compare VA environments by their integration of explicit knowledge, (ii) it guides the development of future VA design studies, and (iii) it is a precondition for the exchange of explicit knowledge between VA environments. This third point, in particular, envisions an analytics ecosystem in which explicit knowledge is a first-class artifact and not limited to the scope of a single tool but can be reused in all the tools needed to perform an activity, without bothering the human to transcribe it into each environment separately.

Therefore, this paper collects desiderata for a structural framework of explicit domain knowledge in VA (Sect. 2). These desiderata originate from reflective discussions of three design studies that resulted in knowledge-assisted VA environments for different application domains (internal medicine [33], gait rehabilitation [45], and IT security [44]) and the review of the scientific literature, which is summarized in Sect. 1. Sect. 3 proposes how to address these desiderata based on the model of linked data. To demonstrate the applicability, Sect. 4 presents how explicit domain knowledge is integrated in the VA environment KAVAGait [45] that supports physiotherapists in gait rehabilitation.

1 Background and Related Work

Since “Illuminating the Path” [40, p. 35], incorporating prior domain knowledge and “build[ing] knowledge structures” has been on VA’s agenda. This is underscored by the pivotal position of knowledge in the VA process model by Keim et al. [20, 19] and further process models such as the knowledge generation model by Sacha et al. [34] and the visualization model by van Wijk [42]. However, these process models do not differentiate between knowledge in the human space and in the machine space. Based on Wang et al. [47], Federico, Wagner et al. [15] delineate tacit knowledge that is exclusively available to human reasoning, from explicit knowledge that can be leveraged by the VA environment. How explicit knowledge is integrated into the VA process is formalized in several recent models by Wang et al. [47], Ribarsky et al. [32], and Federico, Wagner et al. [15].

Beyond the role of knowledge in the VA process, only few works discuss the content and structure of explicit knowledge on a general level. Andrienko et al. [4] conceptualize domain knowledge as a model of a part of reality and provide definitions for different types of models but they do not specify the form and medium how the model is represented. Schulz et al. [36] formalize data descriptors that include domain knowledge about data. Tominski [41] captures domain knowledge as event types that are specified using predicate logic. Lammarsch at al. [25] propose a data structure for knowledge about temporal patterns leveraging the structure of time. The generation of adapted visualizations which are based on ontological datasets and the specification of ontological mappings are treated by Falconer et al. [12]. Therefore, they use the ‘COGZ’ tool, converting ontological mappings in software transformation rules so that it describes a model which fits the visualization. A similar approach for adapted visualizations is also followed by Gilson et al. [17], describing a general system pipeline which combines ontology mapping and probabilistic reasoning techniques. Thereby, they describe the automated generation of visualizations of domain-specific data from the web. However, none of these approaches aim for a general framework.

The application of visualization techniques to healthcare has sparked a lot of interest to integrate knowledge. Already the early LifeLines [30] approach envisioned how domain knowledge is used to highlight relationships between events, which was then realized by a simple full-text search. A number of approaches such as Midgaard [5, 1] and QualizonGraph [13] enrich the display of time series of medical parameters with qualitative levels. Thus, a period of critical conditions can be detected and visually highlighted, for example by color. The ViTA-Lab environment [24] and its preceding work [23, 38] leverage complex temporal data abstractions for pattern discovery and provide a case study of longitudinal analysis of 22,000 diabetes patient records. The Five W’s [52] environment arranges events of the health record hierarchically based on the knowledge about diseases formalized in the ICD9 taxonomy. Gnaeus [14] is a guideline-based knowledge-assisted visualization of electronic health records for cohorts. Evidence-based clinical practice guidelines are sets of statements and recommendations used to improve health care by providing a trustworthy comparison of treatment options in terms of risks and benefits according to patient’s status. The KAVAGait [45] tool is a knowledge-assisted VA environment for clinical gait analysis that supports analysts during diagnosis and clinical decision making. Users can load, visualize and compare patient gait data containing ground reaction forces (GRF) measurements gaining new knowledge, identify unseen pattern and recognize connections. KAMAS [44] is a similar knowledge-assisted VA environment for analyzing event sequences and categorizing suspicious sequences according to a taxonomy of behaviors. Even though KAMAS was originally designed for IT-security analysts, comparable data structures and analysis problems are also relevant for healthcare (e.g., [46, 51]).

Summarizing these findings, it can be seen, that most of the discussed approaches cover how explicit domain knowledge can be exploited to enhance visual representation and data analysis; some approaches provide methods to generate explicit knowledge. Additionally, most of the currently implemented knowledge-assisted VA environments are focused on the integration of specific domain knowledge, which could only be used for precisely defined analysis tasks. In general, explicit knowledge is now a first-class artifact in the VA process but its form and structure are left unspecified. None of the presented approaches provides a structural framework for describing and storing explicit knowledge in VA environments. Thus, a structural framework is needed and combined with the theoretical process model by Federico, Wagner et al. [15] it would provide valuable generative guidelines for the development of novel knowledge-assisted VA environments.

2 Desiderata

The following desiderata for a structural model of explicit domain knowledge were established in the reflective discussions of two recent design study projects, in which our collaboration with domain experts resulted in a VA environment [2, 44, 45] and preceding design study work that only sketched the integration of explicit knowledge [33, 1, 13]. Additional inputs result from analysis of the scientific work cited above. In particular, we build upon the characterization of knowledge by Federico, Wagner et al. [15] with its three axes space, type, and origin. Their knowledge characterization, however, is only descriptive about what is possible and is used to categorize knowledge-assisted VA environments for their survey.

Overall, we envision explicit domain knowledge as a first-class artifact in VA process that is both an input and an output of VA activities [15, 25]. As analytics in the “wild” are seldom constrained to a single isolated VA environment or a single data backend [16], we regard it imperative to design a framework for explicit domain knowledge in a way to allow manifestation of knowledge in various data structures and backends as well as utilization in different VA environments.

The nine desiderata are:

D1:

Explicit knowledge, by definition, resides in the machine space and is machine interpretable [15, 47]. Structured forms of knowledge representation allow the VA environment to reason about relationships within the knowledge (e.g., controlled vocabularies with same/different, taxonomies with hierarchical, or ontologies with custom relationships). While free text annotations or hand drawn polygons can capture knowledge in the machine space, they are, in their raw form, opaque to machine reasoning.
D2:

The structural model should focus on domain knowledge, i.e. interpreting the data. In contrast, operational knowledge, i.e. effectively using the VA environment, is out of its scope. While the latter can be tackled by usable user interfaces, user onboarding, and automated visualization recommendation, the former is essential for the success analysis, either as explicit knowledge or as tacit knowledge of the domain experts as user [15].
D3:

It should be possible to reuse pre-existing taxonomies or other knowledge artifacts that have been established within a community of practice.
D4:

The structural framework should facilitate the exchange of explicit knowledge between different VA environments.
D5:

The structural framework should be compatible with heterogeneous data structures and storage technologies.
D6:

Runtime editing of explicit knowledge should be possible from within the VA environment. Thus, all the origins of knowledge sketched by Federico, Wagner et al. [15] should be supported: (i) knowledge artifacts can predate the VA environment (cp. D3); (ii) explicit knowledge can be prepared during the VA environment’s design process; (iii) a single user can interactively externalize domain knowledge; (iv) multiple users can externalize and share knowledge; or (v) knowledge can be automatically derived from data [15].
D7:

In order to support provenance and accountability of knowledge generated, especially in the post-design phase, the framework should provide a standardized form to include authorship details and further provenance information.
D8:

There should be good software library support to integrate explicit knowledge into the source code of a VA environment. Widely used development languages like JavaScript, Python, and Java should be supported.
D9:

The notation of explicit knowledge should be easily readable and editable by knowledge engineers and visualization designers for development and debugging.

The last two desiderata (D8 and D9) might appear to be overly specific. Of course, it is possible to rely on custom software solutions and knowledge editors. However, our practical experience in designing knowledge-assisted VA environments underscored the importance of suitable software development support and easily debuggable notation to allow for the rapid feedback cycles typical in visualization design studies[27].

3 Structural Framework

The desiderata collected above characterize a generalized structural framework that enables the communication and reuse of explicit domain knowledge across different KAVA environments (D4), different datasets (D5), and different users (D6). In order to separate these aspects clearly, we suggest a structural framework consisting of three components (Fig. 1).

Refer to caption — Figure 1: The structural framework of explicit domain knowledge consists of concepts, their manifestation in datasets, and the utilization in KAVA environments.

1.

Concepts from the application domain and their relationships.

Example (diabetes): the diagnosis ‘hyperglycemia’, i.e. high blood sugar.

Example (physiotherapy): a gait abnormality of the knee during mid-stance phase.
2.

A mapping between concepts and the dataset(s) to manifest knowledge in the relevant data items.

Example: a blood sugar level higher than 200 mg/dl is diagnosed as hyperglycemia.
3.

A mapping between concepts and the KAVA environment(s) to utilize the knowledge.

Example (diabetes): visually represent period of hyperglycemia as a horizontal line.

Example (physiotherapy): provide list of known gait patterns and highlight patient marks on selection.

The vocabulary of concepts will remain comparatively stable for most domain problems (Table 1). Therefore, knowledge about concepts can act as an anchor when linking additional datasets to the analysis. If these datasets are heterogeneously structured (D5), adaptations to the manifestation component will be needed. Likewise, if additional VA environments are used (D4), the knowledge utilization mapping may be adapted. All three knowledge components can be manipulated by analysts via interaction with the VA environment (D6), whereby we assume that most changes will affect the manifestation component.

Table 1: Components of the structural knowledge framework and their dependence on domain problem, datasets, KAVA environments, and user interaction (•…high,

\circ

…low).

	concepts	manifestation	utilization
domain problem	•	$\circ$	$\circ$
dataset(s)		•
KAVA environment(s)			•
user interaction	$\circ$	•	$\circ$

3.1 Knowledge about Concepts

Concepts are in the center of the structural framework because they are comparatively stable over time. Since the concept component of the structural framework is independent from datasets and KAVA environments, it is possible to apply existing work from the field of knowledge representation. Linked data [7] and in particular the Resource Description Framework (RDF) [10] are established approaches for representing semantic information in a machine-interpretable form (D1).

The RDF models information about concepts in a directed graph that is specified as a set of triples. Each triple consists of a subject node, a property predicate, and an object, which can be a node or a literal. A node is identified by a globally unique resource identifier and can refer to anything from concrete persons/things to an abstract concept. For storing and exchanging these triples, several serialization formats exist. In the context of KAVA, we adopt in particular the Turtle and JSON-LD formats. Turtle [6] has a concise, text-based syntax that is suitable for debugging by VA designers and knowledge engineers (D9). JSON-LD [39] is a JSON-based format and, thus, compatible with many modern VA software libraries (D8). Alternatively, particular software libraries such as rdflib.js [31] directly support RDF and allow for operations based graph relations and semantics (D1).

Often there will be a pre-existing concept schemes such as ICD-10, MeSH, or SNOMED CT [11] that can be built upon (D3).
Example (diabetes): in the ICD-10 classification of health issues [49], there is a concept for hyperglycemia (without diabetes or other known diagnosis) in the R73 subbranch. In WikiData, hyperglycemia is known as concept Q271993 [50].

If a custom concept scheme is needed, the Simple Knowledge Organization System (SKOS) [18] provides a compact RDF vocabulary to describe the concepts of a semi-formal knowledge organization system such as a thesaurus or a taxonomy. The vocabulary of SKOS includes labels to describe concepts in natural language, a broader/narrower relationship for hierarchical links between concepts, and a related relationship for associative links.
Example (physiotherapy): a gait analysis laboratory characterizes abnormal gait pattern by concepts along a 3-level taxonomy that distinguishes first by gait phase (e.g., ‘mid stance’), then the affected joint (e.g., ‘knee’), and finally the direction (e.g., ‘sagittal’) (Fig. 2, Listings 1 and 2).

Listing 1: A gait pattern concept described with SKOS in Turtle format.

⬇

gps:midKnee rdf:type skos:Concept;

skos:prefLabel "abnormal mid stance phase of knee";

skos:broader gps:mid;

skos:narrower gps:midKneeSagittal;

skos:inScheme gps:gaitPatternSchema.

Listing 2: The same gait pattern concept described with SKOS in JSON-LD format.

⬇

{

"@id": "gps:midKnee",

"@type": "skos:Concept",

"skos:broader": {

"@id": "gps:midKnee"

"skos:narrower": {

"@id": "gps:midKneeSagittal"

"skos:inScheme": {

"@id": "gps:gaitPatternSchema"

"skos:prefLabel": "abnormal mid stance phase of knee"

}, ...

3.2 Knowledge Manifestation in Datasets

In addition to the concept component described above, a structural framework of explicit knowledge for VA needs a component to manifest domain concepts in the respective data items, because VA is primarily concerned with analysis of datasets (D2). We model also this component of the knowledge framework as an RDF graph, which makes it possible to leverage the semantics of concept and two established approaches for provenance (D7), the Dublin Core and the FOAF vocabularies [10, 18]. The RDF properties necessary for manifestation of domain knowledge will be illustrated in this subsection.

The concepts can be manifested on dataset items through either a direct or an indirect mapping depending on whether references or characteristics are given [3]. A direct mapping annotates individual known occurrences of a concept in the datasets by specifying their references, i.e. values for their identifying variables. Such explicit knowledge can be utilized (i) to train classification models from the annotated prototypes, (ii) to reconfirm the results of such models, (iii) to interpret data in context (Example in healthcare: blood sugar levels during pregnancy.), or even (iv) to hide irrelevant parts of the data.
Example (diabetes): the patient with ID 12345 is marked as having hyperglycemia. This knowledge has provenance to be created by ‘Doctor Dreamy’ on Feb 4, 2019 (Fig. 3 and Listing 3).

Listing 4: The hyperglycemia concept manifested by indirect mapping with a threshold on a blood test.

⬇

icd10:R73 kava:manifest [

kava:matchVariable [

kava:variable health:bloodSugar;

kava:minValue 200

]

Listing 5: The hyperglycemia concept manifested by indirect mapping with a custom query string on blood laboratory data.

⬇

icd10:R73 kava:manifest [

kava:matchQuery "[glucose] > 200";

dct:creator [foaf:name "ACME laboratory equipment"]

An indirect mapping describes the characteristics of all occurrences of a concept rather than identifying individual exemplars. The characteristics can be specified in a query predicate. Typical query predicates, e.g., on multivariate tables or sequences, can be expressed in a structured way by an RDF vocabulary. Yet, to support the widest possible range of dataset structures and data storage technologies (D5), the query predicate can also be given as a string using the query language of the underlying technology (e.g., SQL, XQuery, SPARQL, Gremlin). Reusing query strings of existing languages will also reduce the effort for developers (D9).
Example (diabetes): hyperglycemia can be characterized by blood sugar test over 200mg/dl (Listings 4 and 5).

3.3 Knowledge Utilization

Finally, concepts and their manifestation in data need to be utilized in the VA environment. A solid starting point for the utilization component are existing visualization grammars such as Vega-Lite [35], which describes data transformations, encoding on visual marks, and interactivity in a declarative JSON-based format.

However, there is a wide range of possible knowledge utilizations and these are integrated deeply into the application logic of their VA environments. Therefore, this component of the structural knowledge framework might be least amendable for generalization. Some typically approaches will be illustrated below.

•

All relevant concepts can be visualized as marks. It is possible to create a tree visualization based on broader/narrower relationships in SKOS or a network visualization based on the related relationships. Visual encoding channels can indicate how frequent a concept is in the data or how similar prototypical items of the concept are to the currently analyzed data (e.g., KAVAGait Fig. 4.1.a).
•

The marks of data items can have a different visual encoding depending on their manifested knowledge concepts – mapped either directly as a prototype or indirectly by fulfilling a query predicate.
•

Multiple data items manifested with the same concept can be shown as an aggregate mark (e.g., a horizontal line spanning the time period, while a patient suffered from hyperglycemia).
•

Query predicates can be parsed and visualized. For example, the area above 200mg/dl could be colored in the background of a blood sugar line plot to indicate risk of hyperglycemia.

4 Applying the Framework in KAVAGait

Next, we demonstrate how the explicit knowledge of an exemplary VA environment can be structured according to our framework. This scenario is based on the KAVAGait [45] design study with physiotherapists and illustrated using the processes of the knowledge-assisted VA model [15, 43].

KAVAGait [45] is a ‘Knowledge-assisted VA System for Clinical Gait Analysis’, whereby the analysts (clinicians) are supported during analysis and clinical decision making (see Fig. 4). The analysts have the ability to load patient gait data containing time series of ground reaction force measurements for each foot. The time series are visualized as line plots in the center of the user interface, describing the force over time. Additionally, 16 spatio-temporal parameters (e.g., step time, stance time, cadence) related to the loaded patient’s gait are calculated, visualized, and used for automated patient comparison and categorization based on the introduced interactive twin box plots. One primary goal during clinical gait analysis is to assess whether a recorded gait measurement displays ‘normal gait’ behavior or if not, which specific ‘gait abnormality’ is present. Thus, the environment’s explicit knowledge store contains several categories of ‘gait abnormalities’ (relating to e.g., knee, hip, ankle) as well as a category including ‘healthy gait pattern’ data, used for analysis and comparison by default. Each category is described by the data of patients that were previously assigned to this category. In particular, the $[min,max]$ ranges of the patients’ 16 spatio-temporal parameters are calculated. Based on these category descriptions, automated data analysis of newly loaded patient data is provided (e.g., automatically calculated category matching). This automated data analysis supports the analysts in their interactive data exploration. To achieve a second goal, clinicians can generate new explicit knowledge by adding the analysis result to the explicit knowledge store. KAVAGait also provides the ability to interactively explore and adjust the internally stored explicit knowledge.

Assessment of Patient Data: If the analyst loads a gait analysis file $D$ of a patient into KAVAGait (see Fig. 5 for a graphical overview of the knowledge processes), first, the contained time series are visualized $V$ based on the systems specification $S$ . Second, the stored explicit knowledge $K^{\epsilon}$ and the automated data analysis methods $A$ are strongly intertwined with all components of the VA environment. Thus, this pipeline immediately calculates the 16 spatio-temporal parameters based on the loaded time series and the matching to the different knowledge categories, affecting the specification. The combination of both former described procedures can be expressed as the initial analysis and visualization pipeline. However, if the specification is not influenced by the analyst (e.g., zooming, filtering, sorting), all stored explicit knowledge is used for analysis and comparison. Based on the generated visualization, the image $I$ is perceived by the analyst, gaining tacit knowledge $K^{\tau}$ , influencing the analysts perception $P$ . Depending on the tacit knowledge, the analyst has now the ability to interactively explore $E$ the visualized time series and spatio-temporal parameters by using the environment’s provided methods (e.g., zooming, filtering, sorting). During this iterative process, the analyst gains further tacit knowledge based on the adjusted visualization.

Explicit Knowledge Generation and Adjustment: To generate explicit knowledge $K^{\epsilon}$ (see Fig. 5 for a graphical overview), the analyst has the ability to include the spatio-temporal parameters of analyzed patients based on his/her clinical decisions to the explicit knowledge store, which can be described as the extraction $X$ of tacit knowledge. This explicit knowledge can be visualized in a separated view, whereby the explicit knowledge is automatically transformed into a dataset. Different views are providing the adjustment of the stored explicit knowledge by the analyst’s tacit knowledge.

Structuring the Explicit Knowledge: KAVAGait operates with explicit domain knowledge about different gait patterns. Thus, it needs a concept for each pattern. While a multi-level taxonomy of gait patterns would be possible (cp. Listing 1), for the KAVAGait prototype a flat list of 7 concepts is sufficient (‘affected knee’, …, ‘norm data’). The explicit knowledge is manifested in $[min,max]$ ranges for the spatio-temporal parameters. Therefore, KAVAGait needs a two-part approach: (1) Typically, the clinician categorizes the gait pattern of a patients and includes them in the explicit knowledge store under a category. In the explicit knowledge framework, this is expressed as a direct mapping linking the concept to their prototypical patients (cp. Listing 3). The $[min,max]$ ranges need to be calculated dynamically from the prototypical patients’ spatio-temporal parameters, because the patient population can be filtered e.g., by age. (2) If the clinician manually adjusts a $[min,max]$ range, this can be expressed as an indirect mapping with the given value range (cp. Listing 4). This explicit knowledge is utilized in multiple ways: (i) the concepts are represented as rows in the ‘Knowledge Table’ (Fig. 4.1.a). The values of the prototypical patients’ spatio-temporal parameters are shown as hatches in the ‘Parameter Explorer’ (Fig. 4.3) and to calculate the matchings and category differences. The $[min,max]$ ranges, which are either calculated from prototypical patients or manually adjusted, are visualized by the range sliders in the ‘Parameter Explorer’ and the boxes of the ‘Knowledge Table’.

Summary: This design study demonstrates that explicit knowledge extracted from the clinicians tacit knowledge opens the possibility to support clinicians during clinical decision making. Additionally, KAVAGait could also be used to share the knowledge of domain experts as well as to use it for educational support. In contrast to other analysis systems (e.g., based on MatLab), KAVAGait uses analytical and visual representation methods to provide a scalable and problem-tailored visualization solution following the visual analytics agenda [40, 19]. For keeping up with the large number of patients stored as explicit knowledge, clinical gait analysts need to continuously adapt the systems settings during the clinical decision making process. Supporting such interactive workflows is a key strength of visualization systems. Clinical gait analysis in particular profits from extensive interaction and annotation because it is a very knowledge-intensive job. By providing knowledge-oriented interactions, externalized knowledge can subsequently be used in the analysis process to support the clinicians. The newly developed visual metaphors provide an easy way to inspect variability of the data (e.g., standard deviation), allow to identify outliers in the data, and provide an easy to understand overview of the data and automated matching results. Additionally, based on the interactive twin box plots, it is possible to perform intercategory and patient comparisons by details on demand to find similarities in the data.

5 Conclusions and Next Steps

Addressing the need for deeper integration of domain knowledge, existing VA environments have found a multitude of mechanisms to structure and manage explicit domain knowledge. This paper set out to reflect and generalize the results from existing design study projects and work towards a structural framework for explicit domain knowledge in VA. This reflection has identified nine desiderata for such a structural framework. A preliminary structural framework is proposed that separates the concerns of explicit knowledge into three components: concepts, manifestation, and utilization. For the components concepts and manifestation, we apply linked data using SKOS and a custom RDF vocabulary for direct and indirect mapping to data items. The utilization component can be addressed based on the JSON-based Vega-Lite language. Being limited to the retrospective reflection of several knowledge-assisted VA design studies, this preliminary structural framework lacks any claims of completeness. Notwithstanding its limitations, this work certainly adds to our understanding of explicit knowledge in VA environments and provides a frame of reference for future design studies.

A natural progression of this work is to analyze a larger sample of VA environments in more detail by reverse engineering their explicit knowledge using this theoretical framework. Such a structured literature review can both survey the landscape of knowledge-assisted VA environments and identify missing vocabulary for knowledge manifestation in heterogeneous datasets. In particular, further modelling needs to examine more closely how concrete variables in datasets can be group to abstract concepts (e.g., a concept for blood sugar). In addition, the scope of knowledge utilization approaches can be more comprehensively surveyed. As a next step, the emerging structural knowledge framework needs to be tested in practice, either in new design study project or through evolution of an existing VA environment. Given the general claim, this implementation can also result in a reusable software library. Finally, the aspired real-world utility can only be assessed through empirical studies with users of VA environments.

Acknowledgements.

This work was partly funded by the Austrian Science Fund (FWF): P25489-N23 via KAVA-Time and by the Austrian Research Promotion Agency (FFG): grant #866855.

References

[1] W. Aigner, A. Rind, and S. Hoffmann. Comparative evaluation of an interactive time-series visualization that combines quantitative data with qualitative abstractions. Computer Graphics Forum, 31(3):995–1004, 2012. doi: 10.1111/j.1467-8659.2012.03092.x
[2] W. Aigner, A. Rind, and M. Wagner. KAVA-Time: Knowledge-assisted visual analytics methods for time-oriented data. In Tagungsband des 12. Forschungsforum der österreichischen Fachhochschulen (FFH) 2018, 2018.
[3] N. Andrienko and G. Andrienko. Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach. Springer, Berlin, 2006. doi: 10.1007/3-540-31190-4
[4] N. Andrienko, T. Lammarsch, G. Andrienko, G. Fuchs, D. Keim, S. Miksch, and A. Rind. Viewing visual analytics as model building. Computer Graphics Forum, 37(6):275–299, 2018. doi: 10.1111/cgf.13324
[5] R. Bade, S. Schlechtweg, and S. Miksch. Connecting time-oriented data and information to a coherent interactive visualization. In E. Dykstra-Erickson and M. Tscheligi, eds., Proc. CHI, pp. 105–112, 2004. doi: 10.1145/985692.985706
[6] D. Beckett and T. Berners-Lee. Turtle – terse RDF triple language. W3C Team Submission, 2011. accessed 2019-04-17 from http://www.w3.org/TeamSubmission/2008/SUBM-turtle-20080114/.
[7] T. Berners-Lee. Linked Data. personal note on design issues, 2006. accessed 2019-04-16 from https://www.w3.org/DesignIssues/LinkedData.html.
[8] C. Chen. Top 10 unsolved information visualization problems. IEEE Computer Graphics and Applications, 25(4):12–16, 2005. doi: 10.1109/MCG.2005.91
[9] M. Chen, D. Ebert, H. Hagen, R. Laramee, R. van Liere, K. Ma, W. Ribarsky, G. Scheuermann, and D. Silver. Data, information, and knowledge in visualization. IEEE Computer Graphics and Applications, 29(1):12–19, 2009. doi: 10.1109/MCG.2009.6
[10] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 concepts and abstract syntax. W3C Recommendation, 2014. accessed 2019-04-16 from http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.
[11] H. Dalianis. Medical classifications and terminologies. In H. Dalianis, ed., Clinical Text Mining: Secondary Use of Electronic Patient Records, pp. 35–43. Springer, Cham, 2018. doi: 10.1007/978-3-319-78503-5_5
[12] S. Falconer, R. Bull, L. Grammel, and M. Storey. Creating visualizations through ontology mapping. In Proc. Int. Conf. on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 688–693, 2009. doi: 10.1109/CISIS.2009.40
[13] P. Federico, S. Hoffmann, A. Rind, W. Aigner, and S. Miksch. Qualizon Graphs: Space-efficient time-series visualization with qualitative abstractions. In Proc. Int. Working Conf. Advanced Visual Interfaces, AVI, pp. 273–280. ACM, 2014. doi: 10.1145/2598153.2598172
[14] P. Federico, J. Unger, A. Amor-Amorós, L. Sacchi, D. Klimov, and S. Miksch. Gnaeus: utilizing clinical guidelines for knowledge-assisted visualisation of EHR cohorts. In E. Bertini and J. C. Roberts, eds., Proc. EuroVis Workshop on Visual Analytics (EuroVA), pp. 79–83. Eurographics, 2015. doi: 10.2312/eurova.20151108
[15] P. Federico, M. Wagner, A. Rind, A. Amor-Amorós, S. Miksch, and W. Aigner. The role of explicit knowledge: A conceptual model of knowledge-assisted visual analytics. In Proc. IEEE Conf. Visual Analytics Science and Technology (VAST), pp. 92–103. IEEE, 2017. doi: 10.1109/VAST.2017.8585498
[16] J.-D. Fekete. Visual analytics infrastructures: From data management to exploration. Computer, 46(7):22–29, 2013. doi: 10.1109/MC.2013.120
[17] O. Gilson, N. Silva, P. Grant, and M. Chen. From web data to visualization via ontology mapping. Computer Graphics Forum, 27(3):959–966, 2008. doi: 10.1111/j.1467-8659.2008.01230.x
[18] A. Isaac and E. Summers. SKOS simple knowledge organization system primer. W3C Working Group Note, 2009. accessed 2019-04-14 from http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/.
[19] D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, eds. Mastering The Information Age – Solving Problems with Visual Analytics. Eurographics, Goslar, Germany, 2010.
[20] D. A. Keim, F. Mansmann, J. Schneidewind, J. Thomas, and H. Ziegler. Visual Analytics: Scope and challenges. In S. J. Simoff, M. H. Böhlen, and A. Mazeika, eds., Visual Data Mining, LNCS 4404, pp. 76–90. Springer, Berlin, 2008. doi: 10.1007/978-3-540-71080-6_6
[21] T. Keller and S. Tergan. Visualizing knowledge and information: An introduction. In S. Tergan and T. Keller, eds., Knowledge and Information Visualization, LNCS 3426, pp. 1–23. Springer, Berlin, 2005. doi: 10.1007/11510154_1
[22] D. Kirsh. Thinking with external representations. AI & Society, 25(4):441–454, 2010. doi: 10.1007/s00146-010-0272-8
[23] D. Klimov, Y. Shahar, and M. Taieb-Maimon. Intelligent visualization and exploration of time-oriented data of multiple patients. Artificial Intelligence in Medicine, 49(1):11–31, 2010. doi: 10.1016/j.artmed.2010.02.001
[24] D. Klimov, A. Shknevsky, and Y. Shahar. Exploration of patterns predicting renal damage in patients with diabetes type ii using a visual temporal analysis laboratory. Journal of the American Medical Informatics Association, 22(2):275–289, 2014. doi: 10.1136/amiajnl-2014-002927
[25] T. Lammarsch, W. Aigner, A. Bertone, S. Miksch, and A. Rind. Towards a concept how the structure of time can support the visual analytics process. In S. Miksch and G. Santucci, eds., Proc. Int. Workshop Visual Analytics (EuroVA) in conjunction with EuroVis 2011, pp. 9–12. Eurographics, Goslar, Germany, 2011. doi: 10.2312/PE/EuroVAST/EuroVA11/009-012
[26] Z. Liu, N. J. Nersessian, and J. T. Stasko. Distributed cognition as a theoretical framework for information visualization. IEEE Trans. Visualization and Computer Graphics, 14(6):1173–1180, 2008. doi: 10.1109/TVCG.2008.121
[27] N. McCurdy, J. Dykes, and M. Meyer. Action design research and visualization design. In Proc. 6th Workshop Beyond Time and Errors on Novel Evaluation Methods for Visualization (BELIV), pp. 10–18. ACM, 2016. doi: 10.1145/2993901.2993916
[28] N. McCurdy, J. Gerdes, and M. Meyer. A framework for externalizing implicit error using visualization. IEEE Trans. Visualization and Computer Graphics, 25(1):925–935, 2018. doi: 10.1109/TVCG.2018.2864913
[29] W. A. Pike, J. Stasko, R. Chang, and T. A. O’Connell. The science of interaction. Information Visualization, 8(4):263–274, 2009. doi: 10.1057/ivs.2009.22
[30] C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller, and B. Shneiderman. LifeLines: Using visualization to enhance navigation and analysis of patient records. In Proc. AMIA Symp., pp. 76–80, 1998.
[31] Read-Write Linked Data. rdflib.js, 2019. accessed 2019-08-06 from https://github.com/linkeddata/rdflib.js.
[32] W. Ribarsky and B. Fisher. The human-computer system: Towards an operational model for problem solving. In Proc. Hawaii Int. Conf. System Sciences (HICSS), pp. 1446–1455, 2016. doi: 10.1109/HICSS.2016.183
[33] A. Rind, W. Aigner, S. Miksch, S. Wiltner, M. Pohl, T. Turic, and F. Drexler. Visual exploration of time-oriented patient data for chronic diseases: Design study and evaluation. In A. Holzinger and K. Simonic, eds., Information Quality in e-Health, Proc. USAB 2011, LNCS 7058, pp. 301–320. Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-25364-5_22
[34] D. Sacha, A. Stoffel, F. Stoffel, B. C. Kwon, G. Ellis, and D. Keim. Knowledge generation model for visual analytics. IEEE Trans. Visualization and Computer Graphics, 20(12):1604–1613, 2014. doi: 10.1109/TVCG.2014.2346481
[35] A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-Lite: A grammar of interactive graphics. IEEE Trans. Visualization and Computer Graphics, 23(1):341–350, 2017. doi: 10.1109/TVCG.2016.2599030
[36] H.-J. Schulz, T. Nocke, M. Heitzler, and H. Schumann. A systematic view on data descriptors for the visual analysis of tabular data. Information Visualization, 16(3):232–256, 2017. doi: 10.1177/1473871616667767
[37] M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology: Reflections from the trenches and the stacks. IEEE Trans. Visualization and Computer Graphics, 18(12):2431–2440, 2012. doi: 10.1109/TVCG.2012.213
[38] Y. Shahar and C. Cheng. Intelligent visualization and exploration of time-oriented clinical data. In Proc. 32nd Ann. Hawaii Int. Conf. System Sciences (HICSS-32), 1999. doi: 10.1109/HICSS.1999.773019
[39] M. Sporny, G. Kellogg, and M. Lanthaler. JSON-LD 1.0: A JSON-based serialization for linked data. W3C Recommendation, 2014. accessed 2019-04-16 from http://www.w3.org/TR/2014/REC-json-ld-20140116/.
[40] J. J. Thomas and K. A. Cook, eds. Illuminating the Path: The Research and Development Agenda for Visual Analytics. IEEE, 2005.
[41] C. Tominski. Event-based concepts for user-driven visualization. Information Visualization, 10:65–81, 2011. doi: 10.1057/ivs.2009.32
[42] J. J. Van Wijk. The value of visualization. In Proc. IEEE Visualization, pp. 79–86, 2005. doi: 10.1109/VISUAL.2005.1532781
[43] M. Wagner. Integrating Explicit Knowledge in the Visual Analytics Process: Model and Case Studies on Time-oriented Data. PhD Thesis, TU Wien, Vienna, 2017.
[44] M. Wagner, A. Rind, N. Thür, and W. Aigner. A knowledge-assisted visual malware analysis system: Design, validation, and reflection of KAMAS. Computers & Security, 67:1–15, 2017. doi: 10.1016/j.cose.2017.02.003
[45] M. Wagner, D. Slijepcevic, B. Horsak, A. Rind, M. Zeppelzauer, and W. Aigner. KAVAGait: Knowledge-assisted visual analytics for clinical gait analysis. IEEE Trans. Visualization and Computer Graphics, 25(3):1528–1542, 2018. doi: 10.1109/TVCG.2017.2785271
[46] T. D. Wang, K. Wongsuphasawat, C. Plaisant, and B. Shneiderman. Extracting insights from electronic health records: Case studies, a visual analytics process model, and design recommendations. Journal of Medical Systems, 35(5):1135–1152, 2011. doi: 10.1007/s10916-011-9718-x
[47] X. Wang, D. H. Jeong, W. Dou, S.-W. Lee, W. Ribarsky, and R. Chang. Defining and applying knowledge conversion processes to a visual analytics system. Computers & Graphics, 33(5):616–623, 2009. doi: 10.1016/j.cag.2009.06.004
[48] P. Wegner. Why interaction is more powerful than algorithms. Commun. ACM, 40(5):80–91, 1997. doi: 10.1145/253769.253801
[49] WHO. ICD-10 Version:2016, 2016. accessed 2019-04-16 from https://icd.who.int/browse10/2016/en#/R73.
[50] Wikidata. Q271993, 2019. accessed 2019-05-05 from https://www.wikidata.org/w/index.php?title=Q271993&oldid=919520428.
[51] K. Wongsuphasawat and D. Gotz. Exploring flow, factors, and outcomes of temporal event sequences with the Outflow visualization. IEEE Trans. Visualization and Computer Graphics, 18(12):2659–2668, 2012. doi: 10.1109/TVCG.2012.225
[52] Z. Zhang, B. Wang, F. Ahmed, I. V. Ramakrishnan, R. Zhao, A. Viccellio, and K. Mueller. The Five Ws for information visualization with application to healthcare informatics. IEEE Trans. Visualization and Computer Graphics, 19(11):1895–1910, 2013. doi: 10.1109/TVCG.2013.89

Towards a Structural Framework for Explicit Domain Knowledge in Visual Analytics