xAI

User-Centric Visualization in the Service of Explainable ML+AI

by santiago lombeyda, cd³/caltech

Visualization is the simple encoding of information into (mostly) visual representations. The process relies on mapping data fields into visual elements through available encoding channels. This encoding channels include position (x,y,z), shape, size, orientation, and color. The ability to discriminate when to use particular encoding channels is guided not only by measurable perception metrics from cognitive studies and general visual layout strategies and methodologies learned from graphic arts, but ultimately comes down to the particular goals of the user viewing, inspecting, and interacting with the crafted visualization tools.

COURSE: [ Visualization | Big Data Analytics | Caltech 2019] by Santiago Lombeyda

Novel and creative visualization representations become useful when paired with meaningful user goals. Thus, user-centric methodologies can be successfully utilized to create informative interactive visualization solutions at most steps of ML and AI processes, where interacting or understanding the results (or partial results) is visually meaningful on its own or in the context of other results. This poster will explore solutions created for different states of ML processes, resulting from user-centric design approaches.

user centric a| CURATION OF TRAINING DATA SET

In the service of creating a training dataset discribing the taxonomy of bio masses and their possible classification as tumors. Given the importance for the user to understand the 3D spatial relationships of the data, and the need to inspect the structures while preserving context of the overall structure and placement, we arrived to a Virtual Reality based solution.

[About OVS+TUMOR]

We created OVS+Tumor: a seamless VR environment designed for intuitive interaction aiding in the complex task of parsing through 3D CT-scans and annotating candidate tumors. Through interactive subsetting and on-the-fly iso-cloud generation, a wider range of users beyond just domain experts (radiologists/surgeons) can generate a viable machine-learning training dataset.

[About OVS+TUMOR]

[OVS+Tumor: a tool for enhanced lung tumor annotation in VR for machine learning training and analysis. S Lombeyda, A Mahabal, D Crichton, H Kincaid, SG Djorgovski, C Patriotis, S Srivastava; ACM SIGGRAPH 2019]

courtesy of CD3/caltech, JPL/NASA & EDRN/NIH

[OVS+TUMOR] by santiago lombeyda

researchers: EDRN Endeavor at JPL/NASA^PI

user centric b1| DATA EXPLORATION OF ML PROCESSES

Self Organizing Maps (SOM) is an unsupervised machine learning technique that has become quite valuable for understanding a wide spectrum of biological processes, as it produces a "low-dimensional representation of a higher dimensional data space, while preserving the topological structure of the data". Understanding how a researcher can make best sense of the SOM data can be as straight forward as finding clear visual mechanisms to convey the mapped data into an interactive system which responds to the natural queries about specific populations that researchers would have after gaining a clear understanding of the overall structure of the data. After understanding what the needs to the researchers were, and how the representation would guide them into their exploration and analysis, an interactive web based tool was easily deployed to help answer researcher questions.

[A comparative encyclopedia of DNA elements in the mouse genome, Nature, November 2013]

courtesy of WOLDLAB/caltech

[SOMMAPVIEWER] by santiago lombeyda

researchers: Gilberto DeSalvo, Barbara Wold^PI

user centric b2| DATA EXPLORATION ENGAGING ML PROCESSES

PIXLIZE (formerly PIXELATE) is a microXRF visualization and analysis tool built for NASA JPL's Mars 2020 rover PIXL instrument team. The tool is built to enable the team's search for signs of past life on Mars, as well as to explore the red planet's geologic history. Thus, the users central need is to have a tool which could visualize the spatial relationships and concentrations of elements in Martian rock samples in order to enable the search for signs of past life on the red planet.

[DATA 2 DISCOVERY : CASE STUDY : PIXLISE]

[PIXLISE-C: Data Analysis for Mineral Identification]

The project tested a diverse range of ML processes, with most revealing some level of insight. These included t-SNE, PCA, and k-Means clustering. However, it is the blending between human and machine intelligence that raises the major challenges. (See [PIXLISE-C: Data Analysis for Mineral Identification] for more detailed analysis.) For instance, while ML processes in dimensionality reduction, like t-SNE, utilized in the initial understanding of the data relationships, we encountered that a much higher priority for the user was the ability to compare locations and detect anomalies in the spectroscopy. Thus, a main feature of the current version of PIXLIZE --which is actively being used to study the data from Mars 2020 rover-- is a Variational Auto Encoder.

[PIXELATE: A Novel Visualization and Computational Methods for the Analysis of Astrobiological Spectroscopy Data, 2019 Astrobiology Science Conference, AGU, 2019]

[Data2Discovery: Case Study: PIXLIZE]

[PIXELATE | The Design Trajectory by Adrian Galvin]

courtesy of JPL/caltech/NASA

original [PIXALATE] by david schurman, pooja nair & adrian galvin, as part of the [DATA2DISCOVERY JPL/CALTECH/ARTCENTER] program mentored by scott davidoff, santiago lombeyda, hillary mushkin & maggie hendrie

researchers: PIXL^JPL/NASA

user centric c| ML SPACES

The DARPA Data Driven Discovery of Models (D3M) program automates methods in data science to enable domain experts to incorporate their knowledge into the modeling process and create meaningful and valid predictive models of real, complex processes without the need for expert data scientists. All data from the D3M AutoML explorations is available for exploration through MARVIN. MARVIN is a visual interface that helps you navigate the rich D3M AutoML ecosystem. Powered by the metalearning database, which captures all assets, pipelines, and experiments run, MARVIN acts as a frontend to help explore and analyze metalearning resources.

LIVE [MARVIN, An Open Machine Learning Corpus and Environment for Automated Machine Learning]

MARVIN allows one to explore the datasets, problems, primitives, pipelines, and ML pipeline runs in the D3M Ecosystem. One can easily compare how selecting various primitives for data preprocessing, encoding, embedding, featurization, and modeling improves the accuracy of particular auto-generated pipelines (problem solutions). A dynamic leaderboard can be generated for any problem by ranking captured pipeline runs on the scored metric. Recognizing how a user chooses to interact with such a rich database, has led to solutions from simple (simple barchart) to complex (comparative exploration of different pipelines) that can be tackled at different levels of granularity of the data, at different levels in the data exploration, and targeted to different user groups.

LIVE [MARVIN, An Open Machine Learning Corpus and Environment for Automated Machine Learning]

[About MARVIN]

courtesy of JPL, D3M, & DARPA

[MARVIN Visualization Components] by sami sahnoune, santiago lombeyda, and brian wilson, utilizing base graph layout from D3M/NYU [PipelineProfiler]

researchers: D3M PROGRAM^DARPA