Shiny Mosaics: A Shiny Implementation of Interactive Mosaic Displays
Author
Gabriel Crone & Naomi Martinez Gutierrez
1 Building Blocks: n-way tables, log-linear models, and the mosaic display
1.1 What is an n-way table?
An n-way table (or contingency table) is a multi-dimensional grid used to summarize the relationship between two or more categorical variables. Each cell in the table represents the joint frequency (count) of a specific combination of variable levels. For example, a 3-way table (\(A \times B \times C\)) cross-classifies observations by three factors simultaneously. As \(n\) increases, the complexity of the table grows, making raw frequency counts difficult to interpret without statistical modeling or visualization.
1.2 What are log-linear models?
Log-linear models are a class of generalized linear models used to analyze patterns of association and independence in n-way tables. Rather than predicting a dependent variable, log-linear models treat all variables symmetrically, modeling the logarithm of the expected cell frequencies (\(\mu\)) as a linear combination of main effects and interaction terms:
\[\log(\mu_{ijk}) = \lambda + \lambda_i^A + \lambda_j^B + \lambda_k^C + \lambda_{ij}^{AB} + \dots\] The goal is to find the most parsimonious model that explains the data. A Mutual Independence model assumes no associations between variables, while a Saturated Model perfectly fits the data by including all possible interaction terms.
1.3 What are mosaic displays?
A mosaic display is a graphical representation of an n-way table where the area of each “tile” is proportional to the observed frequency in that cell.
They serve as the visual counterpart to log-linear models by encoding the model residuals (i.e., the difference between observed and expected frequencies) into the color and intensity of the tiles.
The Visual Grammar of the mosaic display is:
Tiles: The area represents the counts.
Shading: Colors represent the residuals (typically Pearson or Standardized residuals).
Interpretation: Blue shading indicates that the observed count is higher than the model predicts, while red indicates it is lower.
1.4 Worked example
To illustrate, consider a psychopharmacological study investigating a new drug. The study consists of three variables: Treatment (Active v. Placebo), Diagnosis (Depression v. Anxiety), and Outcome (Improved v. Declined).
Treatment
Diagnosis
Outcome
Frequency
Active
Depression
Improved
85
Active
Depression
Declined
15
Active
Anxiety
Improved
70
Active
Anxiety
Declined
30
Placebo
Depression
Improved
40
Placebo
Depression
Declined
60
Placebo
Anxiety
Improved
35
Placebo
Anxiety
Declined
65
By fitting a Mutual Independence model, we can visualize efficacy of the drug:
Code
library(vcd)library(MASS)# Create the data frametrial_data <-data.frame(Treatment =rep(c("Active", "Placebo"), each =4),Diagnosis =rep(c("Depression", "Anxiety"), each =2, times =2),Outcome =rep(c("Improved", "Declined"), times =4),Freq =c(85, 15, 70, 30, 40, 60, 35, 65))# Convert to table for loglmtrial_tab <-xtabs(Freq ~ Treatment + Diagnosis + Outcome, data = trial_data)# Fit Mutual Independence model [T][D][O]mod_indep <- MASS::loglm(~ Treatment + Diagnosis + Outcome, data = trial_tab)# Generate Mosaic with Friendly shadingvcd::mosaic(mod_indep, shade =TRUE, gp = shading_Friendly,main ="Mutual Independence Model: Clinical Trial")
Figure 1: Mosaic Display of Clinical Data Testing Mutual Independence
2 Mosaic Displays in action: mosaics app
The original mosaics web application, developed by Dr. Michael Friendly, serves as our inspiration for modern interactive categorical data visualization. As a pioneering tool in the field, it was designed to allow researchers to move beyond static plots and create customized mosaic displays based on either user-inputted data or a selection of default datasets.
The original mosaics interface provided customizations that allowed user control over both the statistical and visual aspects of the display:
2.0.1 Analysis and Model Specification
The app allowed users to define the underlying statistical logic of the plot by selecting:
Fit Type: Users could specify the type of independence model being tested, such as MUTUAL (all variables independent), JOINT (one variable independent of others), or CONDIT (conditional independence).
Variable Order: Users could manipulate the order of variables in the model, which dictates the nesting of the mosaic tiles.
Residual Type: The app supported various methods for calculating residuals, including Pearson (GF), Likelihood Ratio (LR), and Freeman-Tukey (FT).
2.0.2 Display Options
To ensure the resulting plots were informative and publication-ready, mosaics included several aesthetic controls:
Fill Type: Options to adjust the pattern and style of shading to represent residual magnitude.
Text Height: The ability to scale the height of text labels for better legibility.
Split Directions: Control over the “direction” of the partitions (horizontal vs. vertical), allowing users to choose how to divide the mosaic to best reveal the data structure.
While the original mosaics app established these essential parameters for categorical analysis, its reliance on older web technologies provided the motivation for the development of our modern, R-based re-imagining.
3mosaics Re-imagined: Shiny Mosaics
In an effort to improve mosaics, the authors created Shiny Mosaics. The app was built using the shiny R package (Chang et al. 2025), which excels at creating fully interactive web apps. Below are some ways in which Shiny Mosaics is an improvement to mosaics:
mosaics lacked interactivity. If one wanted to modify their mosaic display, they would need to start over from scratch each time. With Shiny Mosaics, users can modify any number of features of their display (e.g., plot title, theme, residuals) and the plot will update in real time.
mosaics was built using a set of PERL scripts, and whose code is available upon being run. However, most statistical coding is now done with the open-source, R coding language. Shiny Mosaics uses R, and all of its associated code is available on Github.
Shiny Mosaics provides additional features that mosaics lacks. For example, it provides a color table preview of inputted data using the color_table function (Friendly 2026), allows users to input custom statistical formulas (underlying the log-linear model; see Section 1.1), and permits full color customizations (for the custom theme).
3.1 How does the Shiny Mosaics app work?
Shiny Mosaics was designed to be intuitive and user-friendly. Using the app is as simple as following three steps:
3.1.1 Step 1: Select Data
Users begin by navigating to the sidebar and customizing the Data Source. If they wish to select from a set of example data sets, they would tick off, “Use Sample Data” and select from any data set provided in the accompanying drop-down menu.
If they wanted to instead upload their own data, the app would accept any data file so long as it meets two criteria:
It is in .csv or .tsv format.
It is in frequency format: The data has one column specifying frequency, with two or more columns of categorical variables (typically, these are of factor or character type in R).
If users choose to upload their data, they must specify which data column captures frequency (e.g., freq, n) and all of the other categorical variable columns. (They need not specify all of them; if they prefer to only focus on a subset, they may select a subset of the categorical variables.)
As soon as a data source is specified, the app will automatically generate a color_table()(Friendly 2026) that previews the tabular data. This table functions as a check to determine if the data were imported correctly, and is akin to a mosaic display previewer: it fits a mutual independence model, then creates a table whereby blue cells have a positive residual, red cells have a negative residual, and the shading corresponds with the strength of the residual (e.g., weak; strong).
The user must then hit the Next button to proceed to Step 2.
3.1.2 Step 2: Variable Selection
Users may select a subset of the variables from their inputted data. To aid them in deciding which variables to use, the app renders a written preview of all of the categorical variables, along with each one’s corresponding unique levels (e.g., sex: male, female; contact: High, Low). Users should select at least two variables to ensure that the rendered mosaic display is informative.
By default, if no variables are selected, all variables in the data set will be used.
To proceed, users once again hit the Next button.
3.1.3 Step 3: Specify Model & Customize Plot
Lastly, users specify the underlying statistical model for the plot, and (optionally) customize it further. At this step, a mosaic display is rendered, and modifying either the model or the customization will automatically alter the mosaic display.
3.1.3.1 3a: Specify Model
Users have the option of specifying a statistical formula underlying the log-linear model, as well as the residuals for the model (see Section 1.2). By default, No customization is selected, which simply fits the mutual independence model (e.g., [A][B][C]…[K] for a k-way table) to the data.
If users wish to specify from a list of reasonable formula options, they may select Select model type to produce a drop-down menu of reasonable model types (e.g., mutual independence, joint independence, conditional independence, markov, saturated). Once a formula is selected, the server will automatically update the mosaic plot to using a log-linear model with the specified formula. So as to allow users to see if their formula is appropriate, the formula will appear below the model selection box, and will also be displayed as the plot title.
If users wish to specify their own formula by typing it in, they may do so by selecting Write custom formula. To remind the user of the variable names, they will appear above the text box. Within the text box, users would type out their formula. The generic formula shown is of the form Freq ~ A + B + C, but users may use any acceptable model syntax in their formula (e.g., including interaction terms with * or : such as in freq ~ A * B + C), and include (or exclude) any factors they so wish to.
Lastly, users may select a residual type to be implemented in the mosaic display. If they do not specify a formula, they may choose from one of three options (for details, see https://cran.r-project.org/web/packages/vcd/vcd.pdf, p. 116):
pearson- Uses components of Pearson’s \(\chi^2\),
deviance- Uses components of the likelihood ratio \(\chi^2\), and
ft- Uses Freemen-Tukey residuals.
If a formula is specified, Freeman-Tukey residuals are not available, so in its place, the rstandard option can be selected. rstandard uses standardized, Pearson \(\chi^2\) residuals.
3.1.3.2 3b: Customize the mosaic display
There are many options to customize and personalize the mosaic display so it best suits the user’s preferences:
Shading Style- Users may choose from one of: Friendly, Friendly2, sieve, and custom (for details, see vcd::shadings). If custom is selected, users may specify the exact color for the positive and negative residuals. Each color is selected using a color picker widget (created with the colourpicker R package (Attali 2023)).
Residual labels- Whether or not to show residual labels within the cells of the plot.
Whether or not to display the formula title in the plot (defaults to yes).
Whether or not to display the \(G^2\) statistic in the title (only shows if one specifies their own formula; defaults to no). If a user wishes to display both the formula title and \(G^2\) in the plot, the title will print the formula, followed by a comma and the \(G^2\) value (e.g., “Freq ~ A + B, \(G^2\) = 10”. If only \(G^2\) is specified, only the \(G^2\) will be displayed (e.g., “\(G^2\) = 10”).
Split directions: Whether to shift the x-axis with the y-axis. Ticking off the respective box will swap the axis, and un-ticking it off will swap the axes back to how they were before.
To transfer the final mosaic display, users should simply right click the plot, copy it, and paste it to wherever they wish. Alternatively, they may download the plot as a PNG or SVG file by simply clicking on the PNG or SVG download buttons (located at the top-right of the plot) respectively.
3.2 App Demonstrations
To show Shiny Mosaics in action, we created short demonstrations. Each demonstration shows the app being used from start to finish. The first is a simple run-through with a default data set without graphical customization, whereas the second involves uploading a custom data set and customizing the mosaic display.
3.2.1 Demo 1: Basic Workflow
In this demo, the user selects the EmploymentStatus dataset (Step 1), leaves the factor selection unchanged (Step 2), configures a conditional model with standardized residuals (rstandard), and exports the resulting plot as a PNG file (Step 3).
3.2.2 Demo 2: Customized Workflow
In this demo, the user imports a custom data set, and selects the appropriate frequency column and factors (Step 1), specifies three factors to display (Step 2), specifies the model, customizes the color theme of the plot, and exports it as a SVG file (Step 3).
4 AI Disclosure
Generative artificial intelligence (AI), namely ChatGPT 4.0, was used regularly throughout the course of creating the code for the app. Whenever GPT 4.0 was used, the first author ensured that all code produced by it was correct by rigorously inspecting the code to ensure it was accurate, then running it over many trials to ensure that it worked as it should. Often, the first author would push back against advice suggested by the AI if it was cumbersome, verbose, or inaccurate, opting instead to always generate code that would work based on his prior experience with shiny and R. In the spirit of transparency, all chats with AI are available here.
GPT 4.0 was also used to troubleshoot any bugs and new features for the app. As before, whenever GPT 4.0 was consulted, the first author ensured that solutions proposed worked theoretically (the code was logical and coherent with the code base) and worked in practice (via re-running the app and testing it out several times)
AI, however, was not consulted throughout the process of creating this report.