Deconvolution and nonnegative matrix factorization

Peter Ralph

Advanced Biological Statistics

Visualizing expression space

A conceptual model

Let’s build a conceptual model for descriptive analysis of “mixture” expression data.

Data: expression data from tissue samples that consist of various mixtures of different cell types.

Goal: identify shared coexpression patterns corresponding to cell type.

Similar situations: identify different developmental stages from whole-organism expression; common community structures from metagenomic data.

Each cell type has a typical set of mean expression levels.
Each sample is composed of a mixture of cell types, defined by the proportions that come from each type.

Mean expression by cell type.
Cell type proportions by sample.

\(x_{kj}\) : Mean expression of gene \(j\) in cell type \(k\).
\(w_{ik}\) : Proportion of sample \(i\) of cell type \(k\).

\(Z_{ij}\) : expression level in sample \(i\) of gene \(j\).

\[\begin{aligned} Z_{ij} \approx \sum_{k=1}^K w_{ik} x_{kj} . \end{aligned}\]

Nonnegative matrix factorization

… aka “NMF”

We are decomposing \(Z\) into the product of two lower-dimensional, nonnegative factors:

\[\begin{aligned} Z_{ij} &\approx \sum_k w_{ik} x_{kj} \\ w_{ik} &\ge 0 \\ x_{kj} &\ge 0 . \end{aligned}\]

A simple NMF model

simple_nmf <- stan_model(model_code="
data {
    int N; // samples
    int L; // variables
    int K; // factors
    real Z[L,N];
}
parameters {
    matrix<lower=0>[L,K] x;
    matrix<lower=0>[K,N] w;
    real<lower=0> sigma;
}
model {
    for (j in 1:L) {
        Z[j] ~ normal(x[j] * w, sigma);
    }
}
")

Relationship to PCA

PCA finds \(w\) and \(z\) to minimize \[\begin{aligned} \sum_{ij} \| Z_{ij} - \sum_k w_{ik} x_{kj} \|^2 . \end{aligned}\]

In other words, it is the maximum-likelihood solution to \[\begin{aligned} Z_{ij} &\sim \Normal(\sum_k w_{ik} x_{kj}, \sigma^2) . \end{aligned}\] (The eigenvectors are the columns of \(x\), and the eigenvectors are related to the size of \(w\) and \(x\).)

PCA, in Stan

stan_pca <- stan_model(model_code="
data {
    int N; // samples
    int L; // variables
    int K; // factors
    real Z[L,N];
}
parameters {
    matrix[L,K] x;
    matrix[K,N] w;
    real<lower=0> sigma;
}
model {
    for (j in 1:L) {
        Z[j] ~ normal(x[j] * w, sigma);
    }
}
")

(note: needs some priors to work well; see here.)

Stochastic minute

the Dirichlet distribution

A random set of \(k\) proportions \(0 \le P_i \le 1\) has a \(\Dirichlet(\alpha_1, \ldots, \alpha_k)\) if it has probability density \[\begin{aligned} \frac{1}{B(\alpha)} \prod_{i=1}^k p_i^{\alpha_i} \end{aligned}\] over the set of possible values \[\begin{aligned} P_1 + \cdots + P_k = 1 . \end{aligned}\]

This is useful as a prior on proportions.
The mean is \[ \left( \frac{\alpha_1}{\sum_j \alpha_j}, \frac{\alpha_2}{\sum_j \alpha_j}, \cdots, \frac{\alpha_k}{\sum_j \alpha_j} \right) . \]
This generalizes the Beta: if \(X \sim \Beta(a, b)\) then \((X, 1-X) \sim \Dirichlet(a, b)\).

Marginal distributions are Beta distributed: \(P_i \sim \Beta(\alpha_i, \sum_{j=1}^k \alpha_j - \alpha_i)\).
If \(X_i \sim \Gam(\text{shape}=\alpha_i)\), and \[\begin{aligned} P_i = X_i / \sum_{j=1}^k X_j \end{aligned}\] then \(P \sim \Dirichlet(\alpha)\).

“Simplex” parameters

“The \(k\)-simplex” is the set of proportions, i.e., nonnegative numbers \(p\) satisfying \[\begin{aligned} p_1 + \cdots p_k = 1 . \end{aligned}\]

parameters {
    simplex[K] p;
}
model {
    p ~ dirichlet(alpha);
}

Exercise

Simulate 10,000 samples from the \(\Dirichlet(20, 10, 3)\) distribution. (This should be a 10,000 x 3 matrix.)
Confirm that rows sum to 1.
Confirm that columns have the correct means.
Plot the result (the pairwise plots, or in 3D).
Do the same for \(\alpha = (0.2, 0.1, 0.03)\).

IN CLASS

# 1. Simulate 10,000 samples from the Dirichlet(20, 10, 3)
#   distribution. (This should be a 10,000 x 3 matrix.)

alpha <- c(20, 10, 3)
P <- matrix(NA, nrow=10000, ncol=length(alpha))
for (i in 1:nrow(P)) {
    X <- rgamma(3, shape=alpha)
    P[i,] <- X / sum(X)
}
    
# Confirm that rows sum to 1.

stopifnot(all(
    abs(rowSums(P) - 1) < 1e-15
))

# Confirm that columns have the correct means.

true_means <- alpha / sum(alpha)
rbind(true_means, colMeans(P))

##                 [,1]      [,2]       [,3]
## true_means 0.6060606 0.3030303 0.09090909
##            0.6049487 0.3045743 0.09047700

# Plot the result (the pairwise plots, or in 3D).

pairs(P, pch=20, cex=0.25, col=adjustcolor("black", 0.1))

plot of chunk r in_class

Back to expression space

Each cell type has a typical set of mean expression levels.
Each sample is composed of a mixture of cell types, defined by the proportions that come from each type.
Mean expression levels differ between cell types for only some of the genes.

Mean expression by cell type.
Cell type proportions by sample.

\(x_{kj}\) : Mean expression of gene \(j\) in cell type \(k\).
\(w_{ik}\) : Proportion of sample \(i\) of cell type \(k\).

\(Z_{ij}\) : expression in sample \(i\) of gene \(j\).

\[\begin{aligned} Z_{ij} \approx \sum_{k=1}^K w_{ik} x_{kj} . \end{aligned}\]

Mean expression by cell type.
Cell type proportions by sample.
Mean expression levels differ between cell types for only some of the genes.

\(Z_{ij}\) : expression level in sample \(i\) of gene \(j\).

\[\begin{aligned} Z_{ij} \approx \sum_{k=1}^K w_{ik} x_{kj} . \end{aligned}\]

\(\bar x_j\), \(\eta_j\) : mean and SD of expression of gene \(j\) across all cell types; shrink \(x_{kj}\) towards \(\bar x_j\).

data {
  int N; // # samples
  int L; // # genes
  int K; // # cell types
  int Z[N,L];
}

parameters {
  matrix<lower=0>[K,L] x;
  vector<lower=0>[N] y;
  vector<lower=0>[L] x_bar;
  simplex[K] w[N];
  real<lower=0> eta;
  vector<lower=0>[K] alpha;
}

model {
  matrix[N,L] mean_Z;
  for (i in 1:N) {
    for (j in 1:L) {
      mean_Z[i,j] = 0.0;
      for (k in 1:K) {
        mean_Z[i,j] += y[i] * w[i][k] * x[k,j];
      }
    }
    Z[i,] ~ poisson(mean_Z[i,]);
    w[i] ~ dirichlet(alpha);
  }
  for (k in 1:K) 
      { x[k,] ~ normal(x_bar, eta * x_bar); }
  y ~ normal(0, 10);
  alpha ~ exponential(1);
  eta ~ normal(0, 4);
  x_bar ~ gamma(0.5, 0.5e-4);
}

\(x_{kj}\) : Mean expression of gene \(j\) in cell type \(k\).
\(w_{ik}\) : Proportion of sample \(i\) of cell type \(k\).
\(y_{i}\) : Total sequencing depth of sample \(i\).

\[\begin{aligned} Z_{ij} \approx y_i \sum_k w_{ik} x_{kj} . \end{aligned}\]

\(\bar x_j\), \(\eta_j\) : mean and SD of expression of gene \(j\) across all cell types; shrink \(x_{kj}\) towards \(\bar x_j\).

Testing: compiles?

nmf_model <- stan_model(model_code=nmf_block)

Testing: runs?

sampling(nmf_model,
         data=list(N=10,
                   L=5,
                   K=2,
                   Z=matrix(rpois(50, 100), ncol=5)),
         chains=1, iter=100)

## 
## SAMPLING FOR MODEL '64f2c322cbd474154082cded7e231b18' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 6.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.65 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1:          three stages of adaptation as currently configured.
## Chain 1:          Reducing each adaptation stage to 15%/75%/10% of
## Chain 1:          the given number of warmup iterations:
## Chain 1:            init_buffer = 7
## Chain 1:            adapt_window = 38
## Chain 1:            term_buffer = 5
## Chain 1: 
## Chain 1: Iteration:  1 / 100 [  1%]  (Warmup)
## Chain 1: Iteration: 10 / 100 [ 10%]  (Warmup)
## Chain 1: Iteration: 20 / 100 [ 20%]  (Warmup)
## Chain 1: Iteration: 30 / 100 [ 30%]  (Warmup)
## Chain 1: Iteration: 40 / 100 [ 40%]  (Warmup)
## Chain 1: Iteration: 50 / 100 [ 50%]  (Warmup)
## Chain 1: Iteration: 51 / 100 [ 51%]  (Sampling)
## Chain 1: Iteration: 60 / 100 [ 60%]  (Sampling)
## Chain 1: Iteration: 70 / 100 [ 70%]  (Sampling)
## Chain 1: Iteration: 80 / 100 [ 80%]  (Sampling)
## Chain 1: Iteration: 90 / 100 [ 90%]  (Sampling)
## Chain 1: Iteration: 100 / 100 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.733825 seconds (Warm-up)
## Chain 1:                0.777809 seconds (Sampling)
## Chain 1:                1.51163 seconds (Total)
## Chain 1:

## Warning: There were 10 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: Examine the pairs() plot to diagnose sampling problems

## Warning: The largest R-hat is 1.19, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess

## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

## Inference for Stan model: 64f2c322cbd474154082cded7e231b18.
## 1 chains, each with iter=100; warmup=50; thin=1; 
## post-warmup draws per chain=50, total post-warmup draws=50.
## 
##              mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
## x[1,1]      11.71    0.42 2.99     7.34     9.44    11.54    13.10    18.57    51 0.98
## x[1,2]      12.97    0.64 3.75     7.92     9.85    12.83    14.45    20.34    34 1.03
## x[1,3]      12.92    0.61 4.01     8.04    10.03    12.07    14.52    22.77    43 1.00
## x[1,4]      13.18    0.64 4.16     8.10    10.28    12.37    14.47    24.46    42 0.98
## x[1,5]      12.35    0.50 3.35     7.64     9.59    12.27    13.82    20.19    45 1.01
## x[2,1]      12.21    0.53 3.50     6.75     9.46    11.54    14.40    19.29    44 1.01
## x[2,2]      12.81    0.50 3.46     8.21    10.37    12.22    13.89    21.59    48 0.99
## x[2,3]      13.18    0.51 3.54     8.43    10.43    12.89    14.75    19.74    49 1.01
## x[2,4]      13.20    0.57 3.69     8.30    10.61    12.61    15.17    20.07    42 1.00
## x[2,5]      12.33    0.48 3.46     7.85     9.79    11.71    14.15    19.39    52 1.00
## y[1]         8.34    0.28 2.12     5.08     6.86     8.07     9.81    11.94    56 1.01
## y[2]         7.97    0.24 2.01     4.70     6.55     7.80     9.13    12.40    68 0.99
## y[3]         8.01    0.25 1.98     4.96     6.72     7.78     9.45    11.89    61 1.03
## y[4]         8.03    0.28 2.08     4.88     6.68     7.67     9.29    12.15    57 1.02
## y[5]         7.83    0.26 2.06     4.63     6.42     7.61     9.17    11.60    63 1.02
## y[6]         8.81    0.29 2.24     5.13     7.47     8.47    10.32    13.47    60 1.01
## y[7]         8.18    0.26 2.12     4.90     6.63     8.15     9.66    11.74    67 1.00
## y[8]         8.55    0.27 2.10     5.16     7.27     8.06     9.99    12.99    62 0.99
## y[9]         8.42    0.26 2.13     5.16     6.80     8.14     9.92    12.59    66 1.00
## y[10]        8.37    0.25 2.19     5.07     6.89     8.28     9.96    12.44    79 1.00
## x_bar[1]    11.97    0.39 2.94     7.72     9.41    11.57    14.06    18.05    57 1.01
## x_bar[2]    13.23    0.75 3.72     8.16    10.28    12.62    14.48    21.59    25 1.02
## x_bar[3]    13.42    0.86 5.08     7.58    10.32    12.93    14.12    26.95    35 0.99
## x_bar[4]    13.30    0.55 3.71     8.53    10.67    12.94    15.09    22.59    46 1.00
## x_bar[5]    12.11    0.48 3.27     7.61     9.47    11.74    13.52    19.75    47 1.01
## w[1,1]       0.32    0.07 0.28     0.00     0.07     0.25     0.55     0.89    16 1.06
## w[1,2]       0.68    0.07 0.28     0.11     0.45     0.75     0.93     1.00    16 1.06
## w[2,1]       0.34    0.08 0.28     0.00     0.11     0.28     0.51     0.96    12 1.00
## w[2,2]       0.66    0.08 0.28     0.04     0.49     0.72     0.89     1.00    12 1.00
## w[3,1]       0.39    0.08 0.38     0.00     0.03     0.23     0.77     1.00    21 1.03
## w[3,2]       0.61    0.08 0.38     0.00     0.23     0.77     0.97     1.00    21 1.03
## w[4,1]       0.43    0.09 0.34     0.00     0.08     0.43     0.68     1.00    16 1.03
## w[4,2]       0.57    0.09 0.34     0.00     0.32     0.57     0.92     1.00    16 1.03
## w[5,1]       0.37    0.11 0.34     0.00     0.03     0.35     0.65     0.99    10 1.10
## w[5,2]       0.63    0.11 0.34     0.01     0.35     0.65     0.97     1.00    10 1.10
## w[6,1]       0.40    0.07 0.36     0.00     0.05     0.26     0.69     1.00    28 1.02
## w[6,2]       0.60    0.07 0.36     0.00     0.31     0.74     0.95     1.00    28 1.02
## w[7,1]       0.35    0.10 0.31     0.00     0.06     0.25     0.63     0.86    11 1.01
## w[7,2]       0.65    0.10 0.31     0.14     0.37     0.75     0.94     1.00    11 1.01
## w[8,1]       0.38    0.09 0.31     0.00     0.08     0.42     0.62     0.98    11 1.01
## w[8,2]       0.62    0.09 0.31     0.02     0.38     0.58     0.92     1.00    11 1.01
## w[9,1]       0.40    0.06 0.36     0.00     0.04     0.29     0.76     0.96    40 1.04
## w[9,2]       0.60    0.06 0.36     0.04     0.24     0.71     0.96     1.00    40 1.04
## w[10,1]      0.37    0.11 0.34     0.00     0.03     0.30     0.63     0.96    10 1.12
## w[10,2]      0.63    0.11 0.34     0.04     0.37     0.70     0.97     1.00    10 1.12
## eta          0.11    0.02 0.06     0.04     0.06     0.09     0.17     0.22     8 1.06
## alpha[1]     1.09    0.28 1.08     0.08     0.33     0.72     1.52     3.93    15 1.02
## alpha[2]     1.72    0.34 1.35     0.18     0.74     1.24     2.62     4.42    16 0.99
## lp__     17637.53    3.40 9.23 17617.23 17633.66 17637.51 17643.56 17653.66     7 0.99
## 
## Samples were drawn using NUTS(diag_e) at Thu Feb 24 15:16:48 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Simulate data

Overall plan:

Choose mean expression levels per gene from a Gamma distribution.
For each cell type, set expression levels to the mean, but for some of the genes, multiply this by a random number in \([0, 2]\).
Choose sample-specific cell type proportions from a Dirichlet distribution.
Choose total expression per sample from a Normal.
Simulate per-sample and per-gene expression as Poisson with mean computed from the above.

Questions:

How many cell types?
How many genes?
How many samples?
How much noise in expression?
How many genes distinguish cell types, and by how much relative to expression?
How much variation in mixture proportions?
How much variation in total coverage?

Simulation: easy case

Expression parameters:

num_cell_types <- 5
num_genes <- 500
num_samples <- 100

mean_total_expression <- 50e6
sd_total_expression <- 5e6
num_diff_genes <- 180

gene_names <- paste0("gene_", apply(matrix(sample(letters, 5*num_genes, replace=TRUE), nrow=num_genes), 1, paste, collapse=''))
stopifnot(length(unique(gene_names)) == num_genes)

Mean expression profiles by cell type:

have num_diff_genes genes specific to each cell type
and each of these differing genes will have some randomly chosen expression level

# x[k,j] is mean expression of gene j in cell type k.
x <- matrix(0, nrow=num_cell_types, ncol=num_genes)
colnames(x) <- gene_names
# vector of typical expression levels across *all* cell types
x_mean <- rgamma(num_genes, shape=0.5, scale=20000)
# which genes are differentially expressed in which cell type
for (k in 1:num_cell_types) {
    diffex <- sample.int(num_genes, num_diff_genes)
    x[k,] <- x_mean
    x[k,diffex] <- x[k,diffex] * 3 * runif(length(diffex))
}

Gene expression profiles

plot of chunk r simit3

Gene expression profiles

plot of chunk r simit4

Cell type proportions

Use gammas with shapes a vector of low integers; normalize to get Dirichlet.

# w[i,k] is proportion of sample i of cell type k.
w <- matrix(0, nrow=num_samples, ncol=num_cell_types)
for (i in 1:num_samples) {
    dirich <- rgamma(num_cell_types, rep(0.3, num_cell_types))
    w[i,] <- dirich/sum(dirich)
}
stopifnot(all(abs(rowSums(w) - 1) < 1e-14))

Cell type proportions

pairs(w, xlim=c(0,1), ylim=c(0,1))

plot of chunk r simit6

Total expression per sample

# y[i] is total reads for sample i
y <- rnorm(num_samples, mean=mean_total_expression, sd=sd_total_expression)

hist(y, main='total expression by sample')

plot of chunk r simit7

Simulate expression

scale mean expression based on total expression level
matrix multiply mean scaled levels by proportions (x * w)
actual expression levels are Poisson (TODO: make overdispersed)

# Z[i,j] is read counts of sample i for gene j

# this will have mean_Z[i,j] = (sum_k w[i,k] x[k,j])
mean_Z <- (w %*% x)
# but we want total sequencing to be y[i] so...
mean_Z <- y * mean_Z / rowSums(mean_Z)
stopifnot(all(abs(y - rowSums(mean_Z)) < 1e-8))
Z <- rpois(length(mean_Z), lambda=mean_Z)
dim(Z) <- dim(mean_Z)

Visualize normalized expression levels

w_ord <- order(apply(w, 1, which.max), rowMaxs(w)) # order samples by proportion
Znorm <- Z / pmax(100, colMeans(Z)[col(Z)])
Znorm[!is.finite(Znorm)] <- 1
Znorm <- Znorm / rowMeans(Znorm)
image(t(Znorm[w_ord,x_ord]), xlab='gene', ylab='sample')

plot of chunk r simit10

Exercise

Suppose instead that \(Z_{ij}\) is number of individuals of plant species \(j\) in a quadrat at spatial location \(i\), sampled in a transect across an environmental transition (e.g., up the Cascades).

Applying the same method, what are \(x_{ik}\) and \(w_{kj}\)?

Fit the model

fit_sim1 <- optimizing(nmf_model,
                data=list(N=num_samples,
                          L=num_genes,
                          K=num_cell_types,
                          Z=Z),
                seed=125)

The results!

Here’s the correlation matrix between inferred and true cell type proportions, \(w\):

opt_w <- fit_sim1$par[grepl("^w", names(fit_sim1$par))]
dim(opt_w) <- c(num_samples, num_cell_types)
cor(w, opt_w)

##            [,1]        [,2]        [,3]       [,4]       [,5]
## [1,] -0.3650149 -0.46800709  0.01241053  0.9883737 -0.7094076
## [2,] -0.4649357 -0.68112510  0.98500504 -0.2470444  0.4452517
## [3,]  0.1109181  0.69846265 -0.23809420 -0.1459216 -0.2516304
## [4,]  0.9617917  0.09698111 -0.43534224 -0.2610371 -0.2615736
## [5,] -0.1944083  0.45636680 -0.39478167 -0.2909646  0.6459502

HOWever…

fit_sim2 <- optimizing(nmf_model,
                data=list(N=num_samples,
                          L=num_genes,
                          K=num_cell_types,
                          Z=Z),
                seed=127)

The results, again?

opt_w <- fit_sim2$par[grepl("^w", names(fit_sim2$par))]
dim(opt_w) <- c(num_samples, num_cell_types)
cor(w, opt_w)

##            [,1]       [,2]       [,3]        [,4]        [,5]
## [1,] -0.2172411 -0.4078729 -0.6425219  0.01929908  0.94837974
## [2,]  0.1665108 -0.4485661 -0.6618148  0.94222156 -0.34250032
## [3,] -0.4188313  0.1169822  0.3839463 -0.04175999 -0.02495652
## [4,] -0.5288389  0.9550957  0.4536915 -0.38017185 -0.05111174
## [5,]  0.8598106 -0.1703489  0.5119214 -0.56631736 -0.45240318

Goodness-of-fit

plot of chunk r gof

What’s going on?

optimizing                package:rstan                R Documentation

Obtain a point estimate by maximizing the joint posterior

Description:

     Obtain a point estimate by maximizing the joint posterior from the
     model defined by class ‘stanmodel’.

Usage:

     ## S4 method for signature 'stanmodel'
     optimizing(object, data = list(), 
         seed = sample.int(.Machine$integer.max, 1), init = 'random', 
         check_data = TRUE, sample_file = NULL, 
         algorithm = c("LBFGS", "BFGS", "Newton"),
         verbose = FALSE, hessian = FALSE, as_vector = TRUE, 
         draws = 0, constrained = TRUE, importance_resampling = FALSE, ...)   
     
Arguments:

  object: An object of class ‘stanmodel’.

    data: A named ‘list’ or ‘environment’ providing the data for the
          model or a character vector for all the names of objects used
          as data.  See the *Passing data to Stan* section in ‘stan’.

    seed: The seed for random number generation. The default is
          generated from 1 to the maximum integer supported by R on the
          machine. Even if multiple chains are used, only one seed is
          needed, with other chains having seeds derived from that of
          the first chain to avoid dependent samples.  When a seed is
          specified by a number, ‘as.integer’ will be applied to it.
          If ‘as.integer’ produces ‘NA’, the seed is generated
          randomly.  The seed can also be specified as a character
          string of digits, such as ‘"12345"’, which is converted to
          integer.

    init: Initial values specification. See the detailed documentation
          for the ‘init’ argument in ‘stan’ with one exception. If
          specifying inits using a list then only a single named list
          of values should be provided.  For example, to initialize a
          parameter ‘alpha’ to ‘value1’ and ‘beta’ to ‘value2’ you can
          specify ‘list(alpha = value1, beta = value2)’.

MCMC?

Too slow.

NMF?

`?nmf in r`

Well, there’s the RcppML package?

install.packages('RcppML')

It’s got great documentation: that’s very reasurring! RcppML docs for nmf

library(RcppML)
nmfZ <- nmf(Z, k=5)

## 
## iter |      tol 
## ---------------
##    1 | 8.51e-01
##    2 | 5.63e-02
##    3 | 8.28e-03
##    4 | 2.21e-03
##    5 | 9.61e-04
##    6 | 5.98e-04
##    7 | 4.40e-04
##    8 | 3.41e-04
##    9 | 2.69e-04
##   10 | 2.09e-04
##   11 | 1.61e-04
##   12 | 1.22e-04
##   13 | 9.31e-05

It works just as well - maybe a bit better?

cor(nmfZ$w, w)

##            [,1]        [,2]       [,3]       [,4]       [,5]
## [1,] -0.2330194  0.97158136 -0.2830714 -0.4389302 -0.1194442
## [2,] -0.3175350 -0.34415247  0.1480907  0.9398846 -0.3668171
## [3,] -0.1844490 -0.58272410 -0.3638424  0.2049184  0.8452016
## [4,]  0.9860517 -0.06099348 -0.1947311 -0.2987144 -0.3999482
## [5,] -0.4224679 -0.34921183  0.7949414 -0.3476999  0.4241365

Takeaways

There are many possible ways to obtain low-dimensional approximations: think about what you want.
Often we want to impose conditions to improve interpretability.
Unfortunately, these often suffer from nonidentifiability. (Why? Find \(x\) and \(y\) to solve \(24 = xy\).)
With a lot of data and a complex model, MCMC can be way too slow.
Optimization of “non-convex” problems can be hard, and require special approaches.
Careful documentation is a sign of reliable software.
Check your results, with simulation!