Peter Ralph
25 February 2019 – Advanced Biological Statistics
Let’s build a conceptual model for descriptive analysis of “mixture” expression data.
Data: expression data from tissue samples that consist of various mixtures of different cell types.
Goal: identify shared coexpression patterns corresponding to cell type.
Similar situations: identify different developmental stages from whole-organism expression; common community structures from metagenomic data.
Each cell type has a typical set of mean expression levels.
Each sample is composed of a mixture of cell types, defined by the proportions that come from each type.
Mean expression by cell type.
Cell type proportions by sample.
\(x_{kj}\) : Mean expression of gene \(j\) in cell type \(k\).
\(w_{ik}\) : Proportion of sample \(i\) of cell type \(k\).
\(Z_{ij}\) : expression level in sample \(i\) of gene \(j\).
\[\begin{aligned} Z_{ij} \approx \sum_{k=1}^K w_{ik} x_{kj} . \end{aligned}\]
We are decomposing \(Z\) into the product of two lower-dimensional, nonnegative factors:
\[\begin{aligned} Z_{ij} &\approx \sum_k w_{ik} x_{kj} \\ w_{ik} &\ge 0 \\ x_{kj} &\ge 0 . \end{aligned}\]
PCA finds \(w\) and \(z\) to minimize \[\begin{aligned} \sum_{ij} \| Z_{ij} - \sum_k w_{ik} x_{kj} \|^2 . \end{aligned}\]
In other words, it is the maximum-likelihood solution to \[\begin{aligned} Z_{ij} &\sim \Normal(\sum_k w_{ik} x_{kj}, \sigma^2) . \end{aligned}\] (The eigenvectors are the columns of \(x\), and the eigenvectors are related to the size of \(w\) and \(x\).)
stan_pca <- stan_model(model_code="
data {
int N; // samples
int L; // variables
int K; // factors
real Z[L,N];
}
parameters {
matrix[L,K] x;
matrix[K,N] w;
real<lower=0> sigma;
}
model {
for (j in 1:L) {
Z[j] ~ normal(x[j] * w, sigma);
}
}
")
(note: needs some priors to work well; see here.)
A random set of \(k\) proportions \(0 \le P_i \le 1\) has a \(\Dirichlet(\alpha_1, \ldots, \alpha_k)\) if it has probability density \[\begin{aligned} \frac{1}{B(\alpha)} \prod_{i=1}^k p_i^{\alpha_i} \end{aligned}\] over the set of possible values \[\begin{aligned} P_1 + \cdots + P_k = 1 . \end{aligned}\]
This is useful as a prior on proportions.
The mean is \[ \left( \frac{\alpha_1}{\sum_j \alpha_j}, \frac{\alpha_2}{\sum_j \alpha_j}, \cdots, \frac{\alpha_k}{\sum_j \alpha_j} \right) . \]
This generalized the Beta: if \(X \sim \Beta(a, b)\) then \((X, 1-X) \sim \Dirichlet(a, b)\).
Marginal distributions are Beta distributed: \(P_i \sim \Beta(\alpha_i, \sum_{j=1}^k \alpha_j - \alpha_i)\).
“The \(k\)-simplex” is the set of proportions, i.e., nonnegative numbers \(p\) satisfying \[\begin{aligned} p_1 + \cdots p_k = 1 . \end{aligned}\]
parameters {
simplex[K] p;
}
model {
p ~ dirichlet(alpha);
}
Each cell type has a typical set of mean expression levels.
Each sample is composed of a mixture of cell types, defined by the proportions that come from each type.
Mean expression levels differ between cell types for only some of the genes.
Some samples are noisier than others.
Mean expression by cell type.
Cell type proportions by sample.
\(x_{kj}\) : Mean expression of gene \(j\) in cell type \(k\).
\(w_{ik}\) : Proportion of sample \(i\) of cell type \(k\).
\(Z_{ij}\) : expression in sample \(i\) of gene \(j\).
\[\begin{aligned} Z_{ij} \approx \sum_{k=1}^K w_{ik} x_{kj} . \end{aligned}\]
Mean expression by cell type.
Cell type proportions by sample.
Mean expression levels differ between cell types for only some of the genes.
Some samples are noisier than others.
\(Z_{ij}\) : expression level in sample \(i\) of gene \(j\).
\[\begin{aligned} Z_{ij} \approx \sum_{k=1}^K w_{ik} x_{kj} . \end{aligned}\]
\(y_j\), \(\eta_j\) : mean and SD of expression of gene \(j\) across all cell types; shrink \(x_{kj}\) towards \(y_j\).
(omit this)
data {
int N; // # samples
int L; // # genes
int K; // # cell types
int Z[N,L];
}
parameters {
matrix<lower=0>[L,K] x;
vector[L] y;
simplex[K] w[N];
vector<lower=0>[L] eta;
vector<lower=0>[K] alpha;
real<lower=0> d_alpha;
}
model {
for (i in 1:N) {
Z[i] ~ poisson(eta .* (x * w[i]));
w[i] ~ dirichlet(d_alpha * alpha);
}
for (j in 1:K)
{ x[,j] ~ normal(y ./ eta, 1); }
y ~ normal(0, 20);
alpha ~ normal(0, 1);
d_alpha ~ exponential(0.2);
eta ~ cauchy(0, 10);
}
\(x_{kj}\) : Mean expression of gene \(j\) in cell type \(k\).
\(w_{ik}\) : Proportion of sample \(i\) of cell type \(k\).
\[\begin{aligned} Z_{ij} \approx \sum_k w_{ik} x_{kj} . \end{aligned}\]
sampling(nmf_model,
data=list(N=10,
L=5,
K=2,
Z=matrix(rpois(50, 100), ncol=5)),
chains=1, iter=100)
##
## SAMPLING FOR MODEL '027c9a0b92082dc735bfb6350861f7ee' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 5.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 7
## Chain 1: adapt_window = 38
## Chain 1: term_buffer = 5
## Chain 1:
## Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
## Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
## Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
## Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
## Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
## Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
## Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
## Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
## Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
## Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
## Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
## Chain 1: Iteration: 100 / 100 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.710808 seconds (Warm-up)
## Chain 1: 1.04427 seconds (Sampling)
## Chain 1: 1.75508 seconds (Total)
## Chain 1:
## Warning: There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: There were 31 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Inference for Stan model: 027c9a0b92082dc735bfb6350861f7ee.
## 1 chains, each with iter=100; warmup=50; thin=1;
## post-warmup draws per chain=50, total post-warmup draws=50.
##
## mean se_mean sd 2.5% 25% 50% 75%
## x[1,1] 1.19 0.12 0.84 0.18 0.62 0.95 1.52
## x[1,2] 0.93 0.14 0.79 0.11 0.45 0.65 1.19
## x[2,1] 1.39 0.47 1.10 0.11 0.54 1.07 2.11
## x[2,2] 1.05 0.12 0.77 0.04 0.47 0.90 1.28
## x[3,1] 1.49 0.29 0.98 0.13 0.58 1.36 2.10
## x[3,2] 1.16 0.15 0.81 0.13 0.53 1.12 1.52
## x[4,1] 1.33 0.10 0.72 0.16 0.91 1.33 1.62
## x[4,2] 1.05 0.25 0.89 0.00 0.31 0.86 1.59
## x[5,1] 1.94 0.33 1.23 0.33 1.07 1.54 2.60
## x[5,2] 1.62 0.17 1.03 0.08 0.93 1.37 2.54
## y[1] 9.77 6.92 16.48 -19.69 -4.64 12.59 24.19
## y[2] 9.75 13.07 25.19 -28.61 -6.47 5.02 34.16
## y[3] 24.10 9.26 21.39 -8.88 7.76 18.70 47.09
## y[4] 5.94 6.41 15.09 -21.34 -3.24 5.40 19.94
## y[5] 40.01 3.96 14.87 14.17 27.96 41.65 51.49
## w[1,1] 0.77 0.16 0.32 0.08 0.65 0.97 0.99
## w[1,2] 0.23 0.16 0.32 0.00 0.01 0.03 0.35
## w[2,1] 0.76 0.16 0.31 0.08 0.65 0.93 1.00
## w[2,2] 0.24 0.16 0.31 0.00 0.00 0.07 0.35
## w[3,1] 0.76 0.17 0.30 0.13 0.64 0.92 1.00
## w[3,2] 0.24 0.17 0.30 0.00 0.00 0.08 0.36
## w[4,1] 0.77 0.16 0.31 0.08 0.68 0.94 1.00
## w[4,2] 0.23 0.16 0.31 0.00 0.00 0.06 0.32
## w[5,1] 0.76 0.17 0.32 0.06 0.65 0.93 1.00
## w[5,2] 0.24 0.17 0.32 0.00 0.00 0.07 0.35
## w[6,1] 0.77 0.16 0.31 0.16 0.68 0.96 1.00
## w[6,2] 0.23 0.16 0.31 0.00 0.00 0.04 0.32
## w[7,1] 0.77 0.16 0.31 0.12 0.61 0.95 1.00
## w[7,2] 0.23 0.16 0.31 0.00 0.00 0.05 0.39
## w[8,1] 0.78 0.16 0.31 0.12 0.61 0.97 1.00
## w[8,2] 0.22 0.16 0.31 0.00 0.00 0.03 0.39
## w[9,1] 0.77 0.16 0.30 0.10 0.60 0.96 1.00
## w[9,2] 0.23 0.16 0.30 0.00 0.00 0.04 0.40
## w[10,1] 0.75 0.17 0.32 0.08 0.61 0.94 0.99
## w[10,2] 0.25 0.17 0.32 0.00 0.01 0.06 0.39
## eta[1] 131.11 17.50 104.77 31.83 68.49 104.77 169.52
## eta[2] 113.86 17.72 124.60 26.25 46.53 85.96 142.42
## eta[3] 105.16 23.88 91.35 32.74 47.74 80.49 128.30
## eta[4] 96.10 12.16 93.52 29.53 59.67 72.76 97.18
## eta[5] 70.69 10.19 44.27 20.59 36.89 58.75 90.14
## alpha[1] 1.09 0.10 0.73 0.20 0.54 0.88 1.61
## alpha[2] 0.53 0.35 0.73 0.02 0.05 0.18 0.72
## d_alpha 8.14 2.70 5.71 0.69 3.69 6.42 12.55
## lp__ 18013.92 8.93 14.57 17994.67 17999.31 18011.47 18029.52
## 97.5% n_eff Rhat
## x[1,1] 3.15 46 0.99
## x[1,2] 2.81 32 1.04
## x[2,1] 3.59 5 1.49
## x[2,2] 2.41 39 1.02
## x[3,1] 3.14 11 1.28
## x[3,2] 2.87 28 0.99
## x[4,1] 3.22 53 0.98
## x[4,2] 2.78 12 1.17
## x[5,1] 4.95 14 0.98
## x[5,2] 3.51 36 1.04
## y[1] 30.90 6 1.06
## y[2] 45.45 4 1.89
## y[3] 60.20 5 1.31
## y[4] 30.95 6 1.00
## y[5] 66.78 14 0.99
## w[1,1] 1.00 4 1.60
## w[1,2] 0.92 4 1.60
## w[2,1] 1.00 4 1.71
## w[2,2] 0.92 4 1.71
## w[3,1] 1.00 3 1.81
## w[3,2] 0.87 3 1.81
## w[4,1] 1.00 4 1.57
## w[4,2] 0.92 4 1.57
## w[5,1] 1.00 3 1.73
## w[5,2] 0.94 3 1.73
## w[6,1] 1.00 4 1.65
## w[6,2] 0.84 4 1.65
## w[7,1] 1.00 3 1.72
## w[7,2] 0.88 3 1.72
## w[8,1] 1.00 4 1.66
## w[8,2] 0.88 4 1.66
## w[9,1] 1.00 4 1.65
## w[9,2] 0.90 4 1.65
## w[10,1] 1.00 4 1.68
## w[10,2] 0.92 4 1.68
## eta[1] 344.99 36 1.00
## eta[2] 247.98 49 1.00
## eta[3] 334.45 15 1.15
## eta[4] 332.98 59 1.02
## eta[5] 182.97 19 1.01
## alpha[1] 2.66 52 0.98
## alpha[2] 2.40 4 1.50
## d_alpha 20.43 4 1.72
## lp__ 18034.18 3 2.98
##
## Samples were drawn using NUTS(diag_e) at Tue Mar 5 22:41:23 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
How many cell types?
How many genes?
How many samples?
How much noise in expression?
How many genes distinguish cell types, and by how much relative to expression?
How many “noisy” genes? How many “noisy” samples?
How much variation in mixture proportions?
Sample mixture proportions (the rows of w
), from the Dirichlet distribution with parameters $= 0.3333333, 0.3333333, 0.3333333 $.
Sample cell type expression levels:
Simulate expression:
mean_z <- w %*% x
noisy_samples <- sample.int(nsamples, num_noisy_samples)
mean_z[noisy_samples,] <- (mean_z[noisy_samples,]
* exp(rnorm(noisy_samples * ngenes, mean=0, sd=0.1)))
noisy_mean_z <- mean_z * exp(rnorm(nsamples * ngenes, mean=0, sd=indiv_sd))
Z <- matrix(rpois(nsamples * ngenes, lambda=noisy_mean_z), ncol=ngenes)
Run Stan:
Look at output:
Here is the correlation between estimated w
and observed w
:
## [,1] [,2] [,3]
## [1,] -0.4479292 0.9974603 -0.5419578
## [2,] 0.9991217 -0.4967387 -0.5112457
## [3,] -0.5412021 -0.4560523 0.9986858
This is very good! We are estimating relative mixtures very well. (But, note they are not in the same order.)
Here is the correlation between estimated x
and observed x
:
## [,1] [,2] [,3]
## [1,] 0.6392301 0.9986823 0.4427321
## [2,] 0.9948616 0.5748984 0.5025609
## [3,] 0.5244367 0.4837158 0.9999782
This is also very good!
Sample mixture proportions (the rows of w
), from the Dirichlet distribution with parameters $= 0.0277778, 0.0555556, 0.0833333, 0.1111111, 0.1388889, 0.1666667, 0.1944444, 0.2222222 $.
Sample cell type expression levels:
Simulate expression:
mean_z <- w %*% x
noisy_samples <- sample.int(nsamples, num_noisy_samples)
mean_z[noisy_samples,] <- (mean_z[noisy_samples,]
* exp(rnorm(noisy_samples * ngenes, mean=0, sd=0.1)))
noisy_mean_z <- mean_z * exp(rnorm(nsamples * ngenes, mean=0, sd=indiv_sd))
Z <- matrix(rpois(nsamples * ngenes, lambda=noisy_mean_z), ncol=ngenes)
Is there signal? It looks like there is.
Run Stan:
Look at output:
Here is the correlation between estimated w
and observed w
:
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.06511751 -0.08548045 -0.12145422 0.99779515 -0.1207468
## [2,] -0.08882160 -0.10209420 -0.17533424 -0.15455090 -0.2230952
## [3,] -0.01964063 -0.06896749 0.99732363 -0.12371316 -0.1475214
## [4,] 0.99886737 -0.03238973 -0.02588937 -0.06523943 -0.0677335
## [5,] -0.06712511 -0.10251975 -0.14701999 -0.11864029 0.9990249
## [6,] -0.07116101 -0.11989702 -0.13666643 -0.15799797 -0.1881662
## [7,] -0.03917093 0.99906854 -0.06805167 -0.08390533 -0.1050867
## [8,] -0.10300235 -0.11957976 -0.15060561 -0.17508342 -0.2109410
## [,6] [,7] [,8]
## [1,] -0.16227159 -0.15239993 -0.17445705
## [2,] -0.22241746 0.99900054 -0.24794625
## [3,] -0.13752298 -0.17502168 -0.14926717
## [4,] -0.07129504 -0.09014155 -0.09375712
## [5,] -0.18860476 -0.22231870 -0.21174459
## [6,] 0.99877567 -0.22528295 -0.23389558
## [7,] -0.11215778 -0.10425900 -0.11965410
## [8,] -0.23685561 -0.24854604 0.99801810
This is very good! We are estimating relative mixtures very well.
Here is the correlation between estimated x
and observed x
:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.6794778 0.6597489 0.6875596 0.9654864 0.6601344 0.6379879 0.6936902
## [2,] 0.6294459 0.6552449 0.6702929 0.6746403 0.6604643 0.6561367 0.9637838
## [3,] 0.6842969 0.6876411 0.9850094 0.7000887 0.6933407 0.7085986 0.6933872
## [4,] 0.9471063 0.7060617 0.6755484 0.6769583 0.7027036 0.6669958 0.6787414
## [5,] 0.6461627 0.6386977 0.6547104 0.6382085 0.9589411 0.6283625 0.6382968
## [6,] 0.6026265 0.5693803 0.6429443 0.6218183 0.6138628 0.9346406 0.6169345
## [7,] 0.6502485 0.9406994 0.6676147 0.6750623 0.6652670 0.6344473 0.6785245
## [8,] 0.6419115 0.6039992 0.6440546 0.6293570 0.6225426 0.6058642 0.6328787
## [,8]
## [1,] 0.6886014
## [2,] 0.6930210
## [3,] 0.7292227
## [4,] 0.7381844
## [5,] 0.6623493
## [6,] 0.6557481
## [7,] 0.6609027
## [8,] 0.9395845
This is also amazingly good! We can reconstruct gene expression profiles almost perfectly, despite the substantial overdispersion in expression that we didn’t model using Stan.
t-SNE is a recently published method for dimension reduction (visualization of struture in high-dimensional data).
The generic idea is to find a low dimensional representation (i.e., a picture) in which proximity of points reflects similarity of the corresponding (high dimensional) observations.
Suppose we have \(n\) data points, \(x_1, \ldots, x_n\) and each one has \(p\) variables: \(x_1 = (x_{11}, \ldots, x_{1p})\).
For each, we want to find a low-dimensional point \(y\): \[\begin{aligned} x_i \mapsto y_i \end{aligned}\] similarity of \(x_i\) and \(x_j\) is (somehow) reflected by proximity of \(y_i\) and \(y_j\)
We need to define:
To measure similarity, we just use (Euclidean) distance: \[\begin{aligned} d(x_i, x_j) = \sqrt{\sum_{\ell=1}^n (x_{i\ell} - x_{j\ell})^2} , \end{aligned}\] after first normalizing variables to vary on comparable scales.
A natural choice is to require distances in the new space (\(d(y_i, y_j)\)) to be as close as possible to distances in the original space (\(d(x_i, x_j)\)).
That’s a pairwise condition. t-SNE instead tries to measure how faithfully relative distances are represented in each point’s neighborhood.
For each data point \(x_i\), define \[\begin{aligned} p_{ij} = \frac{ e^{-d(x_i, x_j)^2 / (2\sigma^2)} }{ \sum_{\ell \neq i} e^{-d(x_i, x_\ell)^2 / (2 \sigma^2)} } , \end{aligned}\] and \(p_{ii} = 0\).
This is the probability that point \(i\) would pick point \(j\) as a neighbor if these are chosen according to the Gaussian density centered at \(x_i\).
Similarly, in the output space, define \[\begin{aligned} q_{ij} = \frac{ (1 + d(y_i, y_j)^2)^{-1} }{ \sum_{\ell \neq i} (1 + d(y_i, y_\ell)^2)^{-1} } \end{aligned}\] and \(q_{ii} = 0\).
This is the probability that point \(i\) would pick point \(j\) as a neighbor if these are chosen according to the Cauchy centered at \(x_i\).
We want neighborhoods to look similar, i.e., choose \(y\) so that \(q\) looks like \(p\).
To do this, we minimize \[\begin{aligned} \text{KL}(p \given q) &= \sum_{i} \sum_{j} p_{ij} \log\left(\frac{p_{ij}}{q_{ij}}\right) . \end{aligned}\]
What’s KL()?
\[\begin{aligned} \text{KL}(p \given q) &= \sum_{i} \sum_{j} p_{ij} \log\left(\frac{p_{ij}}{q_{ij}}\right) . \end{aligned}\]
This is the average log-likelihood ratio between \(p\) and \(q\) for a single observation drawn from \(p\).
It is a measure of how “surprised” you would be by a sequence of samples from \(p\) that you think should be coming from \(q\).
Facts:
\(\text{KL}(p \given q) \ge 0\)
\(\text{KL}(p \given q) = 0\) only if \(p_i = q_i\) for all \(i\).
Let’s first make some data. This will be some points distributed around a ellipse in \(n\) dimensions.
n <- 20
npts <- 1e3
xy <- matrix(rnorm(n*npts), ncol=n)
theta <- runif(npts) * 2 * pi
ab <- matrix(rnorm(2*n), nrow=2)
ab[,2] <- ab[,2] - ab[,1] * sum(ab[,1] * ab[,2]) / sqrt(sum(ab[,1]^2))
ab <- sweep(ab, 1, sqrt(rowSums(ab^2)), "/")
for (k in 1:npts) {
dxy <- 4 * c(cos(theta[k]), sin(theta[k])) %*% ab
xy[k,] <- xy[k,] + dxy
}
Here’s what the data look like:
But there is hidden, two-dimensional structure:
Now let’s build the distance matrix.
tsne_block <- '
data {
int N; // number of points
int n; // input dimension
int k; // output dimension
matrix[N,N] dsq; // distance matrix, squared
}
parameters {
real<lower=0> sigma_sq; // in kernel for p
matrix[N-2,k] y1;
real<lower=0> y21;
}
transformed parameters {
matrix[N,k] y;
y[1,] = rep_row_vector(0.0, k);
y[3:N,] = y1;
y[2,] = rep_row_vector(0.0, k);
y[2,1] = y21;
}
model {
matrix[N,N] q;
real dt;
matrix[N,N] p;
q[N,N] = 0.0;
for (i in 1:(N-1)) {
q[i,i] = 0.0;
for (j in (i+1):N) {
q[i,j] = 1 / (1 + squared_distance(y[i], y[j]));
q[j,i] = q[i,j];
}
}
for (i in 1:N) {
q[i] = q[i] / sum(q[i]);
}
// create p matrix
p = exp(-dsq/(2*sigma_sq));
for (i in 1:N) {
p[i,i] = 0.0;
p[i] = p[i] / sum(p[i]);
}
// compute the target
for (i in 1:(N-1)) {
for (j in (i+1):N) {
target += (-1) * (p[i,j] .* log(p[i,j] ./ q[i,j]));
target += (-1) * (p[j,i] .* log(p[j,i] ./ q[j,i]));
}
}
sigma_sq ~ normal(0, 10);
}'
tk <- 2
tsne_model <- stan_model(model_code=tsne_block)
optimizing
runtime <- system.time(tsne_optim <- optimizing(tsne_model,
data=list(N=nrow(xy),
n=ncol(xy),
k=tk,
dsq=(dmat/max(dmat))^2)))
runtime
## user system elapsed
## 117.588 0.180 117.799
Here’s what the data look like:
Now let’s build the distance matrix.
rw_runtime <- system.time(
rw_tsne_optim <- optimizing(tsne_model,
data=list(N=nrow(rw),
n=ncol(rw),
k=tk,
dsq=(rw_dmat/max(rw_dmat))^2),
tol_rel_grad=1e5))
rw_runtime
## user system elapsed
## 1.038 0.000 1.038