adadmire package
adadmire: Anomaly detection in mixed high-dimensional molecular data.
Submodules
adadmire.apgpy module
- npwrap(x)
- npwrapfunc(f, *args)
- solve(grad_f, prox_h, x_init, max_iters=2500, eps=1e-06, alpha=1.01, beta=0.5, use_restart=True, gen_plots=False, quiet=False, use_gra=False, step_size=False, fixed_step_size=False, debug=False)
adadmire.main module
- admire(X, D, levels, lam, oIterations=10000, oTol=1e-06, t=0.05)
Detect and correct data anomalies in continuous data matrix X and discrete states matrix D.
- Parameters:
X (-) – Continuous Data matrix, features in columns, samples in rows.
D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.
levels (-) – List of levels for discrete states.
lam (-) – (numpy.ndarray): Sequence of penalty values.
oIterations (-) – Number of iterations for fitting MGMs. Defaults to 10000.
oTol (-) – Tolerance for fitting MGMs. Defaults to 1e-6.
- Returns:
- Tuple containing the following elements:
X_cor (numpy.ndarray): Continuous data matrix X corrected for anomalies.
n_cont (int): Number of detected continuous anomalies.
position_cont (numpy.ndarray): Positions of detected continuous anomalies in X.
D_cor (numpy.ndarray): Discrete states matrix D corrected for anomalies.
n_disc (int): Number of detected discrete anomalies.
position_disc (numpy.ndarray): Positions of detected discrete anomalies in D.
- Return type:
tuple
- calc_mean(X, D)
Calculate the group mean of the continuous data matrix X according to states D.
- Parameters:
X (-) – continuous Data matrix, features in columns, samples in rows.
D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.
- Returns:
Group mean values.
- Return type:
numpy.ndarray
- get_threshold_continuous(X, X_hat, dev)
Calculate the threshold for continuous anomaly detection and correct matrix X accordingly.
- Parameters:
X (-) – Continuous data matrix, features in columns, samples in rows.
X_hat (-) – Predicted continuous data matrix X.
dev (-) – Variances of the continuous estimates.
- Returns:
- Tuple containing the following elements:
X_cor (numpy.ndarray): Data matrix X corrected for anomalies.
threshold (float): Threshold value for anomaly detection.
n_ano (int): Number of detected anomalies.
pos (numpy.ndarray): Indices of detected anomalies in matrix X.
- Return type:
Tuple
- get_threshold_discrete(D, levels, D_hat)
Calculate the threshold for discrete anomaly detection and determine the number of discrete anomalies.
- Parameters:
D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.
levels (-) – List of levels for discrete states.
D_hat (-) – Predicted discrete state matrix D.
- Returns:
- Tuple containing the following elements:
n_ano (int): Number of detected anomalies.
threshold (float): Threshold value for anomaly detection.
pos (numpy.ndarray): Indices of detected anomalies in matrix X.
- Return type:
Tuple
- impute(X, D, levels, lambda_seq, oIterations=10000, oTol=1e-06)
Impute missing values in the data using MGM.
- Parameters:
X (-) – Continuous Data matrix with missing values (np.nan), features in columns, samples in rows.
D (-) – Discrete states matrix in one-hot-encoding with missing values (np.nan), features in columns, samples in rows.
levels (-) – List of levels for discrete states.
lambda_seq (-) – (numpy.ndarray): Sequence of penalty values.
oIterations (-) – Number of iterations for fitting MGMs. Defaults to 10000.
oTol (-) – Tolerance for fitting MGMs. Defaults to 1e-6.
- Returns:
- Tuple containing the following elements:
numpy.ndarray: Imputed continuous data.
numpy.ndarray: Imputed discrete data.
float: Optimal lambda_seq value used for imputation.
- Return type:
tuple
- loo_cv_cor(X, D, levels, lambda_seq, oIterations=10000, oTol=1e-06, t=0.05)
Estimate continuous matrix X and discrete states D in a leave-one-out cross-validation approach using Mixed Graphical Models.
- Parameters:
X (-) – Continuous Data matrix, features in columns, samples in rows.
D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.
levels (-) – List of levels for discrete states.
lambda_seq (-) – (numpy.ndarray): Sequence of penalty values.
oIterations (-) – Number of iterations for fitting MGMs. Defaults to 10000.
oTol (-) – Tolerance for fitting MGMs. Defaults to 1e-6.
t (-) – Probability threshold value for smoothing corrections. Defaults to 0.05.
- Returns:
- Tuple containing the following elements:
prob_cont_old (numpy.ndarray): Estimated continuous probabilities.
Var_old (numpy.ndarray): Variances of the continuous estimates.
lam_opt_old (numpy.ndarray): Optimal Lambda.
X_hat_cor_xp_old (numpy.ndarray): Predicted continuous data matrix X.
D_hat_cor_xp_old (numpy.ndarray): Predicted discrete state matrix D.
- Return type:
tuple
- penalty(X, D, min, max, step)
Calculate a sequence of penalty parameters for regularization in the form of scaled exponentials.
- Parameters:
X (-) – Continuous Data matrix, features in columns, samples in rows.
D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.
min (-) – The minimum exponent for the penalty parameter.
max (-) – The maximum exponent for the penalty parameter.
step (-) – The step size between consecutive exponents.
- Returns:
(numpy.ndarray): Sequence of penalty values.
- Return type:
lambda_seq
- place_anomalies_continuous(X, n_ano, epsilon, positive=False)
Place anomalies in continuous data matrix X.
- Parameters:
X (-) – Continuous data matrix, features in columns, samples in rows.
n_ano (-) – Number of anomalies to be placed.
epsilon (-) – List of anomaly strengths. For each entry one simulation is generated.
positive (-) – If True, ensures anomalies are positive. Defaults to False.
- Returns:
- Tuple containing the following elements:
list: List of matrices with placed anomalies.
numpy.ndarray: Positions of placed anomalies.
- Return type:
tuple
- pred_continuous(B, Rho, alphap, D_pred, X_pred)
Predict continuous values given estimated parameters of MGM and discrete values.
- Parameters:
B (-) – Matrix B.
Rho (-) – Continuous-discrete couplings.
alphap (-) – Continuous node potentials.
D_pred (-) – Vector of discrete states.
X_pred (-) – Vector of continuous values that should be estimated.
- Returns:
Predicted continuous values.
- Return type:
numpy.ndarray
- pred_discrete(Rho, X_pred, D_pred, alphaq, Phi, levels, p)
Predict probabilities of observing states D_pred given estimated parameters of MGM and continuous values.
- Parameters:
Rho (numpy.ndarray) – Continuous-discrete couplings.
X_pred (numpy.ndarray) – Vector of continuous values.
D_pred (numpy.ndarray) – Vector of discrete states for which probabilities should be calculated.
alphaq (numpy.ndarray) – Continuous node potentials.
Phi (numpy.ndarray) – Discrete-discrete couplings.
levels (list) – Levels for discrete states.
p (int) – Number of continuous features.
- Returns:
Predicted probabilities of observing discrete states D_pred.
- Return type:
numpy.ndarray
- rel_dev(x, org)
Calculate relative deviation of x from org.
- Parameters:
x (-) – Value to be compared.
org (-) – Original value.
- Returns:
Relative deviation.
- Return type:
float
- transform_back(X, X_scaled)
Transform data back to original scale.
- Parameters:
X (-) – Original continuous data matrix.
X_scaled (-) – Continuous data matrix X transformed to [0,1] range.
- Returns:
Continuous data matrix X retransformed to original scale.
- Return type:
numpy.ndarray
- transform_data(X)
Transform data to [0,1] range using Min-max transformation.
- Parameters:
X (-) – Continuous data matrix, features in columns, samples in rows.
- Returns:
Continuous data matrix X transformed to [0,1] range.
- Return type:
numpy.ndarray
adadmire.mgm module
- B_Rho_Phi_alphap_alphaq(B, Rho, Phi, alphap, alphaq)
- Fit_MGM(X, D, levels, lambda_seq, iterations, eps=1e-06)
- Inv_B_Rho_Phi_alphap_alphaq(x, p, q)
- grad_f_temp(x, X, D, levels, p, q)
- grad_neglogli(B, Rho, Phi, alphap, alphaq, X, D, levels)
- grad_neglogli_plain(B_Rho_Phi_alphap_alphaq, X, D, levels, p, q)
- make_penalty_factors(X, D, levels)
- make_starting_parameters(X, D, levels)
- neglogli(B, Rho, Phi, alphap, alphaq, X, D, levels)
- neglogli_plain(B_Rho_Phi_alphap_alphaq, X, D, levels, p, q)
- prox_enet(x, l_l1, l_l2, t, pen, p0, tol0)