adadmire package

adadmire: Anomaly detection in mixed high-dimensional molecular data.

Submodules

adadmire.apgpy module

class IWrapper

Bases: object

copy()
property data
dot(other)
norm()
class NumpyWrapper(nparray)

Bases: IWrapper

copy()
property data
dot(other)
norm()
npwrap(x)
npwrapfunc(f, *args)
solve(grad_f, prox_h, x_init, max_iters=2500, eps=1e-06, alpha=1.01, beta=0.5, use_restart=True, gen_plots=False, quiet=False, use_gra=False, step_size=False, fixed_step_size=False, debug=False)

adadmire.main module

admire(X, D, levels, lam, oIterations=10000, oTol=1e-06, t=0.05)

Detect and correct data anomalies in continuous data matrix X and discrete states matrix D.

Parameters:
  • X (-) – Continuous Data matrix, features in columns, samples in rows.

  • D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.

  • levels (-) – List of levels for discrete states.

  • lam (-) – (numpy.ndarray): Sequence of penalty values.

  • oIterations (-) – Number of iterations for fitting MGMs. Defaults to 10000.

  • oTol (-) – Tolerance for fitting MGMs. Defaults to 1e-6.

Returns:

Tuple containing the following elements:
  • X_cor (numpy.ndarray): Continuous data matrix X corrected for anomalies.

  • n_cont (int): Number of detected continuous anomalies.

  • position_cont (numpy.ndarray): Positions of detected continuous anomalies in X.

  • D_cor (numpy.ndarray): Discrete states matrix D corrected for anomalies.

  • n_disc (int): Number of detected discrete anomalies.

  • position_disc (numpy.ndarray): Positions of detected discrete anomalies in D.

Return type:

tuple

calc_mean(X, D)

Calculate the group mean of the continuous data matrix X according to states D.

Parameters:
  • X (-) – continuous Data matrix, features in columns, samples in rows.

  • D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.

Returns:

Group mean values.

Return type:

numpy.ndarray

get_threshold_continuous(X, X_hat, dev)

Calculate the threshold for continuous anomaly detection and correct matrix X accordingly.

Parameters:
  • X (-) – Continuous data matrix, features in columns, samples in rows.

  • X_hat (-) – Predicted continuous data matrix X.

  • dev (-) – Variances of the continuous estimates.

Returns:

Tuple containing the following elements:
  • X_cor (numpy.ndarray): Data matrix X corrected for anomalies.

  • threshold (float): Threshold value for anomaly detection.

  • n_ano (int): Number of detected anomalies.

  • pos (numpy.ndarray): Indices of detected anomalies in matrix X.

Return type:

Tuple

get_threshold_discrete(D, levels, D_hat)

Calculate the threshold for discrete anomaly detection and determine the number of discrete anomalies.

Parameters:
  • D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.

  • levels (-) – List of levels for discrete states.

  • D_hat (-) – Predicted discrete state matrix D.

Returns:

Tuple containing the following elements:
  • n_ano (int): Number of detected anomalies.

  • threshold (float): Threshold value for anomaly detection.

  • pos (numpy.ndarray): Indices of detected anomalies in matrix X.

Return type:

Tuple

impute(X, D, levels, lambda_seq, oIterations=10000, oTol=1e-06)

Impute missing values in the data using MGM.

Parameters:
  • X (-) – Continuous Data matrix with missing values (np.nan), features in columns, samples in rows.

  • D (-) – Discrete states matrix in one-hot-encoding with missing values (np.nan), features in columns, samples in rows.

  • levels (-) – List of levels for discrete states.

  • lambda_seq (-) – (numpy.ndarray): Sequence of penalty values.

  • oIterations (-) – Number of iterations for fitting MGMs. Defaults to 10000.

  • oTol (-) – Tolerance for fitting MGMs. Defaults to 1e-6.

Returns:

Tuple containing the following elements:
  • numpy.ndarray: Imputed continuous data.

  • numpy.ndarray: Imputed discrete data.

  • float: Optimal lambda_seq value used for imputation.

Return type:

tuple

loo_cv_cor(X, D, levels, lambda_seq, oIterations=10000, oTol=1e-06, t=0.05)

Estimate continuous matrix X and discrete states D in a leave-one-out cross-validation approach using Mixed Graphical Models.

Parameters:
  • X (-) – Continuous Data matrix, features in columns, samples in rows.

  • D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.

  • levels (-) – List of levels for discrete states.

  • lambda_seq (-) – (numpy.ndarray): Sequence of penalty values.

  • oIterations (-) – Number of iterations for fitting MGMs. Defaults to 10000.

  • oTol (-) – Tolerance for fitting MGMs. Defaults to 1e-6.

  • t (-) – Probability threshold value for smoothing corrections. Defaults to 0.05.

Returns:

Tuple containing the following elements:
  • prob_cont_old (numpy.ndarray): Estimated continuous probabilities.

  • Var_old (numpy.ndarray): Variances of the continuous estimates.

  • lam_opt_old (numpy.ndarray): Optimal Lambda.

  • X_hat_cor_xp_old (numpy.ndarray): Predicted continuous data matrix X.

  • D_hat_cor_xp_old (numpy.ndarray): Predicted discrete state matrix D.

Return type:

tuple

penalty(X, D, min, max, step)

Calculate a sequence of penalty parameters for regularization in the form of scaled exponentials.

Parameters:
  • X (-) – Continuous Data matrix, features in columns, samples in rows.

  • D (-) – Discrete states matrix in one-hot-encoding, features in columns, samples in rows.

  • min (-) – The minimum exponent for the penalty parameter.

  • max (-) – The maximum exponent for the penalty parameter.

  • step (-) – The step size between consecutive exponents.

Returns:

(numpy.ndarray): Sequence of penalty values.

Return type:

  • lambda_seq

place_anomalies_continuous(X, n_ano, epsilon, positive=False)

Place anomalies in continuous data matrix X.

Parameters:
  • X (-) – Continuous data matrix, features in columns, samples in rows.

  • n_ano (-) – Number of anomalies to be placed.

  • epsilon (-) – List of anomaly strengths. For each entry one simulation is generated.

  • positive (-) – If True, ensures anomalies are positive. Defaults to False.

Returns:

Tuple containing the following elements:
  • list: List of matrices with placed anomalies.

  • numpy.ndarray: Positions of placed anomalies.

Return type:

tuple

pred_continuous(B, Rho, alphap, D_pred, X_pred)

Predict continuous values given estimated parameters of MGM and discrete values.

Parameters:
  • B (-) – Matrix B.

  • Rho (-) – Continuous-discrete couplings.

  • alphap (-) – Continuous node potentials.

  • D_pred (-) – Vector of discrete states.

  • X_pred (-) – Vector of continuous values that should be estimated.

Returns:

Predicted continuous values.

Return type:

numpy.ndarray

pred_discrete(Rho, X_pred, D_pred, alphaq, Phi, levels, p)

Predict probabilities of observing states D_pred given estimated parameters of MGM and continuous values.

Parameters:
  • Rho (numpy.ndarray) – Continuous-discrete couplings.

  • X_pred (numpy.ndarray) – Vector of continuous values.

  • D_pred (numpy.ndarray) – Vector of discrete states for which probabilities should be calculated.

  • alphaq (numpy.ndarray) – Continuous node potentials.

  • Phi (numpy.ndarray) – Discrete-discrete couplings.

  • levels (list) – Levels for discrete states.

  • p (int) – Number of continuous features.

Returns:

Predicted probabilities of observing discrete states D_pred.

Return type:

numpy.ndarray

rel_dev(x, org)

Calculate relative deviation of x from org.

Parameters:
  • x (-) – Value to be compared.

  • org (-) – Original value.

Returns:

Relative deviation.

Return type:

float

transform_back(X, X_scaled)

Transform data back to original scale.

Parameters:
  • X (-) – Original continuous data matrix.

  • X_scaled (-) – Continuous data matrix X transformed to [0,1] range.

Returns:

Continuous data matrix X retransformed to original scale.

Return type:

numpy.ndarray

transform_data(X)

Transform data to [0,1] range using Min-max transformation.

Parameters:

X (-) – Continuous data matrix, features in columns, samples in rows.

Returns:

Continuous data matrix X transformed to [0,1] range.

Return type:

numpy.ndarray

adadmire.mgm module

B_Rho_Phi_alphap_alphaq(B, Rho, Phi, alphap, alphaq)
Fit_MGM(X, D, levels, lambda_seq, iterations, eps=1e-06)
Inv_B_Rho_Phi_alphap_alphaq(x, p, q)
grad_f_temp(x, X, D, levels, p, q)
grad_neglogli(B, Rho, Phi, alphap, alphaq, X, D, levels)
grad_neglogli_plain(B_Rho_Phi_alphap_alphaq, X, D, levels, p, q)
make_penalty_factors(X, D, levels)
make_starting_parameters(X, D, levels)
neglogli(B, Rho, Phi, alphap, alphaq, X, D, levels)
neglogli_plain(B_Rho_Phi_alphap_alphaq, X, D, levels, p, q)
prox_enet(x, l_l1, l_l2, t, pen, p0, tol0)