Privacy Attacks¶

AuditML implements four attacks. Each produces an AttackResult with standardised metrics so you can compare them directly.

Threshold MIA¶

The simplest membership inference attack. It exploits the fact that a model assigns lower loss (or higher confidence) to samples it was trained on.

How it works: 1. Compute a signal (loss, confidence, or entropy) for every sample. 2. Sweep all unique signal values as potential thresholds. 3. Pick the threshold that maximises attack accuracy.

attack_params:
  mia_threshold:
    metric: loss          # loss | confidence | entropy
    percentile: 50        # fallback if optimal scan disabled

auditml audit --config configs/audit_mnist.yaml --attack mia_threshold

When to use: Quick baseline. If AUC > 0.6 with this attack, your model is leaking.

Shadow Model MIA¶

A stronger attack that trains several "shadow models" on data with known membership labels, then uses them to train a binary membership classifier.

attack_params:
  mia_shadow:
    num_shadow_models: 4
    shadow_epochs: 10

When to use: More powerful than threshold MIA but requires 4× the training time.

Model Inversion¶

Reconstructs representative images for each class by gradient ascent in pixel space. If the model has memorised training data, the reconstructions resemble actual training samples.

attack_params:
  model_inversion:
    num_iterations: 500
    learning_rate: 0.1
    lambda_tv: 0.01       # Total Variation regularisation
    lambda_l2: 0.001      # L2 regularisation
    target_class: null    # null = all classes

Output: A grid of reconstructed images per class, plus SSIM quality scores.

Attribute Inference¶

Predicts sensitive attributes (e.g., gender, age group) from the model's intermediate representations.

attack_params:
  attribute_inference:
    target_attribute: label
    attack_epochs: 10

Attack metrics¶

All attacks report the same set of metrics:

Metric	Meaning
`accuracy`	Fraction of correct member/non-member predictions
`precision`	Of predicted members, fraction that are truly members
`recall`	Of true members, fraction correctly identified
`f1`	Harmonic mean of precision and recall
`auc_roc`	Area under the ROC curve (0.5 = random, 1.0 = perfect)
`auc_pr`	Area under the precision-recall curve
`tpr_at_1fpr`	True Positive Rate at 1% False Positive Rate
`tpr_at_01fpr`	True Positive Rate at 0.1% False Positive Rate

See Interpreting Results for guidance on what these numbers mean in practice.