API Reference: Reporting¶
Report generator¶
auditml.reporting.report_generator
¶
Unified report generator for a complete AuditML audit.
Orchestrates individual attack reports, DP comparison, and cross-attack comparison into a single output directory with a master summary.
Expected workflow:
- Run one or more attacks against a target model.
- Optionally run the same attacks against a DP-trained model.
- Pass all results to
ReportGenerator. - Call
generate()to produce the full report.
Output structure::
<output_dir>/
├── summary.txt # Master text summary
├── audit_summary.json # Machine-readable summary
├── attacks/
│ ├── mia_threshold/ # Per-attack report dirs
│ ├── mia_shadow/
│ ├── model_inversion/
│ └── attribute_inference/
├── attack_comparison/ # Cross-attack comparison (if 2+ attacks)
│ ├── attack_comparison.json
│ ├── attack_comparison_bar.png
│ └── attack_roc_overlay.png
└── dp_comparison/ # DP vs non-DP (if DP results provided)
├── comparison.json
├── roc_comparison.png
└── ...
ReportGenerator
¶
Unified report generator for a full privacy audit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_name
|
str
|
Human-readable name for this audit (used in titles and filenames). |
'audit'
|
attack_results
|
dict[str, tuple[BaseAttack, AttackResult]] | None
|
Mapping from attack name to |
None
|
dp_attack_results
|
dict[str, tuple[BaseAttack, AttackResult]] | None
|
Same structure but for attacks run against the DP model. If provided, a DP comparison section is added. |
None
|
epsilon
|
float | None
|
The privacy budget used for DP training. Required if
|
None
|
model_accuracy
|
dict[str, float] | None
|
Optional dict |
None
|
Source code in src/auditml/reporting/report_generator.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | |
generate(output_dir: str | Path) -> Path
¶
Generate the complete audit report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir
|
str | Path
|
Root directory for the report. Created if it doesn't exist. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The output directory. |
Source code in src/auditml/reporting/report_generator.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
Attack comparison¶
auditml.reporting.attack_comparison
¶
Cross-attack comparison module.
Compares results from multiple attack types run against the same model, answering questions like:
- Which attack is the most effective at inferring membership?
- Which classes are most vulnerable, and does this vary by attack?
- How do confidence-score distributions differ across attacks?
Expected workflow:
- Run two or more attacks (threshold MIA, shadow MIA, model inversion, attribute inference) against the same target model.
- Pass all
AttackResultobjects toAttackComparison. - Call
rank_attacks(),generate_report(), etc.
This module is complementary to DPComparison (Task 2.11), which
compares the same attack across DP vs non-DP models.
AttackComparison
¶
Compare effectiveness across multiple attack types.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
dict[str, AttackResult]
|
Mapping from attack name to its |
required |
Source code in src/auditml/reporting/attack_comparison.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 | |
compute_all_metrics() -> dict[str, dict[str, float]]
¶
Compute standard metrics for every attack.
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, float]]
|
Mapping from attack name to its metric dictionary. |
Source code in src/auditml/reporting/attack_comparison.py
rank_attacks(metric: str = 'auc_roc') -> list[tuple[str, float]]
¶
Rank attacks by a chosen metric (descending).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
str
|
The metric key to rank by (default |
'auc_roc'
|
Returns:
| Type | Description |
|---|---|
list of (attack_name, metric_value), sorted descending.
|
|
Source code in src/auditml/reporting/attack_comparison.py
best_attack(metric: str = 'auc_roc') -> str
¶
Return the name of the most effective attack.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
str
|
The metric to compare by. |
'auc_roc'
|
Returns:
| Type | Description |
|---|---|
str
|
Name of the top-ranked attack. |
Source code in src/auditml/reporting/attack_comparison.py
summary_table() -> dict[str, dict[str, float]]
¶
Build a table of all attacks and all metrics.
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, float]]
|
Outer key = attack name, inner dict = metric values. |
Source code in src/auditml/reporting/attack_comparison.py
summary_dict() -> dict[str, Any]
¶
Return a comprehensive summary for serialisation.
Returns:
| Type | Description |
|---|---|
dict containing ``metrics``, ``ranking``, ``best_attack``, and
|
|
``attack_names``.
|
|
Source code in src/auditml/reporting/attack_comparison.py
generate_report(output_dir: str | Path) -> Path
¶
Generate a cross-attack comparison report.
Creates:
attack_comparison.json— full metrics and rankingsattack_comparison_bar.png— grouped bar chart of metricsattack_roc_overlay.png— overlaid ROC curvessummary.txt— human-readable summary
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir
|
str | Path
|
Directory to write report files. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The output directory. |
Source code in src/auditml/reporting/attack_comparison.py
DP vs Non-DP comparison¶
auditml.reporting.comparison
¶
DP vs Non-DP comparison module.
Compares attack results obtained from a standard (non-private) model against a differentially-private (DP) model. The comparison answers the core question: does DP training reduce privacy leakage?
Expected workflow:
- Train a standard model and run attacks →
baseline_results. - Train a DP model (same architecture, same data) and run the same
attacks →
dp_results. - Pass both to
DPComparisonto compute deltas, generate plots, and produce a combined report.
The module is attack-agnostic — it works with any AttackResult
from any of the four attack types.
DPComparison
¶
Compare attack effectiveness between standard and DP models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
baseline_result
|
AttackResult
|
Attack result from the standard (non-DP) model. |
required |
dp_result
|
AttackResult
|
Attack result from the DP-trained model. |
required |
baseline_metrics
|
dict[str, float] | None
|
Pre-computed metrics for the baseline. If |
None
|
dp_metrics
|
dict[str, float] | None
|
Pre-computed metrics for the DP model. If |
None
|
epsilon
|
float | None
|
The privacy budget (epsilon) used during DP training. |
None
|
model_accuracy
|
dict[str, float] | None
|
Optional dict with |
None
|
Source code in src/auditml/reporting/comparison.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 | |
compute_deltas() -> dict[str, float]
¶
Compute the change in each metric: dp_value - baseline_value.
Negative deltas mean the DP model is more private (attacks are less effective). The most important metrics:
accuracy_delta: negative = DP makes the attack less accurateauc_roc_delta: negative = DP makes the attack less discriminativetpr_at_1fpr_delta: negative = DP reduces true positive rate at realistic operating points
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
Keys are |
Source code in src/auditml/reporting/comparison.py
compute_privacy_gain() -> dict[str, float]
¶
Summarise the privacy improvement from DP training.
Returns:
| Type | Description |
|---|---|
dict with keys:
|
|
Source code in src/auditml/reporting/comparison.py
summary_dict() -> dict[str, Any]
¶
Return a comprehensive summary combining all comparisons.
Returns:
| Type | Description |
|---|---|
dict
|
Contains |
Source code in src/auditml/reporting/comparison.py
generate_report(output_dir: str | Path) -> Path
¶
Generate a comparison report with metrics, plots, and summary.
Creates:
comparison.json— full comparison datacomparison_bar_chart.png— side-by-side metric comparisonroc_comparison.png— overlaid ROC curvesscore_comparison.png— confidence score distributionssummary.txt— human-readable summary
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir
|
str | Path
|
Directory to write report files. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The output directory. |
Source code in src/auditml/reporting/comparison.py
Visualisation¶
auditml.reporting.visualization
¶
Visualization functions for DP vs Non-DP comparison reports.
Provides side-by-side plots that make it easy to see how DP training affects attack effectiveness.
plot_metric_comparison(baseline_metrics: dict[str, float], dp_metrics: dict[str, float], save_path: str | Path | None = None, title: str = 'Attack Metrics — Baseline vs DP') -> plt.Figure
¶
Side-by-side bar chart comparing attack metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
baseline_metrics
|
dict[str, float]
|
Metrics from the standard (non-DP) model. |
required |
dp_metrics
|
dict[str, float]
|
Metrics from the DP model. |
required |
save_path
|
str | Path | None
|
If given, saves the figure. |
None
|
title
|
str
|
Plot title. |
'Attack Metrics — Baseline vs DP'
|
Returns:
| Type | Description |
|---|---|
Figure
|
|
Source code in src/auditml/reporting/visualization.py
plot_roc_comparison(baseline_gt: np.ndarray, baseline_scores: np.ndarray, dp_gt: np.ndarray, dp_scores: np.ndarray, save_path: str | Path | None = None, title: str = 'ROC Curve — Baseline vs DP') -> plt.Figure
¶
Overlay ROC curves from baseline and DP models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
baseline_gt
|
ndarray
|
Ground truth for baseline attack. |
required |
baseline_scores
|
ndarray
|
Confidence scores for baseline attack. |
required |
dp_gt
|
ndarray
|
Ground truth for DP attack. |
required |
dp_scores
|
ndarray
|
Confidence scores for DP attack. |
required |
save_path
|
str | Path | None
|
If given, saves the figure. |
None
|
title
|
str
|
Plot title. |
'ROC Curve — Baseline vs DP'
|
Returns:
| Type | Description |
|---|---|
Figure
|
|
Source code in src/auditml/reporting/visualization.py
plot_score_comparison(baseline_scores: np.ndarray, baseline_gt: np.ndarray, dp_scores: np.ndarray, dp_gt: np.ndarray, save_path: str | Path | None = None, title: str = 'Confidence Score Distribution — Baseline vs DP') -> plt.Figure
¶
Four-panel histogram comparing member/non-member score distributions.
Top row: baseline model (members vs non-members). Bottom row: DP model (members vs non-members).
Well-separated distributions indicate a successful attack; overlapping distributions indicate the model is more private.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
baseline_scores
|
ndarray
|
Confidence scores from baseline attack. |
required |
baseline_gt
|
ndarray
|
Ground truth for baseline. |
required |
dp_scores
|
ndarray
|
Confidence scores from DP attack. |
required |
dp_gt
|
ndarray
|
Ground truth for DP. |
required |
save_path
|
str | Path | None
|
If given, saves the figure. |
None
|
title
|
str
|
Plot title. |
'Confidence Score Distribution — Baseline vs DP'
|
Returns:
| Type | Description |
|---|---|
Figure
|
|