Classification Metrics
Computes standard classification metrics. Supports specific positive labels for binary classification.
Classification Metrics
Processing
This brick evaluates the performance of a classification model by comparing its predictions against actual historical data (ground truth). It calculates standard statistical scores—such as Accuracy, Precision, Recall, and F1 Score—to determine how well the model is performing.
The brick processes lists or columns of data to generate a single performance score. It is capable of handling binary classification (two possible outcomes, like "Yes/No") as well as multi-class scenarios, and it supports automatic data type detection for labels.
Inputs
- y true
- The dataset containing the actual, correct labels (ground truth). This represents what really happened (e.g., whether a customer actually churned). If you provide a DataFrame or Table, you must specify the column name in the Target Column option.
- y pred
- The dataset containing the class labels predicted by your model. If you provide a DataFrame or Table, you must specify the column name in the Prediction Column option.
- y prob
- (Optional) The dataset containing the predicted probability scores (values between 0 and 1). This is only required if you select ROC-AUC as the metric. If you provide a DataFrame or Table, you must specify the column name in the Probability Column option.
Inputs Types
| Input | Types |
|---|---|
y true |
DataFrame, ArrowTable, DataSeries, NDArray, List |
y pred |
DataFrame, ArrowTable, DataSeries, NDArray, List |
y prob |
DataFrame, ArrowTable, DataSeries, NDArray, List |
You can check the list of supported types here: Available Type Hints.
Outputs
- score
- The calculated performance metric returned as a single floating-point number.
Outputs Types
| Output | Types |
|---|---|
score |
float |
You can check the list of supported types here: Available Type Hints.
Options
The Classification Metrics brick contains some changeable options:
- Metric
- Determines which statistical formula is used to evaluate the model.
- Accuracy: The percentage of total predictions that were correct.
- Precision: The accuracy of positive predictions (e.g., of all emails marked "Spam", how many were actually spam?).
- Recall: The ability to find all positive instances (e.g., of all actual spam emails, how many did we find?).
- F1 Score: The harmonic mean of Precision and Recall, useful when you need a balance between the two.
- ROC AUC: Evaluates the model's ability to distinguish between classes using probabilities. Requires
y_probinput.
- Average Strategy
- Defines how to calculate metrics when there are multiple classes (or for specific binary behaviors).
- binary: Only considers the class specified in Positive Label. Used for simple Yes/No or True/False cases.
- micro: Calculates metrics globally by counting total true positives, false negatives, and false positives.
- macro: Calculates metrics for each label, and finds their unweighted mean. This treats all classes equally.
- weighted: Calculates metrics for each label, and finds their average weighted by support (the number of true instances for each label).
- Positive Label
- The label that represents the "positive" class (e.g.,
1,True, orChurn). This is used specifically when the Average Strategy is set to "binary". The brick will attempt to match the data type (integer, boolean, or string) automatically. - Target Column
- The name of the column in
y truethat contains the actual labels. Default isy_true. - Prediction Column
- The name of the column in
y predthat contains the predicted labels. Default isy_pred. - Probability Column
- The name of the column in
y probthat contains the probability scores. Default isprobability. - Verbose
- If enabled, detailed logs about the computation process and any data extraction issues will be printed to the console.
import logging
import numpy as np
import pandas as pd
import polars as pl
import pyarrow as pa
from sklearn.metrics import (
accuracy_score,
precision_score,
recall_score,
f1_score,
roc_auc_score,
)
from coded_flows.types import (
Union,
DataFrame,
ArrowTable,
DataSeries,
List,
NDArray,
Float,
Any,
)
from coded_flows.utils import CodedFlowsLogger
logger = CodedFlowsLogger(name="Classification Metrics", level=logging.INFO)
def _extract_vector(
data: Any, column_name: str, verbose: bool = False, label: str = "Input"
) -> Union[List, np.ndarray]:
"""
Extracts a 1D vector/list from various data structures (DF, Series, Arrow, List, Numpy).
"""
result = None
if isinstance(data, pd.DataFrame):
if column_name not in data.columns:
verbose and logger.error(
f"Column '{column_name}' not found in Pandas DataFrame for {label}."
)
raise ValueError(
f"Column '{column_name}' not found in Pandas DataFrame for {label}."
)
result = data[column_name].tolist()
elif isinstance(data, pl.DataFrame):
if column_name not in data.columns:
verbose and logger.error(
f"Column '{column_name}' not found in Polars DataFrame for {label}."
)
raise ValueError(
f"Column '{column_name}' not found in Polars DataFrame for {label}."
)
result = data[column_name].to_list()
elif isinstance(data, (pa.Table, pa.lib.Table)):
if column_name not in data.column_names:
verbose and logger.error(
f"Column '{column_name}' not found in Arrow Table for {label}."
)
raise ValueError(
f"Column '{column_name}' not found in Arrow Table for {label}."
)
result = data[column_name].to_pylist()
elif isinstance(data, pd.Series):
result = data.tolist()
elif isinstance(data, pl.Series):
result = data.to_list()
elif isinstance(data, np.ndarray):
result = data.flatten().tolist()
elif isinstance(data, list):
result = data
if result is None:
verbose and logger.error(
f"Unsupported data type for {label}: {type(data)}. Expected DataFrame, Series, Array, or List."
)
raise ValueError(
f"Unsupported data type for {label}: {type(data)}. Expected DataFrame, Series, Array, or List."
)
return result
def _cast_pos_label(pos_label_str: str, sample_data: List) -> Any:
"""
Attempts to cast the string input 'pos_label' to the type found in the data.
Useful because the UI returns strings, but data might be Int or Bool.
"""
if not sample_data or pos_label_str is None:
return pos_label_str
elem = next((x for x in sample_data if x is not None), None)
if elem is None:
return pos_label_str
try:
if isinstance(elem, (int, np.integer)) and (not isinstance(elem, bool)):
return int(pos_label_str)
elif isinstance(elem, (bool, np.bool_)):
if pos_label_str.lower() == "true":
return True
if pos_label_str.lower() == "false":
return False
return bool(int(pos_label_str))
except Exception:
pass
return pos_label_str
def classification_metrics(
y_true: Union[DataFrame, ArrowTable, DataSeries, NDArray, List],
y_pred: Union[DataFrame, ArrowTable, DataSeries, NDArray, List],
y_prob: Union[DataFrame, ArrowTable, DataSeries, NDArray, List] = None,
options: dict = None,
) -> Float:
options = options or {}
verbose = options.get("verbose", True)
metric = options.get("metric", "Accuracy")
average_strategy = options.get("average_strategy", "binary")
raw_pos_label = options.get("pos_label", "1")
y_true_col = options.get("y_true_column", "y_true")
y_pred_col = options.get("y_pred_column", "y_pred")
y_prob_col = options.get("y_prob_column", "probability")
score = 0.0
try:
verbose and logger.info(f"Starting metric computation: {metric}")
y_true_vec = _extract_vector(
y_true,
column_name=y_true_col,
verbose=verbose,
label="Real Target (y_true)",
)
y_pred_vec = _extract_vector(
y_pred, column_name=y_pred_col, verbose=verbose, label="Prediction (y_pred)"
)
pos_label = _cast_pos_label(raw_pos_label, y_true_vec)
metric_kwargs = {"zero_division": 0, "average": average_strategy}
if average_strategy == "binary":
metric_kwargs["pos_label"] = pos_label
if metric == "Accuracy":
score = accuracy_score(y_true_vec, y_pred_vec)
elif metric == "Precision":
score = precision_score(y_true_vec, y_pred_vec, **metric_kwargs)
elif metric == "Recall":
score = recall_score(y_true_vec, y_pred_vec, **metric_kwargs)
elif metric == "F1 Score":
score = f1_score(y_true_vec, y_pred_vec, **metric_kwargs)
elif metric == "ROC-AUC":
if y_prob is None:
verbose and logger.error(
"Input 'y_prob' is required for ROC-AUC metric."
)
raise ValueError("Input 'y_prob' is required for ROC-AUC metric.")
y_prob_vec = _extract_vector(
y_prob,
column_name=y_prob_col,
verbose=verbose,
label="Probabilities (y_prob)",
)
if average_strategy == "binary":
y_true_binary = [1 if x == pos_label else 0 for x in y_true_vec]
score = roc_auc_score(y_true_binary, y_prob_vec)
else:
score = roc_auc_score(
y_true_vec, y_prob_vec, multi_class="ovr", average=average_strategy
)
verbose and logger.info(f"Computation successful. {metric}: {score}")
except Exception as e:
verbose and logger.error(f"Error computing classification metrics: {str(e)}")
raise e
return score
Brick Info
- shap>=0.47.0
- scikit-learn
- numpy
- pandas
- pyarrow
- polars[pyarrow]
- numba>=0.56.0