Classification Metrics

Computes standard classification metrics. Supports specific positive labels for binary classification.

Classification Metrics

Processing

This brick evaluates the performance of a classification model by comparing its predictions against actual historical data (ground truth). It calculates standard statistical scores—such as Accuracy, Precision, Recall, and F1 Score—to determine how well the model is performing.

The brick processes lists or columns of data to generate a single performance score. It is capable of handling binary classification (two possible outcomes, like "Yes/No") as well as multi-class scenarios, and it supports automatic data type detection for labels.

Inputs

y true: The dataset containing the actual, correct labels (ground truth). This represents what really happened (e.g., whether a customer actually churned). If you provide a DataFrame or Table, you must specify the column name in the Target Column option.
y pred: The dataset containing the class labels predicted by your model. If you provide a DataFrame or Table, you must specify the column name in the Prediction Column option.
y prob: (Optional) The dataset containing the predicted probability scores (values between 0 and 1). This is only required if you select ROC-AUC as the metric. If you provide a DataFrame or Table, you must specify the column name in the Probability Column option.

Inputs Types

Input	Types
`y true`	`DataFrame`, `ArrowTable`, `DataSeries`, `NDArray`, `List`
`y pred`	`DataFrame`, `ArrowTable`, `DataSeries`, `NDArray`, `List`
`y prob`	`DataFrame`, `ArrowTable`, `DataSeries`, `NDArray`, `List`

You can check the list of supported types here: Available Type Hints.

Outputs

score: The calculated performance metric returned as a single floating-point number.

Outputs Types

Output	Types
`score`	`float`

You can check the list of supported types here: Available Type Hints.

Options

The Classification Metrics brick contains some changeable options:

Metric: Determines which statistical formula is used to evaluate the model.

Accuracy: The percentage of total predictions that were correct.
Precision: The accuracy of positive predictions (e.g., of all emails marked "Spam", how many were actually spam?).
Recall: The ability to find all positive instances (e.g., of all actual spam emails, how many did we find?).
F1 Score: The harmonic mean of Precision and Recall, useful when you need a balance between the two.
ROC AUC: Evaluates the model's ability to distinguish between classes using probabilities. Requires y_prob input.

Average Strategy: Defines how to calculate metrics when there are multiple classes (or for specific binary behaviors).

binary: Only considers the class specified in Positive Label. Used for simple Yes/No or True/False cases.
micro: Calculates metrics globally by counting total true positives, false negatives, and false positives.
macro: Calculates metrics for each label, and finds their unweighted mean. This treats all classes equally.
weighted: Calculates metrics for each label, and finds their average weighted by support (the number of true instances for each label).

Positive Label: The label that represents the "positive" class (e.g., 1, True, or Churn). This is used specifically when the Average Strategy is set to "binary". The brick will attempt to match the data type (integer, boolean, or string) automatically.
Target Column: The name of the column in y true that contains the actual labels. Default is y_true.
Prediction Column: The name of the column in y pred that contains the predicted labels. Default is y_pred.
Probability Column: The name of the column in y prob that contains the probability scores. Default is probability.
Verbose: If enabled, detailed logs about the computation process and any data extraction issues will be printed to the console.

import logging
import numpy as np
import pandas as pd
import polars as pl
import pyarrow as pa
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
)
from coded_flows.types import (
    Union,
    DataFrame,
    ArrowTable,
    DataSeries,
    List,
    NDArray,
    Float,
    Any,
)
from coded_flows.utils import CodedFlowsLogger

logger = CodedFlowsLogger(name="Classification Metrics", level=logging.INFO)


def _extract_vector(
    data: Any, column_name: str, verbose: bool = False, label: str = "Input"
) -> Union[List, np.ndarray]:
    """
    Extracts a 1D vector/list from various data structures (DF, Series, Arrow, List, Numpy).
    """
    result = None
    if isinstance(data, pd.DataFrame):
        if column_name not in data.columns:
            verbose and logger.error(
                f"Column '{column_name}' not found in Pandas DataFrame for {label}."
            )
            raise ValueError(
                f"Column '{column_name}' not found in Pandas DataFrame for {label}."
            )
        result = data[column_name].tolist()
    elif isinstance(data, pl.DataFrame):
        if column_name not in data.columns:
            verbose and logger.error(
                f"Column '{column_name}' not found in Polars DataFrame for {label}."
            )
            raise ValueError(
                f"Column '{column_name}' not found in Polars DataFrame for {label}."
            )
        result = data[column_name].to_list()
    elif isinstance(data, (pa.Table, pa.lib.Table)):
        if column_name not in data.column_names:
            verbose and logger.error(
                f"Column '{column_name}' not found in Arrow Table for {label}."
            )
            raise ValueError(
                f"Column '{column_name}' not found in Arrow Table for {label}."
            )
        result = data[column_name].to_pylist()
    elif isinstance(data, pd.Series):
        result = data.tolist()
    elif isinstance(data, pl.Series):
        result = data.to_list()
    elif isinstance(data, np.ndarray):
        result = data.flatten().tolist()
    elif isinstance(data, list):
        result = data
    if result is None:
        verbose and logger.error(
            f"Unsupported data type for {label}: {type(data)}. Expected DataFrame, Series, Array, or List."
        )
        raise ValueError(
            f"Unsupported data type for {label}: {type(data)}. Expected DataFrame, Series, Array, or List."
        )
    return result


def _cast_pos_label(pos_label_str: str, sample_data: List) -> Any:
    """
    Attempts to cast the string input 'pos_label' to the type found in the data.
    Useful because the UI returns strings, but data might be Int or Bool.
    """
    if not sample_data or pos_label_str is None:
        return pos_label_str
    elem = next((x for x in sample_data if x is not None), None)
    if elem is None:
        return pos_label_str
    try:
        if isinstance(elem, (int, np.integer)) and (not isinstance(elem, bool)):
            return int(pos_label_str)
        elif isinstance(elem, (bool, np.bool_)):
            if pos_label_str.lower() == "true":
                return True
            if pos_label_str.lower() == "false":
                return False
            return bool(int(pos_label_str))
    except Exception:
        pass
    return pos_label_str


def classification_metrics(
    y_true: Union[DataFrame, ArrowTable, DataSeries, NDArray, List],
    y_pred: Union[DataFrame, ArrowTable, DataSeries, NDArray, List],
    y_prob: Union[DataFrame, ArrowTable, DataSeries, NDArray, List] = None,
    options: dict = None,
) -> Float:
    options = options or {}
    verbose = options.get("verbose", True)
    metric = options.get("metric", "Accuracy")
    average_strategy = options.get("average_strategy", "binary")
    raw_pos_label = options.get("pos_label", "1")
    y_true_col = options.get("y_true_column", "y_true")
    y_pred_col = options.get("y_pred_column", "y_pred")
    y_prob_col = options.get("y_prob_column", "probability")
    score = 0.0
    try:
        verbose and logger.info(f"Starting metric computation: {metric}")
        y_true_vec = _extract_vector(
            y_true,
            column_name=y_true_col,
            verbose=verbose,
            label="Real Target (y_true)",
        )
        y_pred_vec = _extract_vector(
            y_pred, column_name=y_pred_col, verbose=verbose, label="Prediction (y_pred)"
        )
        pos_label = _cast_pos_label(raw_pos_label, y_true_vec)
        metric_kwargs = {"zero_division": 0, "average": average_strategy}
        if average_strategy == "binary":
            metric_kwargs["pos_label"] = pos_label
        if metric == "Accuracy":
            score = accuracy_score(y_true_vec, y_pred_vec)
        elif metric == "Precision":
            score = precision_score(y_true_vec, y_pred_vec, **metric_kwargs)
        elif metric == "Recall":
            score = recall_score(y_true_vec, y_pred_vec, **metric_kwargs)
        elif metric == "F1 Score":
            score = f1_score(y_true_vec, y_pred_vec, **metric_kwargs)
        elif metric == "ROC-AUC":
            if y_prob is None:
                verbose and logger.error(
                    "Input 'y_prob' is required for ROC-AUC metric."
                )
                raise ValueError("Input 'y_prob' is required for ROC-AUC metric.")
            y_prob_vec = _extract_vector(
                y_prob,
                column_name=y_prob_col,
                verbose=verbose,
                label="Probabilities (y_prob)",
            )
            if average_strategy == "binary":
                y_true_binary = [1 if x == pos_label else 0 for x in y_true_vec]
                score = roc_auc_score(y_true_binary, y_prob_vec)
            else:
                score = roc_auc_score(
                    y_true_vec, y_prob_vec, multi_class="ovr", average=average_strategy
                )
        verbose and logger.info(f"Computation successful. {metric}: {score}")
    except Exception as e:
        verbose and logger.error(f"Error computing classification metrics: {str(e)}")
        raise e
    return score

Brick Info

version v0.1.4

python 3.11, 3.12, 3.13

requirements

shap>=0.47.0
scikit-learn
numpy
pandas
pyarrow
polars[pyarrow]
numba>=0.56.0