Apply Target Encoder
Applies a pre-fitted Target Encoder to a specific series or array.
Apply Target Encoder
Processing
This brick applies a pre-trained Target Encoder (or similar Scikit-Learn compatible encoder) to your dataset. It transforms categorical data—such as city names, product categories, or status labels—into numerical values based on the specific rules and patterns learned during the encoder's training phase.
The brick automatically standardizes various input formats (including Lists, Polars/Pandas Series, and NumPy arrays) into the correct format required by the encoder, applies the transformation, and returns the result as a standardized series.
Inputs
- data
- The raw data you want to transform. This is typically a list or column of categorical values (strings or labels) that needs to be converted into numbers.
- Encoder
- The pre-fitted encoder object containing the logic for the transformation. This object must have already been "trained" or "fitted" in a previous step.
Inputs Types
| Input | Types |
|---|---|
data |
DataSeries, NDArray, List |
Encoder |
Any |
You can check the list of supported types here: Available Type Hints.
Outputs
- encoded data
- The transformed data. This series contains the numerical representations of your input categories based on the encoder's rules.
Outputs Types
| Output | Types |
|---|---|
encoded data |
DataSeries |
You can check the list of supported types here: Available Type Hints.
Options
The Apply Target Encoder brick contains some changeable options:
- Reshape Input to 2D
- Controls the shape of the data fed into the encoder. Some encoders require data to be shaped as a 2D column (e.g.,
[[1], [2]]), while others prefer a flat 1D list (e.g.,[1, 2]). When enabled, it reshapes the input into a 2D column format. Use this if your encoder expects a matrix. - Verbose
- Controls the amount of logging information generated during processing.
Example
Input:
data:["High Risk", "Low Risk", "High Risk", "Medium Risk"]Encoder: A Target Encoder object previously trained to associate each label with a numerical value.
Output:
encoded_data:[1, 2, 1, 3]
Explanation:
The brick uses the logic stored in the Encoder input to swap the text categories ("High Risk") with the specific numerical values established during training (1). Note that the first and third items are identical because they share the same category.
import logging
import numpy as np
import pandas as pd
import polars as pl
import pyarrow as pa
from coded_flows.types import Union, DataSeries, NDArray, List, Any
from coded_flows.utils import CodedFlowsLogger
logger = CodedFlowsLogger(name="Apply Target Encoder", level=logging.INFO)
def apply_target_encoder(
data: Union[DataSeries, NDArray, List], Encoder: Any, options=None
) -> DataSeries:
options = options or {}
verbose = options.get("verbose", True)
reshape_2d = options.get("reshape_2d", False)
encoded_data = None
try:
verbose and logger.info("Starting Target Encoder application process.")
if Encoder is None:
raise ValueError("No Encoder object provided.")
working_data = None
if isinstance(data, list):
working_data = np.array(data)
verbose and logger.info("Detected Input: Python List.")
elif isinstance(data, (pa.Array, pa.ChunkedArray)):
working_data = data.to_numpy()
verbose and logger.info("Detected Input: PyArrow Array.")
elif isinstance(data, pl.Series):
working_data = data.to_numpy()
verbose and logger.info("Detected Input: Polars Series.")
elif isinstance(data, pd.Series):
working_data = data.values
verbose and logger.info("Detected Input: Pandas Series.")
elif isinstance(data, np.ndarray):
working_data = data
verbose and logger.info("Detected Input: NumPy Array.")
else:
raise ValueError(
f"Unsupported input type: {type(data)}. Expected List, Series, or Array."
)
if reshape_2d:
if working_data.ndim == 1:
working_data = working_data.reshape(-1, 1)
verbose and logger.info("Reshaped input data to 2D.")
elif working_data.ndim > 1:
working_data = working_data.ravel()
verbose and logger.info("Flattened input data to 1D.")
verbose and logger.info(f"Applying transform using {type(Encoder).__name__}.")
if not hasattr(Encoder, "transform"):
raise AttributeError(
f"The provided Encoder ({type(Encoder).__name__}) does not have a 'transform' method."
)
transformed_data = Encoder.transform(working_data)
if isinstance(transformed_data, np.ndarray) and transformed_data.ndim > 1:
if transformed_data.shape[1] == 1:
transformed_data = transformed_data.ravel()
else:
transformed_data = list(transformed_data)
encoded_data = pd.Series(transformed_data, name="encoded_target")
verbose and logger.info("Encoder transform applied successfully.")
except Exception as e:
verbose and logger.error(f"Error during encoder application")
raise e
return encoded_data
Brick Info
- shap>=0.47.0
- scikit-learn
- pandas
- pyarrow
- numpy
- numba>=0.56.0
- polars