Round Numbers

Round decimal numbers in one or several columns using round, floor, or ceil.

Round Numbers

Processing

The function rounds numeric values in specified columns of an input data table (Pandas DataFrame, Polars DataFrame, or Arrow Table). Rounding can be controlled by specifying a list of columns, a regex pattern, or applying the operation to all numeric columns if no specific selection is made. Users must choose between standard rounding (round), always rounding down (floor), or always rounding up (ceil). The precision can be specified either by the number of decimal places or by the number of significant digits.

Inputs

data
The input data structure containing columns to be rounded. Supported formats are Pandas DataFrame, Polars DataFrame, or PyArrow Table.
columns (optional)
A list of column names (strings) that should be rounded. If this list is empty and regex pattern is not provided, the operation will apply to all numeric columns.
regex pattern (optional)
A Python regular expression pattern used to select columns for rounding. If provided, specific columns listed in columns are ignored.
rounding mode (optional)
Specifies the type of mathematical rounding to apply (round, floor, or ceil).
significant digits (optional)
If greater than 0, defines the number of significant digits the numbers should be rounded to. This setting overrides decimal places.
decimal places (optional)
If significant digits is 0, numbers are rounded to this specified number of decimal places. Supports negative values.

Inputs Types

Input Types
data DataFrame, ArrowTable
columns List
regex pattern Str
rounding mode Str
significant digits Int
decimal places Int

You can check the list of supported types here: Available Type Hints.

Outputs

result
The data structure containing the processed data with the specified columns rounded according to the chosen mode. The format depends on the 'Output Format' option.

Outputs Types

Output Types
result DataFrame, ArrowTable

You can check the list of supported types here: Available Type Hints.

Options

The Round Numbers brick contains some changeable options:

Columns to Round
A list of columns (names) to which the rounding operation should be applied. If left empty, columns are selected either by the Regex Pattern or all numeric columns are targeted.
Regex Pattern
A Python regular expression used to dynamically select the columns to be rounded. If provided, this overrides the 'Columns to Round' list.
Rounding Mode
Selects the mathematical function for rounding: standard round, round down (floor), or round up (ceil). Defaults to round.
Significant Digits
If greater than 0, numbers are rounded based on the specified number of significant digits. Defaults to 0 (disabled).
Decimal Places
If 'Significant Digits' is 0, numbers are rounded to this specified number of decimal places. Supports negative values for rounding to powers of 10 (e.g., -1 rounds to the nearest 10). Defaults to 0.
Output Format
Specifies the desired format for the returned data structure: pandas DataFrame, polars DataFrame, or arrow Table. Defaults to pandas.
Safe Mode
If enabled, the function suppresses errors caused by non-existent columns (if specified in input) or non-numeric columns targeted for rounding, skipping them instead of raising an error.
Verbose
If enabled, detailed logs about the operation, column handling, and conversion steps are printed.
import logging
import duckdb
import pandas as pd
import polars as pl
import pyarrow as pa
import re
from coded_flows.types import Union, List, DataFrame, ArrowTable, Str, Int

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def _coalesce(*values):
    return next((v for v in values if v is not None), None)


def _sanitize_identifier(identifier):
    """
    Sanitize SQL identifier by escaping special characters.
    Handles double quotes and other problematic characters.
    """
    return identifier.replace('"', '""')


def _is_numeric_type(duckdb_type):
    """
    Check if a DuckDB type is numeric.
    """
    type_lower = duckdb_type.lower()
    return any(
        (
            t in type_lower
            for t in [
                "tinyint",
                "smallint",
                "integer",
                "bigint",
                "int",
                "float",
                "double",
                "real",
                "decimal",
                "numeric",
            ]
        )
    )


def _build_rounding_expression(col, mode, sig_digits, dec_places):
    """
    Build a SQL expression for rounding based on the specified mode.
    """
    if sig_digits > 0:
        if mode == "round":
            expr = f'\n                CASE \n                    WHEN "{col}" = 0 THEN 0\n                    WHEN "{col}" IS NULL THEN NULL\n                    ELSE ROUND("{col}" / POWER(10, FLOOR(LOG10(ABS("{col}"))) + 1 - {sig_digits})) * POWER(10, FLOOR(LOG10(ABS("{col}"))) + 1 - {sig_digits})\n                END\n            '
        elif mode == "floor":
            expr = f'\n                CASE \n                    WHEN "{col}" = 0 THEN 0\n                    WHEN "{col}" IS NULL THEN NULL\n                    ELSE FLOOR("{col}" / POWER(10, FLOOR(LOG10(ABS("{col}"))) + 1 - {sig_digits})) * POWER(10, FLOOR(LOG10(ABS("{col}"))) + 1 - {sig_digits})\n                END\n            '
        else:
            expr = f'\n                CASE \n                    WHEN "{col}" = 0 THEN 0\n                    WHEN "{col}" IS NULL THEN NULL\n                    ELSE CEIL("{col}" / POWER(10, FLOOR(LOG10(ABS("{col}"))) + 1 - {sig_digits})) * POWER(10, FLOOR(LOG10(ABS("{col}"))) + 1 - {sig_digits})\n                END\n            '
    elif mode == "round":
        expr = f'ROUND("{col}", {dec_places})'
    elif mode == "floor":
        if dec_places >= 0:
            expr = f'FLOOR("{col}" * POWER(10, {dec_places})) / POWER(10, {dec_places})'
        else:
            expr = (
                f'FLOOR("{col}" / POWER(10, {-dec_places})) * POWER(10, {-dec_places})'
            )
    elif dec_places >= 0:
        expr = f'CEIL("{col}" * POWER(10, {dec_places})) / POWER(10, {dec_places})'
    else:
        expr = f'CEIL("{col}" / POWER(10, {-dec_places})) * POWER(10, {-dec_places})'
    return expr


def round_numbers(
    data: Union[DataFrame, ArrowTable],
    columns: List = None,
    regex_pattern: Str = None,
    rounding_mode: Str = None,
    significant_digits: Int = None,
    decimal_places: Int = None,
    options=None,
) -> Union[DataFrame, ArrowTable]:
    brick_display_name = "Round Numbers"
    options = options or {}
    verbose = options.get("verbose", True)
    columns = _coalesce(columns, options.get("columns", []))
    regex_pattern = _coalesce(regex_pattern, options.get("regex_pattern", ""))
    fill_all_columns = len(columns) == 0
    rounding_mode = _coalesce(rounding_mode, options.get("rounding_mode", "round"))
    significant_digits = _coalesce(
        significant_digits, options.get("significant_digits", 0)
    )
    decimal_places = _coalesce(decimal_places, options.get("decimal_places", 0))
    output_format = options.get("output_format", "pandas")
    safe_mode = options.get("safe_mode", False)
    result = None
    conn = None
    if not isinstance(columns, list) or not all((isinstance(c, str) for c in columns)):
        verbose and logger.error(
            f"[{brick_display_name}] Invalid columns format! Expected a list."
        )
        raise ValueError("Columns must be provided as a list!")
    if rounding_mode not in ["round", "floor", "ceil"]:
        verbose and logger.error(
            f"[{brick_display_name}] Invalid rounding mode: '{rounding_mode}'. Must be 'round', 'floor', or 'ceil'."
        )
        raise ValueError(
            f"Invalid rounding mode: '{rounding_mode}'. Must be 'round', 'floor', or 'ceil'."
        )
    try:
        round_mode = None
        if regex_pattern:
            round_mode = "regex_pattern"
        elif fill_all_columns:
            round_mode = "all_columns"
        else:
            round_mode = "column_list"
        verbose and logger.info(
            f"[{brick_display_name}] Detected mode: '{round_mode}'. Starting rounding operation with mode: '{rounding_mode}', significant_digits: {significant_digits}, decimal_places: {decimal_places}."
        )
        data_type = None
        if isinstance(data, pd.DataFrame):
            data_type = "pandas"
        elif isinstance(data, pl.DataFrame):
            data_type = "polars"
        elif isinstance(data, (pa.Table, pa.lib.Table)):
            data_type = "arrow"
        if data_type is None:
            verbose and logger.error(
                f"[{brick_display_name}] Input data must be a pandas DataFrame, Polars DataFrame, or Arrow Table"
            )
            raise ValueError(
                "Input data must be a pandas DataFrame, Polars DataFrame, or Arrow Table"
            )
        verbose and logger.info(
            f"[{brick_display_name}] Detected input format: {data_type}."
        )
        conn = duckdb.connect(":memory:")
        conn.register("input_table", data)
        column_info = conn.execute("DESCRIBE input_table").fetchall()
        all_columns = {col[0]: col[1] for col in column_info}
        verbose and logger.info(
            f"[{brick_display_name}] Total columns in data: {len(all_columns)}."
        )
        columns_to_round = []
        if round_mode == "column_list":
            if not safe_mode:
                missing_columns = [col for col in columns if col not in all_columns]
                if missing_columns:
                    verbose and logger.error(
                        f"[{brick_display_name}] Columns not found in data: {missing_columns}"
                    )
                    raise ValueError(f"Columns not found in data: {missing_columns}")
            columns_to_round = [col for col in columns if col in all_columns]
            skipped = len(columns) - len(columns_to_round)
            if safe_mode and skipped > 0:
                skipped_cols = [col for col in columns if col not in all_columns]
                verbose and logger.warning(
                    f"[{brick_display_name}] Safe mode: Skipped {skipped} non-existent columns: {skipped_cols}"
                )
            verbose and logger.info(
                f"[{brick_display_name}] Rounding {len(columns_to_round)} column(s): {columns_to_round}."
            )
        elif round_mode == "regex_pattern":
            try:
                pattern = re.compile(regex_pattern)
                columns_to_round = [
                    col for col in all_columns.keys() if pattern.search(col)
                ]
                if not columns_to_round:
                    verbose and logger.warning(
                        f"[{brick_display_name}] No columns matched regex pattern '{regex_pattern}'. Returning data unchanged."
                    )
                verbose and logger.info(
                    f"[{brick_display_name}] Regex pattern '{regex_pattern}' matched {len(columns_to_round)} columns: {columns_to_round}."
                )
            except re.error as e:
                verbose and logger.error(
                    f"[{brick_display_name}] Invalid regex pattern."
                )
                raise ValueError(f"Invalid regex pattern!")
        elif round_mode == "all_columns":
            columns_to_round = [
                col for col in all_columns.keys() if _is_numeric_type(all_columns[col])
            ]
            verbose and logger.info(
                f"[{brick_display_name}] Rounding all numeric columns: {len(columns_to_round)} columns."
            )
        select_parts = []
        rounded_count = 0
        for col in all_columns.keys():
            sanitized_col = _sanitize_identifier(col)
            if col in columns_to_round:
                col_type = all_columns[col]
                if not _is_numeric_type(col_type):
                    if safe_mode:
                        verbose and logger.warning(
                            f"[{brick_display_name}] Safe mode: Skipping non-numeric column '{col}' (type: {col_type})."
                        )
                        select_parts.append(f'"{sanitized_col}"')
                        continue
                    else:
                        verbose and logger.error(
                            f"[{brick_display_name}] Column '{col}' is not numeric (type: {col_type})."
                        )
                        raise ValueError(
                            f"Column '{col}' is not numeric (type: {col_type}). Cannot apply rounding."
                        )
                try:
                    rounding_expr = _build_rounding_expression(
                        sanitized_col, rounding_mode, significant_digits, decimal_places
                    )
                    select_parts.append(f'{rounding_expr} AS "{sanitized_col}"')
                    if significant_digits > 0:
                        verbose and logger.info(
                            f"[{brick_display_name}] Column '{col}' (type: {col_type}): applying {rounding_mode} with {significant_digits} significant digits."
                        )
                    else:
                        verbose and logger.info(
                            f"[{brick_display_name}] Column '{col}' (type: {col_type}): applying {rounding_mode} with {decimal_places} decimal places."
                        )
                    rounded_count += 1
                except Exception as e:
                    if safe_mode:
                        verbose and logger.warning(
                            f"[{brick_display_name}] Safe mode: Skipping column '{col}' due to error: {str(e)}"
                        )
                        select_parts.append(f'"{sanitized_col}"')
                    else:
                        verbose and logger.error(
                            f"[{brick_display_name}] Error processing column '{col}'."
                        )
                        raise
            else:
                select_parts.append(f'"{sanitized_col}"')
        if rounded_count == 0:
            verbose and logger.warning(
                f"[{brick_display_name}] No columns were rounded. Returning data unchanged."
            )
            result = data
        else:
            select_clause = ", ".join(select_parts)
            query = f"SELECT {select_clause} FROM input_table"
            verbose and logger.info(
                f"[{brick_display_name}] Executing query to round numbers."
            )
            if output_format == "pandas":
                result = conn.execute(query).df()
                verbose and logger.info(
                    f"[{brick_display_name}] Converted result to pandas DataFrame."
                )
            elif output_format == "polars":
                result = conn.execute(query).pl()
                verbose and logger.info(
                    f"[{brick_display_name}] Converted result to Polars DataFrame."
                )
            elif output_format == "arrow":
                result = conn.execute(query).fetch_arrow_table()
                verbose and logger.info(
                    f"[{brick_display_name}] Converted result to Arrow Table."
                )
            else:
                verbose and logger.error(
                    f"[{brick_display_name}] Unsupported output format: {output_format}"
                )
                raise ValueError(f"Unsupported output format: {output_format}")
            verbose and logger.info(
                f"[{brick_display_name}] Rounding operation completed successfully. Rounded {rounded_count} column(s)."
            )
    except Exception as e:
        verbose and logger.error(
            f"[{brick_display_name}] Error during rounding operation: {str(e)}"
        )
        raise
    finally:
        if conn is not None:
            conn.close()
    return result

Brick Info

version v0.1.3
python 3.10, 3.11, 3.12, 3.13
requirements
  • pandas
  • polars[pyarrow]
  • duckdb
  • pyarrow