Loading learning content...
In the world of database programming, user-defined functions (UDFs) represent one of the most powerful abstraction mechanisms available to database developers. While SQL provides an extensive library of built-in functions—from mathematical operations like ROUND() and ABS() to string manipulations like SUBSTRING() and UPPER()—the real power emerges when you can encapsulate your own business logic into reusable, callable units that integrate seamlessly with the SQL language itself.
Consider this scenario: Your e-commerce database needs to calculate product discounts based on complex rules involving customer loyalty tier, purchase history, seasonal promotions, and inventory levels. Without UDFs, you would repeat this intricate calculation logic in every query that needs it—creating a maintenance nightmare where a single business rule change requires updates across dozens of queries and views. With UDFs, you write the logic once, test it thoroughly, and invoke it anywhere with a simple function call like CalculateDiscount(customer_id, product_id, quantity).
By the end of this page, you will understand what user-defined functions are and why they exist, master the fundamental syntax for creating functions across major database systems, learn how to invoke functions in various SQL contexts, and appreciate the architectural role functions play in database application design. This knowledge forms the foundation for all subsequent pages in this module.
A user-defined function (UDF) is a named, stored routine that accepts zero or more input parameters, performs a computation or data retrieval operation, and returns a value or a result set. Unlike stored procedures, which are primarily imperative programs that can modify database state and produce multiple result sets, functions are designed to be deterministic computational units that can be embedded directly within SQL expressions.
The conceptual model of a function follows the mathematical definition: a mapping from input values to output values. When you invoke a function with the same inputs, you should—in most cases—receive the same output. This predictability is what allows the database optimizer to reason about functions and potentially cache their results or reorder operations for efficiency.
At their core, UDFs extend the expression language of SQL. Just as you can write total * 0.1 to calculate a 10% tax, you can write CalculateTax(total, region) to apply region-specific tax rules. The function call is syntactically an expression and can appear wherever that expression type is valid.
Before diving into creation syntax, it's essential to understand the landscape of user-defined functions. Database systems typically support several categories of functions, each with distinct characteristics, use cases, and performance implications. Understanding these categories guides you in choosing the right function type for each requirement.
The primary classification axis is the return type: what kind of value does the function produce when invoked?
| Function Type | Returns | Usage Context | Key Characteristic |
|---|---|---|---|
| Scalar Function | Single value (INT, VARCHAR, DATE, etc.) | SELECT, WHERE, JOIN, CASE expressions | Row-by-row evaluation in queries |
| Table-Valued Function (TVF) | Result set (virtual table) | FROM clause, JOIN operations | Returns rows that can be queried |
| Inline TVF | Single SELECT statement result | FROM clause (inline expansion) | Optimizer can inline into calling query |
| Multi-Statement TVF | Populated table variable | FROM clause | More flexible logic, less optimizable |
| Aggregate Function | Single value from multiple rows | SELECT with GROUP BY | Custom aggregation logic (advanced) |
The distinction matters for optimization:
The database query optimizer treats different function types very differently. Inline table-valued functions can often be "flattened" into the calling query, allowing the optimizer to consider the entire query plan holistically. Scalar functions, by contrast, are typically executed row-by-row, which can create performance bottlenecks in large result sets. Multi-statement table-valued functions are treated as "black boxes" that the optimizer cannot peer into.
This module dedicates separate pages to scalar functions and table-valued functions because their creation, usage patterns, and optimization considerations differ substantially.
When designing a function, start by asking: 'What am I returning?' If it's a single computed value (price, discount, formatted string), use a scalar function. If it's a set of rows (filtered subset, transformed data), use a table-valued function. This decision fundamentally shapes how the function integrates with SQL queries.
Creating a user-defined function follows a consistent structural pattern across database systems, though syntax details vary. Understanding the anatomy of a function definition helps you write clear, maintainable, and portable code.
A function definition consists of several key components that specify its identity, interface, behavior, and metadata:
123456789101112131415161718192021
-- Generic function structure (SQL Server syntax)CREATE FUNCTION schema_name.function_name( -- Parameter declarations @parameter1 datatype, @parameter2 datatype = default_value, -- Optional default ... )RETURNS return_datatype -- What the function returns[WITH function_options] -- Metadata and behavior modifiersASBEGIN -- Function body: local variables, logic, control flow DECLARE @result return_datatype; -- Computation logic here SET @result = ...; -- Every function MUST return a value RETURN @result;END;CREATE OR ALTER (SQL Server 2016+) or CREATE OR REPLACE (PostgreSQL, Oracle) for idempotent definitions.Unlike some programming languages that allow implicit null returns, SQL functions typically require explicit RETURN statements on every execution path. A function that might not return a value will either fail at creation time or produce runtime errors.
Let's create a practical scalar function that demonstrates the key concepts. We'll build a function that calculates the age of a person based on their birth date—a common requirement in many applications that involves date arithmetic and edge-case handling.
This seemingly simple task actually requires careful consideration: we must handle future birth dates, account for whether the birthday has occurred this year, and deal with potential NULL inputs.
123456789101112131415161718192021222324252627282930
-- Calculate age in years from birth dateCREATE OR ALTER FUNCTION dbo.CalculateAge( @BirthDate DATE)RETURNS INTWITH SCHEMABINDING -- Prevents changes to referenced objectsASBEGIN -- Handle NULL input gracefully IF @BirthDate IS NULL RETURN NULL; -- Handle future birth dates (invalid data) IF @BirthDate > GETDATE() RETURN NULL; DECLARE @Age INT; DECLARE @Today DATE = CAST(GETDATE() AS DATE); -- Calculate base age from year difference SET @Age = DATEDIFF(YEAR, @BirthDate, @Today); -- Adjust if birthday hasn't occurred this year -- Compare month and day only IF DATEADD(YEAR, @Age, @BirthDate) > @Today SET @Age = @Age - 1; RETURN @Age;END;Notice the common patterns across all implementations:
Once created, functions become first-class citizens of the SQL expression language. You can invoke them anywhere an expression of the appropriate type is valid. Understanding the various invocation contexts helps you leverage functions effectively throughout your database applications.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
-- Function invocation demonstrations (using CalculateAge example) -- 1. In SELECT list: computed column for each rowSELECT employee_id, first_name, last_name, birth_date, dbo.CalculateAge(birth_date) AS ageFROM employees; -- 2. In WHERE clause: filtering based on computed valueSELECT employee_id, first_name, last_nameFROM employeesWHERE dbo.CalculateAge(birth_date) >= 18 AND dbo.CalculateAge(birth_date) < 65; -- 3. In ORDER BY: sorting by computed valuesSELECT employee_id, first_name, hire_dateFROM employeesORDER BY dbo.CalculateAge(birth_date) DESC; -- 4. In CASE expressions: conditional logicSELECT employee_id, first_name, CASE WHEN dbo.CalculateAge(birth_date) < 30 THEN 'Young' WHEN dbo.CalculateAge(birth_date) < 50 THEN 'Mid-Career' ELSE 'Senior' END AS age_categoryFROM employees; -- 5. In JOIN conditionsSELECT e.employee_id, e.first_name, b.benefit_planFROM employees eJOIN age_based_benefits b ON dbo.CalculateAge(e.birth_date) BETWEEN b.min_age AND b.max_age; -- 6. In computed columns (persistent or virtual)ALTER TABLE employeesADD age AS dbo.CalculateAge(birth_date); -- Virtual computed column -- 7. In CHECK constraints (with SCHEMABINDING)ALTER TABLE employeesADD CONSTRAINT chk_adult CHECK (dbo.CalculateAge(birth_date) >= 18); -- 8. In viewsCREATE VIEW v_employee_ages ASSELECT employee_id, first_name || ' ' || last_name AS full_name, dbo.CalculateAge(birth_date) AS current_age, birth_dateFROM employees; -- 9. In INSERT statementsINSERT INTO audit_log (employee_id, age_at_action, action_date)SELECT employee_id, dbo.CalculateAge(birth_date), GETDATE()FROM employees WHERE status = 'ACTIVE'; -- 10. In UPDATE statementsUPDATE employeesSET age_category = CASE WHEN dbo.CalculateAge(birth_date) < 30 THEN 'JUNIOR' ELSE 'SENIOR'END;When using scalar functions in WHERE clauses on large tables, be aware that the function is typically evaluated for every row that reaches that point in query execution. This can be expensive. For frequently filtered calculations, consider persisted computed columns or indexed views that pre-compute the function result.
Parameters form the interface contract between a function and its callers. Understanding parameter mechanics—including default values, type coercion, and parameter ordering—is essential for designing robust, user-friendly functions.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
-- Comprehensive parameter demonstration -- Function with multiple parameters and defaultsCREATE OR ALTER FUNCTION dbo.FormatCurrency( @Amount DECIMAL(18, 4), @CurrencyCode CHAR(3) = 'USD', -- Default value @IncludeSymbol BIT = 1, -- Default to true @DecimalPlaces INT = 2 -- Default precision)RETURNS NVARCHAR(50)ASBEGIN DECLARE @Result NVARCHAR(50); DECLARE @Symbol NVARCHAR(5); -- Determine currency symbol SET @Symbol = CASE @CurrencyCode WHEN 'USD' THEN '$' WHEN 'EUR' THEN '€' WHEN 'GBP' THEN '£' WHEN 'JPY' THEN '¥' WHEN 'INR' THEN '₹' ELSE @CurrencyCode + ' ' END; -- Format the amount with specified decimal places SET @Result = FORMAT(@Amount, 'N' + CAST(@DecimalPlaces AS VARCHAR)); -- Prepend symbol if requested IF @IncludeSymbol = 1 SET @Result = @Symbol + @Result; RETURN @Result;END; -- Invocation examples: -- Using all defaults (USD, with symbol, 2 decimals)SELECT dbo.FormatCurrency(1234.5678);-- Result: $1,234.57 -- Specifying currency onlySELECT dbo.FormatCurrency(1234.5678, 'EUR');-- Result: €1,234.57 -- Named parameters for clarity (SQL Server)SELECT dbo.FormatCurrency( @Amount = 1234.5678, @CurrencyCode = 'GBP', @IncludeSymbol = 0, @DecimalPlaces = 4);-- Result: 1,234.5678 -- Skipping middle parameters with named syntaxSELECT dbo.FormatCurrency( @Amount = 1234.5, @DecimalPlaces = 0 -- Skip CurrencyCode and IncludeSymbol);-- Result: $1,235The RETURN statement is the culmination of function execution—it terminates the function and provides the computed value back to the caller. While conceptually simple, the RETURN statement has nuances that affect function design and behavior.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
-- Pattern 1: Single return at end (recommended for clarity)CREATE FUNCTION dbo.CalculateDiscount(@Price DECIMAL(10,2), @Quantity INT)RETURNS DECIMAL(10,2)ASBEGIN DECLARE @Discount DECIMAL(10,2) = 0; IF @Quantity >= 100 SET @Discount = @Price * 0.20; ELSE IF @Quantity >= 50 SET @Discount = @Price * 0.10; ELSE IF @Quantity >= 10 SET @Discount = @Price * 0.05; RETURN @Discount; -- Single exit pointEND; -- Pattern 2: Early returns for guard clauses (acceptable)CREATE FUNCTION dbo.SafeDivide(@Numerator DECIMAL(18,6), @Denominator DECIMAL(18,6))RETURNS DECIMAL(18,6)ASBEGIN -- Guard clause: return NULL for division by zero IF @Denominator = 0 RETURN NULL; -- Guard clause: return NULL for NULL inputs IF @Numerator IS NULL OR @Denominator IS NULL RETURN NULL; -- Main logic RETURN @Numerator / @Denominator;END; -- Pattern 3: Return in conditional branches (use carefully)CREATE FUNCTION dbo.GetTaxRate(@StateCode CHAR(2))RETURNS DECIMAL(5,4)ASBEGIN -- Each branch must return IF @StateCode = 'CA' RETURN 0.0725; IF @StateCode = 'TX' RETURN 0.0625; IF @StateCode = 'NY' RETURN 0.0800; IF @StateCode = 'FL' RETURN 0.0600; -- Default case (required!) RETURN 0.0000;END; -- Anti-pattern: Missing return path (ERROR or unpredictable)-- CREATE FUNCTION dbo.BrokenFunction(@Input INT)-- RETURNS INT-- AS-- BEGIN-- IF @Input > 0-- RETURN @Input * 2;-- -- What if @Input <= 0? No return! This is an ERROR.-- END;In scalar functions, RETURN provides the value; you cannot use SELECT to output results. SELECT statements in scalar functions are only for assigning to variables. Table-valued functions have different semantics where SELECT populates the returned table.
Functions support various metadata options that affect their behavior, security, and optimization. These options tell the database engine important facts about how the function behaves, enabling better query plans and enforcing safety constraints.
| Option | SQL Server | PostgreSQL | Oracle | Effect |
|---|---|---|---|---|
| Deterministic | WITH SCHEMABINDING* | IMMUTABLE | DETERMINISTIC | Same inputs always produce same output; enables caching |
| Stability (reads data) | STABLE | Returns same result within a transaction | ||
| Volatile | VOLATILE | Can return different results on each call | ||
| Schema binding | WITH SCHEMABINDING | Prevents modification of referenced objects | ||
| Parallel safe | PARALLEL SAFE | PARALLEL_ENABLE | Can be executed in parallel operations | |
| Security | WITH ENCRYPTION | Hides function source code | ||
| NULL handling | RETURNS NULL ON NULL INPUT | RETURNS NULL ON NULL INPUT | Auto-return NULL if any input is NULL |
12345678910111213141516171819202122232425262728293031323334353637
-- SQL Server: Schema binding for computed columns and indexed viewsCREATE FUNCTION dbo.ComputeHash(@Input NVARCHAR(MAX))RETURNS VARBINARY(64)WITH SCHEMABINDING, -- Required for indexed views RETURNS NULL ON NULL INPUT -- Automatic NULL propagationASBEGIN RETURN HASHBYTES('SHA2_256', @Input);END; -- PostgreSQL: Full volatility and parallel specificationCREATE OR REPLACE FUNCTION calculate_compound_interest( principal NUMERIC, rate NUMERIC, periods INTEGER)RETURNS NUMERICLANGUAGE SQLIMMUTABLE -- Always same result for same inputsPARALLEL SAFE -- Can run in parallel queriesRETURNS NULL ON NULL INPUT -- NULL in, NULL outAS $$ SELECT principal * POWER(1 + rate, periods);$$; -- Oracle: Optimizer hints through DETERMINISTICCREATE OR REPLACE FUNCTION calculate_tax( p_amount IN NUMBER, p_rate IN NUMBER) RETURN NUMBERDETERMINISTIC -- Enables result cachingPARALLEL_ENABLE -- Safe for parallel DMLASBEGIN RETURN p_amount * p_rate;END;/Only mark functions as deterministic/immutable if they truly are. A function that reads from tables, uses GETDATE(), generates random values, or depends on session settings is NOT deterministic. Falsely declaring determinism can cause incorrect query results due to optimizer caching.
Error handling in functions differs from stored procedures because functions cannot alter the flow of the calling query—they must return a value. This constraint shapes how errors are managed: through defensive validation, NULL returns for invalid inputs, or allowing errors to propagate up to the caller.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
-- Strategy 1: Defensive NULL returns (safest for queries)CREATE FUNCTION dbo.ParseInteger(@Input NVARCHAR(100))RETURNS INTASBEGIN -- Validate input format before conversion IF @Input IS NULL OR @Input = '' RETURN NULL; -- Check for valid numeric characters only IF @Input LIKE '%[^0-9-]%' RETURN NULL; -- Check for misplaced minus sign IF CHARINDEX('-', @Input) > 1 RETURN NULL; -- Safe conversion (TRY_CAST would be simpler in SQL Server 2012+) BEGIN TRY RETURN CAST(@Input AS INT); END TRY BEGIN CATCH RETURN NULL; -- Return NULL on conversion failure END CATCHEND; -- Strategy 2: Using TRY_CAST/TRY_CONVERT (SQL Server 2012+)CREATE FUNCTION dbo.SafeParseDecimal(@Input NVARCHAR(100))RETURNS DECIMAL(18, 4)ASBEGIN RETURN TRY_CAST(@Input AS DECIMAL(18, 4)); -- TRY_CAST returns NULL on failure instead of errorEND; -- PostgreSQL: Using exception handlingCREATE OR REPLACE FUNCTION safe_parse_json(input TEXT)RETURNS JSONBLANGUAGE plpgsqlAS $$BEGIN RETURN input::JSONB;EXCEPTION WHEN OTHERS THEN -- Return NULL or empty JSON on parse failure RETURN NULL;END;$$; -- Strategy 3: Error propagation (let caller handle)CREATE FUNCTION dbo.StrictDivide(@Num DECIMAL(18,6), @Denom DECIMAL(18,6))RETURNS DECIMAL(18,6)ASBEGIN -- No validation: division by zero will raise error -- Caller must handle or prevent invalid inputs RETURN @Num / @Denom;END;This page has established the foundational concepts of user-defined functions in SQL. You now understand what UDFs are, why they exist, and how to create and invoke them. This knowledge prepares you for the deeper dives into specific function types in subsequent pages.
You now have a solid foundation in user-defined functions. The next page dives deep into scalar functions—the most common function type—covering advanced patterns, performance considerations, and real-world use cases that will make you an effective function developer.