The Cardinal Rules of Rust - Understanding Type Cardinality for Flawless Data Models

Algebraic data types (ADTs) are a cornerstone of type theory in programming, providing a powerful framework for constructing and manipulating complex data structures. This post delves into the algebraic nature of Rust's type system, emphasizing the critical role of understanding a type's cardinality — that is, the total number of values a type can represent. This understanding is crucial for selecting the most appropriate data model, one that ensures impossible states are inherently unrepresentable. We explore the concepts of sum and product types, and highlight the intriguing way in which functions can be seen as exponentials within this algebraic framework.

Motivation: Eliminating Illegal States in Data Modeling

In the world of software development, grouping related data into compound structures is a common task across many programming languages. Rust, known for its strong type system, offers structs as one of the primary to group data. However, without a careful understanding of Rust’s Algebraic Data Types (ADTs), developers might inadvertently introduce illegal states into their data models, creating room for bugs and logical inconsistencies.

Consider the scenario of modeling a shipment process for orders with three distinct states: Pending, Shipped, or Canceled. Each state has associated data that is relevant only when the order is in that particular state. For example, the Pending state might include a flag indicating whether the item has been paid for, the Shipped state would have a tracking ID, and the Canceled state would include a cancellation reason.

A naive approach to modeling this in Rust might look something like this:

struct Order {
    status: Status, // The current status: Pending, Shipped, or Canceled
    paid: Option<bool>, // Some(true) or Some(false) if Pending, otherwise None
    tracking_id: Option<u32>, // Some(tracking_id) if Shipped, otherwise None
    cancellation_reason: Option<String>, // Some(reason) if Canceled, otherwise None
}

Here, Option is used to indicate that the fields paid, tracking_id, and cancellation_reason may or may not be present, depending on the order’s current state. This approach might seem reasonable, especially for those coming from programming languages that don’t support Algebraic Data Types (ADTs).

However, this model has a significant flaw: It introduces the possibility of illegal states that don’t make sense in the context of our domain. Consider the following instance of an Order:

let order = Order {
    status: Status::Pending,
    paid: None,
    tracking_id: None,
    cancellation_reason: Some("I didn't even cancel yet"),
}

It represents an order in the Pending state that has a cancellation reason — a clear violation of our business logic, as only orders in the Canceled state should have a cancellation reason. Yet, this code compiles and is valid from the point of view of the type checker. This discrepancy highlights the problem of illegal states, which are states that are representable in code but invalid according to the domain’s rules.

Are there more such illegal states? Absolutely! Given that the status field has three possibilities, and the paid, tracking_id, and cancellation_reason fields can each be present or absent (not considering the range of values for bool, u32, and String), we find ourselves with a total of 3×2×2×2=24 possible states. Out of these, only 3 represent valid states according to our business logic, leaving us with 21 potential illegal states!

This underscores the importance of understanding for how many states our data model allows. The number of possible states a type can have is called the cardinality of the type and it plays a crucial role in ensuring that our data model is tightly aligned with the domain it represents. By misaligning the cardinality of our data structure with the actual states needed to model our domain accurately, we increase the complexity and decrease the robustness of our software.

To tackle these challenges head-on, we must delve deeper into Rust’s type system, specifically its support for Algebraic Data Types, to construct a more appropriate data model for our shipment process scenario. One that eliminates all illegal states and uses the compiler to enforce our business logic at compile time, making our software safer, more robust, and easier to maintain.

Understanding Cardinality with Basic Data Types

Cardinality, in the context of type theory, refers to the number of possible values that a given type can hold. Let’s examine cardinality at the example of some basic data types:

  • bool -> 2, can hold two values true or false
  • u8 -> 256, can represent values from 0 to 255
  • i16 -> 65536, spans from -32768 to 32767
  • Option<bool> -> 3, possible values are Some(true), Some(false), and None
  • String -> infinite, can hold any sequence of UTF-8 characters of practically any length. Therefore, its cardinality is considered infinite for all practical purposes.
  • () -> 1: called the unit type, it represents a value that holds no information. There’s only one possible value () for this type, so its cardinality is 1.
  • ! -> 0: called the never type, it represents computations that never complete. Since a value of this type can never exist (because the computation does not terminate), its cardinality is 0.

Understanding the cardinality of types is more than an academic exercise; it has practical implications for the way we design and implement our software. Writing down the cardinality of basic data types lays the groundwork for understanding how more complex types, such as structs and enums, derive their cardinality. By considering the cardinality of the types we use, we can make informed decisions about our data models that prevent invalid states and reduce the possibility of errors.

For example, when choosing between representing an optional Boolean value as a bool or an Option<bool>, understanding that the latter increases the cardinality from 2 to 3 and allows for the representation of an additional state (the absence of a value) can be crucial in accurately modeling our domain.

Similarly, the infinite cardinality of the string type means it can represent a vast range of values, making it a flexible choice for textual data. However, this flexibility comes with the responsibility of ensuring that the strings we use conform to our application’s rules, such as length constraints or format requirements. For example, we could choose the strings "Pending", "Shipped", and "Canceled" to model the status of our shipment process, but this would be a poor choice, as there are an infinite number of other strings that do not represent any meaningful status.

Product Types: The Multiplication of Types

We’ve already seen how structs can be used to group related data together, but there’s a deeper, algebraic significance to their use, especially when considering the cardinality of types. As discussed above, the cardinality of a struct is determined by the product of the cardinalities of its fields. Therefore a struct is considered a product type.

To illustrate the application of structs where the multiplication of cardinalities is desired, consider a scenario involving three Boolean fields: is_active, has_access, and is_admin. Each Boolean field possesses a cardinality of 2, encapsulating the binary state of true or false. When these fields are combined within a struct to model user permissions, the resulting structure represents a product type:

struct UserPermissions {
    is_active: bool,
    has_access: bool,
    is_admin: bool,
}

The cardinality of UserPermissions is the product of the cardinalities of its fields, calculated as 2×2×2=8. This means there are eight possible combinations of permissions states that a user can have, ranging from having no permissions at all to having all possible permissions. This model is effective for representing discrete, combinable states where the presence of one state does not exclude the presence of others.

A product type, like a struct, where the components are accessed through named fields is more generally known as a record type. In addition to records, Rust offers another product type, the tuple type. Tuples are similar to structs in that they group multiple values together. However, unlike structs, which name each field, tuple components are accessed by their index, offering a more succinct syntax. Despite this syntactic difference, tuples share the fundamental property of structs: the cardinality of a tuple is the product of the cardinalities of its components.

Ironically, in Rust, tuples are structural types but structs are not. This means that a struct’s type checking is based on its name rather than its structure. More about what this means in Nominal Types vs Structural Types.

Sum Types: The Addition of Types

To address the limitations of our initial Order type, we refactor our code to use an enum instead of a struct, encapsulating each state’s unique data within its variants. This approach ensures that each variant aligns with one of the legally defined states:

enum Order {
    Pending { paid: bool },
    Shipped { tracking_id: u32 },
    Cancelled { cancellation_reason: String },
}

In this revised model, the Order enum has three variants, each with a specific type of associated data. The cardinality of an enum, more generally known as a tagged union, is the sum of the cardinalities of its variants. Given that each variant is now explicitly defined with no optional fields, the cardinality is straightforwardly calculated as 3 (1 for each variant - focusing on the presence rather than the range of values), perfectly aligning with the three legitimate states of an order in our domain logic.

Because the resulting cardinality of an enum is the sum of its variants, enums are considered to be sum types. Each enum variant can hold different types and amounts of data, but an enum value can only be one of its variants at any given time. This contrasts with product types, such as structs, where values comprise a fixed set of fields, and the presence of each field is guaranteed.

The distinction between sum and product types is analogous to addition and multiplication in algebra. A Sum type’s cardinality is the sum of its variants’ cardinalities, reflecting the exclusive nature of its state representation. In contrast, the cardinality of product types is the product of the cardinalities of its fields, indicative of the combinatory nature of its structure.

Like structs, enums in Rust are nominal types. This means that even if two enums have identical variants, they are treated as distinct types by the Rust compiler. For more see Nominal Types vs Structural Types.

Sum types, through enums in Rust, allow us to express exclusive states or variants, whereas product types, like structs, enable us to bundle related data together. Combining these types, we can construct data models that mirror the nuanced relationships and constraints in the domain.

To illustrate, let’s expand our Order example to include additional fields such as id and customer_id, which are relevant to all orders regardless of their status. This necessitates a blend of sum and product types to encapsulate the aspects of an order while maintaining the correct cardinality that reflects the legal states of the domain.

struct Order {
    id: u32,
    customer_id: u32,
    status: OrderStatus,
}

enum OrderStatus {
    Pending { paid: bool },
    Shipped { tracking_id: u32 },
    Canceled { reason: String },
}

In this expanded model, Order is a struct, a product type, representing the combination of an order’s universal attributes (id and customer_id) with its state (status). The status field is an enum, a sum type, encapsulating the exclusive states an order can be in, each with its state-specific data.

The cardinality of the Order struct is a product of the cardinalities of its fields. Given that id and customer_id are u32, focusing on the presence rather than the range of values, and the status has three distinct states, the critical aspect of cardinality here pertains to the number of legal states represented by the OrderStatus enum. This model succinctly captures the domain’s requirements, ensuring that each order is in one and only one of the three legal states, with its associated data correctly represented.

Designing data models that accurately represent the domain requires a thoughtful analysis of the possible legal states within that domain. Before diving into the implementation, a clear understanding of these states and their cardinalities guides the decision between employing sum and product types. This approach ensures that the data model not only captures the essence of the domain but also leverages Rust’s type system to enforce constraints at compile time, thereby preventing illegal states.

Digression: Exploring the Unit and Never Types

In Rust’s type system, two particularly interesting types are the unit type () and the never type !. These types play unique roles within the language’s type algebra, serving as the identity elements for product and sum types, respectively.

The Unit Type ()

The unit type in Rust, denoted by (), represents the absence of a meaningful value and is analogous to the concept of void in other programming languages. However, unlike void, () is a type that has exactly one value: itself. This makes it useful in scenarios where a type is required syntactically, but no meaningful value is needed.

We can define our own unit type as follows:

struct Unit;

This custom Unit struct serves the same purpose as Rust’s built-in unit type (), holding no data and having a single value.

In the context of product types, the unit type acts as the identity element. This means that the product of () (or our custom Unit) and any other type T results in a type with the same cardinality as T. For instance, a struct combining Unit and bool effectively has the same cardinality as bool alone, because Unit contributes nothing to the total count of possible values.

// Unit × bool = 1 × 2 = 2
struct Product((), bool); // has cardinality of 2, same as bool
let prod1 = Product((), true);  // 1st member
let prod2 = Product((), false); // 2nd member

// Unit + bool = 1 + 2 = 3
enum Sum { // has cardinality of 3
    A(()),
    B(bool),
}
let sum1 = Sum::A(());    // 1st member
let sum2 = Sum::B(true);  // 2nd member
let sum3 = Sum::B(false); // 3rd member

The Never Type !

The never type, !, represents computations that never return because they either loop {} indefinitely or terminate the program. As such, it’s impossible to have a value of type !, making its cardinality 0.

Defining a custom never type can be illustrated as follows:

enum Never {}

This Never enum cannot be instantiated because it has no variants, perfectly mimicking the behavior of Rust’s built-in ! type.

In the context of sum types, ! acts as the identity element. The sum of ! (or our custom Never) with any type T has the same cardinality as T because adding an impossible state doesn’t introduce new possible values to the type.

// Bool + Never = 2 + 0 = 2
enum Sum { //  has cardinality of 2
    A(bool),
    B(!),
}
let sum1 = Sum::A(true);  // 1st member
let sum2 = Sum::A(false); // 2nd member
let _ = Sum::B(loop {}); // legal but Sum::B will never be constructed

Moreover, the never type is the absorbing element in the context of product types. Any product involving ! has a cardinality of 0, reflecting the fact that it’s impossible to create a value of such a type.

// Never × bool = 0 × 2 = 0
struct Product(!, bool); // has a cardinality of 0 because impossible to create a member
let prod = Product(loop {}, true); // legal but Product will never be constructed

Note, currently you need to enable the #[feature(never_type)] flag to work with ! as a full-fledged type.

Algebraic Properties in Rust’s Type System

The unit () and never ! types exemplify the algebraic nature of Rust’s type system, mirroring identity and absorbing elements from mathematical algebra. Their roles as identity elements for product and sum types, respectively, not only enrich the language’s type theory but also offer practical tools for expressing a wide range of computational concepts and ensuring type safety in Rust programs. This mathematical foundation underscores the elegance and power of Rust’s approach to type management, allowing for precise and expressive data modeling.

Functions as Exponential Types

What is the cardinality of function types? Diving futher into the algebraic properties of Rust’s type system, we encounter a fascinating concept where function types can be seen as exponentials. This perspective arises from how functions map inputs to outputs, presenting a relationship that can be numerically expressed in terms of cardinalities.

Consider the example of a TrafficLight enum with three variants: Red, Yellow, and Green.

enum TrafficLight {
    Red,
    Yellow,
    Green,
}

The cardinality of TrafficLight is three, corresponding to its three distinct variants.

Now, what would be the cardinality of the function type fn(TrafficLight) -> bool? Let’s delve into this by examining a particular instance of thefn(TrafficLight) -> bool function type: The can_i_drive function, which decides whether driving is allowed based on the color of a traffic light:

fn can_i_drive(color: TrafficLight) -> bool {
    match color {
        TrafficLight::Red => false,
        TrafficLight::Yellow => false,
        TrafficLight::Green => true,
    }
}

Now the question is: In how many other ways could we possibly interpret a traffic light? Specifically, how many distinct (pure) functions of type TrafficLight -> bool can we define? Recall, that two pure functions are considered the same if they yield identical outputs for every possible input.

For each variant of TrafficLight, the function can either return true or false. Since there are three variants in TrafficLight, and each can map to 2 possible outcomes (true or false), the number of different functions from TrafficLight to bool is 2³ = 8. These functions represent all possible combinations for mappings from the three traffic light states to true or false.

TFTFTFTFTFTFTFfn can_i_drive()TF
Shows all eight possible mappings from TrafficLight -> bool. At the 7th position we can see the can_i_drive we defined above.

Exploring the concept in reverse, from bool to TrafficLight, we’re looking at functions that could map a Boolean value to a traffic light color. Given that a Boolean has 2 possible values and there are 3 traffic light colors, the number of such functions would be 3² = 9, considering each Boolean value can map to any of the three colors.

If a type A has a possible values, and a type B has b possible values, then the total number of functions from A to B is bᵃ. This relationship showcases functions as exponential types because the total number of distinct functions reflects an exponential growth based on the cardinalities of the input and output types.

Understanding functions as exponentials enriches our comprehension of the algebraic structures present within programming languages like Rust. It illuminates the depth of type theory’s application in practical programming, providing a mathematical foundation that underlies the design and implementation of software.

Conclusion

In this exploration of Algebraic Data Types (ADTs) in Rust, we’ve seen how understanding and applying the concept of cardinality is crucial for effective data modeling. The journey from the basic data types to sum and product types, and even beyond to the unique cases of the unit and never types, underscores the richness of Rust’s type system. By revisiting the fundamental principles of ADTs — through examples such as the Order struct, the TrafficLight enum, and the intriguing nature of functions as exponential types — we gain insights into crafting precise and robust data models.

When designing data models, particularly in a system as type-rich as Rust, beginning with an analysis of the legal states in your domain is imperative. This initial analysis serves as a guide for selecting between sum and product types, aiming to model your domain accurately while ensuring that impossible states are unrepresentable. The key is to match the cardinality of your data model to the number of legal states dictated by your business logic.

By carefully choosing sum and product types, and appreciating the exponential nature of functions, programmers can design data structures that naturally encode valid states and operations. This not only makes code safer and more predictable but also leverages the type system to catch errors at compile time, making impossible states truly unrepresentable. Through these algebraic principles, Rust offers a compelling environment for crafting robust, efficient, and correct software.