The validation problem

At some point, every developer writing user-facing code has asked themselves the question

“How should I validate input?”

For example, a user wants to change their email address, and you only want to process their request if their new email is indeed, roughly*, a correct email address.

[HttpPost("/set-email")]
public ActionResult SetEmail([FromBody] string? alleged_email)
{
    // ???
}

And as far as the database code is concerned, an email address could come from any point in the preceding call stack. So the next question you find yourself asking is

“When do I validate input?”

If you Google “When to validate parameters” you get absurd answers like

Usually parameter checks are very cheap, even if called thousands of times. For example, test if a value is null, a string or Collection is empty (sic) a number is in a given range. (Link)

and

if you can live with the potential performance hit, I like to validate parameters everywhere, as it makes code maintenance and refactoring a bit easier (Link)

I contend that we should not revalidate the same parameters “thousands of times”. I hope you, dear discerning reader, agree.

*I’m pretty sure the only way to really tell if an email address is valid is to send an email to it and hopefully not get it bounced back as undeliverable. Regular expressions be damned.

Type-driven design

At this point, functional programmers chuckle, twirl their mustaches, and adjust their monocles before exclaiming “Elementary my boy, use type-driven design!”.

Image of a man in a top hat with a bushy mustache and wearing a monocle. — Your average functional programmer (generated by DALL-E)

As the functional programmer explains that type-driven design ensures program correctness by designing types where certain properties are always true, the object-oriented programmer scratches their head wondering what’s so revolutionary about that; enforcing invariants is pretty much the entire point of object-oriented programming (encapsulation and access control are the what, and invariants are the why).

What neither programmer realizes is that they’ve both been doing type-driven design wrong. But it’s not their fault. They’ve both been misled. We’ve all been misled. Misled by the allure of our languages’ constructors.

What is the deal with constructors?

All general-purpose languages are hobbled by just how darn easy it is to construct.

What is the purpose of a constructor?

A constructors’ purpose is to ensure that when an object is created, it is done so in a valid state. Don’t take my word for it, Wikipedia plainly states as much.

What if we can’t achieve a valid state with the parameters we’re given?

Ah, and therein lies the problem. A constructor can only “return” an instance of the class type. When you perform validation within a constructor, it’s too late to do anything but throw an exception upon failure.

public class EmailAddress
{
    public EmailAddress(string alleged_email)
    {
        if (!EmailRegex.IsMatch(alleged_email))
            throw new ArgumentException("wtf bro");
// ...

Digression on exceptions

Some languages lack the sort of exceptions you see in C++ and C#. In Go, we’re talking about “panics”, for example.

But even the word “panic” is inconsistent across languages. In Rust, a panic is not recoverable.

Like Rust, Java has unrecoverable panics, but they’re called Errors. It can be confusing because Java also has recoverable Exceptions, too.

To confuse us even more, Go also has Errors, except they are nothing like unrecoverable panics. Instead they are simply values that can be passed around like anything else. Extra confusing is that in Haskell, Exceptions are like Go errors; they’re just values. Value-based exceptions are often wrapped up in other standard language constructs. In Rust, you’d use std::result, for example.

When I mention exceptions, I’m speaking of recoverable exceptions/panics that are not values.

Friends don’t let friends use exceptions

Exceptions are less-than-ideal mechanisms for logic control. Besides the fact that a try-catch block is uglier than sin, here are some reasons to avoid them:

Exceptions are heavy performance hitters.

Exception instances almost always* store an entire stack trace
- Additionally, there are often big performance overheads just to build the stack trace.
Handling exceptions is so expensive that some C++ projects disable them altogether (ref). This has caused a bit of an existential crisis in the C++ community
- In C++, the common “zero cost exception” model builds compile-time information about all handlers for places where there is a throw. It is “zero (CPU) cost” until an exception is actually thrown. When that happens, though, it is extremely expensive.
In C#, exceptions trigger a stack unwinding in search of a handler, which involves examining each stack frame in the current call-stack.

*In Haskell, this is only true if you ask it to

Exceptions do not compose

(Except in languages where they do)

Exceptions “escape” the current scope, or as Chair of the C++ committee Herb Sutter says in “Zero-overhead deterministic exceptions: Throwing values”, “exceptional control flow is invisible”. This means laboriously adding nested scope upon nested scope in the form of try...catch blocks so that exceptions do not “escape” and terminate your program.

We do not want error handling to distract us from the “happy-path” logic we are expressing. Instead, we want to give the programmer flexibility to either handle an error immediately or decide to push off handling until later.

So you’re outlawing exceptions?

I’m not saying don’t ever use exceptions, I’m saying they should be a last resort. I may not always agree with the Google C++ style guide, but it does have this pearl of wisdom with regards to exceptions:

The availability of exceptions may encourage developers to throw them when they are not appropriate or recover from them when it’s not safe to do so. For example, invalid user input should not cause exceptions to be thrown.

If a user accidentally left off the domain of their email address, don’t throw an exception. If there is a disk read error or failure to allocate memory, for example, then sure, exceptions can be useful.

Conditions for proper type-driven design

A well-designed system:

Rejects invalid input early (but not too early)
Doesn’t revalidate the same input (“thousands of times”), and
Gracefully handles invalid input

1. Reject invalid input early

Let’s set aside the security implications. Practically speaking, if you need to validate at all, you might as well do it right away otherwise when validation fails, you’ve wasted CPU and memory.

That’s it. I’m sure there are other arguments, but this is the most sound.

2. Don’t revalidate the same input

It’s needlessly wasteful. See #1. Even if your validation code was very lightweight, you’re at least abusing the “Don’t repeat yourself” (DRY) principle.

3. Gracefully handle invalid input

What makes error handling “graceful”?

In order for error-handling to be “graceful”, it must:

not throw exceptions, and
not bounce around between logic and error checking like so:

  var resultA = FunctionA();
  if (!resultA.Succeeded)
  {
    // error handling
  }

  var resultB = FunctionB(resultA.Value);
  if (!resultB.Succeeded)
  {
    // error handling
  }
  // ...

Throwing exceptions almost got it right

Throwing exceptions in constructors actually satisfies conditions #1 (reject invalid input early) and #2 (don’t revalidate the same input). However, they utterly fail #3 (gracefully handle invalid input).

The workarounds that fell short

Programmers are a curious, perfection-seeking bunch. Many have recognized the issues with using constructors as validators and invented workarounds. Few got it right. In attempting to address condition #3 (gracefully handle invalid input), they relax either condition #1 (reject invalid input early); condition #2 (don’t revalidate the same input); or both.

All of the workarounds that got it wrong have one thing in common: they move the validation code outside of the class. When this happens, later consumers of the class can never be sure if the class is valid. Let’s take a look at some of these well-meaning but ultimately wrong techniques.

Error codes

Error codes are lightweight integers used to represent some error condition. The most ubiquitous error code is 1 which is returned by a process when it exits with an error. In C#, validating our email address with error codes might look like this:

public enum EmailValidationErrorCode
{
    Ok = 0,
    IsNull = 1,
    InvalidFormat = 2
}

[HttpPost("/set-email")]
public ActionResult SetEmail([FromBody] string? alleged_email)
{
    var validationResult = ValidateEmail(alleged_email);
    if (validationResult != EmailValidationErrorCode.Ok)
        return BadRequest();
    // ...
    return Ok();
}

EnumValidationErrorCode ValidateEmail(string? alleged_email)
{
    if (alleged_email == null)
        return EmailValidationErrorCode.IsNull;
    else if (!EmailRegex.IsMatch(alleged_email))
        return EmailValidationErrorCode.InvalidFormat;
    return EmailValidationErrorCode.Ok;
}

Consider yourself lucky if you get enumerations like this instead of plain integers for which you need to consult a physical paper manual issued by a company that went out of business 20 years ago, whose only copy is somewhere in Grace’s old office (before she retired).

The only problem that error codes solve is that errors are now much more lightweight than exceptions. Everything else about them is awful.

They encourage a if (success) {...} else {...} programming style that is hard to read
They don’t compose
Other functions that need to consume this input CANNOT KNOW THEY RECEIVED A VALID EMAIL! They are forced to revalidate.

In other words, error codes violate condition #2 (don’t revalidate the same input) and arguably condition #3 (handle invalid input gracefully). They arguably violate condition #1 (reject invalid input early) as well, since you must remember to call the validation code, but I’ll give it a pass.

ValidationResult

People realized that error codes were seriously lacking. For one thing, they usually only indicated one type of failure. Multiple things could be wrong with input, and you don’t want the end user to get stuck in a debug whack-a-mole loop where they fix one validation error just to be presented with the next one, and so on. (Skipping over bitfield approaches) this leads us to ValidationResult approaches, which can store more than one error object. These error objects are fancy, they can have an integer code AND a human-readable string.

The slick FluentValidation library in for .NET has us declare a validator class where we can compose validation rules, even dependent rules using LINQ-like syntax:

public class EmailValidator : AbstractValidator<string>
{
    public EmailValidator()
    {
        RuleFor(str => str).EmailAddress(EmailValidationMode.Net4xRegex);
    }
}

[HttpPost("/set-email")]
public ActionResult SetEmail([FromBody] string? alleged_email)
{
    var validationResult = new EmailValidator().Validate(alleged_email);
    if (!validationResult.IsValid())
        return BadRequest(validationResult.ToString());
    // ...
    return Ok();
}

These solve exactly zero of the problems I laid out with error codes. They’re fun to write though! In fact, this actually violates another rule:

Use protection; validate before you allocate

Validator classes require us to have an instance of the class being validated before validation can run. Besides wasting an allocation for scenarios where validation fails, they make the assumption that the class being validated is basically a bag of public properties (called “POD” in C++, POJO in Java, POCO in C#, or more universally a Passive data structure).

Mithra save you if your class reads a configuration file, or calls out to a service on the network like a database. All that work for nothing. And sometimes you don’t even really have any control over that; in C++ if you derive from a base class, the base class gets constructed first (as opposed to C# where the derived class is).

Allocations are not inherently expensive; it’s all the stuff that can come with it.

(N.B. I consider Validator classes as nearly identical to the fallacious two-stage construction as popularized by MFC)

Either

Some programmers got a little wiser with their ValidationResult-esque classes. Maybe they overheard some of the older kids talking about “Monads” and decided that they were sick of writing if(success){...} else{...} everywhere. Who knows?

One day someone said, “What if instead of manually checking whether the result succeeded or failed, I could just provide the functions that should get called for either scenario?” and then proceeded to absolutely butcher an implementation of Either.

The prolific C# Youtuber Nick Chapsas eventually stumbled his way into the following implementation after 3 years and 6 videos with cringey thumbnails (1, 2, 4, 5, 6)

public readonly struct Result<TValue, TError>
{
    private readonly TValue? _value;
    private readonly TError? _error;

    private Result(TValue value)
    {
        IsError = false;
        _value = value;
        _error = default;
    }

    private Result(TError error)
    {
        IsError = true;
        _error = error;
        _value = default;
    }

    public bool IsError {get;}
    public bool IsSuccess =>IsError;
    public static implicit operator Result(TValue value) => new(value);
    public static implicit operator Result(TError error) => new(error);
    public TResult Match(
        Func success,
        Func failure) =>
        !IsError ? success(_value!) : failure(_error!);
    )
}

An image of Nick Chapsas — My criticisms of Mr. Chapsas are tongue-in-cheek; I actually appreciate all he’s done for the C# community

If you were fooled into even trying to compose with functions that returned instances of Chapsas’ Result you’d quickly see how fast it becomes unwieldy. For example, here are just three function calls in order, the first two returning Result.

var finalResult = FunctionReturningResult(arg).Match
(
    successValue => OtherFunctionReturningResult(successValue)
        .Match
        (
            anotherSuccess => FinallyASaneFunction(anotherSuccess) // I already hate myself
            someFailure => SomeErrorCode(someFailure)
        ),
    failureValue => OtherFailureCode(failureValue)
);

Brief aside on Monads

“This type is technically a union and this type is also technically a Monad” Chapsas tells viewers (ref), and he is wrong on both fronts.

Let’s demystify Monads. They’re basically interfaces: interfaces for classes that wrap other classes. Consider List. It wraps zero or more T that can be accessed sequentially. For a type to be a Monad, you must be able to:

“lift” a T into it
- We can easily create a list of a single element
“map” a function to it
- In C#, this is LINQ’s .Select function. We can convert a List<int> to a List<string> for example
- (Monads aren’t actually required to have “map”, but many do anyway)
“bind” a function to it
- in C#, this is LINQ’s .SelectMany function. It “flattens”. If you “map” a function that raises each element into its own list like so .Select(value => new List{value, value}), then you end up with effectively List<List<T>>. SelectMany (“bind”) will concatenate all internal lists to form a flattened list of List<T>.

What Nick has given us is not a Monad, but roughly a “Church encoded Either” (except Mark Seemann more cleverly implements his to simulate a union). An unfortunately it lacks a “bind” operation.

As for the claim that it’s a union, C# lacks true sum types and as a result memory is still allocated for both. An actual union only allocates enough memory to hold its largest type. Behaviorally, though, Nick’s Result<T> does act like a discriminated union.

Kudos for getting that far though, Nick. With just a little more work we could evolve this type into a full Monad. To his credit, though, he eventually recommends both dotNext’s Result and LanguageExt’s Result<A> classes instead (of which neither are Monads)

Regardless, it doesn’t matter if you used a properly-implemented Either to solve the validation problem. As soon as you “reach inside” it to grab the value (e.g., in the success case), the consuming function has no idea that the parameter was validated anyway.

Types must be self-validating, otherwise it’s the “trust me bro” approach to contracts.

ValidatedEmailAddress

Here is where I give everyone I’ve criticized about validation a pass. Validation is subtly difficult to accomplish, and even the best of the best get it wrong. As proof, let’s revisit my hero, the Reverend Mark Seemann. Even he has been blinded by constructors.

Before I begin, I want to mention that Mark Seemann has probably forgotten more category theory and functional programming than I will ever learn. The guy has written two books and runs a freakin’ course on type-driven design.

And yet he still got validation wrong (so close, though).

Mark has written quite a bit about validation (1, 2, 3, 4), and he ultimately recommends the following pattern to construct your objects:

public class EmailDto
{
    public string? Email{get; init;}
    public Result TryParse()
    {
        if (Email == null)
            return null;
        if (!EmailRegex.IsMatch(Email))
            return null;
        return new EmailAddress{Email = this.Email};
    }
}

public class ValidatedEmailAddress
{
    public string Email{get; init;}
}

(Where Result is a proper Either implementation).

The problem with this code is that I have a coworker named Homer S. who doesn’t pay attention to silly things like unenforced preconditions. You tell them that ValidatedEmailAddress should only be constructed as a result of EmailDTO.TryParse and it goes in one ear and out the other. Maybe you’ll get lucky and catch them submitting something like this in code review:

var validEmail = new ValidatedEmailAddress{Email = "notvalid"};

While this code handles invalid input gracefully (condition #3) it ultimately fails to avoid revalidating the same input (condition #2). Future consumers of an ValidatedEmailAddress cannot actually be certain it was validated.

Like the techniques mentioned above, it also arguably fails to reject invalid input early (condition #1); one must remember to construct a ValidatedEmailAddress via EmailDto.TryParse

I mentioned that this code got really close to the right answer, and TryParse is the clue.

Guid.TryParse

In C#, there’s a common pattern for constructing basic types from a string that looks like this:

 public struct Guid
 {
    public static bool TryParse(string input, out Guid result)
    {
        // ...
    }
 }

If this were the only way to make a Guid (it’s not) it would satisfy the two criteria that Mark’s TryParse did not!

Let’s combine Mark Seemann’s TryParse pattern with C#’s TryParse pattern.

Smart constructors

Briefly let’s revisit our requirements for proper validation

Reject invalid input early
Don’t revalidate the same input, and
Gracefully handle invalid input

More concretely,

It should be impossible to construct an instance of the target type in an invalid state
We must not throw exceptions, and
We must be able to compose with the result of validation

Just get to the technique already!

Let’s tweak Mark’s TryParse pattern above just slightly for our EmailAddress class:

public class EmailAddress
{
    private readonly string _email;
    private EmailAddress(string validatedEmail)
    {
        _email = validatedEmail;
    }

    public static ValidatedResult TryParse(string maybeEmail)
    {
        if (maybeEmail == null)
            return Validated.Fail("Email address was null")
        else if (!EmailAddressRegex.IsMatch(maybeEmail))
            return Validated.Fail("Email address was not in the correct format")
        else 
            return Validated.Succeed(new EmailAddress(maybeEmail));
    }
}

Does this approach satisfy our criteria?

1. Reject invalid input early

Failure to construct an EmailAddress immediately returns a ValidationResult in a failure state, which can short circuit subsequent chained calls.

2. Don’t revalidate the same input

It is impossible to create an EmailAddress instance in a bad state; the class enforces its own preconditions. Consumers can rest assured knowing they don’t need to revalidate.

3. Gracefully handle invalid input

As outlined in Mark’s blog post, we can compose with our ValidationResult class. (Also we are not throwing exceptions.)

Personally, I prefer an error type that is a bit richer than a string (Mark Seemann does too), but the above demonstrates the gist of the technique.

Railway oriented programming

With all this in place, we can implement a railway pattern. In this pattern, error information is encapsulated and then propagated down the call chain, to be handled at the end.

It looks something like this in our codebase:

return await EmailAddress.TryParse(inputStr)
  .Then(async (validEmailAddress) => await _db.UpdateEmailAddressForUser(uid, validEmailAddress))
  .Then(updatedEmailResponse => new UpdateEmailResponseForWeb(/*...*/))
  .Apply(MapResponse);

We could fail either when we try to turn a string into an email address (EmailAddress.TryParse) or when we actually attempt to update the user (_db.UpdateEmailAddressForUser) if, for example, no user with that Id actually exists.

But you don’t see the error handling scattered around the codebase. Instead, each .Then examines the ValidationResult object. If said object is in an error state, the error simply propagates to the next call in the chain. Otherwise the function passed into .Then is applied. Error handling is done at the end in .Apply(MapResponse) which will return an appropriate REST response based on whether the prior code succeeded or failed.

I advise you to use function names that make sense to your team. For example, I like the name .Then, which is overloaded for a variety of scenarios. It sure beats confusing names like Map and Bind.

A little bit at a time

If I wanted to, I could break the railway pattern at any point and write imperatively:

var updatedEmailResponse = await EmailAddress.TryParse(inputStr)
  .Then(async (validEmailAddress) => await _db.UpdateEmailAddressForUser(uid, validEmailAddress));
if (!updatedEmailResponse.Success)
  return updatedEmailResponse.Match(/*...*/);

 var webResponse = new UpdateEmailResponseForWeb(/*...*/);
 return Ok(webResponse);

Sure, it’s not 100% pure, but it’s massively more useful than before. Baby steps. We can now transition our codebase from a more imperative style to a more compositional style as we see fit.

(N.B. If you are a Go developer, Rob Pike, co-creator of Go, has a nice article where he nearly recreates the railway pattern for Go)

EmailAddress.TryParse is a smart constructor!

A smart constructor pattern is where you make it impossible for consumers to call the actual constructor of a type. Instead, you (the type’s author) expose a function (or functions) that can do it instead. Consumers are forced to call your function, which guarantees correct-construction of your type, and also has more flexibility to return error information than a lowly constructor.

The name “smart constructor” comes from a Haskell technique of the same name. In Haskell, smart constructors are achieved through some module export trickery, but for most “general purpose” languages, we need to follow the TryParse pattern outlined above.

(Feel free to name your smart constructor something other than TryParse)

We are working against our programming languages!

It’s not easy or intuitive to pull this off correctly. Programming languages make it easy to call constructors, and I think it should be the other way around. Maybe one day we’ll get a language (or updates to an existing language) to better support type-driven design.

Smart constructors are the best of functional and object-oriented programming

Modulo language syntax, we get to keep the fantastic invariant preservation of object-oriented design, while also benefiting from the expressivity afforded to us by monadic composition.

Closing thoughts

I did not invent smart constructors. Almost everyone I shared this article with said that they had either seen or independently reinvented this technique at some point. In 2015, Scott Wlaschin succinctly summarized the entire technique in two comments:

// Just as in C#, use a private constructor 
// and expose "factory" methods that enforce the constraints

As long as I’m linking to Scott Wlaschin, I should point out that he wrote an excellent series on type-driven design in 2013. I’ve reiterated many of his points here. In fact, in that same 2015 Github gist I linked above, he demonstrates smart constructors for F#.

I suspect that smart constructors (in object-oriented code) have been around nearly as long as object-oriented code existed. They just have not been written about extensively.

I hope that I’ve helped formalize this technique and give it a handy name to refer to. Perhaps one day we’ll mention “smart constructors” as often as we say “factory pattern”.

Acknowledgements

I’m extremely grateful for the fantastic correspondence I’ve had with the following people in the course of writing this article (in alphabetical order by last name):

Nathan Bayles
Dr. Jory Denny
Timothy Gilino
Adam Homer
Ben Ketcherside
Dr. Stephanie Valentine

Appendix A: ASP.NET validation middleware

Considering my use of C# throughout this article, I would be remiss if I didn’t mention ASP.NET’s validation middleware.

When you write a web app in C# .NET, the infrastructure conveniently handles constructing your strongly-typed parameters from the weakly typed JSON/XML/Text/etc. requests you receive. If you create a class to encapsulate all of these parameters, you can annotate them with validation requirements as follows:

public record ChangeEmailRequest
(
    [Required]
    string UserId,

    [Email]
    string NewEmail
)

Before your function is invoked, ASP.NET will validate that UserId is not null, and that NewEmail has an @ character (yes, that’s really all it does). If validation fails, you can configure it to automatically return a 400 (Bad Request) with a default or custom response body, or you can explicitly check if (!ModelState.IsValid) and perform in-line handling in your controller.

This actually works really well in practice. Almost all validation only needs to happen at the point that we receive user input, and returning a bad request response right away is good (see: #1 Reject invalid input early). If you really need to, you can even access dependency-injected services in your custom-validation attributes (even if it makes testing a little weird).

Plus, when you use a record type in this way (init-only properties), it is immutable after creation. This almost satisfies condition #2 (don’t revalidate the same input).

With automatic 400 (Bad Request) responses configured, this even satisfies condition #3 (gracefully handle invalid input).

There are a few problems, though. Condition #2 (don’t revalidate the same input) is not actually satisfied; if the controller action decided to pass this parameter onto a service class (or even just a helper function), that consumer can’t be certain that validation has been performed, because the following is still possible:

var badRequest = new ChangeEmailRequest("validUserId", "InvalidEmail");

Condition #3 (gracefully handle invalid input) is not actually satisfied either. Again, it’s only satisfied at the controller action level. If you wanted to perform this validation anywhere else in the codebase (outside of maybe EntityFramework), it would look like this:

var maybeAGoodRequest = new ChangeEmailRequest(/*...*/);
var validationContext = new ValidationContext(maybeAGoodRequest, null, null);
var validationResults = new List<ValidationResult>();
bool isValid = Validator.TryValidateObject(maybeAGoodRequest, validationContext, results, true);
if (!isValid)
{
    // handle error
}
else
{
    // handle success
}

So ASP.NET middleware validation only works if it stays right there: executed automatically before being passed along to your controller. Any invariants that must be preserved beyond a controller action need to be put into a self-validating type (via a smart constructor).

Appendix B: Improving the smart constructor pattern

Almost all validation happens on “primitive types”; booleans, numeric types, characters, and collections of the previous. Therefore there’s a pretty finite number of ways to perform validation on them:

Enforce value is in a range
For collections, enforce a min/max length
For strings, enforce a regex match

Therefore you can encode each of these types of validation into general-purpose utility functions to be reused.

Better yet, if you have access to a code-generation tool, like the built-in source-generators in C#, then you can take this a step further by annotating your class properties with the kind of validation that should be performed on them, and then letting the source generator take care of the rest.

[AutoGenerateTryParse]
public partial class ChangeEmailRequest
(
    [NotEmpty]
    public string UserId {get;}

    [Email]
    public string NewEmail{get;}
)

Notice that this class has get-only properties and no constructors.

Then your code generator could generate the TryParse method and the private constructor for you:

public partial class ChangeEmailRequest
{
    private ChangeEmailRequest(string UserId, string NewEmail)
    {
        this.UserId = UserId;
        this.NewEmail = NewEmail;
    }

    public static ValidationResult<ChangeEmailRequest, List<string>>
    TryParse(string UserId, string NewEmail)
    {
        var validationErrors = new List<string>();
        if (string.IsNullOrEmpty(UserId))
            validationErrors.Add("UserId field is required");
        if (!EmailRegex.IsMatch(NewEmail))
            validationErrors.Add("NewEmail is not a valid email");
        
        if (validationErrors.Count > 0)
            return ValidationResult.Fail(validationErrors);
        
        return ValidationResult.Succeess(new ChangeEmailRequeest(UserId, NewEmail));
    }
}

While C#’s source generators are still in pretty rough shape, writing one to take care of this boilerplate significantly reduced the lines of code in our codebase.

Appendix C: When it’s acceptable to forgo smart constructors

Smart constructors are a way to enforce that a “narrow contract” is never broken; a function (constructor in our case) may only be defined for certain inputs, and “undefined” for all others. Functional programmers call these types of functions “partial”. Smart constructors effectively wrap partial functions (constructors) in “total” functions; functions that can accept all possible values for their parameters’ types. Instead of allowing a narrow contract to be violated, our smart constructors return an error type instead.

Performance

Checking contract validity costs time and space, and some argue that this should not be necessary inside a bug-free program. In other words, (C++) developers would prefer that such checks do not happen in a production (Release) build of their software. Because obviously there are no bugs in production software.

Nonetheless, you may decide to hide your smart constructors’ checks behind a compile-time check that indicates it is for Debug-only builds. E.g., #if DEBUG in C# and #if NDEBUG in C++. You may even decide to demote your smart constructors to simple assertions for syntax’s sake.

In C++, when a contract in the language or standard library is violated, it is said to result in “undefined behavior” which is often described as “anything can happen”, from the right thing all the way to say, somehow remote starting your car. The reality is that your compiler writers decide what happens, which ranges from a possible (but not guaranteed) segmentation fault from e.g., from accessing a pointer to deallocated memory to possibly incorrect mathematical behavior from e.g., from signed integer overflow (most implementations just roll over to the negatives).

Making such assumptions about the correctness of your code can result in far more efficient assembly generation, which does actually matter sometimes. Just know that this is a conscious design decision made by the architects of a system, who have proved in other ways (usually by construction) that a narrow contract will not be violated.

Wide contracts

Opposite to “narrow contracts” are wide contracts. Functions with wide contracts have no preconditions on the values of their parameters. As you compose more and more types on top of each other, you may get to a point where you do not need to perform any additional validation because your types are already sufficiently “strong”. Consider the scenario of composing a Person type from an Email type, and other similar “strong” parameters:

public class Person
{
    public Person(Email emailAddress, NonEmptyString name){...}
}

Parameter objects

The most common type of wide contract I encounter is when a type merely serves to hold all the parameters for a function. This is called a “parameter object isomorphism”, and is really common in the Vulkan API.

Two baby goats — Thanks for reading this far. As a reward, here are some baby goats. (Photo by James Tiono on Unsplash)

The validation problem

“How should I validate input?”

“When do I validate input?”

Type-driven design

What is the deal with constructors?

What is the purpose of a constructor?

What if we can’t achieve a valid state with the parameters we’re given?

Digression on exceptions

Friends don’t let friends use exceptions

Exceptions are heavy performance hitters.

Exceptions do not compose

So you’re outlawing exceptions?

Conditions for proper type-driven design

1. Reject invalid input early

2. Don’t revalidate the same input

3. Gracefully handle invalid input

What makes error handling “graceful”?

Throwing exceptions almost got it right

The workarounds that fell short

Error codes

ValidationResult

Use protection; validate before you allocate

Either

Brief aside on Monads

ValidatedEmailAddress

Guid.TryParse

Smart constructors

Just get to the technique already!

1. Reject invalid input early

2. Don’t revalidate the same input

3. Gracefully handle invalid input

Railway oriented programming

A little bit at a time

EmailAddress.TryParse is a smart constructor!

We are working against our programming languages!

Smart constructors are the best of functional and object-oriented programming

Closing thoughts

Acknowledgements

Appendix A: ASP.NET validation middleware

Appendix B: Improving the smart constructor pattern

Appendix C: When it’s acceptable to forgo smart constructors

Performance

Wide contracts

Parameter objects

Share this:

Related

One thought on “Smart Constructors”

Leave a comment Cancel reply