Rust's custom derives in a hexagonal architecture: Incompatible ideas?

This blog post is a manifestation of a problem that has been floating around in my head for quite a while now. It is about the seemingly incompatible idea of fully embracing Rust's custom derive system in an application that puts a strong focus on a hexagonal architecture.

To discuss this problem, I am going to first write about both concepts individually. Feel free to skip over those sections if you are already familiar with the topics. The blog post finishes off with some ideas on how Rust could be extended to better support these kind of usecases.

Hexagonal architecture

The concept of a hexagonal architecture is known under a variety of different names. Some people refer to it as onion architecture whereas others like to call it "ports and adaptors"1. Independent of the name, the idea is always the same: embracing the principle of Dependency Inversion. Or, the way Uncle Bob calls it: Clean Architecture.

In a nutshell, you want to model the "core" of your problem domain (your "business logic") in a way that it is ignorant about the rest of the system

  • Is your system invoked via an HTTP API or in a CLI? Your core doesn't care.
  • Is the state persisted in an SQL database or just held in memory? Your core doesn't care.

Note: With system, I am always referring to a single runtime component (one application) and the core is just a module that (for most languages) only exists at the source-code level.

What is the point of such a modularisation? The examples already provided quite a strong hint: It is great at separating concerns and has many interesting implications down the road. If the code that models your problem domain is independent from (as in: the core module does not depend on) the rest of the code base:

  • you can compile it very fairly quickly: Fast compile times allow for many iterations and hence faster feature development.
  • you can (unit-)test it in isolation
  • you can reason about the critical behaviour of the system in isolation
  • you can port it to other runtime environments

There is probably many more things that could be mentioned and they also kind of overlap at certain points. The key takeaway for me is simplicity, which is one of the crucial points of why we as developers are coming up with abstractions and architectures.

A hexagonal architecture is a simple architecture. Not only because it separates concerns but also because it allows you to defer some decisions to a later stage in the development of a system2. The project I am currently working on still stores things in-memory and we've been developing for over a year now. It is actually fully functional but just doesn't persist the state between restarts (yet).

Hexagonal architecture in Rust

That project is actually written in Rust, which turned out to be an excellent choice for the problem domain. It is also the reason I am writing this blog post because we've been trying to embrace a hexagonal architecture and we've hit some problems with it.

Overall, Rust has a very well thought-through module system:

  • Symbols are private by default and need to be explicitly exported
  • Transitive dependencies are not exposed: A dependency on module B in module A is not leaked through to C (if you don't use it in a public signature)

This allows for a sophisticated modularisation even within a single crate which is what Rust uses to package up and distribute libraries. What is the trouble then?

Rust has a clear separation of data and behaviour. Data is stored inside structs whereas behaviour is attached to structs by implementing traits on them. If there is an automated way to implement a certain trait (like creating a printable representation for debugging purposes through Debug) Rust allows you to "derive" the implementation and thereby having it generated for you by the compiler. This will, and that is the important part, add source code at compile-time in the same module as the declared struct. Here is an example:

#[derive(Debug)] // <-- Instruction to derive in implementation of the `Debug` trait for the struct `Person`
struct Person {
   name: String
}

// <-- Implementation of `Debug` is going to be generated here at compile time:
// impl Debug for Person {
//    ...
// }

One of the most popular crates in the Rust ecosystem is serde. It allows you to implement (and derive) implementations of the Serialize and Deserialize traits which can then be used to serialize an instance of a struct into a variety of formats (JSON, YAML, XML, etc) and also deserialize into an instance from any of these. As with all custom derives, those implementations are generated next to the actual struct definition.

You might already have an idea of where I am going with this ...

Let's recap:

In a hexagonal architecture, we want our core module to be independent of the other aspects of a system. If we are building an HTTP API, we don't want the code in our core module to know about that. For example, if any of those types need to be serialized, the code for serializing them should not be within the core module. It should be in an http module or something like that. The stress test for whether this requirement is fulfilled is always: if I would extract the core module in its own project without any of the other stuff being present, would it still compile? As soon as we start to derive Serialize on any of our core types, this is no longer given unless we would add serde as a dependency to that project.

Note: serde by itself is very well designed and is actually separated into the generic serialization library and concrete formats like json or yaml so it might not be so bad to depend on serde but I hope the examples still communicates the point I am trying to make.

What is the alternative?

We can obviously always go and implement the trait ourselves in whichever module we want:

// In our "core" module

struct Person {
   name: String
}
// In our "http" module
use serde::Serialize;
use core::Person;

impl Serialize for Person {
   ...
}

This has two downsides:

  1. It is tedious and error prone to implement the serialization code yourself, especially if you want the exact same one that would already be provided by serdes custom-derive. Also, serde is very configurable, so it very likely that, even if you have special requirements for the serialization, it is probably gonna support it in a way so that you don't have to handroll your own implementation.
  2. It actually doesn't survive our "stresstest" because of Rust's orphan rules. If we move Person to its own crate, neither Serialize nor Person are local to the crate that hosts the http module and hence, declaring this implementation will not compile.

Possible solutions

Let's try to workaround those to problems. In the end, we want to achieve the following:

  • Having the host module of Person be free of any serialization code and the resulting dependencies
  • Being able to serialize an instance of Person in another module

Solution 1: Create a new type

Creating types is cheap in Rust thanks to zero-cost abstractions. We can therefore define a new type: HttpPerson that lives in the http module. This one will mirror the fields of Person and derive Serialize. The only thing we have to do is convert between HttpPerson and Person. Yeah!

Well, depending on how complex our real-world data structures are, this can be quite a tedious and also error prone task. Also, let's not forget the mental load (why are there two, seemingly identical structs?) that would come with such an approach. If you are writing software for a complex business domain, you shouldn't make the code any harder to understand than it already is. In his book Domain Driven Design, Eric Evans suggests several patterns on how to design software for complex domains. One important technique is to reduce the mental mapping between the business domain and the actual source code as much as possible. Having several data types that represent the same element of the business domain does not help with that, especially because we only introduced it because of a technical limitation of our tool.

Solution 2: Create a local wrapper type

Instead of creating a type that mirrors the implementation of Person, we can create a generic Http<T> struct:

struct Http<T>(T);

This one will be local to our crate and hence we are allowed to implementation Serialize like this:

use serde::Serialize;
use core::Person; // Imagine `core` being a crate instead of just a module

impl Serialize for Http<Person> {
   ...
}

This avoids the need for a type that mirrors the structure of Person but has the downside that we have to manually implement Serialize again.

Conclusion

This is the stage where I am currently out of ideas on how to proceed. Both solutions are sub-optimal and hard to justify just for the sake of the "stresstest". Obviously, if the core of your system already lives in another crate, you will have to roll with one of those anyway but if the modules still live in the same crate, you might just bend the rules a little and roll with #[derive(Serialize, Deserialize)].

The 2nd solution is currently my favourite if I'd have to go for one. Mainly for it's cleanliness of not having to re-define Person but also because I have the feeling, it should be possible (through changes to the language or other clever things one can do with Rust) to make Serialize easier to implement. The limitation we are currently hitting there is that macros are processed very early in the compile phase, hence they only have access to the source code and cannot resolve symbols. Declaring #[derive(Serialize)] will only receive the tokens of the declaration it is sitting on, which is the struct definition of Person in our case. I think it is therefore not possible to write a custom-derive that generates code based on some code somewhere else.

Extending custom-derives with symbol resolution

It would be nice if one could do like:

use serde::Serialize;
use core::Person;

// Imaginary syntax:
derive Serialize on Person;

and get access to the declaration of the Person symbol in the implementation of the custom-derive, no matter where it is actually defined. Orphan rules would still apply obviously. This would allow for some seriously powerful code generation and at the same time, keep concerns nicely separated.

Baking this into the custom-derive feature is probably not such a good idea though since derive is associated with annotating a struct. More generally expressed, I'd like to have a way of doing meta-programming in Rust that has access to symbol resolution kind of like reflection in languages such as Java and C#. This would allow for generating code like Serialize impls in a different module other than where the actual struct is defined.

Lexical trait implementations

Another feature, although completely orthogonal, that would nicely fit into hexagonal architectures are lexical trait impls. Currently, the Rust compiler enforces the so-called "orphan-rule" when it comes to trait implementations. Roughly summarized, it states that either the trait or the type that the trait is implemented on have to be local to current crate. This is to guarantee that no matter which crates are linked together, there is at maximum one implementation of a specific trait on a certain struct. This is because declaring the implementation of a trait on a struct is an element of a crate that is "exported". In other words, any piece of code that depends on this crate can use this implementation. If one could lexically scope trait implementations, the "orphan-rule" could be relaxed under certain circumstances. Imagine you could do the following:

use serde::Serialize; // Foreign trait
use core::Person; // Foreign struct

#[no_export]
impl Serialize for Person {

}

Or with a different syntax:

use serde::Serialize;
use core::Person;

pub(crate) impl Serialize for Person {

}

In the above scenario, Serialize as-well as Person are types foreign to the current crate. However, the impl blocks are marked/annotated as private to the current crate. Hence, no code outside of the current crate is affected by this implementation because there is an unambiguous way of selecting which functionality should be called: invoking Serialize within the current module will always use the local implementation. In a way, this would a form of specialisation.

Wrapping up

A hexagonal architecture allows for a clean separation of concerns within a codebase. At the current state, embracing such an architecture in a Rust code base to its fullest causes some friction with how certain things like custom-derives in the Rust ecosystem work. I am super excited about seeing Rust evolve and tackle problems like these!

Discussion

Comments or ideas?
Post them to the /r/rust post: https://www.reddit.com/r/rust/comments/bvrmaa/rusts_custom_derives_in_a_hexagonal_architecture/


You'll only receive email when they publish something new.

More from Thomas Eizinger
All posts