Rust's custom derives in a hexagonal architecture: Incompatible ideas?
May 30, 2019ā¢2,206 words
This blog post is a manifestation of a problem that has been floating around in my head for quite a while now. It is about the seemingly incompatible idea of fully embracing Rust's custom derive system in an application that puts a strong focus on a hexagonal architecture.
To discuss this problem, I am going to first write about both concepts individually. Feel free to skip over those sections if you are already familiar with the topics. The blog post finishes off with some ideas on how Rust could be extended to better support these kind of usecases.
Hexagonal architecture
The concept of a hexagonal architecture is known under a variety of different names. Some people refer to it as onion architecture whereas others like to call it "ports and adaptors"1. Independent of the name, the idea is always the same: embracing the principle of Dependency Inversion. Or, the way Uncle Bob calls it: Clean Architecture.
In a nutshell, you want to model the "core" of your problem domain (your "business logic") in a way that it is ignorant about the rest of the system
- Is your system invoked via an HTTP API or in a CLI? Your
core
doesn't care. - Is the state persisted in an SQL database or just held in memory? Your
core
doesn't care.
Note: With
system
, I am always referring to a single runtime component (one application) and thecore
is just a module that (for most languages) only exists at the source-code level.
What is the point of such a modularisation? The examples already provided quite a strong hint: It is great at separating concerns and has many interesting implications down the road. If the code that models your problem domain is independent from (as in: the core
module does not depend on) the rest of the code base:
- you can compile it very fairly quickly: Fast compile times allow for many iterations and hence faster feature development.
- you can (unit-)test it in isolation
- you can reason about the critical behaviour of the system in isolation
- you can port it to other runtime environments
There is probably many more things that could be mentioned and they also kind of overlap at certain points. The key takeaway for me is simplicity, which is one of the crucial points of why we as developers are coming up with abstractions and architectures.
A hexagonal architecture is a simple architecture. Not only because it separates concerns but also because it allows you to defer some decisions to a later stage in the development of a system2. The project I am currently working on still stores things in-memory and we've been developing for over a year now. It is actually fully functional but just doesn't persist the state between restarts (yet).
Hexagonal architecture in Rust
That project is actually written in Rust, which turned out to be an excellent choice for the problem domain. It is also the reason I am writing this blog post because we've been trying to embrace a hexagonal architecture and we've hit some problems with it.
Overall, Rust has a very well thought-through module system:
- Symbols are private by default and need to be explicitly exported
- Transitive dependencies are not exposed: A dependency on module B in module A is not leaked through to C (if you don't use it in a public signature)
This allows for a sophisticated modularisation even within a single crate
which is what Rust uses to package up and distribute libraries. What is the trouble then?
Rust has a clear separation of data and behaviour. Data is stored inside struct
s whereas behaviour is attached to struct
s by implementing trait
s on them. If there is an automated way to implement a certain trait (like creating a printable representation for debugging purposes through Debug
) Rust allows you to "derive" the implementation and thereby having it generated for you by the compiler. This will, and that is the important part, add source code at compile-time in the same module as the declared struct. Here is an example:
#[derive(Debug)] // <-- Instruction to derive in implementation of the `Debug` trait for the struct `Person`
struct Person {
name: String
}
// <-- Implementation of `Debug` is going to be generated here at compile time:
// impl Debug for Person {
// ...
// }
One of the most popular crates in the Rust ecosystem is serde. It allows you to implement (and derive) implementations of the Serialize
and Deserialize
traits which can then be used to serialize an instance of a struct into a variety of formats (JSON, YAML, XML, etc) and also deserialize into an instance from any of these. As with all custom derives, those implementations are generated next to the actual struct definition.
You might already have an idea of where I am going with this ...
Let's recap:
In a hexagonal architecture, we want our core
module to be independent of the other aspects of a system. If we are building an HTTP API, we don't want the code in our core
module to know about that. For example, if any of those types need to be serialized, the code for serializing them should not be within the core
module. It should be in an http
module or something like that. The stress test for whether this requirement is fulfilled is always: if I would extract the core
module in its own project without any of the other stuff being present, would it still compile? As soon as we start to derive Serialize
on any of our core types, this is no longer given unless we would add serde
as a dependency to that project.
Note:
serde
by itself is very well designed and is actually separated into the generic serialization library and concrete formats likejson
oryaml
so it might not be so bad to depend onserde
but I hope the examples still communicates the point I am trying to make.
What is the alternative?
We can obviously always go and implement the trait ourselves in whichever module we want:
// In our "core" module
struct Person {
name: String
}
// In our "http" module
use serde::Serialize;
use core::Person;
impl Serialize for Person {
...
}
This has two downsides:
- It is tedious and error prone to implement the serialization code yourself, especially if you want the exact same one that would already be provided by
serde
scustom-derive
. Also,serde
is very configurable, so it very likely that, even if you have special requirements for the serialization, it is probably gonna support it in a way so that you don't have to handroll your own implementation. - It actually doesn't survive our "stresstest" because of Rust's orphan rules. If we move
Person
to its own crate, neitherSerialize
norPerson
are local to the crate that hosts thehttp
module and hence, declaring this implementation will not compile.
Possible solutions
Let's try to workaround those to problems. In the end, we want to achieve the following:
- Having the host module of
Person
be free of any serialization code and the resulting dependencies - Being able to serialize an instance of
Person
in another module
Solution 1: Create a new type
Creating types is cheap in Rust thanks to zero-cost abstractions. We can therefore define a new type: HttpPerson
that lives in the http
module. This one will mirror the fields of Person
and derive Serialize
. The only thing we have to do is convert between HttpPerson
and Person
. Yeah!
Well, depending on how complex our real-world data structures are, this can be quite a tedious and also error prone task. Also, let's not forget the mental load (why are there two, seemingly identical structs?) that would come with such an approach. If you are writing software for a complex business domain, you shouldn't make the code any harder to understand than it already is. In his book Domain Driven Design, Eric Evans suggests several patterns on how to design software for complex domains. One important technique is to reduce the mental mapping between the business domain and the actual source code as much as possible. Having several data types that represent the same element of the business domain does not help with that, especially because we only introduced it because of a technical limitation of our tool.
Solution 2: Create a local wrapper type
Instead of creating a type that mirrors the implementation of Person
, we can create a generic Http<T>
struct:
struct Http<T>(T);
This one will be local to our crate and hence we are allowed to implementation Serialize
like this:
use serde::Serialize;
use core::Person; // Imagine `core` being a crate instead of just a module
impl Serialize for Http<Person> {
...
}
This avoids the need for a type that mirrors the structure of Person
but has the downside that we have to manually implement Serialize
again.
Conclusion
This is the stage where I am currently out of ideas on how to proceed. Both solutions are sub-optimal and hard to justify just for the sake of the "stresstest". Obviously, if the core
of your system already lives in another crate
, you will have to roll with one of those anyway but if the modules still live in the same crate, you might just bend the rules a little and roll with #[derive(Serialize, Deserialize)]
.
The 2nd solution is currently my favourite if I'd have to go for one. Mainly for it's cleanliness of not having to re-define Person
but also because I have the feeling, it should be possible (through changes to the language or other clever things one can do with Rust) to make Serialize
easier to implement. The limitation we are currently hitting there is that macros are processed very early in the compile phase, hence they only have access to the source code and cannot resolve symbols. Declaring #[derive(Serialize)]
will only receive the tokens of the declaration it is sitting on, which is the struct
definition of Person
in our case. I think it is therefore not possible to write a custom-derive that generates code based on some code somewhere else.
Extending custom-derives with symbol resolution
It would be nice if one could do like:
use serde::Serialize;
use core::Person;
// Imaginary syntax:
derive Serialize on Person;
and get access to the declaration of the Person
symbol in the implementation of the custom-derive
, no matter where it is actually defined. Orphan rules would still apply obviously. This would allow for some seriously powerful code generation and at the same time, keep concerns nicely separated.
Baking this into the custom-derive
feature is probably not such a good idea though since derive
is associated with annotating a struct
. More generally expressed, I'd like to have a way of doing meta-programming in Rust that has access to symbol resolution kind of like reflection in languages such as Java and C#. This would allow for generating code like Serialize
impls in a different module other than where the actual struct
is defined.
Lexical trait implementations
Another feature, although completely orthogonal, that would nicely fit into hexagonal architectures are lexical trait
impls. Currently, the Rust compiler enforces the so-called "orphan-rule" when it comes to trait implementations. Roughly summarized, it states that either the trait or the type that the trait is implemented on have to be local to current crate. This is to guarantee that no matter which crates are linked together, there is at maximum one implementation of a specific trait
on a certain struct
. This is because declaring the implementation of a trait
on a struct
is an element of a crate
that is "exported". In other words, any piece of code that depends on this crate can use this implementation. If one could lexically scope trait
implementations, the "orphan-rule" could be relaxed under certain circumstances. Imagine you could do the following:
use serde::Serialize; // Foreign trait
use core::Person; // Foreign struct
#[no_export]
impl Serialize for Person {
}
Or with a different syntax:
use serde::Serialize;
use core::Person;
pub(crate) impl Serialize for Person {
}
In the above scenario, Serialize
as-well as Person
are types foreign to the current crate. However, the impl
blocks are marked/annotated as private to the current crate. Hence, no code outside of the current crate is affected by this implementation because there is an unambiguous way of selecting which functionality should be called: invoking Serialize
within the current module will always use the local implementation. In a way, this would a form of specialisation.
Wrapping up
A hexagonal architecture allows for a clean separation of concerns within a codebase. At the current state, embracing such an architecture in a Rust code base to its fullest causes some friction with how certain things like custom-derive
s in the Rust ecosystem work. I am super excited about seeing Rust evolve and tackle problems like these!
Discussion
Comments or ideas?
Post them to the /r/rust post: https://www.reddit.com/r/rust/comments/bvrmaa/rusts_custom_derives_in_a_hexagonal_architecture/
-
http://www.dossier-andreas.net/software_architecture/ports_and_adapters.html ā©
-
Check out Uncle Bob's Clean Architecture talk for more on this topic. ā©