Designing a modular Discord bot framework

Oliver Oliver • Published 2 years ago Updated 5 months ago


Update 8 August 2023

While I am extremely proud of this solution, I ultimately decided against it and settled on separating the bots as containerised services. Nevertheless, the solution here is still valid and useful for a variety of situations, and so I'm leaving this post up in the hopes that you or someone you know will find it useful.

For the better part of the past month or so, I've been rewriting the bot(s) which power the official Brackeys Community Discord server. This post isn't so much a guide, as it is a vent. A window into my process of dealing with the ever-growing enterprise-level codebase I've signed myself up for, and the hurdles I've faced. Enjoy.

What is BrackeysBot?

BrackeysBot is the mitochondria of the Brackeys Discord server. The powerhouse of the cell, if you will. The current version of BrackeysBot is written in C# and powered by the Discord.Net wrapper.

Written, maintained, and hosted by the Admins of the server, the current version of the bot (v3) has been used since before I myself joined the server. It has stood the test of time and still occassionally gets pull requests. It's certainly capable of doing the job, but we've all come to realise a flaw.

Version 3 is a mess

v3 of BrackeysBot has been infamously dubbed as “broken bot” because the codebase has become unmanagable, erroneous, and no one dares to touch it. The last major pull request was one I wrote myself which fixed an exploit allowing users to trigger the bot to ping @everyone - which ultimately led to a raid. Prior to that, the last PR was in November by one of the Gurus which fixed links being posted in a collaboration embed.

Recently, the code has been in a position where the only reason we touch it is to fix bugs. Not to introduce new features or update the .NET version or - hell - not even to update the Discord.Net version it references. The current version of the bot doesn't even work in threads, because it references a version of Discord.Net that was released prior to threads existing.

On top of that, a number of times now we've found the bot latency so high that we attempt to []warn or []mute members, and the bot has just failed to respond in time because of what I can only assume is the rate limit being hit. This could be fixable, but no one wants to even look at the code in v3.

A rewrite has been planned for quite some time now. In fact a few people have attempted a from-scratch rewrite in various languages and frameworks. But for one reason or another they all halted their development. Life gets in the way, or people lose interest, perfectly reasonable. So I decided to step in and take a stab at it - and that leads me to today.

Deciding on a wrapper

Let the record show that I despise Discord.Net. It is an interface nightmare. You have things like IUser, IGuildUser, SocketUser, RestUser, and it's just so much needless separation. I get it, they are big on inheritance over composition, but this level of masochistic extremism is frankly unwarranted. Decision 1: Don't use Discord.Net.

Update 1 May 2024

Ah the naïveté of my former self. I've actually been preferring Discord.Net lately because the numerous amount of interfaces make mock testing not only easy, but actually possible. I'm so sorry I doubted you, Discord.Net, you're actually pretty decent.

For a while, I was using DisCatSharp. This was formerly known as DSharpPlus.NextGen - it is a fork of DSharpPlus, which itself was inspired by DiscordSharp. The benefit it has over DSharpPlus is that it aims to implement the newest bleeding-edge endpoints and response objects that the Discord API offers, and for the longest time it was my go-to wrapper, until I realised something that was a dealbreaker.

DisCatSharp does not like exceptions. Not one bit. The way it handles async is shockingly bad, so much so that you can encounter deadlocks without any warning. I was encountering issues and receiving zero feedback, because any exception that got thrown was immediately swallowed. I had to manually go in and try-catch the ever-loving shit out of everything to narrow down the issue. This is what fucking torture feels like. Give me a stack trace, dammit.

Okay, okay, so DisCatSharp got me frustrated. The only sensible option is DSharpPlus. It is not an interface-nightmare, nor is it a deadlock-nightmare. It might not have bleeding-edge features, but that's fine if it means I can at least have stable features. After switching to DSharpPlus, I was encountering errors that I never knew existed. I have managed to patch code that I didn't even know was erroneous prior to this, because - like I said - DisCatSharp gave me zero feedback. Zero stack trace. I was never aware of any exceptions being thrown.

After chatting with one of the maintainers of DSharpPlus, I can see why they hate DisCatSharp. But that is a story for another time.

All of this to say, I decided to use DSharpPlus.

Segregating functionality

One of the ideas driving my rewrite is the ability to separate functionality into multiple bots - rather than having one bot for everything. My motivation was to keep the codebase and repositories clean, organised, and decoupled; this also prevents the rate limit from inevitably being hit as things expand, as one bot currently requests literally everything.

BrackeysBot v3 is responsible for:

  • Moderation (warning, muting, kicking, banning)
  • Self-role assignment
  • LaTeX rendering with the []math command
  • Filtering COC-violating messages
  • “Custom commands” (aka macros)
  • Game jam timestamp management
  • Endorsements for Guru role
  • Listings in the collaboration section
  • Managing and display rules
  • Codeblock detection with PasteMyst integration
  • Code formatting
  • Slowmode

Yes. All that. With one bot. So I aim to split all of these into several single-purpose bots, each responsible for one job and one job only. This also has the added benefit that if one bot needs a feature hotfix for whatever reason, it can go down without affecting the others. If, for example, an exploit is found in the bot which renders LaTeX expressions - it can temporarily go down while a patch is written; while keeping moderation (and other) features available.

However this came with an interesting design decision problem that I found myself having to solve:

Inter-bot communication

It quickly became apparent that although splitting the codebase into single-purpose bots solves a wide array of problems, it introduces another: dependency requirements.

The userinfo command should ideally be implemented in the Core bot, but it also needs to display infraction count and endorsements - which will be managed elsewhere. We'd also need some bots to know if members are muted or have past infractions. For example: it'll be necessary for the bot which handles collaboration listings to interface with the moderation bot because if a user is muted, it shouldn't allow the user to post a listing and essentially allowing the user to bypass their mute. So how would I go about doing that? How can I have bots separated, completely in their own repositories, while still being able to inter-communicate with each other seemlessly?

Idea 1: Shared database

The first idea that was posed to me was to have one database, likely MySQL or SQL server, which every bot could access. The moderation bot could add or update infractions and member-muted states in this database, and other bots could simply query this database to verify that users satisfy some requirements before performing an action.

I pondered this idea for a while and ultimately decided that while this idea sounds nice in theory, and would completely solve the problem of inter-communication, it does mean having to create duplicate models throughout the codebase. Every bot which needs to know about user infractions would need to have a copy of the Infraction class so that it can map it within an EF DbContext. And before you say “just use ADO” - no. Get out.

It also means having every bot keep an up-to-date configuration file with the correct authentication details for the database, and if you change one - you must change them all. That is so overly cumbersome and would defeat the point of making this rewrite more “maintainable”

Doing this also has us violating a golden rule: the consumer of an API should never be able to access the implementation layer. Not for a second. This opens up the potential for irreversible destructive actions.

Idea 2: Sockets

The second idea that came to me was to have bots communicate through sockets. This way, the moderation bot could keep its own local SQLite database and another bot could send a request, which the moderation bot responds to with a JSON or binary serialised list of infractions over a network.

This is more ideal as it means only one bot - the moderation bot - needs to know about the infraction database. But I quickly realised this would mean having to write an entire networking implementation which needs to support encrypted message sending. Now, I could do this, I've done it before, but I simply refuse to put so much effort into something like this. BrackeysBot is something I work on in my own time, free of charge. If I were being paid to do this, then perhaps - but as it stands today, I am not going to put in the effort to go so HAM with this rewrite when I could spend my time working on the actual features.

At this point, I hit a minor roadblock. I wasn't sure how I'd approach this problem of decoupling while maintaining the possibility to inter-communicate.

And then I realised, the problem had already been solved before by none other than…

Bukkit / Spigot / Paper

Bukkit (which later forked into Spigot, which itself forked into Paper) is a Minecraft server mod which introduces server-side plugin support. These plugins - in the form of compiled .jar artifacts - allow developers to introduce functionality that doesn't exist in vanilla servers. They are dumped into the plugins directory relative to the server, and these plugins are dynamically loaded when the server starts. The API allows plugins to register their own commands and event listeners, and can do anything from displaying the time in a human-readable form, to grief protection, to a literal cannon that fires kittens.

I have written a plugin loader akin to Bukkit in the past. .NET with its powerful reflection API allows you to load managed assemblies (in the form of .dll artifacts) at runtime, instantiate classes, manipulate them. So, this is the approach I settled on.

BrackeysBot v4 - at least, my vision of it - has become a glorified plugin loader. Instead of separating every single bot into its own application, there will only be one application. This application will load plugins, and its the plugins themselves that represent bot instances. These plugins all reference an API, similar to how Bukkit plugins reference the Bukkit API, and they inherit from a class I have dubbed MonoPlugin. This class I wrote offers things that plugins can interface with, such as their own dedicated DiscordClient, as well as a Logger, Configuration, ServiceProvider, etc.

In about a day, I had written a pretty solid plugin loader. It even supports dependencies! As an example, the moderation plugin - which I have named Hammer - depends on some functionality provided by the Core plugin. Hammer can reference the BrackeysBot.Core.API NuGet, which offers the publicly-consumable Core API, and it can tap into the methods that it needs to.

But there is one issue: we need to instruct the application to load the Core plugin before Hammer. The methods it calls need to actually exist, they need to be implemented. Hammer could load before Core, causing Hammer to throw exceptions because the necessary classes don't exist. Though this won't actually happen in production (the plugin loader enumerates the plugin files in lexicographical order), it still needs to guarantee that Core has been detected and loaded in full, prior to Hammer being loaded.

Well, now this can be achieved easily with PluginDependenciesAttribute! This attribute lets you specify a list of plugin names that BrackeysBot needs to initialise and ensure is loaded, prior to this plugin. And so, the signature for the HammerPlugin class is as follows:

[Plugin("Hammer", "1.0.0")]
[PluginDependencies("BrackeysBot.Core")]
[PluginDescription("A BrackeysBot plugin for managing infractions against misbehaving users.")]
[PluginIntents(DiscordIntents.All)]
public sealed class HammerPlugin : MonoPlugin, IHammerPlugin

It's elegant, it's easy to maintain, and it works. If you're interested in seeing how it works, feel free to browse the source for the ironically-named SimplePluginManager class. I have tried my best to account for any and all situations, but please - if you notice a bug or a mistake - do not hesitate to open a pull request and contribute. This bot will eventually replace the current BrackeysBot, and it needs to be stable enough to do so in a server with over 90 thousand members. This must be done right.

Anyway, enough of that tangent, let's get back on topic.

A roadblock with application domains

At first, I was simply using Assembly.LoadFrom/File to load the plugin assemblies into the current app domain. However I soon realised this solution would not be sufficient - I need the possibility for plugins to be unloaded. This was, after all, the whole point of writing a plugin system. They need to be hot-swappable, reloadable, updatable, without affecting every other plugin and ultimately every other bot.

You see, once an assembly has been loaded into an app domain, it cannot be unloaded. The assembly is now in memory and there it shall stay until the domain as a whole has been unloaded - which in this case, happens when the entire runtime environment exits. This wouldn't do. I cannot have a system in which the only way to reload plugins is to reload the entire application - that would defeat the point.

In my past experience with writing a plugin loader, I was able to spin up a new app domain for each plugin. This way, plugins and their dependencies get sandboxed into their own domains. When the plugin is unloaded, the domain it belongs to gets unloaded - leaving other domains in tact.

I began to trial this system, and quickly came to learn that - for whatever fucking reason I still don't understand today - .NET Core removed the ability to create new app domains. You now only have one app domain to play with, and that is the one app domain provided by the environment itself. As a side note, this is why Unity has yet to update to .NET Core! Unity relies on the ability to spin up and unload app domains as and when it pleases, as this is how it handles game state being reset when you stop playing in the editor. With the removal of this very important feature in .NET Core, I now fully understand why they are stuck on .NET Framework. They have no choice… kind of.

That article, which outlines why app domains are no longer supported, does in fact mention a suitable alternative. An AssemblyLoadContext. As it turns out, this was the answer I needed. A load context can be created dynamically, assemblies can be loaded into them, and the context can be unloaded when we no longer need it (i.e. when a plugin is unloaded.) And so, that is the solution I used. When a plugin is loaded, a new load context is created and the plugin file is loaded from a stream.

var context = new AssemblyLoadContext(name, true);
using FileStream stream = file.Open(FileMode.Open, FileAccess.Read);
Assembly assembly = context.LoadFromStream(stream);

Now if you've taken a look at the documentation for AssemblyLoadContext, you might have noticed it offers a method called LoadFromAssemblyPath. So you might be wondering, why am I loading the stream myself, and calling LoadFromStream instead of just loading it by path?

I'll tell you why - because LoadFromAssemblyPath places a lock onto the file which isn't released until the environment exits. That problem mortified me for hours. All I could think was, “how could I possibly have a modular plugin system which refuses to let you update a plugin file in the event of an important hotfix, if that file is locked by the environment anyway??” It was a frustrating experience, to say the least. Alas I resorted to loading the stream myself, and calling LoadFromStream - ensuring that I have a using statement so that the stream is disposed when the plugin has finished loading. This releases the file lock, allowing you to update the dll file should it ever need updating.

I have to admit, this system is incredibly complex. Don't get me wrong, the level of fuckery I had to deal with to get it to work was insane. But it works and I'm proud of it. The Core plugin even allows you easily reload a plugin! This unloads the plugin completely, loads the new dll, and re-enables it so that the bot contained within it comes back online.

Seriously. How cool is that?

So this was the story of how BrackeysBot v4 has essentially become my job. An unpaid job. A job I'm putting way too much time into. I feel like an intern who's investing in more effort than the return reward is worth - but there is one thing that keeps me motivated to continue working on it.

One thing:

I don't want to touch v3 again.




1 legacy comment

Legacy comments are comments that were posted using a commenting system that I no longer use. This exists for posterity.

Job Job • 2 years ago

BBv4 Release when

But realistically, amazing solution, thank you for the time and work put into this.