Activity functions are run at least once
The thing ignored by everyone
Things you should know beforehand
Durable Functions is an extension of Azure Functions that makes them stateful. In more simple words, durable functions are used for stateful orchestration of some function execution.
Types of functions
Our problem
Sometimes things are called twice.
In most of the implementations of durable functions with orchestrators, the implementation looks like this: you’ve got an orchestrator that starts more activities, or some activities and a suborchestration. In theory, you can start a list of suborchestrations, but I haven’t seen this scenario in a real-life production project yet.
Most of the time the problem is from the caller and he is doing 2 requests, the orchestrator called twice and everything goes to hell (devils dancing, demons laughing and so on). This has an easy solution: blame the caller, and push him to do a fix, because you won’t process the second call, or you don’t care about his data. If you are a good guy you could implement idempotency or try at least to do it. Either way, if you are or aren’t one, it’s not your fault and the caller should not send you crazy stuff.
The thing that is a little bit rare and is the topic of our discussion today is when the activities are called twice (or more times). Yes, it can really happen. I saw it 3 times in the last 2 years (in different products). The first time was really a big issue and our activities multiplied 2x-5x. Maybe you think it won’t happen to you, or that you have never seen this and it’s BS, or this happens the same number of times like a duplicate GUID. Well, in my opinion, this is a real threat and you should really think about this possibility whenever you write an activity.
In Microsoft documentation it’s stated pretty clearly: “The Durable Task Framework guarantees that each called activity function will be executed at least once during an orchestration’s execution.”
Solution
There’s none because this is how the framework should behave! It works as designed! Another good quote from the Microsoft documentation: “Because activity functions only guarantee at least once execution, we recommend you make your activity function logic idempotent whenever possible.”
Personal experience
From what I’ve seen so far, this never happens if you have the Storage account at a decent level, aka, it’s not crazy big.
The first time when I saw this, our storage account had a little bit over 1 terabyte, the second time I remembered that the storage table had over 325GB and the third time, over 250GB.
Maybe you’ll say “Heeellll that’s a lot!” Yeah, maybe, I can’t argue with that, but now it depends on what you’re saving in there, if you use the blob or not, if you use tables or not and so on. If you aren’t using anything at all and the tables are only the default ones for the durable framework (to keep its state), maybe 20–30GB is a decent number, but even this one depends on how old, used, and big is your “thing”.
Do not forget the input/output can be saved in the blob if it’s too big, and no one really cleans that one out! Again, you might say yeah…but how big can that be? Well, I’ve seen 20MB files as input, and not the size of the file was the problem, but the number of them! If you have 1000 files of 20 MB? You do the math.
Personal suggestion
Based on my experience with this, my suggestion would be to clean up your function. For sure you won’t need orchestrations older than 1–3 years (based on your legal agreements). I’ve written another article in which you have exactly this, as an example: how to delete older orchestrations than a specific number of months.
If your function grows too big, think of it as a normal service, and think about how you could refactor it and split it into multiple functions. Keep it small, simple and easy. A service (microservice?) shouldn’t do more than one thing, it’s the same for functions.
Under the hood
Activity Triggers internally listen on a queue that is used by the durable functions framework. When an activity function completes, its output is written to the history table. Like a normal queue-triggered function, if the first activity function gets stuck before renewing a second one will get scheduled. It can actually get scheduled even while scaling. If your functions scale up, a second activity function can get scheduled if the first one hasn't been renewed yet.
This is actually why the framework guarantees you “at least once” executions.
Last words
If you really use serverless, you’ll see this behaviour, maybe not in the first years, in 5–6–7 years, but you’ll get into this as well. Do not ignore this and do not just…write code. The benefit of activities is that the response is “cached”, or actually in reality saved in the storage, in the orchestration won’t retry, or do again the activities that were completed with success.
Think about a scenario in which you want to grab something from the database and send an email. If you have 2 different activities, you might get (rare) cases in which the email will be sent multiple times. You might not care, or you might want to keep the entire logic in only one activity. Or even go more hardcore, and put this logic directly in the orchestrator! There is no silver bullet, it depends on your scenario, every time. Yeah, you need to think and plan before you write code, but if you read this article until this point, probably it doesn’t hurt that much :)