Split Queries in EF
Problem
Imagine you are trying to get a big aggregate (big entity), from the database. With Entity framework it’s pretty easy, you just write dbcontext.myEntity and after that start with .Include and another include, and some more includes, and a .ThenInclude, and in the end, you’ll get your big entity read from the database. This sounds great, right? The query needed it’s generated by ef, so you don’t need to worry, (again great) and it would be more or less a select with all the properties needed having a lot of let joins and inner joins; now it starts not to sound so great, since you know that this might give you headaches from the point of view of the performance. In reality, you’ll get as a result, the cartesian product, as the “myEntity” will be duplicated for every include / thenInclude you have in there. (Basically, when “one to many” relations are loaded the “one” data will get duplicated for each row of the “many data”) This is very bad from the memory perspective and even if you’re running in the cloud, believe me, you’ll get into trouble eventually.
Solution
The best way would be NOT TO DO THAT! If you have performance problems, most likely your DDD idea is wrong, and your aggregate is way to big and you should split it!
Most likely you won’t do that, because if you are reading this, you most likely have issues with the production environment and you don’t have the time needed to do that, and you need a fast and easy solution. For this, you can use the “AsSplitQuery” functionality that appeared from .NET 5.
At that time it worked only with “include” but now in .NET6 you can use with any kind of projections.
Under the hood, the EF will split your LINQ query into multiple SQL queries, try to have as few JOINS as it can, and you’ll get in the end a separate delete for each include that you have.
If you don’t want to add it to a specific query, and you want to use this “technique”, on your entire app, you can do that also by enabling this globally.
Of course, you can enable it globally and in the places in which you don’t want it, you can “disable” it locally, by using the opposite of this, which is “AsSingleQuery()” and it can be used in the same way.
So far, everything sounds great, right? You might want to use it everywhere, but be careful, because it has a few tricks
1. If you are using pagination (skip and take), be sure you are making your LINQ unique, from the ordering perspective. By not doing that, you may get incorrect data. If you are ordering only by date, and your results might have the same date, then, one of the selects generated, might bring you the wrong data.
2. If you have only “one-to-one” relations it won’t help you at all from the performance point of view. This kind of relations between entities always gets loaded only using JOINs.
3. If your database is updated concurrently you won’t get any consistency. While you get the benefit of better performance, you might get inconsistent data, if it gets changed “between the selects”, while they are being executed. This won’t happen if you’re using only “one big select”.
4. For each query, you’ll get another trip to the database, which will cause additional network traffic.
5. Win some, lose some: Don’t use it directly in production! You might get the performance you want, if you have a lot of data, because you, most likely have that chaos because of the cartesian product, but if you have only a few data, or not really a lot of “includes”, you might use even more memory as before, if you just…use it. Think about having 6 queries; you executed 4 and you have another 2 that are next. The results of those 4, are in your application memory, waiting for the other 2 queries to bring their results, and make the mapping for your “final result”.
It’s a great thing, but “great power comes with great responsibilities”, so don’t use it everywhere, and be careful while doing it!