Posts tagged Entity Framework

DbContextScope now available as a nuget package

The DbContextScope library for Entity Framework is now available as a nuget package. Thanks to Tieson Trowbridge for putting it together!

Managing DbContext the right way with Entity Framework 6: an in-depth guide

UPDATE: the source code for DbContextScope is now available on GitHub: DbContextScope on GitHub.

A bit of context

This isn't the first post that has been written about managing the DbContext lifetime in Entity Framework-based applications. In fact, there is no shortage of articles discussing this topic.

For many applications, the solutions presented in those articles (which generally revolve around using a DI container to inject DbContext instances with a PerWebRequest lifetime) will work just fine. They also have the merit of being very simple - at least at first sight.

For certain types of applications however, the inherent limitations of these approaches pose problems. To the point that certain features become impossible to implement or require to resort to increasingly complex structures or increasingly ugly hacks to work around the way the DbContext instances are created and managed.

Here is for example an overview of the real-world application that prompted me to re-think the way we managed our DbContext instances:

  • The application is comprised of multiple web applications built with ASP.NET MVC and WebAPI. It also includes many background services implemented as console apps and Windows Services, including a home grown task scheduler service and multiple services that process messages from MSMQ and RabbitMQ queues. Most of the articles I linked to above make the assumption that all services will execute within the context of a web request. This is not the case here.
  • It stores and reads data to / from multiple databases, including a main database, a secondary database, a reporting database and a logging database. Its domain model is separated into several independent groups, each with their own DbContext type. Any approach assuming a single DbContext type won't work here.
  • It relies heavily on third-party remote APIs, such as the Facebook, Twitter or LinkedIn APIs. These aren't transactional. Many user actions require us to make multiple remote API calls before we can return a result to the user. Many of the articles I linked to make the assumption that "1 web request = 1 business transaction" that either gets committed or rolled back in an atomic manner (hence the idea of using a PerWebRequest-scope DbContext instance). This clearly doesn't apply here. Just because one remote API call failed doesn't mean that we can auto-magically "rollback" the results of any remote API call we may be done prior to the failed one (e.g. when you've used the Facebook API to post a status update on Facebook, you can't roll it back even if that operation was part of a wider user action that eventually failed as a whole). So in this application, a user action will often require us to execute multiple business transactions, which must be independently persisted. (you may argue that there might be ways to redesign the whole system to avoid finding ourselves in this sort of situation. And maybe there are. But that's how the application was originally designed, it works very well and that's what we have to work with).
  • Many services are heavily parallelized, either by taking advantage of async I/O or (more often) by simply distributing tasks across multiple threads via the TPL's Task.Run() or Parallel.Invoke() methods. So the way we manage our DbContext instances must play well with multi-threading and parallel programming in general. Most of the common approaches suggested to manage DbContext instances don't work at all in this scenario.

In this post, I'll go in depth into the various moving parts that are involved in DbContext lifetime management. We'll look at the pros and cons of several strategies commonly used to solve this problem. Finally, we'll look in details at one strategy (among others) to manage the DbContext lifetime that addresses all the challenges presented above and that should work for most applications regardless of their complexity.

There is of course no such thing as one-size-fits-all. But by the end of this post, you should have all the tools and knowledge you need to make an informed decision for your specific application.

Like most posts on this blog, this post is on the long and detailed side. It might take a while to read and digest. For an Entity Framework-based application, the strategy you choose to use to manage the lifetime of the DbContext will be one of the most important decisions you make. It will have a major impact on the correctness, maintainability and scalability of your application. So it's well worth taking some time to choose your strategy carefully and not rush into it.

A note on terminology

In this post, I'll often refer to the term "services". What I mean by that is not remote services (REST or otherwise). Instead, what I'm referring to is what is often called Service Objects. That is: the place where your business logic is implemented - the objects responsible for executing your business rules and defining your business transaction boundaries.

Of course, depending on the design patterns that were used to create the architecture of your application (and depending on the imagination of whoever designed it - software developers are an imaginative bunch), your code base might be using different names for this. So what I call a "service" might very well be called a "workflow", an "orchestrator", an "executor", an "interactor", a "command", a "handler" or a variety of other names in your application.

Not to mention that many application don't have a well-defined place where business logic is implemented and rely instead on implementing (and often duplicating) business logic on an ad-hoc basis where and when needed, e.g. in controllers in an MVC application.

But none of this matters for this discussion. Whenever I say "service", read: "the place that implements the business logic", be it a random controller method or a well-defined service class in a separate service layer.

Key points to consider

When coming up with or evaluating a DbContext lifetime management strategy, it's important to keep in mind the key scenarios and functionalities that it must support.

Here are a few points that I would consider to be essential for most applications.

Your services must be in control of the business transaction boundary (but not necessarily in control of the DbContext instance lifetime)

Perhaps the main source of confusion when it comes to managing DbContext instances is understanding the difference between the lifetime of a DbContext instance and the lifetime of a business transaction and how they relate.

DbContext implements the Unit of Work pattern:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

In practice, as you use a DbContext instance to load, update, add and delete persistent entities, the instance keeps track of those changes in memory. It doesn't however persist those changes to the underlying database until you call its SaveChanges() method.

A service method, as defined above, is responsible for defining the boundary of a business transaction.

The practical consequence of this is that:

  • A service method must use the same DbContext instance throughout the duration of a business transaction. This is so that all the changes made to your persistent model are tracked and either committed to the underlying data store or rolled back in an atomic manner.
  • Your services must be the sole components in your application responsible for calling the DbContext.SaveChanges() method at the end of a business transaction. Should other parts of the application call the SaveChanges() method (e.g. repository methods), you will end up with partially committed changes, leaving your data in an inconsistent state.
  • The SaveChanges() method must be called exactly once at the end of each business transaction. Inadvertently calling this method in the middle of a business transaction may leave the system with inconsistent, partially committed changes.

A DbContext instance can however span across multiple (sequential) business transactions. Once a business transaction has completed and has called the DbContext.SaveChanges() method to persist all the changes it made, it's entirely possible to just re-use the same DbContext instance for the next business transaction.

I.e. the lifetime of a DbContext instance is not necessarily bound to the lifetime of a single business transaction.

Pros and cons of managing the DbContext instance lifetime independently of the business transaction lifetime.

Example

A very common scenario where the lifetime of the DbContext instance can be maintained independently from the lifetime of business transactions is in the case of web applications. It's quite common to a use a configuration where a DbContext instance is created at the beginning of each web request, used by all the services invoked during the execution of the web request and eventually disposed of at the end of the request.

Pros

There are two main reasons why you would want to decouple the lifetime of the DbContext instance from the business transaction lifetime.

  • Possible performance gains. Each DbContext instance maintains a first-level cache of all the entities its loads from the database. Whenever you query an entity by its primary key, the DbContext will first attempt to retrieve it from its first-level cache before defaulting to querying it from the database. Depending on your data query pattern, re-using the same DbContext across multiple sequential business transactions may result in a fewer database queries being made thanks to the DbContext first-level cache.
  • It enables lazy-loading. If your services return persistent entities (as opposed to returning view models or other sorts of DTOs) and you'd like to take advantage of lazy-loading on those entities, the lifetime of the DbContext instance from which those entities were retrieved must extend beyond the scope of the business transaction. If the service method disposed the DbContext instance it used before returning, any attempt to lazy-load properties on the returned entities would fail (whether or not using lazy-loading is a good idea is a different debate altogether which we won't get into here). In our web application example, lazy-loading would typically be used in controller action methods on entities returned by a separate service layer. In that case, the DbContext instance that was used by the service method to load these entities would need to remain alive for the duration of the web request (or at the very least until the action method has completed).

Issues with keeping the DbContext alive beyond the scope of a business transaction

While it can be fine to re-use a DbContext across multiple business transactions, its lifetime should still be kept short. Its first-level cache will become eventually become stale, which will lead to concurrency issues. If your application uses optimistic concurrency this will result in business transactions failing with a DbUpdateConcurrencyException. Using an instance-per-web-request lifetime for your DbContext in web apps will usually be fine as a web request is short-lived by nature. But using an instance-per-form lifetime in a desktop application, which you'll often find suggested, is a lot more questionable and requires careful thought before being adopted.

Note that you can't re-use the same DbContext instance across multiple business transactions if you rely on pessimistic concurrency. Correctly implementing pessimistic concurrency involves keeping a database transaction with the correct isolation level open for the whole lifetime of a DbContext instance, which would prevent committing or rolling back individual business transactions independently.

Re-using the same DbContext instance for more than one business transaction can also lead to disastrous bugs where a service method accidently commits the changes from a previously failed business transaction.

Finally, managing your DbContext instance lifetime outside of your services tends to tie your application to a specific infrastructure, making it a lot less flexible and much more difficult to evolve and maintain in the long run.

For example, for an application that starts off as a simple web application and relies an instance-per-web-request strategy to manage the lifetime of its DbContext instances, it's easy to fall into the trap of relying on lazy-loading in controllers or views or on passing persistent entities across service methods on the assumption that they will all use the same DbContext instance behind the scenes. When the need to introduce multi-threading or move operations to background Windows Services inevitably arises, this carefully constructed sand castle often collapses as there are no more web requests to bind DbContext instances to.

As a result, it's advisable to avoid managing the lifetime of DbContext instances separately from business transactions. Instead, each service method (i.e. each business transaction) should create its own DbContext instance and dispose it at the end of the business transaction (i.e. before returning).

This precludes using lazy-loading outside of services (which can be addressed by modeling your domain using DDD or by getting services to return DTOs instead of persistent entities) and poses a few other constraints (e.g. you shouldn't pass persistent entities into a service method as they won't be attached to the DbContext instance that the service will use). But it brings a lot of long-term benefits for the flexibility and maintenance of the application.

Your services must be in control of the database transaction scope and isolation level

If your application works against an RDMS that provides ACID properties for its transactions (and if you're using Entity Framework, you almost certainly are), it's essential for your services to be in control of the database transaction scope and isolation level. You can't write correct code otherwise.

As we'll see later, Entity Framework wraps all write operations within an explicit database transaction by default. Coupled with a READ COMMITTED isolation level - the default on SQL Server - this suits the needs of most business transactions. This is especially the case if you rely on optimistic concurrency to detect and avoid conflicting updates.

Most applications however will still occasionally need to use other isolation levels for specific operations.

It's very common for example to execute reporting queries where you have determined that dirty reads aren't an issue under a READ UNCOMMITTED isolation level in order to eliminate lock contention with other queries (although if your environment allows it, you'll probably want to use READ COMMITTED SNAPSHOT instead).

And some business rules might require the use the REPEATABLE READ or even SERIALIZABLE isolation levels (especially if your application uses pessimistic concurrency control). In which case the service will need to have explicit control over the transaction scope.

The way your DbContext is managed should be independent of the architecture of the application

The architecture of a software system and the design patterns it relies on always evolve over time to adapt to new constraints, business requirements and increasing load.

You don't want the strategy you choose to manage the lifetime of your DbContext to tie you to a specific architecture and prevent you from being able to evolve it as and when needed.

The way your DbContext is managed should be independent of the application type

While most applications today start off as web applications, the strategy you choose to manage the lifetime of your DbContext shouldn't assume that your service method will be called from within the context a web request. More generally, your service layer (if you have one) should be independent of the type of application it's used from.

It won't be long until you need to create command-line utilities for your support team to execute ad-hoc maintenance tasks or Windows Services to handle scheduled tasks and long-running background operations. When this happens, you want to be able to reference the assembly that contains your services and just use any service you need from your console or Windows Service application. You most definitely don't want to have to completely re-engineer the way your DbContext instances are managed just to be able to use your services from a different type of application.

Your DbContext management strategy should support multiple DbContext-derived types

If your application needs to connect to multiple databases (for example if it uses separate reporting, logging and / or auditing databases) or if you have split your domain model into multiple aggregate groups, you will have to manage multiple DbContext-derived types.

For those coming from an NHibernate background, this is the equivalent of having to manage multiple SessionFactory instances.

Whatever strategy you choose should be able to let services use the appropriate DbContext for their need.

Your DbContext management strategy should work with EF6's async workflow

In .NET 4.5, ADO.NET introduced (at very long last) support for async database queries. Async support was then included in Entity Framework 6, allowing you to use a fully async workflow for all read and write queries made through EF.

Needless to say that whatever system you use to manage your DbContext instance must play well with Entity Framework's async features.

DbContext's default behaviour

In general, DbContext's default behaviour can be described as: "does the right thing by default".

There are several key behaviours of Entity Framework you should always keep in mind however. This list documents EF's behaviour when working against SQL Server. There might be differences when using other data stores.

DbContext is not thread-safe

You must never access your DbContext-derived instance from multiple threads simultaneously. This might result on multiple queries being sent concurrently over the same database connection. It will also corrupt the first-level cache that DbContext maintains to offer its Identity Map, change tracking and Unit of Work functionalities.

In a multi-threaded application, you must create and use a separate instance of your DbContext-derived class in each thread.

So if DbContext isn't thread-safe, how can it support the async query features introduced with EF6? Simply by preventing more than one async operation being executed at any given time (as documented in the Entity Framework specifications for its async pattern support). If you attempt to execute multiple actions on the same DbContext instance in parallel, for example by kicking off multiple SELECT queries in parallel via the the DbSet<T>.ToListAsync() method, you will get a NotSupportedException with the following message:

A second operation started on this context before a previous asynchronous operation completed. Use 'await' to ensure that any asynchronous operations have completed before calling another method on this context. Any instance members are not guaranteed to be thread safe.

Entity Framework's async features are there to support an asynchronous programming model, not to enable parallelism.

Changes are only persisted when SaveChanges() is called

Any changes made to your entities, be it updates, inserts or deletes, are only persisted to the database when the DbContext.SaveChanges() method is called. If a DbContext instance is disposed before its SaveChanges() method was called, none of the inserts, updates or deletes done through this DbContext will be persisted to the underlying data store.

The canonical manner to implement a business transaction with Entity Framework is therefore:

using (var context = new MyDbContext(ConnectionString))  
{
    /* 
     * Business logic here. Add, update, delete data
     * through the 'context'.
     * 
     * Throw in case of any error to roll back all 
     * changes.
     * 
     * Do not call SaveChanges() until the business
     * transaction is complete - i.e. no partial or 
     * intermediate saves. SaveChanges() must be 
     * called exactly once per business transaction.
     *
     * If you find yourself needing to call SaveChanges() 
     * multiple times within a business transaction, it means
     * that you are in fact implementing multiple business 
     * transactions within a single service method. 
     * This is the perfect recipe for disaster. Clients of
     * your service class will naturally assume that your 
     * service method will either commit or roll-back all
     * changes in an atomic manner when it might in fact
     * end up doing a partial roll-back, leaving the system
     * in an inconsistent state.
     *
     * In this case, refactor your service method into 
     * multiple service methods that each implement once
     * and exactly one business transaction. 
     */
     [...]

    // Complete the business transaction
    // and persist all changes.
    context.SaveChanges(); 

    // Changes cannot be rolled back after this point. 
    // context.SaveChanges() should be the last statement
    // of any business transaction.
}

A side note for NHibernate veterans

If you're coming from an NHibernate background, the way Entity Framework persists changes to the database is one of the major differences between EF and NHibernate.

In NHibernate, the Session operates by default in AutoFlush mode. In this mode, the Session will automatically persists all changes made to entities to the database before executing any 'select' query, ensuring consistency between the persisted entities and their in-memory state within the context of a Session. Entity Framework's default behaviour is the equivalent of setting Session.FlushMode to Never in NHibernate.

This EF behaviour can result in subtle bugs as it is possible to be in a situation where queries may unexpectedly return stale or incorrect data. This wouldn't be possible with NHibernate's default behaviour. On the other side, it dramatically simplifies the issue of database transaction lifetime management.

One of the trickiest issue in NHibernate is to correctly manage the database transaction lifetime. Since NHibernate's Session can persists outstanding changes to the database automatically at any time throughout its lifetime and may do so multiple times within a single business transaction, there is no single, well-defined point or method where to start the database transaction to ensure that all changes are either committed or rolled-back in an atomic manner.

The only reliable method to correctly manage the database transaction lifetime with NHibernate is to wrap all your service methods in an explicit database transaction. This is what you'll see done in pretty much every NHibernate-based application.

A side-effect of this approach is that it requires keeping a database connection and transaction open for often longer than strictly necessary. It therefore increases database lock contention and the probability of database deadlocks occurring. It's also very easy for a developer to inadvertently execute a long-running computation or a remote service call without realizing or even knowing that they're within the context of an open database transaction.

With the EF approach, only the SaveChanges() method must be wrapped in an explicit database transaction (unless you need a REPEATABLE READ or SERIALIZABLE isolation level of course), ensuring that the database connection and transaction are kept as short-lived as possible.

Reads are executed within an AutoCommit transaction

DbContext doesn't start explicit database transactions for read queries. It instead relies on SQL Server's Autocommit Transactions (or Implicit Transactions if you've enabled them but that would be a relatively unusual setup). Autocommit (or Implicit) transactions will use whatever default transaction isolation level the database engine has been configured to use (READ COMMITTED by default for SQL Server).

If you've been around the block for a while, and particularly if you've used NHibernate before, you may have heard that AutoCommit (or Implicit) transactions are bad. And indeed, relying on Autocommit transactions for writes can have a disastrous impact on performance.

The story is very different for reads however. As you can see by yourself by running the SQL script below, neither Autocommit nor Implicit transactions have any significant performance impact for SELECT statements.

/* 
 * Execute 100,000 SELECT queries under autocommit, 
 * implicit and explicit database transactions. 
 * 
 * These scripts assumes that the database they are 
 * running against contains a Users table with an 'Id' 
 * column of data type INT.
 * 
 * If running from SQL Server Management Studio,
 * right-click in the query window, go to 
 * Query Options -> Results and tick "Discard results
 * after execution". Otherwise, what you'll be measuring
 * will be the Result Grid redrawing performance and not
 * the query execution time.
 */


---------------------------------------------------
-- Autocommit transaction
-- 6 seconds
DECLARE @i INT  
SET @i = 0

WHILE @i < 100000  
    BEGIN 
        SELECT  Id
        FROM    dbo.Users
        WHERE   Id = @i
        SET @i = @i + 1
    END

---------------------------------------------------
-- Implicit transaction
-- 6 seconds
SET IMPLICIT_TRANSACTIONS ON  
DECLARE @i INT  
SET @i = 0  
WHILE @i < 100000  
    BEGIN 
        SELECT  Id
        FROM    dbo.Users
        WHERE   Id = @i
        SET @i = @i + 1
    END
COMMIT;  
SET IMPLICIT_TRANSACTIONS OFF


----------------------------------------------------
-- Explicit transaction
-- 6 seconds
DECLARE @i INT  
SET @i = 0  
BEGIN TRAN  
WHILE @i < 100000  
    BEGIN
        SELECT  Id
        FROM    dbo.Users
        WHERE   Id = @i
        SET @i = @i + 1
    END
COMMIT TRAN  

Obviously, if you need to use an isolation level higher than the default READ COMMITTED, all reads will need to be part of an explicit database transaction. In that case, you will have to start the transaction yourself - EF will not do this for you. But this would typically only be done on an ad-hoc basis for specific business transactions. Entity Framework's default behaviour should suit the vast majority of business transactions.

Writes are executed within an explicit transaction

Entity Framework automatically wraps all the queries made by the DbContext.SaveChanges() method in a single explicit database transaction, therefore ensuring that all the changes applied to the context are either committed or rolled-back in full.

It will use whatever default transaction isolation level the database engine has been configured to use (READ COMMITTED by default for SQL Server).

A side note for NHibernate veterans

This is another major difference between EF and NHibernate. With NHibernate, database transactions are entirely in the hands of developers. NHibernate's Session will never start an explicit database transaction automatically.

You can override EF's default behaviour and control the database transaction scope and isolation level

With Entity Framework 6, taking explicit control of the database transaction scope and isolation level is as simple as it should be:

using (var context = new MyDbContext(ConnectionString))  
{
    using (var transaction = context.BeginTransaction(IsolationLevel.RepeatableRead))
    {
        [...]
        context.SaveChanges();
        transaction.Commit();
    }
}

An obvious side-effect of manually controlling the database transaction scope is that you are now forcing the database connection and transaction to remain open for the duration of the transaction scope.

You should be careful to keep this scope as short-lived as possible. Keeping a database transaction running for too long can have a significant impact on your application's performance and scalability. In particular, it's generally a good idea to refrain from calling other service methods within an explicit transaction scope - they might be executing long-running operations unaware that they have been invoked within an open database transaction scope.

There's no built-in way to override the default isolation level used for AutoCommit and automatic explicit transactions

As mentioned earlier, the AutoCommit transactions EF relies on for read queries and the explicit transaction it automatically starts when SaveChanges() is called use whatever default isolation level the database engine has been configured with.

There's unfortunately no built-in way to override this isolation level. If you'd like to use another isolation level, you must start and manage the database transaction yourself.

The database connection open by DbContext will enroll in an ambient TransactionScope

Alternatively, you can also use the TransactionScope class to control the transaction scope and isolation level. The database connection that Entity Framework opens will enroll in the ambient TransactionScope.

Prior to EF6, using TransactionScope was the only practical way to control the database transaction scope and isolation level.

In practice, and unless you actually need a distributed transaction, you should avoid using TransactionScope. TransactionScope, and distributed transactions in general, are not necessary for most applications and tend to introduce more problems than they solve. EF's documentation has more details on working with TransactionScope with Entity Framework if you really need distributed transactions.

DbContext instances should be disposed of (but you'll probably be OK if they're not)

DbContext implements IDisposable. Its instances should therefore be disposed of as soon as they're not needed anymore.

In practice however, and unless you choose to explicitly manage the database connection or transaction that the DbContext uses, not calling DbContext.Dispose() won't cause any issues as Diego Vega, a EF team member, explains.

This is good news as a lot of the code you'll find in the wild fails to dispose of DbContext instances properly. This is particularly the case for code that attempts to manage DbContext instance lifetimes via a DI container, which can be a lot trickier than it sounds.

A DI container like StructureMap for example doesn't support decommissioning the components it created. As a result, if you rely on StructureMap to create your DbContext instances, they will never be disposed of, regardless of what lifecycle you choose for them. The only correct way to manage disposable components with a DI container like this is to significantly complicate your DI configuration and use nested dependency injection containers as Jeremy Miller demonstrates.

Ambient DbContext vs Explicit DbContext vs Injected Dbcontext

A key decision you'll have to make at the start of any Entity Framework-based project is how your code will handle passing the DbContext instances down to the method / layer that will make the actual database queries.

As we've seen above, the responsibility of creating and disposing the DbContext lies with the top-level service methods. The data access code, i.e. the code that actually uses the DbContext instance, will however often be made in a separate part of the code - be it in a private method deep down the service implementation, in a query object or in a separate repository layer.

The DbContext instance that the top-level service method creates must therefore somehow find its way down to these methods.

There are 3 school of thoughts when it comes to making the DbContext instance available to the data access code: ambient, explicit or injected. Each approach has its pros and cons, which we'll examine now.

Explicit DbContext

What it looks like

With the explicit DbContext approach, the top-level service method creates a DbContext instance and simply passes it down the stack as a method parameter until it finally reaches the method that implements the data access part. In a traditional 3-tier architecture with both a service and a repository layer, this would look like this:

public class UserService : IUserService  
{
    private readonly IUserRepository _userRepository;

    public UserService(IUserRepository userRepository)
    {
        if (userRepository == null) throw new ArgumentNullException("userRepository");
        _userRepository = userRepository;
    }

    public void MarkUserAsPremium(Guid userId)
    {
        using (var context = new MyDbContext())
        {
            var user = _userRepository.Get(context, userId);
            user.IsPremiumUser = true;
            context.SaveChanges();
        }
    }
}

public class UserRepository : IUserRepository  
{
    public User Get(MyDbContext context, Guid userId)
    {
        return context.Set<User>().Find(userId);
    }
}

(in this intentionally contrived example, the repository layer is of course completely pointless. In a real-work application, you would expect the repository layer to be a lot richer. In addition, you could of course abstract your DbContext behind an "IDbContext" of sorts and create it via an abstract factory if you really didn't want to have to have a direct dependency on Entity Framework in your services. The principle would remain the same).

The Good

This approach is by far and away the simplest approach. It results in code that's very easy to understand and maintain, even by developers new to the code base.

There's no magic anywhere. The DbContext instance doesn't materialize out of thin air. There's a clear and obvious place where the context is created. And it's really easy to climb up the stack and find it if you're wondering where a particular DbContext instance is coming from.

The Bad

The main drawback of this approach is that it requires you to pollute all your repository methods (if you have a repository layer) as well as most of your service methods with a mandatory DbContext parameter (or some sort of IDbContext abstraction if you don't want to be tied to a concrete implementation - but the point still stands). You could see this as being a sort of Method Injection pattern.

That your repository methods require to be provided with an explicit DbContext parameter isn't too much of an issue. In fact, it can even be seen as a good thing as it removes any potential ambiguity as to which context they'll run their queries against.

Things are quite different in your service layer however. Chances are that most of your service methods won't use the DbContext at all, particularly if you've isolated your data access code away in query objects or in a repository layer. As a result, these methods will only require to be provided with a DbContext parameter so that they can pass it down the line until it eventually reaches whatever method actually uses it.

It can get quite ugly. Particularly if your application uses multiple DbContext, resulting in service methods potentially requiring two or more mandatory DbContext parameters. It also muddies your method contracts as your service method are now forced to ask for a parameter that they neither need nor use but require purely to satisfy the dependency of a downstream method.

Jon Skeet wrote an interesting article on the topic of explicitness vs ambient but couldn't come up with a good solution either.

Nevertheless, the simplicity and foolproofness of this approach is hard to beat.

Ambient DbContext

What it looks like

NHibernate users will be very familiar with this approach as the ambient context pattern is the predominant approach used in the NHibernate world to manage NH's Session (NHibernate's equivalent to EF's DbContext). NHibernate even comes with built-in support for this pattern, which it calls contextual sessions.

In .NET itself, this pattern is used quite extensively. You've probably already used HttpContext.Current or the TransactionScope class, both of which rely on the ambient context pattern.

With this approach, the top-level service method not only creates the DbContext to use for the current business transaction but it also registers it as the ambient DbContext. The data access code can then just retrieve the ambient DbContext whenever it needs it. No need to pass the DbContext instance around anymore.

Anders Abel has written a simple implementation of an ambient DbContext that relies on a ThreadStatic variable to store the ambient DbContext. Have a look - there's less to it than it sounds.

The Good

The advantages of this approach are obvious. Your service and repository methods are now free of DbContext parameters, making your interfaces cleaner and your method contracts clearer as they can now only request the parameters that they actually need to do their job. No need to pass DbContext instances all over the place anymore.

As with the explicit approach, the creation and disposal of the DbContext instance is in a clear, well-defined and logical place.

The Bad

This approach does however introduce a certain amount of magic which can certainly make the code more difficult to understand and maintain. When looking at the data access code, it's not necessarily easy to figure out where the ambient DbContext is coming from. You just have to hope that someone somehow registered it before calling the data access code.

If your application uses multiple DbContext classes, e.g. if it connects to multiple databases or if you have split your domain model into separate model groups, it can be difficult for the top-level service method to know which DbContext object(s) it must create and register. With the explicit approach, the data access methods require to provided with whatever DbContext object they need as a method parameter. There is therefore no ambiguity possible. But with an ambient context approach, the top-level service method must somehow know what DbContext type the downstream data access code will require. There are ways to solve this issue in a fairly clean manner however as we'll see later.

Finally, the ambient DbContext example I linked to above works fine in a single-threaded model. But if you intend to use Entity Framework's async query feature, this won't fly. After an async operation, you will most likely find yourself in another thread than the one where the DbContext was created. In many cases (although not in all cases - this is where async gets tricky), it means that your ambient DbContext will be gone. This is fixable as well but it will require some advanced understanding of how multi-threading, the TPL and async works behind the scenes in .NET. We'll have a look at this later in this post.

Injected DbContext

What it looks like

Last but not least, the injected DbContext approach is the most often mentioned strategy in articles and blog posts addressing the issue of managing the DbContext lifetime.

With this approach, you let your DI container manage the lifetime of your DbContext and inject it into whatever component needs it (your repository objects for example).

This is what it looks like:

public class UserService : IUserService  
{
    private readonly IUserRepository _userRepository;

    public UserService(IUserRepository userRepository)
    {
        if (userRepository == null) throw new ArgumentNullException("userRepository");
        _userRepository = userRepository;
    }

    public void MarkUserAsPremium(Guid userId)
    {
        var user = _userRepository.Get(context, userId);
        user.IsPremiumUser = true;
    }
}

public class UserRepository : IUserRepository  
{
    private readonly MyDbContext _context;

    public UserRepository(MyDbContext context)
    {
        if (context == null) throw new ArgumentNullException("context");
        _context = context;
    }

    public User Get(Guid userId)
    {
        return _context.Set<User>().Find(userId);
    }
}

You then need to configure your DI container to create an instance of the DbContext with an appropriate lifetime on object graph creation. A common advice you'll find is to use a PerWebRequest lifetime for web apps and PerForm lifetime for desktop apps.

The Good

The advantage here is similar to that of the ambient approach: the code isn't littered with DbContext instances being passed all over the place. This approach goes one step further still: there is no DbContext to be seen anywhere in the service code. The service is completely oblivious of Entity Framework. Which might sound good a first sight but quickly leads to a lot of problems.

The Bad

Despite its popularity, this approach has significant drawbacks and limitations. It's important to understand them before adopting this approach.

A lot of magic

The first issue is that this approach relies very heavily on magic. And when it comes to managing the correctness and consistency of your data - your most precious asset - magic isn't a word you want to hear too often.

Where do these DbContext instances come from? How and where is the business transaction boundary defined? If a service depends on two different repositories, will they both have access to the same DbContext instance or will they each have their own instance?

If you're a back-end developer working on a EF-based project, you must know the answers to these questions if you want to be able to write correct code.

The answers here aren't obvious and will require you to pour through your DI container configuration code to find out. And as we've seen earlier, getting this configuration right isn't as trivial as it may seem at first sight and may end up being fairly complex and / or subtle.

Unclear business transaction boundaries

Perhaps the most glaring issue in the code sample above is: who is responsible for committing changes to the data store? I.e. who is calling the DbContext.SaveChanges() method? It's unclear.

You could inject the DbContext into your service for the sole purpose of calling its SaveChanges() method. That would be rather baffling and very error-prone code. Why would the service method call SaveChanges() on a context object that it neither created nor used? What changes would be saved?

Alternatively, you could define a SaveChanges() method on all your repositories, which would just delegate to the underlying DbContext. The service method would then just call SaveChanges() on the repository itself. This would be very misleading code, as it would imply that each repository implement their own unit-of-work and can persist their changes independently of the other repositories. Which would of course be incorrect as they would in fact all use the same DbContext instance behind the scenes.

Another approach sometimes seen in the wild is to let the DI container call SaveChanges() before decommissioning the DbContext instance. A disastrous approach that would merit a blog post of its own to examine.

In short: the DI container is an infrastructure-level component - it has no knowledge of the business logic the components it manages implement. The DbContext.SaveChanges() method on the other side defines a business transaction boundary - i.e. it's a business logic concern (and a critical one at that). Mixing those two unrelated concerns together will quickly cause a lot of pain.

All that being said, if you subscribe to the Repository is Dead movement, the issue of defining who is calling DbContext.SaveChanges() shouldn't arise as your services will use the DbContext instance directly. They will therefore be the natural place for SaveChanges() to be called.

There is however a number of other issues you will run into with an injected DbContext regardless of the architectural style of your application.

Forces your services to become stateful

A notable one is that DbContext isn't a service. It's a resource. And a Disposable one to boot. By injecting it into whatever layer implement your data access, you're making that layer, and by extension all the layers above which would be pretty much the entire application, stateful.

It's not the end of the world but it certainly complicates DI container configuration. Having stateless services provides tremendous flexibility and makes the configuration of their lifetime a non-issue (any lifetime would do and singleton is often your best bet). As soon as you introduce stateful services, careful consideration has to be given to your service lifetimes.

It often starts off easy (PerWebRequest or Transient lifetime for everything which suits a simple web app well) and then descends into more complexity as console apps, Windows Services and others inevitably make their appearance.

Prevents multi-threading

Another issue (related to the previous one) that will inevitably bite you quite hard is that an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.

Remember that DbContext (just like Session in NHibernate) isn't thread-safe. If you need to execute multiple tasks in parallel in a service, you must make sure that each task works against its own DbContext instance or the whole thing will blow up at runtime. This is impossible to do with the injected DbContext approach since the service isn't in control of the DbContext instance creation and doesn't have any way to create new ones.

How can you fix this? Not easily.

Your first instinct is probably to change your services to depend on a DbContext factory instead of depending directly on a DbContext. That would allow them to create their own DbContext instances when needed. But that would effectively defeat the whole point of the injected DbContext approach. If services create their own DbContext instances via a factory, these instances can't be injected anymore. Which means that services will have to explicitly pass those DbContext instances down the layers to whatever components need them (e.g. the repositories). So you're effectively back to the explicit DbContext approach discussed earlier. I can think of a few ways in which this could be solved but all of them feel more like hacks than clean and elegant solutions.

Another way to approach the issue would be to add a few more layers of complexity, introduce a queuing middleware like RabbitMQ and let it distribute the workload for you. Which may or may not work depending on why you need to introduce parallelism. But in any case, you may neither need nor want the additional overhead and complexity.

With an injected DbContext, you're simply better off limiting yourself to single-threaded code or at least to a single logical flow of execution. Which is perfectly fine for many applications but it will become a major limitation in certain cases.

DbContextScope: a simple, correct and flexible way to manage DbContext instances

Time to look at a better way to manage those DbContext instances.

The approach presented below relies on DbContextScope, a custom component that implements the ambient DbContext approach presented earlier. The full source code for DbContextScope and the classes it depends on is on GitHub.

If you're familiar with the TransactionScope class, then you already know how to use a DbContextScope. They're very similar in essence - the only difference is that DbContextScope creates and manages DbContext instances instead of database transactions. But just like TransactionScope, DbContextScope is ambient, can be nested, can have its nesting behaviour disabled and works fine with async execution flows.

This is the DbContextScope interface:

public interface IDbContextScope : IDisposable  
{
    void SaveChanges();
    Task SaveChangesAsync();

    void RefreshEntitiesInParentScope(IEnumerable entities);
    Task RefreshEntitiesInParentScopeAsync(IEnumerable entities);

    IDbContextCollection DbContexts { get; }
}

The purpose of a DbContextScope is to create and manage the DbContext instances used within a code block. A DbContextScope therefore effectively defines the boundary of a business transaction. I'll explain later why I didn't name it "UnitOfWork" or "UnitOfWorkScope", which would have been a more commonly used terminology for this.

You can instantiate a DbContextScope directly. Or you can take a dependency on IDbContextScopeFactory, which provides convenience methods to create a DbContextScope with the most common configurations:

public interface IDbContextScopeFactory  
{
    IDbContextScope Create(DbContextScopeOption joiningOption = DbContextScopeOption.JoinExisting);
    IDbContextReadOnlyScope CreateReadOnly(DbContextScopeOption joiningOption = DbContextScopeOption.JoinExisting);

    IDbContextScope CreateWithTransaction(IsolationLevel isolationLevel);
    IDbContextReadOnlyScope CreateReadOnlyWithTransaction(IsolationLevel isolationLevel);

    IDisposable SuppressAmbientContext();
}

Typical usage

With DbContextScope, your typical service method would look like this:

public void MarkUserAsPremium(Guid userId)  
{
    using (var dbContextScope = _dbContextScopeFactory.Create())
    {
        var user = _userRepository.Get(userId);
        user.IsPremiumUser = true;
        dbContextScope.SaveChanges();
    }
}

Within a DbContextScope, you can access the DbContext instances that the scope manages in two ways. You can get them via the DbContextScope.DbContexts property like this:

public void SomeServiceMethod(Guid userId)  
{
    using (var dbContextScope = _dbContextScopeFactory.Create())
    {
        var user = dbContextScope.DbContexts.Get<MyDbContext>.Set<User>.Find(userId);
        [...]
        dbContextScope.SaveChanges();
    }
}

But that's of course only available in the method that created the DbContextScope. If you need to access the ambient DbContext instances anywhere else (e.g. in a repository class), you can just take a dependency on IAmbientDbContextLocator, which you would use like this:

public class UserRepository : IUserRepository  
{
    private readonly IAmbientDbContextLocator _contextLocator;

    public UserRepository(IAmbientDbContextLocator contextLocator)
    {
        if (contextLocator == null) throw new ArgumentNullException("contextLocator");
        _contextLocator = contextLocator;
    }

    public User Get(Guid userId)
    {
        return _contextLocator.Get<MyDbContext>.Set<User>().Find(userId);
    }
}

Those DbContext instances are created lazily and the DbContextScope keeps track of them to ensure that only one instance of any given DbContext type is ever created within its scope.

You'll note that the service method doesn't need to know which type of DbContext will be required during the course of the business transaction. It only needs to create a DbContextScope and any component that needs to access the database within that scope will request the type of DbContext they need.

Nesting scopes

A DbContextScope can of course be nested. Let's say that you already have a service method that can mark a user as a premium user like this:

public void MarkUserAsPremium(Guid userId)  
{
    using (var dbContextScope = _dbContextScopeFactory.Create())
    {
        var user = _userRepository.Get(userId);
        user.IsPremiumUser = true;
        dbContextScope.SaveChanges();
    }
}

You're implementing a new feature that requires being able to mark a group of users as premium within a single business transaction. You can easily do it like this:

public void MarkGroupOfUsersAsPremium(IEnumerable<Guid> userIds)  
{
    using (var dbContextScope = _dbContextScopeFactory.Create())
    {
        foreach (var userId in userIds)
        {
            // The child scope created by MarkUserAsPremium() will
            // join our scope. So it will re-use our DbContext instance(s)
            // and the call to SaveChanges() made in the child scope will
            // have no effect.
            MarkUserAsPremium(userId);
        }

        // Changes will only be saved here, in the top-level scope,
        // ensuring that all the changes are either committed or
        // rolled-back atomically.
        dbContextScope.SaveChanges();
    }
}

(this would of course be a very inefficient way to implement this particular feature but it demonstrates the point)

This makes creating a service method that combines the logic of multiple other service methods trivial.

Read-only scopes

If a service method is read-only, having to call SaveChanges() on its DbContextScope before returning can be a pain. But not calling it isn't an option either as:

  1. It will make code review and maintenance difficult (did you intend not to call SaveChanges() or did you forget to call it?)
  2. If you requested an explicit database transaction to be started (we'll see later how to do it), not calling SaveChanges() will result in the transaction being rolled back. Database monitoring systems will usually interpret transaction rollbacks as an indication of an application error. Having spurious rollbacks is not a good idea.

The DbContextReadOnlyScope class addresses this issue. This is its interface:

public interface IDbContextReadOnlyScope : IDisposable  
{
    IDbContextCollection DbContexts { get; }
}

And this is how you use it:

public int NumberPremiumUsers()  
{
    using (_dbContextScopeFactory.CreateReadOnly())
    {
        return _userRepository.GetNumberOfPremiumUsers();
    }
}

Async support

DbContextScope works with async execution flows as you would expect:

public async Task RandomServiceMethodAsync(Guid userId)  
{
    using (var dbContextScope = _dbContextScopeFactory.Create())
    {
        var user = await _userRepository.GetAsync(userId);
        var orders = await _orderRepository.GetOrdersForUserAsync(userId);

        [...]

        await dbContextScope.SaveChangesAsync();
    }
}

In the example above, the OrderRepository.GetOrdersForUserAsync() method will be able to see and access the ambient DbContext instance despite the fact that it's being called in a separate thread than the one where the DbContextScope was initially created.

This is made possible by the fact that DbContextScope stores itself in the CallContext. The CallContext automatically flows through async points. If you're curious about how it all works behind the scenes, Stephen Toub has written an excellent blog post about it. But if all you want to do is use DbContextScope, you just have to know that: it just works.

WARNING: There is one thing that you must always keep in mind when using any async flow with DbContextScope. Just like TransactionScope, DbContextScope only supports being used within a single logical flow of execution.

I.e. if you attempt to start multiple parallel tasks within the context of a DbContextScope (e.g. by creating multiple threads or multiple TPL Task), you will get into big trouble. This is because the ambient DbContextScope will flow through all the threads your parallel tasks are using. If code in these threads need to use the database, they will all use the same ambient DbContext instance, resulting the same the DbContext instance being used from multiple threads simultaneously.

In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.

However, if you really need to start a parallel task within a DbContextScope (e.g. to perform some out-of-band background processing independently from the outcome of the business transaction), then you must suppress the ambient context before starting the parallel task. Which you can easily do like this:

public void RandomServiceMethod()  
{
    using (var dbContextScope = _dbContextScopeFactory.Create())
    {
        // Do some work that uses the ambient context
        [...]

        using (_dbContextScopeFactory.SuppressAmbientContext())
        {
            // Kick off parallel tasks that shouldn't be using the
            // ambient context here. E.g. create new threads,
            // enqueue work items on the ThreadPool or create 
            // TPL Tasks. 
            [...]
        }

        // The ambient context is available again here.
        // Can keep doing more work as usual.
        [...]

        dbContextScope.SaveChanges();
    }
}

Creating a non-nested DbContextScope

This is an advanced feature that I would expect most applications to never need. Tread carefully when using this as it can create tricky issues and quickly lead to a maintenance nightmare.

Sometimes, a service method may need to persist its changes to the underlying database regardless of the outcome of overall business transaction it may be part of. This would be the case if:

  • It needs to record cross-cutting concern information that shouldn't be rolled-back even if the business transaction fails. A typical example would be logging or auditing records.
  • It needs to record the result of an operation that cannot be rolled back. A typical example would be service methods that interact with non-transactional remote services or APIs. E.g. if your service method uses the Facebook API to post a new status update on Facebook and then records the newly created status update in the local database, that record must be persisted even if the overall business transaction fails because of some other error occurring after the Facebook API call. The Facebook API isn't transactional - it's impossible to "rollback" a Facebook API call. The result of that API call should therefore never be rolled back.

In that case, you can pass a value of DbContextScopeOption.ForceCreateNew as the joiningOption parameter when creating a new DbContextScope. This will create a DbContextScope that will not join the ambient scope even if one exists:

public void RandomServiceMethod()  
{
    using (var dbContextScope = _dbContextScopeFactory.Create(DbContextScopeOption.ForceCreateNew))
    {
        // We've created a new scope. Even if that service method
        // was called by another service method that has created its 
        // own DbContextScope, we won't be joining it. 
        // Our scope will create new DbContext instances and won't
        // re-use the DbContext instances that the parent scope uses.
        [...]

        // Since we've forced the creation of a new scope,
        // this call to SaveChanges() will persist
        // our changes regardless of whether or not the
        // parent scope (if any) saves its changes or rolls back.
        dbContextScope.SaveChanges();
    }
}

The major issue with doing this is that this service method will use separate DbContext instances than the ones used in the rest of that business transaction. Here are a few basic rules to always follow in that case in order to avoid weird bugs and maintenance nightmares:

1. Persistent entity returned by a service method must always be attached to the ambient context

If you force the creation of a new DbContextScope (and therefore of new DbContext instances) instead of joining the ambient one, your service method must never return persistent entities that were created / retrieved within that new scope. This would be completely unexpected and will lead to humongous complexity.

The client code calling your service method may be a service method itself that created its own DbContextScope and therefore expects all service methods it calls to use that same ambient scope (this is the whole point of using an ambient context). It will therefore expect any persistent entity returned by your service method to be attached to the ambient DbContext.

Instead, either:

  • Don't return persistent entities. This is the easiest, cleanest, most foolproof method. E.g. if your service creates a new domain model object, don't return it. Return its ID instead and let the client load the entity in its own DbContext instance if it needs the actual object.
  • If you absolutely need to return a persistent entity, switch back to the ambient context, load the entity you want to return in the ambient context and return that.

2. Upon exit, a service method must make sure that all modifications it made to persistent entities have been replicated in the parent scope

If your service method forces the creation of a new DbContextScope and then modifies persistent entities in that new scope, it must make sure that the parent ambient scope (if any) can "see" those modification when it returns.

I.e. if the DbContext instances in the parent scope had already loaded the entities you modified in their first-level cache (ObjectStateManager), your service method must force a refresh of these entities to ensure that the parent scope doesn't end up working with stale versions of these objects.

The DbContextScope class has a handy helper method that makes this fairly painless:

public void RandomServiceMethod(Guid accountId)  
{
    // Forcing the creation of a new scope (i.e. we'll be using our 
    // own DbContext instances)
    using (var dbContextScope = _dbContextScopeFactory.Create(DbContextScopeOption.ForceCreateNew))
    {
        var account = _accountRepository.Get(accountId);
        account.Disabled = true;

        // Since we forced the creation of a new scope,
        // this will persist our changes to the database
        // regardless of what the parent scope does.
        dbContextScope.SaveChanges();

        // If the caller of this method had already
        // loaded that account object into their own
        // DbContext instance, their version
        // has now become stale. They won't see that
        // this account has been disabled and might
        // therefore execute incorrect logic.
        // So make sure that the version our caller
        // has is up-to-date.
        dbContextScope.RefreshEntitiesInParentScope(new[] { account });
    }
}

Why DbContextScope and not UnitOfWork?

The first version of the DbContextScope class I wrote was actually called UnitOfWork. This is arguably the most commonly used name for this type of component.

But as I tried to use that UnitOfWork component in a real-world application, I kept getting really confused as to how I was supposed to use it and what it really did. This is despite the fact that I was the one who researched, designed and implemented it and despite the fact that I knew what it did and how it worked inside-out. Yet, I kept getting myself confused and had to often take a step back and think hard about how this "unit of work" related to the actual problem I was trying to solve: managing my DbContext instances.

If even I, who had spent a significant amount of time researching, designing and implementing this component, kept getting confused when trying to use it, there clearly wasn't a hope that anyone else would find it easy to use it.

So I renamed it DbContextScope and suddenly everything became clearer.

The main issue I had with the UnitOfWork I believe is that at the application-level, it often doesn't make a lot of sense. At the lower levels, for example at the database level, a "unit of work" is a very clear and concrete concept. This is Martin Fowler's definition of a unit of work:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

There is no ambiguity at to what a unit of work means at the database level.

At the application level however, a "unit of work" is a very vague concept that could mean everything and nothing. And it's certainly not clear how this "unit of work" relates to Entity Framework, to the issue of managing DbContext instances and to the problem of ensuring that the persistent entities we're manipulating are attached to the right DbContext instance.

As a result, any developer trying to use a "UnitOfWork" would have to pour through its source code to find out what it really does. The definition of the unit of work pattern is simply too vague to be useful at the application level.

In fact, for many applications, an application-level "unit of work" doesn't even make any sense. Many applications will have to use several non-transactional services during the course of a business transaction, such as remote APIs or non-transactional legacy components. The changes made there cannot be rolled back. Pretending otherwise and is counter-productive, confusing and makes it even harder to write correct code.

A DbContextScope on the other side does what it says on the tin. Nothing more, nothing less. It doesn't pretend to be what it's not. And I've found that this simple name change significantly reduced the cognitive load required to use that component and to verify that it was being used correctly.

Of course, naming this component DbContextScope means that you can't hide the fact that you're using Entity Framework from your services anymore. UnitOfWork is a conveniently vague term that allows you to abstract away the persistence mechanism used in the lower layers. Whether or not abstracting EF away from your service layer is a good thing is another debate that we won't get into here.

See it in action

The source code on GitHub includes a demo application that demonstrates the most common use-cases.

How DbContextScope works

The source code is well commented and I would encourage you to read through it. In addition, this excellent blog post by Stephen Toub on the ExecutionContext is a mandatory read if you'd like to fully understand how the ambient context pattern was implemented in DbContextScope.

Further reading

The personal blog of Rowan Miller, the program manager for the Entity Framework team, is a must-read for any developer working on an Entity Framework-based application.

Bonus material

Where not to create your DbContext instances

An Entity Framework anti-pattern commonly seen in the wild is to implement the creation and disposal of DbContext in data access methods (e.g. in repository methods in a traditional 3-tier application). It usually looks like this:

public class UserService : IUserService  
{
    private readonly IUserRepository _userRepository;

    public UserService(IUserRepository userRepository)
    {
        if (userRepository == null) throw new ArgumentNullException("userRepository");
        _userRepository = userRepository;
    }

    public void MarkUserAsPremium(Guid userId)
    {
        var user = _userRepository.Get(userId);
        user.IsPremiumUser = true;
        _userRepository.Save(user);
    }
}

public class UserRepository : IUserRepository  
{
    public User Get(Guid userId)
    {
        using (var context = new MyDbContext())
        {
            return context.Set<User>().Find(userId);
        }
    }

    public void Save(User user)
    {
        using (var context = new MyDbContext())
        {
            // [...] 
            // (either attach the provided entity to the context 
            // or load it from the context and update its properties
            // from the provided entity)

            context.SaveChanges();
        }
    }
}

By doing this, you're loosing pretty much every feature that Entity Framework provides via the DbContext, including its 1st-level cache, its identity map, its unit-of-work, and its change tracking and lazy-loading abilities. That's because in the scenario above, a new DbContext instance is created for every database query and disposed immediately afterwards, hence preventing the DbContext instance from being able to track the state of your data objects across the entire business transaction.

You're effectively reducing Entity Framework to a basic ORM in the literal sense of the term: an mapper from your objects to their relational representation in the database.

There are some applications where this type of architecture does make sense. If you're working on such an application, you should however ask yourself why you're using Entity Framework in the first place. If you're going to use it as a basic ORM and won't use any of the features that it provides on top of its ORM capabilities, you might be better off using a lightweight ORM library such as Dapper. Chances are it would simplify your code and offer better performance by not having the additional overhead that EF introduces to support its additional functionalities.

From NHibernate to Entity Framework 6 - Part 1: the mapping story

This is the first part of an n-part series on using Entity Framework 6 coming from an NHibernate background . Although this series is primarily aimed at NHibernate veterans making the transition to Entity Framework 6, any developer new to EF will probably find most of this information helpful too.

Introduction

After having used NHibernate quite extensively for the past few years, I've just completed my first 6 months in the Entity Framework world.

I had actually already given Entity Framework a try back in the EF4 days. I quickly abandonned it as it became clear very quickly that it wasn't ready for prime time. But I have to say that Entity Framework has come a long way since then. EF6 has become not just a usable but a very capable ORM. It's definitely now a very credible alternative to NHibernate.

The transition from NH to EF6 was fairly smooth as they are conceptually very similar. However, I had to spend quite some time researching many of the finer points before starting to feel confident with EF and happy to let it manage my data. Entity Framework's rather poor documentation didn't help much.

In the NHibernate world, you know that no matter what your question is, Oren Eini (who goes by the pseudonym Ayende Rahien) will have already written a clear, precise and concise blog post addressing it. And you know that it will show up as the first result when you google it. There is unfortunately no Oren Eini equivalent in the Entity Framework world. When you google an EF-related question, all you get are Stack Overflow questions. And to be completely honest, the answers there rarely surprise by their brilliance. As soon as you venture out of the non-trivial territory, SO answers to EF-related questions tend to be tentative, incomplete, superficial, out-of-date or just plain wrong.

This series attempts at filling some of the void. If you consider yourself an NHibernate veteran and are taking on your first EF project, this should save you some time.

Database First, Model First, Code First, oh my

With Entity Framework - just like with every other ORM, your first task is to map your object model to your relational model. With EF, it's also the first hurdle: should you adopt a Database First, Model First or Code First mapping workflow?

Let's go through a very quick refresher on how NHibernate deals with this issue and see how it compares with Entity Framework.

Object-to-relational mapping the NHibernate way

With NHibernate, mapping your object model to your relational model is a fairly simple and logical process and one that your will see implemented in a very similar manner in just about every NHibernate-based application.

You start off by defining your domain model using plain C# classes (or POCO rather since it doesn't have to be C#). This part is unrelated to NHibernate and there should be nothing NHibernate-related in your domain model classes.

You then define the mappings between your domain model objects and their relational representations. You can do it either using XML (if you enjoy pain), using the built-in but undocumented fluent API or using the excellent fluent API provided by the third-party Fluent NHibernate library.

Once your mappings are defined, you can then just create a Configuration instance, add your mappings to this configuration object and call its BuildSessionFactory() method to create a SessionFactory instance. This is typically done once on application startup.

The SessionFactory instance in NHibernate is responsible for holding onto the compiled mappings and for creating new Session instances on demand. It's thread-safe and typically used as a singleton. If you were working against multiple databases, you'd have one SessionFactory instance per database, each containing the mappings for that database and able to create new Session instances that know how to generate and execute SQL queries against that database.

If needed, you can also create a new SchemaExport instance, passing in your Configuration object and call its Execute() method to generate a SQL script creating your database schema based on the mappings you defined and / or execute that script to create the database on-the-fly for you. This can be handy if you're working on a new application where the database hasn't been created yet or to keep an up-to-date database creation script under source control. E.g.:

// parameters : script = write out to schema.sql file. export = execute against database. justDrop = drop existing database only
new SchemaExport(configuration)  
    .SetOutputFile(scriptPath)
    .Execute(script: true, export: false, justDrop: false);

Object-to-relational mapping the Entity Framework way: Code First

The NHibernate way of object-to-relational mapping is supported in Entity Framework and is what is refered to as "Code First". Code First mapping has been supported in Entity Framework since version 4.1.

In practice, and unless you have a really good reason to choose one of the other workflows, Code First will be the mapping workflow you'll use with EF. Just like with NHibernate's mapping model, Code First is simple, logical, flexible and it doesn't rely on Visual Studio-specific designers or on auto-generated code.

Creating your domain model

With Code First, you start off by defining your domain model using plain C# classes. Just like with NHibernate, your domain model should of course contain nothing EF-related.

Defining your object-to-relational mappings

Once your domain model is created, you can use Entity Framework's fluent mapping API to create your mappings. This API is very similar to that of Fluent NHibernate. So Fluent NHibernate users will feel right at home.

For each domain model class, you simply create a mapping class deriving from EntityTypeConfiguration<TEntityType> and add the mapping in the class' constructor. Here is an example mapping of a stereotypical User entity:

// Domain model
public class User  
{
    public long Id { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
    public byte[] PasswordHash { get; set; }
    public byte[] Salt { get; set; }
}

// Entity Framework mapping
public class UserMapping : EntityTypeConfiguration<User>  
{
    public UserMapping()
    {
        ToTable("Users");
        HasKey(m => m.Id);
        Property(m => m.Id).HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
        Property(m => m.Name);
        Property(m => m.Email).IsRequired().HasMaxLength(256);
        Property(m => m.PasswordHash).IsRequired().HasMaxLength(32);
        Property(m => m.Salt).IsRequired().HasMaxLength(32);
    }
}

Note that, unlike what we would have had to do with NHibernate, we didn't have to declare the properties in our domain model class as virtual. We'll get back to this later.

And as we'll see later, most of this mapping code is in fact unnecessary as Entity Framework uses convention-based mappings by default (but it doesn't hurt to specify it manually as we've done here if you prefer).

Initializing Entity Framework with your mappings

Configuring EF with your mappings is one of the points where I personally think EF gets confusing and has been poorly designed.

Entity Framework's equivalent of Nhibernate's Session is DbContext. Both NH's Session and EF's DbContext have the same purpose, are used in very much the same way and share many of the same properties. We'll take a more detailled look at the similarities and differences between Session and DbContext in a future post but, for now, whenever you see DbContext, think Session.

Entity Framework however doesn't have any equivalent for NHibernate's SessionFactory. Intead, the responsibilities of both Session and SessionFactory are held by DbContext in the EF world.

So, like SessionFactory, DbContext is responsible for creating the compiled object-to-relational mappings, caching them for the duration of the application and providing them in a thread-safe manner to all DbContext instances that have to work against the database for which those mappings were created.

And like Session, DbContext is also responsible for providing a single-threaded, short-lived unit-of-work that manages the database connection and database queries for the duration of a single business transaction.

These are two very different responsibilities with completely different lifecycles and constraints that would really have been best implemented in two different classes. But so be it.

So how does it work?

With EF, you never instantiate DbContext directly. Instead, after having created your domain model and mapping classes, you must create a new class deriving from DbContext which will be configured with your mappings. It's then this DbContext-derived class that you will use as you would have used Session in NHibernate. Quick example that creates a DbContext configured with the User mapping we created earlier:

public class MyDbContext: DbContext  
{
    public MyDbContext() {}

    public MyDbContext(string nameOrConnectionString) : base(nameOrConnectionString)
        {}

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);
        modelBuilder.Configurations.Add(new UserMapping());
    }
}

There are different ways to configure EF with your mappings but the method above is the most common method and the recommended one. Override the DbContext.OnModelCreating() method and add your custom mappings and other configuration values there.

Here's we've specified our mapping class (UserMapping) explicitely. In practice, you'll most likely want to use the ConfigurationRegistar.AddFromAssembly() method instead to register all the mapping classes found in a given assemby.

What happens on DbContext instantiation

You're all done. You can now create a instance of your DbContext-derived class, passing in your database connection string or the name of the connection string to use in your app.config / web.config file and start using it to query your database:

using (var context = new MyDbContext("Server=localhost;Database=MyDatabase;Trusted_Connection=true"))  
{
    var johns = context.Set<User>().Where(u => u.Name == "John").ToList();
}

which would be the equivalent of the following NHibernate code:

using (var session = sessionFactory.OpenSession())  
{
    var johns = session.QueryOver<User>().Where(u => u.Name == "John").List();
}

When you create the first instance of your DbContext-derived class, EF will automatically compile and cache you mappings. As part of this process, it will call the OnModelCreating() method. The first instantiation of a DbContext-derived class within an app domain is therefore a very expensive operation that will most likely involve opening a database connection. It's the equivalent of the Configuration.BuildSessionFactory() call in NHibernate.

Any subsequent instantiations will be very cheap operations as the cached compiled mapping will get re-used. In particular, the OnModelCreating() method won't get called and no attempt to connect to the database will be made (until you start querying the database that is). So subsequent instantiations of your DbContext-derived class are doing the equivalent of the SessionFactory.OpenSession() call in NHibernate. It's therefore perfectly acceptable to create instances of your DbContext eagerly, for example at the start of every web request in a web application, as you would do with NHibernate's Session.

Convention-based mapping

In NHibernate, convention-based mapping of the object model to the relational model isn't a common sight. The third-party Fluent NHibernate library does offer a convention-based mapping feature but it's not part of NHibernate itself, it's not enabled by default and it can be quite cumbersome to customize to suit your particular model.

Entity Framework on the other side fully embraces convention-based mapping. It's built-in, it's enabled by default and, on the whole, it's rather nice.

In the example above, we explicitely mapped our User domain model object to its relational representation. But we didn't have to. Had we not specified an explicit mapping, EF would have mapped it automatically using its default conventions. Entity Framework's documentation contains an overview of the mapping conventions it uses.

This of course now begs the question: how does Entity Framework know which class should be mapped? It does it by looking for DbSet<TEntityType> properties on your DbContext-derived class.

So in order to have a domain model class automatically mapped by EF, simply add a DbSet<TEntityType> property for that class to your custom DbContext. In our previous example, if we'd wanted to have our User model automatically mapped by EF, we would have used the following implementation for our custom DbContext:

public class MyDbContext: DbContext  
{
    // Tell EF that the User class needs to be mapped
    public DbSet<User> Users { get; set; } 

    public MyDbContext() {}

    public MyDbContext(string nameOrConnectionString) : base(nameOrConnectionString)
        {}

    // No need for the OnModelCreating() override anymore
    // as we're using convention-based mapping.
}

The first time that your custom DbContext class is instantiated, Entity Framework will use reflection to find all the DbSet<TEntityType> properties declared on your context and will automatically generate mappings for these domain model classes. If these classes contain references to other domain model classes, they will get mapped as well.

So if you happen to use DDD, you can limit yourself to declaring your aggregate roots as DbSet<TEntityType> properties on your DbContext. The child entities will be included in the mappings via the aggregate roots. Otherwise, declare all your domain model classes as DbSet<TEntityType> properties to ensure that they all get mapped.

Ad-hoc overrides of the convention-based mappings are as trivial as it gets. Simply create a mapping class that only contains the mappings for the properties you want to map explicitely. For example, in the case of our User entity, if we wanted to not store its PasswordHash value in the database for security reasons and store it somewhere else instead and wanted to limit the width of its Email and Salt columns as we did in our explicit mapping earlier, we could just use this cut-down mapping class that only declares the mappings that differ from the default mappings:

public class UserMapping : EntityTypeConfiguration<User>  
{
    public UserMapping()
    {
        Property(m => m.Email).IsRequired().HasMaxLength(256);
        Property(m => m.Salt).IsRequired().HasMaxLength(32);

        Ignore(m => m.PasswordHash);
    }
}

And then configure our DbContext with it:

public class MyDbContext: DbContext  
{
    // Tell EF that the User class needs to be mapped
    public DbSet<User> Users { get; set; } 

    public MyDbContext() {}

    public MyDbContext(string nameOrConnectionString) : base(nameOrConnectionString)
        {}

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);
        // Mapping overrides
        modelBuilder.Configurations.Add(new UserMapping());
    }
}

With this setup, the Id and Name properties of our User class will get mapped automatically, while its Email, PasswordHash and Salt properties are explicitely mapped.

Lazy loading and virtual properties in domain model classes

One thing that you quickly learn the hard way when working with NHibernate is that you must either declare all the properties in a domain model class virtual or disable lazy-loading for that domain model altogether. There's no middle point.

Entity Framework is more lenient in this respect. You still need to declare properties that you want lazy-loaded (navigation properties or collections) as virtual of course so that EF can generate a dynamic proxy class that overrides those properties in order to provide the lazy-loading behaviour. But you can leave all the other properties as non-virtual if you wish.

There are more subtleties involved with virtual properties and dynamic proxy classes in Entity Framework. If you're curious, the blog of Arthur Vickers, one of Entity Framework's developers, is a good place to start.

Database creation script

Once you've created an instance of your custom DbContext, you can get it to generate a database creation script for you:

var sqlScript = ((IObjectContextAdapter)context).ObjectContext.CreateDatabaseScript()  

As you can see in the DbContext source code, DbContext implements IObjectContextAdapter.ObjectContext explicitely, hence the ugly but necessary and safe cast. It's not clear why the EF developers felt the need to reduce the discoverability of this property.

By default, Entity Framework will automatically create the database for you if it doesn't already exist during the first instantiation of your DbContext.


So that was Code First. I'll quickly go over Database First and Model First for the sake of completeness but you're unlikely to want to use them.

Object-to-relational mapping the Entity Framework way: Database First

Another way to map your object model to their relational representations in EF is Database First. It was the only mapping workflow available in the first version of Entity Framework.

With Database First, you first create your database. You then add an .edmx file to your project. .edmx files are very similar to NHibernate's XML mapping files. They contain your object-to-relational mapping in XML format. You can write the mappings by hand if you really want to.

Alternatively, you can use Visual Studio to create this file. VS will bring you through a wizard allowing you to connect to your database and add the relevant database tables.

VS will then generate the source code of your domain model classes based on the database tables you added. It will also generate the code of a custom DbContext that you can use to query your database. Finally, it will add a new connection string to your app.config / web.config file that points to the .edmx file. It's necessary for the generated DbContext to locate the XML mappings.

You can now just create an instance of the generated DbContext class and use it normally.

There's nothing fundamentaly different between Code First and Database First. The only difference is that with the latter, the code for your domain model classes and DbContext will be automatically generated and the mappings will be defined in XML instead of in code using a fluent API. None of these things are a good thing in my book so I really wouldn't recommend using Database First.

Object-to-relational mapping the Entity Framework way: Model First

Model First was introduced in Entity Framework 4, which was the second version of EF.

Model First is identical to Database First, with the exception that you don't have to create a database first. Instead, you start by adding an empty .edmx file to your project. You can then use the Visual Studio designer to visually "design" your domain model.

Once you're done, VS will generate the code of you domain model classes and DbContext for you just like it would do with Database First. Also not a great approach in my book.

How to tell if an existing code base is using Database First, Model First or Code First

As we've seen earlier, there are no fundamental differences between the three mappings workflows in Entity Framework. In all three cases, you will end up with a set of domain model classes (auto-generated by Visual Studio if using Database First / Model First or written by you if using Code First) and with a DbContext-derived class that you will use to query the database (auto-generated by Visual Studio if using Database First / Model First or written by you if using Code First).

The main difference is the way the object-to-relational mappings are defined. Applications using Database First or Model First will have these mappings defined in XML in an .edmx file. Applications using Code First will either have no mappings defined (if the application relies entirely on convention-based mappings) or will have mappings defined in C# using Entity Framework's fluent mapping API.

So how does Entity Framewok know which mapping workflow the application is using?

It's all down to the connection string the application provides when instantiating the DbContext-derived class.

If the application uses a standard database connection string such as:

Server=SERVER_NAME;Database=DATABASE_NAME;Trusted_Connection=true  

...DbContext will start in Code First mode. In this mode, it will use reflection when first instantiated as we've seen earlier to discover any property of type DbSet<TEntityType> you've defined on the class and automatically create mappings for these entity classes. It will also call its OnModelCreating() method to register any explicit mapping the application may have specified.

If on the other side the application specifies a connection string containing information about an .edmx file, such as:

metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/Model1.msl;provider=System.Data.SqlClient;provider connection string="data source=SERVER_NAME;initial catalog=DATABASE_NAME;integrated security=True;MultipleActiveResultSets=True;App=EntityFramework"  

...DbContext will start in Database First / Model First mode. In this mode, it will not do any convention-based mappings and it won't call its OnModelCreating() method. It will instead rely entirely on the XML mappings defined in the .edmx file.

If you're interested, the EF documentation has more details on connection strings with Entity Framework.

So if you're wondering what mapping workflow an application you've inherited uses, look for the connection string.

A complete example

This is the full source code of a small console application that uses Entity Framework Code First to query the stereotypical User model we've been using throughout this post and that demonstrates all the points we've covered above:

using System;  
using System.Data.Entity;  
using System.Data.Entity.Infrastructure;  
using System.Data.Entity.ModelConfiguration;  
using System.IO;  
using System.Linq;

namespace EntityFramework  
{
    class Program
    {
        private const string ConnectionString = "Server=localhost;Database=EFExample;Trusted_Connection=true";
        private static readonly string DbCreationScriptPath = Path.GetFullPath("schema.sql");

        static void Main(string[] args)
        {
            // Get EF to generate a database schema creation script for us
            using (var context = new MyDbContext(ConnectionString))
            {
                File.WriteAllText(DbCreationScriptPath, ((IObjectContextAdapter)context).ObjectContext.CreateDatabaseScript());
                Console.WriteLine("Wrote database schema creation script to {0}.", DbCreationScriptPath);
            }

            Console.WriteLine();

            // Write to database
            using (var context = new MyDbContext(ConnectionString))
            {
                var user1 = new User()
                            {
                                Name = "User 1",
                                Email = "email1",
                                PasswordHash = new byte[0],
                                Salt = new byte[0]
                            };

                var user2 = new User()
                {
                    Name = "User 2",
                    Email = "email2",
                    PasswordHash = new byte[0],
                    Salt = new byte[0]
                };

                context.Set<User>().AddRange(new [] {user1, user2});
                context.SaveChanges();

                Console.WriteLine("Inserted User Id {0} in the database.", user1.Id);
                Console.WriteLine("Inserted User Id {0} in the database.", user2.Id);
            }

            Console.WriteLine();

            // Read from database
            using (var context = new MyDbContext(ConnectionString))
            {
                var count = context.Set<User>().Count();
                Console.WriteLine("Found {0} User records in the database.", count);
            }

            Console.WriteLine("Done. Press Enter to exit.");
            Console.ReadLine();
        }

        public class MyDbContext : DbContext
        {
            // Tell EF that the User class needs to be mapped using the default conventions
            public DbSet<User> Users { get; set; }

            public MyDbContext()
            {}

            public MyDbContext(string nameOrConnectionString)
                : base(nameOrConnectionString) {}

            protected override void OnModelCreating(DbModelBuilder modelBuilder)
            {
                base.OnModelCreating(modelBuilder);

                // Specify mapping overrides
                modelBuilder.Configurations.Add(new UserMapping());
            }
        }

        //-- Domain Model
        public class User
        {
            public long Id { get; set; }
            public string Name { get; set; }
            public string Email { get; set; }
            public byte[] PasswordHash { get; set; }
            public byte[] Salt { get; set; }
        }

        //-- Mapping overrides
        public class UserMapping : EntityTypeConfiguration<User>
        {
            public UserMapping()
            {
                Property(m => m.Email).IsRequired().HasMaxLength(256);
                Property(m => m.Salt).IsRequired().HasMaxLength(32);

                Ignore(m => m.PasswordHash);
            }
        }
    }
}

EF will create the database for you if needed the first time your run the application. If you want to create it beforehand, this is the schema:

CREATE TABLE [dbo].[Users]  
    (
      [Id] [bigint] NOT NULL IDENTITY ,
      [Name] [nvarchar](MAX) NULL ,
      [Email] [nvarchar](256) NOT NULL ,
      [Salt] [varbinary](32) NOT NULL ,
      PRIMARY KEY ( [Id] )
    );

For comparison purposes, here is the full source code of the same application implemented with NHibernate and Fluent NHibernate:

using System;  
using System.IO;  
using FluentNHibernate.Cfg;  
using FluentNHibernate.Cfg.Db;  
using FluentNHibernate.Mapping;  
using NHibernate.Tool.hbm2ddl;

namespace NHibernate  
{
    class Program
    {
        private const string ConnectionString = "Server=localhost;Database=EFExample;Trusted_Connection=true";
        private static readonly string DbCreationScriptPath = Path.GetFullPath("schema.sql");

        static void Main(string[] args)
        {
            var configuration = Fluently.Configure()
                                        .Database(MsSqlConfiguration.MsSql2012.ConnectionString(ConnectionString))
                                        .Mappings(m => m.FluentMappings.Add<UserMapping>())
                                        .BuildConfiguration();

            // Get NHibernate to generate a database schema creation script for us
            new SchemaExport(configuration)
                    .SetOutputFile(DbCreationScriptPath)
                    .Execute(script: true, export: false, justDrop: false);
            Console.WriteLine("Wrote database schema creation script to {0}.", DbCreationScriptPath);
            Console.WriteLine();

            var sessionFactory = configuration.BuildSessionFactory();

            // Write to database
            using (var session = sessionFactory.OpenSession())
            {
                var user1 = new User()
                {
                    Name = "User 1",
                    Email = "user1",
                    PasswordHash = new byte[0],
                    Salt = new byte[0]
                };

                var user2 = new User()
                {
                    Name = "User 2",
                    Email = "user2",
                    PasswordHash = new byte[0],
                    Salt = new byte[0]
                };

                using (var transaction = session.BeginTransaction())
                {
                    session.Save(user1);
                    session.Save(user2);
                    transaction.Commit();
                }

                Console.WriteLine("Inserted User Id {0} in the database.", user1.Id);
                Console.WriteLine("Inserted User Id {0} in the database.", user2.Id);
            }

            Console.WriteLine();

            // Read from database
            using (var session = sessionFactory.OpenSession())
            {
                var count = session.QueryOver<User>().RowCount();
                Console.WriteLine("Found {0} User records in the database.", count);
            }

            Console.WriteLine("Done. Press Enter to exit.");
            Console.ReadLine();
        }


        //-- Domain Model
        public class User
        {
            public virtual long Id { get; set; }
            public virtual string Name { get; set; }
            public virtual string Email { get; set; }
            public virtual byte[] PasswordHash { get; set; }
            public virtual byte[] Salt { get; set; }
        }

        //-- Mappings
        public class UserMapping : ClassMap<User>
        {
            public UserMapping()
            {
                Table("Users");
                Id(m => m.Id).GeneratedBy.Identity();
                Map(m => m.Name).Length(10000); // i.e. force NVARCHAR(MAX)
                Map(m => m.Email).Not.Nullable().Length(256);
                Map(m => m.Salt).Not.Nullable().Length(32);

                // Intentionally not mapping the PasswordHash property
                // as we don't want it stored in the database.
            }
        }
    }
}

Note the single big difference between the NHibernate and the Entity Framework version: in the NHibernate version, we started an explicit database transaction when inserting new records in the database to ensure that all the inserts were done in a single transaction.

We didn't have to do this with Entity Framework as EF started this explicit transaction for us. This is one of the major difference in behaviour between NHibernate's Session and Entity Framework's DbContext. We'll cover this in more detail in a future post.


In the next post in this series, we'll take a look at one point of confusion you'll hit early on when working with Entity Framework: the difference between ObjectContext and DbContext.