You Might Not Need the Repository Pattern
In recent years, more and more often I see developers reaching for the Repository Pattern when using Entity Framework. Even Microsoft's own tutorials show uing the Repository Pattern with Entity Framework. However, you might not need to reach for such a pattern.
According to Martin Fowler:
A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes. Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers.
That all sounds great in principle, but from what I've seen, it doesn't always turn out so clean in practice. In many cases it seems people copy the basic Repository Pattern tutorial and never think about their specific needs.
Let us also consider that Entity Framework already implements a great Repository Pattern. Do we really need to create our own?
Ironically enough, Microsoft pushes the Repository Pattern in their Entity Framework tutorials. Let's look at an example from Microsoft's website:
using System; using System.Collections.Generic; using System.Linq; using System.Data; using ContosoUniversity.Models; namespace ContosoUniversity.DAL { public class StudentRepository : IStudentRepository, IDisposable { private SchoolContext context; public StudentRepository(SchoolContext context) { this.context = context; } public IEnumerable<Student> GetStudents() { return context.Students.ToList(); } public Student GetStudentByID(int id) { return context.Students.Find(id); } public void InsertStudent(Student student) { context.Students.Add(student); } public void DeleteStudent(int studentID) { Student student = context.Students.Find(studentID); context.Students.Remove(student); } public void UpdateStudent(Student student) { context.Entry(student).State = EntityState.Modified; } public void Save() { context.SaveChanges(); } private bool disposed = false; protected virtual void Dispose(bool disposing) { if (!this.disposed) { if (disposing) { context.Dispose(); } } this.disposed = true; } public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } } }
In this example straight from Microsoft's website we create a StudentRepository that implements a IStudentRepository
interface with all the methods for Get, Insert, Update, Save. All we've done is create one abstraction on top of an almost identical abstraction that Entity Framework already provides. Next the tutorial has you update the controller to use the StudentRepository
. One sample controller endpoint looks like:
[HttpPost] [ValidateAntiForgeryToken] public ActionResult Create( [Bind(Include = "LastName, FirstMidName, EnrollmentDate")] Student student) { try { if (ModelState.IsValid) { studentRepository.InsertStudent(student); studentRepository.Save(); return RedirectToAction("Index"); } } catch (DataException /* dex */) { //Log the error (uncomment dex variable name after DataException and add a line here to write a log. ModelState.AddModelError(string.Empty, "Unable to save changes. Try again, and if the problem persists contact your system administrator."); } return View(student); }
The question I would ask is What value does this bring? Essentially we have re-implemented what Entity Framework already gives us except we renamed the actions. Instead of Add
we create a method called Insert
, instead of Remove
we create Delete
, etc. The DbContext already gives us all these actions just with slightly different naming than our Repository. Entity Framework already implements the Repository Pattern for us. Beyond naming, the main difference in the tutorial is that we now create a Repository class and interface for each DbSet in our DbContext. I find zero value in the above code. It doesn't really solve a problem in the code. This concept of duplicating what Entity Framework already provides doesn't make Unit Testing any easier nor does it remove any complexity from the code.
Microsoft's tutorial then solves this problem of needing a separate interface for each Repository by then creating a generic base class Repository and interface. This also involves creating a Unit of Work Pattern to handle saving data separate from the repository.
The generic repository for the Repository Pattern with a subsequent Unit of Work Pattern to handle saving:
using System; using System.Collections.Generic; using System.Linq; using System.Data; using System.Data.Entity; using ContosoUniversity.Models; using System.Linq.Expressions; namespace ContosoUniversity.DAL { public class GenericRepository<TEntity> where TEntity : class { internal SchoolContext context; internal DbSet<TEntity> dbSet; public GenericRepository(SchoolContext context) { this.context = context; this.dbSet = context.Set<TEntity>(); } public virtual TEntity GetByID(object id) { return dbSet.Find(id); } public virtual void Insert(TEntity entity) { dbSet.Add(entity); } public virtual void Delete(object id) { TEntity entityToDelete = dbSet.Find(id); Delete(entityToDelete); } public virtual void Delete(TEntity entityToDelete) { if (context.Entry(entityToDelete).State == EntityState.Detached) { dbSet.Attach(entityToDelete); } dbSet.Remove(entityToDelete); } } }
Gotcha # 1
Before we get to the Unit of Work, a known gotcha I have seen time and time again when the Repository Pattern is used without a Unit of Work is that people naively create an Insert
method (along with all the basic CRUD operations - every repository implements the same generic interface) that also does the saving in the method (so nothing outside the repository knows about the database) like this:
public void InsertStudent(Student student) { context.Students.Add(student); context.SaveChanges(); }
But then someone comes along and iterates over a large collection calling Insert
on each item and saving it to the database.
foreach(var student in students) { studentRepository.InsertStudent(student); }
This [unintentionally] creates a unique call to the database every time SaveChanges
is called thus creating many more open connections to the database than necessary. To solve this problem, Microsoft recommends separating out the saving to a Unit of Work.
and the Unit of Work:
using System; using ContosoUniversity.Models; namespace ContosoUniversity.DAL { public class UnitOfWork : IDisposable { private SchoolContext context = new SchoolContext(); private GenericRepository<Department> departmentRepository; private GenericRepository<Course> courseRepository; public GenericRepository<Department> DepartmentRepository { get { if (this.departmentRepository == null) { this.departmentRepository = new GenericRepository<Department>(context); } return departmentRepository; } } public GenericRepository<Course> CourseRepository { get { if (this.courseRepository == null) { this.courseRepository = new GenericRepository<Course>(context); } return courseRepository; } } public void Save() { context.SaveChanges(); } private bool disposed = false; protected virtual void Dispose(bool disposing) { if (!this.disposed) { if (disposing) { context.Dispose(); } } this.disposed = true; } public void Dispose() { Dispose(true); GC.SuppressFinalize(this); } } }
Example Controller methods from the same tutorial:
[HttpPost] [ValidateAntiForgeryToken] public ActionResult Create( [Bind(Include = "CourseID,Title,Credits,DepartmentID")] Course course) { try { if (ModelState.IsValid) { unitOfWork.CourseRepository.Insert(course); unitOfWork.Save(); return RedirectToAction("Index"); } } catch (DataException /* dex */) { //Log the error (uncomment dex variable name after DataException and add a line here to write a log.) ModelState.AddModelError("", "Unable to save changes. Try again, and if the problem persists, see your system administrator."); } PopulateDepartmentsDropDownList(course.DepartmentID); return View(course); } [HttpPost, ActionName("Delete")] [ValidateAntiForgeryToken] public ActionResult DeleteConfirmed(int id) { Course course = unitOfWork.CourseRepository.GetByID(id); unitOfWork.CourseRepository.Delete(id); unitOfWork.Save(); return RedirectToAction("Index"); }
Again, we are essentially re-implementing what EF already gave us. Replace unitOfWork
with dbContext
and Save
with SaveChanges
and we are back to plain Entity Framework.
Gotcha # 2
Another gotcha is that rarely do real world applications fit neatly into this generic repository pattern where every repository implements the same interface of Add/Get/Delete/etc so cleanly. Repositories are often related to one another and overlap with one another. A Student is related to and has Classes, Grades, etc and vice versa.
You may ask isn't he just straw-manning a basic tutorial? Sure, in some sense. Are there potential valid use-cases of this pattern? Yes. Do most people need it? No. With all the code we write, we should ask ourselves What value does this bring? Do I have a legitimate problem that this solves?
Instead of implementing an abstraction to sit in front of an identical abstraction, I would venture that many cases can get away with a basic Service Pattern. Something like:
using System; using System.Collections.Generic; using System.Linq; using System.Data; using ContosoUniversity.Models; namespace ContosoUniversity.DAL { public class StudentService : IStudentService { private SchoolContext context; public StudentService(SchoolContext context) { this.context = context; } public List<Student> SearchStudents(string name) { return context.Students.Where(_ => _.Name == name).ToList(); } public Student GetStudentByID(int id) { return context.Students.Find(id); } public void EnrollStudent(Student student) { context.Students.Add(student); //add classes, enrollment, etc context.SaveChanges(); } } }
Sometimes I have seen this pattern also called Repository but instead of following the generic Repository Pattern(Add/Get/Delete/Save), it uses workflow specific methods. Thus every "Repository" class might look different based on the needs of the workflows. That way it is actually a meaningful abstraction, not just a duplicate of what Entity Framework already provides.
Entity Framework already gives us the abstraction between our business layer and the database, one which can be easily tested, mocked, and pointed to a different provider (such as an in-memory provider). We don't need to be scared of injecting the DbContext in places we need it, though I agree that it can be best to isolate our DbContext to certain subset of classes. If we need an abstraction between Entity Framework and our business layer, create one, but EF already provides the generic Repository Pattern - no need to re-invent the CRUD wheel. For more complex needs beyond a basic service, look at the Command and Query Pattern. Rob Conery has a great article on why Command and Query can be better than Repository.
Abstractions are useful when they actually provide value to us. Too often we use abstractions we don't need just because that's the way we've done it or been taught. Overall, the less abstraction we use, the greater the clarity we have in our applications. Don't use what we don't need. I'd rather see a naively executed declarative pattern that is easy to see what happens on a given workflow than to hunt through layers of abstraction that serves little purpose. Greater clarity and explicitness decreases the opportunity for hidden bugs.