Hierarchy in the Entity Framework 6 with the hierarchyid type

data: 3 kwietnia, 2019
czas czytania: 11 min
autor: Adrian Rojek

Storing hierarchical data in a database is surely not a trivial thing. While searching for the best solution, I came across many design patterns answering this problem. Each of them solves some aspects of this issue and has advantages in different domains. One of the patterns is readable for humans (e.g. Materialized Path), other is very fast in data selecting (e.g. Nested sets) etc. Unfortunately, also each of them has weak points: one poses a problem with maintenance, another is unreadable to humans or requires complicated update procedures. Finally, in the course of my research, I encountered the hierarchyid type, which appeared to be perfect for what I wanted to achieve. The only question was whether it could be used with Entity Framework 6 or not. In this article, I am presenting an alternative approach to implementing the hierarchy structure in a database. In order to do that I’m going to combine features provided by the hierarchyid data type with Entity Framework 6.

[markdown]
## Hierarchyid – what is it?
Hierarchyid is a data type available in SQL server 2008, which can be used to implement hierarchy structure. It’s a variable-length type so it takes as low space as possible. Even though it’s a binary type, it still has a human-friendly representation:

/2/5/

/

/3/2/0.1/5/

/0.3/0.2/

This path representation can be applied to operations on a hierarchy, which makes it more readable.

## Where hierarchyid can be handy
In general, this type can be used in any case when we want to store hierarchy data. To understand how to handle this type and what benefits it provides let’s take a look at the example. We are supposed to create a system for a customer in which we store information about the company structure. One part of this system is information about the employees, their position, and who is whose subordinate. This is our hierarchy:
* Jane Calista – CEO
* Oscar Ellie – manager
* Salena Roosevelt – team leader
* Pearlie Sherley – software developer
* Reggie Lucius – software developer
* Carina Franklin – quality assurance
* Clayton Calla – team leader
* Erick Kris – software developer
* Ariana Shelton – manager
* Brand Hester – team leader
* Eugenia Laci – software developer
* Presley Celestine – quality assurance

Our table can look like below:

![hierarchyid1](https://kariera.future-processing.pl//wp-content/uploads/2019/04/hierarchyid1-1.png)

Hierarchyid type provides a couple of useful methods to operate within a hierarchy e.g.: GetAncestor, GetDescendant, GetLevel, IsDescendantOf etc. We can use those methods in queries as shown in the example below.

Supposedly, we want to see all subordinates of Salena Roosevelt who is a team leader. To do that we can use the following query:

select Id, Name, Position, Node.ToString() as Node 
 from Employees  
 where Node.IsDescendantOf('/1/1/') = 1 

This query produces such a result:

All we should know is the Node value of Salena, and then the IsDescendantOf method does the job. It’s worth mentioning that Salena is included in a result. This is because the parent node is considered its own descendant. Node.ToString() is only dedicated to a ‘path like’ representation and without it, the result looks like that:

This case gets complicated though when we add a new record between the two existing. Let’s insert Ayden Noelene between Pearlie Sherley and Reggie Lucius (id 4 and 5). In order to do that, we need to know a node value for this new record. GetDescendant method can save the day. This method should be executed on the parent node and takes two nodes in between which we want to put our new record.

select convert(hierarchyid, '/1/').GetDescendant('/1/1/', '/1/2/').ToString()

This query will produce: /1/1.1/ as a value for a new record

To add a record before any existing one, the first parameter should be null and the second one takes a node value of the first element in the node:

select convert(hierarchyid, '/1/').GetDescendant(null, '/1/1/').ToString()

The result is like that: /1/0/

To add a record after the last one, the first parameter should have a value of the last node and the second parameter should be null:

select convert(hierarchyid, '/1/').GetDescendant('/1/2/', null).ToString()

The result is: /1/3/

As we can see in the first example, this method generates a part of the path with a decimal value 1.1 which is a valid value for a hierarchyid type as well as the negative values like /1/-1/ for GetDescendant(null, ‘/1/0/’). Basing on this assumption, we can conclude that there is an infinite number of possibilities.

Of course, the conclusion above is not true in practice, but it is so because the algorithm responsible for generating new paths is limited. Hierarchyid type is limited by a length of 892 bytes:

Hierarchyid type is also a variable length. Let’s see how many bytes it takes in our exemplary table:

select Id, Name, Position, Node.ToString() as Path, Node, datalength(Node) as Bytes  
 from Employees  

So far, I have only presented this type’s superlatives. What hierarchyid doesn’t cover is the uniqueness. It is possible to add many records with the same value. If such action is not acceptable, then other mechanisms should be added like indices etc.

## Entity Framework 6 support for the hierarchyid type

Unfortunately, the Entity Framework 6 doesn’t provide support for this data type, but as I mentioned before, hierarchyid is a binary type and so I’m going to present how to take advantage of that. Let’s take our previous example of the Employees table and create a model which allows us to use the hierarchyid features without losing a readable path-like representation.

Before we start though, we need to assure ourselves that we have the following NuGet packages referenced to our project: EntityFramework, EntityFramework.CodeFirstStoreFunctions and Microsoft.SqlServer.Types.

Let’s start with creating an Employee entity:

Public class Employee 
 { 
     public int Id { get; set; } 

     public string Name { get; set; } 

     public string Position { get; set; } 

     [Required] 
     [Index("IX_UniqueNode", IsUnique = true)] 
     [MaxLength(892)] 
     public byte[] Node { get; set; } 
 } 

As I stated earlier, the hierarchyid type is currently not supported in the Entity Framework 6. Despite the fact that we have access to the CLR version of the hierarchyid type, which may be found in Microsoft.SqlServer.Types.SqlHierarchyId, we cannot use it in our model – the Entity Framework doesn’t recognise it. We can use the fact that the hierarchyid is a binary type and create a Node as a binary array. At this point, we protect ourselves from any potential issues by setting a maximum length for this column to 892 – which is a maximum length of the hierarchyid type. A unique index can be created to protect from further creating entities with the same node value.

Our simple context:

public class MyContext : DbContext 
 { 
     public IDbSet Employees { get; set; }  
 } 

We need some helper methods to convert byte array or path to CLR version of the hierarchyid type and vice versa:

public static class HierarchyExtensions 
 { 
     public static byte[] ToByteArray(this SqlHierarchyId id) 
     { 
         if (id.IsNull) 
         { 
             return null; 
         } 

         using (var stream = new MemoryStream()) 
         { 
             using (var writer = new BinaryWriter(stream)) 
             { 
                 id.Write(writer); 
                 return stream.ToArray(); 
             } 
         } 
     } 

     public static SqlHierarchyId ToSqlHierarchyId(this string value) 
     { 
         return string.IsNullOrEmpty(value)  
             ? SqlHierarchyId.Null  
             : SqlHierarchyId.Parse(value); 
     } 

     public static SqlHierarchyId ToSqlHierarchyId(this byte[] value) 
     { 
         if (value == null) 
         { 
             return SqlHierarchyId.Null; 
         } 

         using (var stream = new MemoryStream(value, false)) 
         { 
             using (var reader = new BinaryReader(stream)) 
             { 
                 SqlHierarchyId id; 
                 id.Read(reader); 
                 return id; 
             } 
         } 
     } 
 } 

Now we can extend our model so we will be able to use a path-like representation of the node:

public class Employee 
 { 
     private byte[] node; 
     private SqlHierarchyId nodeSql; 
  
     public int Id { get; set; } 

     public string Name { get; set; } 

     [Required] 
     [Index("IX_UniqueNode", IsUnique = true)] 
     [MaxLength(892)] 
     public byte[] Node 
     { 
         get => node; 

         set 
         { 
             node = value; 
             nodeSql = node.ToSqlHierarchyId(); 
         } 
     } 

     [NotMapped] 
     public string NodePath 
     { 
         get => nodeSql.ToString(); 
         set => Node = value.ToSqlHierarchyId().ToByteArray(); 
     } 

     public string Position { get; set; } 
 } 

With NodePath we can use a path-like representation of the node. Let’s see how it looks after adding a record to our table:

using (var context = new MyContext()) 
 { 
     context.Employees.Add( 
         new Employee 
         { 
             Name = "Carina Franklin", 
             Position = "quality assurance", 
             NodePath = "/1/1/3/" 
         }); 

     context.SaveChanges(); 
 } 

By following the SQL query, we get the results as below:

select Name, Node, convert(hierarchyid, Node).ToString() as Path  
 from Employees  
 where Name = 'Carina Franklin' 

 Name            Node   Path 
 --------------- ------ ------- 
 Carina Franklin 0x5ADE /1/1/3/ 

As you can see, the Node can be converted to the hierarchyid type. We can use this fact and implement the solution which allows using the hierarchyid methods when operating on a database.

## LINQ-to-Entities and the hierarchy operations
Hierarchyid type provides a couple of useful methods to operate on a hierarchy e.g.: GetAncestor, GetDescendant, GetLevel, IsDescendantOf etc. Unfortunatly, we do not have direct access to those methods in the Entity Framework 6. Even if we had, we would not be able to use them because in the database our Node column would be of the varbinary type (892) instead of the hierarchyid.

Let’s think about another case. We want to query our employees to find all of them whose first name is ‘Carina’. We can do that with the following query:

context.Employees.Where(e => e.Name.StartsWith("Carina "))

– but this is not the only solution. We can also use DbFunctions like there:

context.Employees.Where(e => DbFunctions.Like(e.Name, "Carina %"))

I’m aware that the SQL function is not a popular solution in the Entity Framework. We can avoid them in most of the cases, but still, I would encourage not being afraid of them. In some edge cases, this can be the best (if not the only) solution.

I will create two methods: GetLevel and IsDescendantOf which are responsible for converting Node value to the hierarchyid type and executing the hierarchyid function.

public static class HierarchyIdDbFunctions 
 { 
     private const string ErrorMessage = "{0} can be used only in linq to entity query"; 

     [DbFunction("CodeFirstDatabaseSchema", "GetLevel")] 
     public static int GetLevel(byte[] node) 
     { 
         throw new NotSupportedException(string.Format(ErrorMessage, nameof(GetLevel))); 
     } 

     [DbFunction("CodeFirstDatabaseSchema", "IsDescendantOf")] 
     public static byte IsDescendantOf(byte[] node, byte[] parent) 
     { 
         throw new NotSupportedException(string.Format(ErrorMessage, nameof(IsDescendantOf))); 
     } 
 } 

We need to add a convention to the model builder:

public class MyContext : DbContext 
 { 
     public IDbSet Employees { get; set; } 

     protected override void OnModelCreating(DbModelBuilder modelBuilder) 
     { 
         modelBuilder.Conventions.Add( 
             new FunctionsConvention("dbo", typeof(HierarchyIdDbFunctions))); 

         base.OnModelCreating(modelBuilder); 
     } 
 } 

The last thing is to add implementation. It can be done with migration, in a seed method or any other preferred way. For keeping things simple, I’ll add it by a migration:

public partial class Added_HierarchyIdDbFunctions_implementation : DbMigration 
 { 
     public override void Up() 
     { 
         Sql("create function [dbo].[IsDescendantOf] " + 
             "(@node varbinary(892), @parent varbinary(892)) " + 
             "returns bit as " + 
             "begin " + 
             "return convert(hierarchyid, @node).IsDescendantOf(@parent) " + 
             "end"); 
         Sql("create function [dbo].[GetLevel] " + 
             "(@node varbinary(892)) " + 
             "returns int as " + 
             "begin " + 
             "return convert(hierarchyid, @node).GetLevel() " + 
             "end"); 
     } 

     public override void Down() 
     { 
     } 
 } 

Now we can use these functions in the LINQ-to-Entities queries.

To get all the subordinates of Salena Roosevelt:

using (var context = new MyContext()) 
 { 
     var employee = context.Employees.FirstOrDefault(e => e.Name == "Salena Roosevelt"); 
     var subordinates = context.Employees.Where( 
         e => HierarchyIdDbFunctions.IsDescendantOf(e.Node, employee.Node) == 1); 
 } 

To get the levels for all employees:

using (var context = new MyContext()) 
 { 
     var levels = 
         context.Employees 
             .Select( 
                 e => new 
                 { 
                     e.Name, 
                     Level = HierarchyIdDbFunctions.GetLevel(e.Node) 
                 }); 
 } 

## Summary
The solution is far from ideal. Wouldn’t it be nice to use this type without SQL functions and without converting the varbinary type to the hierarchyid and vice versa? It surely would, but still, thanks to the hierarchyid type, our code is going to be more readable. Without conversions, our queries will be executed faster. Additionally, our database structure is going to match what we are currently using. In the presented solution, the Node column is a different type. If this type was the hierarchyid instead of the varbinary, it would be more user-friendly from outside of the Entity Framework. Without a doubt, implementation and usage of the hierarchy structure supported by the hierarchyid type are very easy. Most of the operations are handled by this type itself. The downside is that it isn’t currently supported by the Entity Framework 6 natively. I hope it will change in the future and such workarounds won’t be necessary.

## Resources
* [Hierarchyid data type method reference](https://docs.microsoft.com/en-us/sql/t-sql/data-types/hierarchyid-data-type-method-reference?view=sql-server-2017)

[/markdown]

Newsletter IT leaks

Dzielimy się inspiracjami i nowinkami z branży IT. Szanujemy Twój czas - obiecujemy nie spamować i wysyłać wiadomości raz na dwa miesiące.

Subscribe to our newsletter

Administratorem Twoich danych osobowych jest Future Processing S.A. z siedzibą w Gliwicach. Twoje dane będziemy przetwarzać w celu przesyłania cyklicznego newslettera dot. branży IT. W każdej chwili możesz się wypisać lub edytować swoje dane. Więcej informacji znajdziesz w naszej polityce prywatności.

Subscribe to our newsletter

Administratorem Twoich danych osobowych jest Future Processing S.A. z siedzibą w Gliwicach. Twoje dane będziemy przetwarzać w celu przesyłania cyklicznego newslettera dot. branży IT. W każdej chwili możesz się wypisać lub edytować swoje dane. Więcej informacji znajdziesz w naszej polityce prywatności.