Real world authorization implementation considerations
Nitpicker corner: this post discusses authorization, which assumes that you already know who the user is. Discussion of authentication methods, how we decide who the user is, would be outside the scope of this post.
I had a lot of experience with building security systems. After all, sooner or later, whatever your project is, you are going to need one. At some point, I got tired enough of doing that that I wrote Rhino Security, which codify a lot of the lessons that I learned from all of those times. And I learned a lot from using Rhino Security in real world projects as well.
When coming to design the authorization bundle for RavenDB, I had decided to make a conscious effort to detail the underlying premise that I have when I am approaching the design of a security system.
You can’t manage authorization at the infrastructure level
That seems to be an instinctual response by most developers when faced with the problem, “we will push it to the infrastructure and handle this automatically”. The usual arguments is that we want to avoid the possibility of the developer forgetting to include the security checks and that it makes it easier to develop.
The problem is that when you put security decisions in the infrastructure, you are losing the context in which a certain operation is performed. And context matters. It matters even more when we consider the fact that there are actually two separate levels of security that we need to consider:
- Infrastructure related – can I read / write to this document?
- Business related – can I perform [business operation] on this entity?
Very often, we try to use the first to apply the second. This is often the can when we have a business rule that specify that a user shouldn’t be able to access certain documents which we try to apply at the infrastructure level.
For a change, we will use the example of a debt collection agency.
As a debt collector, I can negotiate a settlement plan with a debtor, so the agency can resolve the debt.
- Debt collectors can only negotiate settlement plans for debts under 50,000$
- Only managers can negotiate settlement plans for debts over 50,000$
Seems simple, right? We will assume that we have a solution in store and say that the role of DebtCollectors can’t read/write to documents about settlement plans of over 50K$. I am not sure how you would actually implement this, but let us say that we did just that. We solved the problem at the infrastructure level and everyone is happy.
Then we run into a problem, a Debt Collector may not be allow to do the actual negotiation with a heavy debtor, but there is a whole lot of additional work that goes on that the Debt Collector should do (check for collateral, future prospects, background check, etc).
The way that the agency works, the Debt Collector does a lot of the preliminary work, then the manager does the actual negotiation. That means that for the same entity, under different contexts, we have very different security rules. And these sort of requirements are the ones that are going to give you fits when you try to apply them at the infrastructure level.
You can argue that those sort of rules are business logic, not security rules, but the way the business think of them, that is exactly what they are.
The logged on user isn’t the actual user
There is another aspect for this. Usually when we need to implement security system like this, people throw into the ring the notion of Row Level Security and allowing access to specific rows by specific logins. That is a non starter from the get go, for several reasons. The previous point about infrastructure level security applies here as well, but the major problem is that it just doesn’t work when you have more than a pittance of users.
All Row Level Security solutions that I am aware of (I am thinking specifically of some solutions provided by database vendors) requires you to login into the database using a specific user, from which your credentials can be checked against specific rows permissions.
Consider the case where you have a large number of users, and you have to login to the database for each user using their credentials. What is going to be the affect on the system?
Well, there are going to be two major problems. The first is that you can wave goodbye to small & unimportant things like connection pooling, since each user have their own login, they can’t share connections, which is going to substantially increase the cost of talking to the database.
The second is a bit more complex to explain. When the system perform an operation as a result of a user action, there are distinct differences between work that the system performs on behalf of the user and work that the system performs on behalf of the system.
Let us go back to our Debt Collection Agency and look at an example:
As a Debt Collector, I can finalize a settlement plan with a debtor, so the agency can make a lot of money.
- A Debt Collector may only view settlement plans for the vendors that they handle debt collection for.
- Settlement plan cannot be finalized if (along with other plans that may exists) the settlement plan would result in over 70% of the debtor salary going into paying debts.
This is pretty simple scenario. If I am collecting debts for ACME, I can’t take a peek and see how debts handle be EMCA, ACME’s competitor, are handled. And naturally, if the debtor’s income isn’t sufficient to pay the debt, it is pretty obvious that the settlement plan isn’t valid, and we need to consider something else.
Now, let us look at how we would actually implement this, the first rule specifies that we can’t see other settlement plans, but for us to enforce the second rule, we must see them, even if they belong to other creditors. In other words, we have a rule where the system need to execute in the context of the system and not in the context of the user.
You will be surprised how often such scenarios come up when building complex systems. When your security system is relying on the logged on user for handling security filtering, you are going to run into a pretty hard problem when it comes the time to handle those scenarios.
Considerations
So, where does this leave us? It leave us with the following considerations when the time comes to build an authorization implementation:
- You can’t handle authorization in the infrastructure, there isn’t enough context to make decisions there.
- Relying on the logged on user for row/document level security is a good way to have a wall hit your head in a considerable speed.
- Authorization must be optional, because we need to execute some operations to ensure valid state outside the security context of a single user.
- Authorization isn’t limited to the small set of operations that you can perform from infrastructure perspective (Read / Write) but have business meaning that you need to consider.
An interesting RavenDB bug
I got a very strange bug report recently,
The following index:
from movie in docs.Movies
from actor in movie.Actors
select new { Actor = actor }
Will produce multiple results from a single document, which poses a pretty big problem when you try to page through that. Imagine that each movie has 10 actors, and you are trying to page through this index for the first two documents of movies by Charlie Chaplin. The first movie that matches Charlie Chaplin will have ten results returned from the index, and simple paging at the index level will give us the wrong results.
Here is my solution for that, which works, but make me just a tad uneasy:
public IEnumerable<IndexQueryResult> Query(IndexQuery indexQuery)
{
IndexSearcher indexSearcher;
using (searcher.Use(out indexSearcher))
{
var previousDocuments = new HashSet<string>();
var luceneQuery = GetLuceneQuery(indexQuery);
var start = indexQuery.Start;
var pageSize = indexQuery.PageSize;
var skippedDocs = 0;
var returnedResults = 0;
do
{
if(skippedDocs > 0)
{
start = start + pageSize;
// trying to guesstimate how many results we will need to read from the index
// to get enough unique documents to match the page size
pageSize = skippedDocs * indexQuery.PageSize;
skippedDocs = 0;
}
var search = ExecuteQuery(indexSearcher, luceneQuery, start, pageSize, indexQuery.SortedFields);
indexQuery.TotalSize.Value = search.totalHits;
for (var i = start; i < search.totalHits && (i - start) < pageSize; i++)
{
var document = indexSearcher.Doc(search.scoreDocs[i].doc);
if (IsDuplicateDocument(document, indexQuery.FieldsToFetch, previousDocuments))
{
skippedDocs++;
continue;
}
returnedResults++;
yield return RetrieveDocument(document, indexQuery.FieldsToFetch);
}
} while (skippedDocs > 0 && returnedResults < indexQuery.PageSize);
}
}
iTunes full screen movies incompatible with Large Font sizes?
I have a PC hooked to my TV, but the problem is that there seems to be a bug in iTunes, when I set the font site to be large enough to actually be readable, like so:
I lose the ability to view full screen movies in iTunes, when I switch the movie to full screen, it continues playing (I can hear it) but the display switch back to the iTunes library, rather than the movie.
I verified the resetting the font size fixes this problem, and this is in iTunes 9.2.1.
Anyone run into this? Any solutions?
Tikal .NET forum: Introduction to NHibernate
I’ll be presenting on July 25th 10:00-11:30 in the Tikal .NET forum:
Tikal .NET forum is delighted to present an introduction to NHibernate, the leading and most advanced open source ORM (Object Relational Mapping) in the .NET domain with integrated support for concurrency, distribution, fault tolerance and incremental code loading. ORM takes care of the burden of mapping between your .NET entities and the underlying relational database.
Following a concise introduction into the motivation for the rising interests ORM, we will provide an general idea of the key features of the NHibernate framework compared to other frameworks , and talk about the impact they have on the production of highly scalable and fault-tolerant systems.
See you all on July 25th at 10:00 in Krypton, Hakfar-Hayarok Ramat-Hasharon.
Buy vs. Build & YAGNI
I was recently at the Israeli ALT.Net tools night, and I had a very interesting discussion on installers. Installers are usually a very painful part of the release procedure. The installer format for Windows is MSI, which is… strange. It takes time to understand how MSI work, and even after you got that, it is still painful to work with. Wix is a great improvement when it comes to building MSI installations, but that doesn’t make it good. Other installer builders, such as InstallSheild and NSIS are just as awkward.
The discussion that I had related to the complexity of building an installer on those technologies.
My argument was that it simply made no sense to try to overcome the hurdles of the installer technologies, instead, we can write our own installer more easily than fussing about the existing ones. The installer already assumes the presence of the .NET framework, so that make things even easier.
This is an application of a principle that I strongly believes in: Single purpose, specially built tools & components can be advantageous over more generic ones, for your specific scenarios.
Case in point, the installer. Installers are complicated beasts because they must support a lot of complex scenarios (upgrading from 5.3.2 to 6.2.1, for example), be transactional, support installation, etc. But for the installer in question, upgrade is always an uninstall of the previous version & install of the new one, and the only tasks it requires is copying files and modifying registry entries.
Given that set of requirements, we can design the following installer framework:
public interface IInstallerTask
{
void Install();
void Uninstall();
}
public class FileCopyTask : IInstallerTask
{
public string Source { get;set; }
public string Destination { get;set; }
public void Install()
{
File.Copy(Source, Destination,overwrite:true);
}
public void Uninstall()
{
File.Delete(Destination);
}
}
And building a particular installer would be:
ExecuteInstaller(
Directory.GetFiles(extractedTempLocation)
.Select( file =>
new FileCopyTask
{
Source = file,
Destination = Path.Combine(destinationPath, file)
}
),
new RegistryKeyTask
{
Key = "HKLM/Windows/CurrentVersion...",
Value = 9
}
);
This gives the ExecuteInstaller method a list of tasks to be executed, which can then be used to installer or uninstall everything.
Yes, it is extremely simple, and yes, it wouldn’t fit many scenarios. But, it is quick to do, match the current and projected requirements, doesn’t introduce any new technology to the mix and it works.
Contrast that with having someone on the team that is the Installer expert (bad) or having to educate the entire team about installer (expensive).
NHProf new feature: Expensive queries report
It has been a while since we had a new major feature for the profiler, but here it is:
The expensive queries report will look at all your queries and surface the most expensive ones across all the sessions. This can give you a good indication on where you need to optimize things.
Naturally, this feature is available across all the profiler profiles (NHibernate Profiler, Entity Framework Profiler, Linq to SQL Profiler and Hibernate Profiler).
Find the issue
There is a design issue that is revealed in the following tests, can you figure out why I changed the behavior and removed the tests?
NoSQL and Data Warehousing
I recently got this question on email, and I thought it would be a good subject for a post.
I wanted to get your thoughts about using NoSQL for data warehouse solutions. I have read mixed thoughts about this and curious where you stand.
Before we can talk about this, we need to understand what data warehousing is, using wise geek definition, that is:
Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course. The final result, however, is homogeneous data, which can be more easily manipulated.
And if you follow that definition, it make an absolute sense to ask about data warehousing in a NoSQL situation. But remember, one of the things that tend to lead people to the NoSQL land is the desire to scale in some manner (more data, more users, higher concurrency, cheaper TCO) than is possible using a SQL solution. In order to achieve that goal, you have to be willing to accept the tradeoff associated with that, which is reduced flexibility. You can query a relational database every which way, but most NoSQL solutions have very strict rules about how you can query them, for example.
By the way, I am probably abusing the term SQL here. I meant the whole set of technologies generally associated with relational databases, so in this case, I am talking about OLAP data stores, which are the typical solution for data warehousing scenarios. OLAP is usually queried with MDX, which looks like this:
SELECT
{ [Measures].[Sales Amount],
[Measures].[Tax Amount] } ON COLUMNS,
{ [Date].[Fiscal].[Fiscal Year].&[2002],
[Date].[Fiscal].[Fiscal Year].&[2003] } ON ROWS
FROM [Adventure Works]
WHERE ( [Sales Territory].[Southwest] )
OLAP & MDX, like the relational database & SQL, gives us a lot of flexibility and power. But like relational databases, those come at a cost. At some point, if you have enough data, it gets impractical to store it all in a single server, and the usual arguments for NoSQL solutions come to the fore.
At that point, we have to decide what is it that we want to get from the data warehouse. In other words, we need to design our solution to match the kind of reports that we want to get out. Of the NoSQL solutions out there (Key/Value stores, Document Databases, Graph Databases, Column Family Databases) I would probably choose a Column Family database for such a task, since my primary concern is probably being able to handle large amount of data.
The type of reports that I would need would dictate how I would store the data itself, but once I built the schema, everything else should just work.
In short, for data warehousing, I think that the relational / OLAP world has significant advantages, mostly because in many BI scenarios, you want to allow the users to explore the data, which is easy with the SQL toolset, and harder with NoSQL solutions. But when you get too large (and large in OLAP scenarios is really large), you might want to consider limiting the users’ options and going with a NoSQL solution tailor to what they need.
The pitfalls of transparent security
A long time ago, I needed to implement a security subsystem for an application. I figured that it was best to make the entire security subsystem transparent to the developer, under the assumption that if the infrastructure would take care of security, it would make a lot more sense than relying on the developer to remember to add the security calls.
It took me a long while to realize how wrong that decision was. Security is certainly important, but security doesn’t apply to the system itself. In other words, while a specific user may not be allowed to read/write to the audit log, actions that the user made should be written to that log. That is probably the simplest case, but that are a lot of similar ones.
Ever since then, I favored using an explicit approach:
var books = session.CreateQuery("from Books")
.ThatUserIsAllowedToRead(CurrentUser)
.List<Book>();
This also help you implement more interesting features, such as “on behalf of”. And yes, it does put the onus of security on the developer, but considering the alternative, that is a plus.
Challenge: Find the bug
The following code contains a bug that would only occur under rare situations, can you figure out what is the bug?
NHibernate: Streaming large result sets
Note: I am not feeling very well for the past week or so, which is why I am posting so rarely.
NHibernate is meant to be used in an OLTP system, as such, it is usually used in cases where we want to load a relatively small amount of data from the database, work with it and save it back. For reporting scenarios, there are better alternatives, usually (and before you ask, any reporting package will do. Right tool for the job, etc).
But there are cases where you want to do use NHibernate in reporting scenarios nonetheless. Maybe because the reporting requirements aren’t enough to justify going to a separate tool, or because you want to use what you already know. It is in those cases where you tend to run into problems, because you violate the assumptions that were made while building NHibernate.
Let us imagine the following use case, we want to print a list of book names to the user:
using (ISession s = OpenSession())
{
var books = s.CreateQuery("from Book")
.List<Book>();
foreach (var book in books)
{
Console.WriteLine(book.Name);
}
}
There are several problems here:
- We query on a large result set without a limit clause.
- We read a lot of data into memory.
- We only start processing the data after it was completely read from the database.
What I would like to see is something like this:
while(dataReader.Read())
{
Console.WriteLine(dataReader.GetString("Name"));
}
This still suffer from the problem of reading a large result set, but we will consider this a part of our requirements, so we’ll just have to live with it. The data reader code has two major advantages, it uses very little memory, and we can start processing the data as soon as the first row loads from the database.
How can we replicate that with NHibernate?
Well, as usual with NHibernate, it is only a matter of finding the right extension point. In this case, the List method on the query also has an overload that accepts an IList parameter:
That make it as simple as implementing our own IList implementation:
public class ActionableList<T> : IList
{
private Action<T> action;
public ActionableList(Action<T> action)
{
this.action = action;
}
public int Add(object value)
{
action((T)value);
return -1;
}
public bool Contains(object value)
{
throw new NotImplementedException();
}
// ...
}
And now we can call it:
using (ISession s = OpenSession())
{
var books = new ActionableList<Book>(book => Console.WriteLine(book.Name));
s.CreateQuery("from Book")
.List(books);
}
This will have the exact same effect as the pervious NHibernate code, but it will start printing the results as soon as the first result loads from the database. We still have the problem of memory consumption, though. The session will keep track of all the loaded objects, and if we load a lot of data, it will eventually blow out with an out of memory exception.
Luckily, NHibernate has a ready made solution for this, the stateless session. The code now looks like this:
using (IStatelessSession s = sessionFactory.OpenStatelessSession())
{
var books = new ActionableList<Book>(book => Console.WriteLine(book.Name));
s.CreateQuery("from Book")
.List(books);
}
The stateless session, unlike the normal NHibernate session, doesn’t keep track of loaded objects, so the code here and the data reader code are essentially the same thing.
Challenge: Dynamically dynamic
Can you figure out a way to write the following code without using a try/catch?
class Program
{
static void Main(string[] args)
{
dynamic e = new ExpandoObject();
e.Name = "Ayende";
Console.WriteLine(HasProperty("Name", e));
Console.WriteLine(HasProperty("Id", e));
}
private static bool HasProperty(string name, IDynamicMetaObjectProvider dyn)
{
try
{
var callSite =
CallSite<Func<CallSite, object, object>>.Create(
Binder.GetMember(CSharpBinderFlags.None, name, typeof (Program),
new[]
{
CSharpArgumentInfo.Create(
CSharpArgumentInfoFlags.None, null)
}));
callSite.Target(callSite, dyn);
return true;
}
catch (RuntimeBinderException)
{
return false;
}
}
}
The HasProperty method should accept any IDynamicMetaObjectProvider implementation, not just ExpandoObject.
Modeling hierarchical structures in RavenDB
The question pops up frequently enough and is interesting enough for a post. How do you store a data structure like this in Raven?
The problem here is that we don’t have enough information about the problem to actually give an answer. That is because when we think of how we should model the data, we also need to consider how it is going to be accessed. In more precise terms, we need to define what is the aggregate root of the data in question.
Let us take the following two examples:
As you can imagine, a Person is an aggregate root. It can stand on its own. I would typically store a Person in Raven using one of two approaches:
Bare references Denormalized References{
"Name": "Ayende",
"Email": "Ayende@ayende.com",
"Parent": "people/18",
"Children": [
"people/59",
"people/29"
]
}
{
"Name": "Ayende",
"Email": "Ayende@ayende.com",
"Parent": { "Name": "Oren", "Id": "people/18"},
"Children": [
{ "Name": "Raven", "Id": "people/59"},
{ "Name": "Rhino", "Id": "people/29"}
]
}
The first option is bare references, just holding the id of the associated document. This is useful if I only need to reference the data very rarely. If, however, (as is common), I need to also show some data from the associated documents, it is generally better to use denormalized references, which keep the data that we need to deal with from the associated document embedded inside the aggregate.
But the same approach wouldn’t work for Questions. In the Question model, we have utilized the same data structure to hold both the question and the answer. This sort of double utilization is pretty common, unfortunately. For example, you can see it being used in StackOverflow, where both Questions & Answers are stored as posts.
The problem from a design perspective is that in this case a Question is not a root aggregate in the same sense that a Person is. A Question is a root aggregate if it is an actual question, not if it is a Question instance that holds the answer to another question. I would model this using:
{
"Content": "How to model relations in RavenDB?",
"User": "users/1738",
"Answers" : [
{"Content": "You can use.. ", "User": "users/92" },
{"Content": "Or you might...", "User": "users/94" },
]
}
In this case, we are embedding the children directly inside the root document.
So I am afraid that the answer to that question is: it depends.
The cost of money
This is just some rambling about the way the economy works, it has nothing to do with tech or programming. I just had to sit down recently and do the math, and I am pretty annoyed by it.
The best description of how the economy works that I ever heard was in a Terry Prachett’s book, it is called Captain Vimes’ Boots’ Theory of Money. Stated simply, it goes like this.
A good pair of boots costs 50$, and they last for 10 years and keep your feet warm. A bad pair of boots costs 10$ and last only a year or two. After 10 years, the poor boots cost twice as much as the good boots, and your feet are still cold!
The sad part about that is that this theory is quite true. Let me outline two real world examples (from Israel, numbers are in Shekels).
Buying a car is expensive, so a lot of people opts for a leasing option. Here are the numbers associated with this (real world numbers):
 Buying car outright Leasing Upfront payment 120,00042,094.31
Monthly payment (36 payments) 01,435.32
Buying the car (after 3 yrs) [optional] 052,039.67
The nice part of going with a leasing contract is that you need so much less upfront money, and the payments are pretty low. The problem starts when you try to compare costs on more than just how much money you are paying out of pocket. We only have to spent a third.
Let us see what is going to happen in three years time, when we wan to switch to a new car.
 Buying car outright Leasing Upfront payment 120,000.0042,094.31
Total payments 0.0051,671.52
Selling the car -80,000.000.00
Total cost 40,000.00 93,765.83With the upfront payment, we can actually sell the car to recoup some of our initial investment. With the leasing option, at the end of the three years, you are out 93,765.83 and have nothing to show for it.
Total cost of ownership for the leasing option is over twice as much as the upfront payment option.
Buying an apartment is usually one of the biggest expenses that most people do in their life. The cost of an apartment/house in Israel is typically over a decade of a person’ salary. Israel’s real estate is in a funky state at the moment, being one of the only places in the world where the prices keep going up. Here are some real numbers:
- Avg. salary in Israel: 8,611
- Avg. price of an apartment (in central Israel): 1,071,900
It isn’t surprising that most people requires a mortgage to buy a place to live.
Let us say that we are talking about a 1,000,000 price, just to make the math simpler, and that we have 400,000 available for the down payment. Let us further say that we got a good interest rate of the 600,000 mortgage of 2% (if you take more than 60% of the money you are penalized with higher interest rate in Israel).
Assuming fixed interest rate and no inflation, you will need to pay 3,035 for 20 years. But a 2% interest rate looks pretty good, right? It sounds pretty low.
Except over 20 years, you’ll actually pay: 728,400 back on your 600,000 loan, which means that the bank get 128,400 more than it gave you.
The bank gets back 21.4% more money. With a more realistic 3% interest rate, you’ll pay back 33% more over the lifetime of the loan. And that is ignoring inflation. Assume (pretty low) 2% per year, you would pay 49% more to the bank in 2% interest rate and 65% more in 3% interest rate.
Just for the fun factor, let us say that you rent, instead. And assume further that you rent for the same price of the monthly mortgage payment. We get:
ÂMortgage
Rent Upfront payment 400,000.00 0.00 Monthly payment 3,000.00 3,000.00 Total payments (20 years) 720,000.00 720,000.00 Total money out 1,120,000.00 720,000.00 House value 1,000,000.00 0.00 Total cost 120,000.00 720,000.00After 20 years, renting cost 720,000. Buying a house costs 120,000. And yes, I am ignoring a lot of factors here, that is intentional. This isn’t a buy vs. rent column, it is a cost of money post.
But after spending all this time doing the numbers, it all comes back to Vimes’ Boots theory of money.
Table scans, index scans and index seeks, on my!
In general, when you break it down to the fundamentals, a data base is just a fancy linked list + some btrees. Yes, I am ignoring a lot, but if you push aside a lot of the abstraction, that is what is actually going on.
If you ever dealt with database optimizations you are familiar with query plans, like the following (from NHProf):

You can see that we have some interesting stuff going on here:
- Index Seek
- Index Scan
And if you are unlucky, you are probably familiar with the dreaded “your !@&!@ query is causing a table scan!” scream from the DBA. But most of the time, people just know that table scan is slow, index scan is fast and index seek is fastest. I am ignoring things like clustered vs. unclustered indexes, since they aren’t really important for what I want to do.
For simplicity sake, I’ll use the following in memory data structure:
public class DocumentDatabase
{
public List<JObject> Documents = new ...;
public IDictionary<string, IDictionary<JToken, JObject>> Indexes = new ...;
}
To keep things simple, we will only bother with the case of exact matching. For example, I might store the following document:
{ "User": "ayende", "Name": "Ayende Rahien", "Email": "Ayende@ayende.com" }
And define an index on Name & Email. What would happen if I wanted to make a query by the user name?
Well, we don’t really have any other option, we have to do what amounts to a full table scan:
foreach (var doc in Documents)
{
if(doc.User == "ayende")
yield return doc;
}
As you can imagine, this is an O(N) operation, which can get pretty expensive if we are querying a large table.
What happen if I want to find data by name & email? We have an index that is perfectly suited for that, so we might as well use it:
Indexes["NameAndEmail"][new{Name="Ayende Rahien", Email = “Ayende@ayende.com”}];
Note that what we are doing here is accessing the NameAndEmail index, and then making a query on that. This is essentially an index seek.
What happens if I want to query just by email? There isn’t an index just for emails, but we do have an index that contains emails. We have two options, use a table scan, or and index scan. We already saw what a table scan is, so let us look at what is an index scan:
var nameAndEmailIndex = Indexes["NameAndEmail"];
foreach (var indexed in nameAndEmailIndex)
{
if(indexed.Email == "ayende@ayende.com")
yield return indexed;
}
All in all, it is very similar to the table scan, and when using in memory data structures, it is probably not worth doing index scans (at least, not if the index is over the entire data set).
Where index scans prove to be very useful is when we are talking about persistent data sets, where reading the data from the index may be more efficient than reading it from the table. That is usually because the index is much smaller than the actual table. In certain databases, the format of the data on the disk may make it as efficient to do a table scan in some situations as it to do an index scan.
Another thing to note is that while I am using hashes to demonstrate the principal, in practice, most persistent data stores are going to use some variant of trees.
Building data store – indexing data structure
I run into an interestingly annoying problem recently. Basically, I am trying to write the following code:
tree.Add(new { One = 1, Two = 2 }, 13);
tree.Add(new { One = 2, Two = 3 }, 14);
tree.Add(new { One = 3, Two = 1 }, 15);
var docPos = tree.Find(new { One = 1, Two = 2 });
Assert.Equal(13, docPos);
docPos = tree.Find(new { Two = 2 });
Assert.Equal(13, docPos);
docPos = tree.Find(new { Two = 1 });
Assert.Equal(14, docPos);
As you can imagine, this is part of an indexing approach, the details of which aren’t really important. What is important is that I am trying to figure out how to support partial searches. In the example, we index by One & Two, and we can search on both of them. The problem begins when we want to make a search on just Two.
While the tree can compare between partial results just fine, the problem is how to avoid traversing the entire tree for a partial result. The BTree is structured like this:
The problem when doing a partial search is that at the root, I have no idea if I should turn right or left.
What I am thinking now is that since I can’t do a binary search, I’ll have to use a BTree+ instead. Since BTree+ also have the property that the leaf nodes are a linked list, it means that I can scan it effectively. I am hoping for a better option, though.
Any ideas?
Building data stores – Append Only
One of the interesting aspects in building a data store is that you run head on into things that you would generally leave to the infrastructure. By far, most developers deal with concurrency by relegating that responsibility to a database.
When you write your own database, you have to build this sort of thing. In essence, we have two separate issues here:
- Maximizing Concurrency – does readers wait for writers? does writers wait for readers? does writers wait for writers?
- Ensuring Consistency – can I read uncommitted data? can I read partially written data?
As I mentioned in my previous post, there are two major options when building a data store, Transaction Log & Append Only. There are probably a better name for each, but that is how I know them.
This post is going to focus on append only. An append only store is very simple idea in both concept and implementation. It requires that you will always append to the file. It makes things a bit finicky with the type of data structures that you have to use, since typical persistent data structures rely on being able to modify data on the disk. But once you get over that issue, it is actually very simple.
An append only store works in the following manner:
- On startup, the data is read in reverse, trying to find the last committed transaction.
- That committed transaction contains pointers to locations in the file where the actual data is stored.
- A crash in the middle of a write just means garbage at the end of the file that you have to skip when finding the last committed transaction.
- In memory, the only thing that you have to keep is just the last committed transaction.
- A reader with a copy of the last committed transaction can execute independently of any other reader / writer. It will not see any changes made by writers made after it started, but it also doesn’t have to wait for any writers.
- Concurrency control is simple:
- Readers don’t wait for readers
- Readers don’t wait for writers
- Writers don’t wait for readers
- There can be only one concurrent writer
The last one is a natural fallout from the fact that we use the append only model. Only one thread can write to the end of the file at a given point in time. That is actually a performance boost, and not something that would slow the database down, as you might expect.
The reason for that is pretty obvious, once you start thinking about it. Writing to disk is a physical action, and the head can be only in a single place at any given point in time. By ensuring that all writes go to the end of the file, we gain a big perf advantage since we don’t do any seeks.
Building data stores – Transaction log
One of the interesting aspects in building a data store is that you run head on into things that you would generally leave to the infrastructure. By far, most developers deal with concurrency by relegating that responsibility to a database.
When you write your own database, you have to build this sort of thing. In essence, we have two separate issues here:
- Maximizing Concurrency – does readers wait for writers? does writers wait for readers? does writers wait for writers?
- Ensuring Consistency – can I read uncommitted data? can I read partially written data?
As I mentioned in my previous post, there are two major options when building a data store, Transaction Log & Append Only. There are probably a better name for each, but that is how I know them.
This post is going to focus on transaction log. Transaction log is actually pretty simple idea, conceptually. It simply requires that you would state what you intend to do before you do it, in such a way that you can reverse it.
For example, let us say that I want to store “users/ayende” –> "ayende@ayende.com”. All I need to do is to write the following to the transaction log.
{
"Key": "users/ayende",
"OldVal": "AYENDE@AYENDE.COM",
"NewVal": "ayende@ayende.com",
"TxId": 19474774
}
If the data store crashes before the transaction is completed, we can run a recovery process that would resolve any issues in the actual data from the transaction log. Once a transaction is committed, we can safely delete it from the transaction log.
As I said, conceptually it is a very simple idea, but it leads to some interesting implementation challenges:
- You can optimize things by not writing to disk (except for writing to the transaction log) immediately.
- You need to keep track of concurrent transactions touching the same records.
- You need to handle background flushing to disk.
- The crash recovery process can be.. finicky to write.
Concurrency control is something that you essentially have to roll on your own, and you can make it as granular as you feel like. There is some complexity involved in ensuring that you read the appropriate data from the transaction log / data file (based on whatever you are in a transaction reading data you modified or outside it, reading the old data), but where it gets really complex is the number of moving parts that you have to deal with.
Building a managed persistent, transactional, queue
When one approaches building transactional systems, there are two main ways one can approach them.
- Transaction log
- Append only
In both cases, we rely on very important property, fsync. fsync is actually a unix call, which flushes all data to the actual device. In essence, this means “write the the actual hardware and don’t return until the hardware confirmed that the data was saved”. In Windows, the equivalent call is: FlushFileBuffers, or WriteThrough. WriteThrough is better if you need to call FlushFileBuffers every single time, while FlushFileBuffers is significantly faster if you only need to call it once in a while.
FileStream.Flush(true) in .NET 4.0 translate to FlushFileBuffers.
Transaction log systems tend to be more complex than append only systems, but append only systems use more space. Space tend to be pretty cheap, so that is a good tradeoff, I think.
Given that, let us see how we can consider a managed transactional persistent queue. One interesting aspect is that append only, just like immutable data structures, is really not friendly for things like queues. The problem is that adding an item to the queue requires modification to the entire queue, which result in a large number of memory allocations / writes to disk.
The functional crowd has solved the problem long ago, it seems. In essence, a functional queue is composed of two immutable lists, one for the front of the queue and the other for the back of the queue. You add things to the back of the queue, and pop things from the front of the queue. When the front of the queue is empty, you set the front of the queue to be the reversed back of the queue and clear the back. The link should make it clear how this works.
Our first task is actually implementing the list. Since we really only need it to be a stack, I won’t bother with list functionality and just implement a very simple stack. With that, we can implement the actual stack:
public class Stack
{
private readonly Stream reader;
private readonly Stream writer;
private StackItem current;
private readonly BinaryWriterWith7BitEncoding binaryWriter;
private readonly BinaryReaderWith7BitEncoding binaryReader;
private readonly List<StackItem> unwritten = new List<StackItem>();
private class StackItem
{
public long Position { get; set; }
public readonly StackItem NextItem;
public readonly long Value;
public readonly long? Next;
public StackItem(long value, StackItem nextItem)
{
Value = value;
NextItem = nextItem;
}
public StackItem(long value, long? next)
{
Value = value;
Next = next;
}
}
public long? CurrentPosition
{
get
{
return current == null ? (long?)null : current.Position;
}
}
public Stack(Stream reader, Stream writer, StartMode mode)
{
this.reader = reader;
this.writer = writer;
binaryReader = new BinaryReaderWith7BitEncoding(reader);
binaryWriter = new BinaryWriterWith7BitEncoding(writer);
if (mode != StartMode.Open)
return;
current = ReadStackItem(reader.Position);
}
public void Push(byte[] data)
{
var pos = writer.Position;
binaryWriter.Write7BitEncodedInt(data.Length);
binaryWriter.Write(data);
PushInternal(pos);
}
private void PushInternal(long pos)
{
current = new StackItem(pos, current);
unwritten.Add(current);
}
public byte[] Pop()
{
var result = PopInternal(ref current);
if (result == null)
return null;
reader.Position = result.Value;
var size = binaryReader.Read7BitEncodedInt();
var bytes = binaryReader.ReadBytes(size);
return bytes;
}
private long? PopInternal(ref StackItem item)
{
if (item == null)
return null;
unwritten.Remove(item);
var result = item.Value;
if (item.NextItem != null)
item = item.NextItem;
else if (item.Next != null)
item = ReadStackItem(item.Next.Value);
else
item = null;
return result;
}
public void Flush()
{
foreach (var stackItem in unwritten)
{
stackItem.Position = writer.Position;
binaryWriter.WriteBitEncodedNullableInt64(stackItem.Value);
binaryWriter.WriteBitEncodedNullableInt64(stackItem.NextItem != null ? stackItem.NextItem.Position : stackItem.Next);
}
unwritten.Clear();
}
private StackItem ReadStackItem(long position)
{
reader.Position = position;
return new StackItem(
binaryReader.ReadBitEncodedNullableInt64().Value,
binaryReader.ReadBitEncodedNullableInt64()
)
{
Position = position
};
}
public Stack Reverse()
{
var stack = new Stack(reader, writer, StartMode.Create);
var item = current;
while(item != null)
{
var value = PopInternal(ref item);
if(value!=null)
stack.PushInternal(value.Value);
}
return stack;
}
}
The code make several assumptions:
- The stream is using WriteThrough, so once a write was completed, it is saved to the disk.
- It is not our responsibility to keep track of things like the current position, this is handled by higher level code.
- We are allowed to jump around on the reader, but the writer is only doing appends.
Given the stack behavior, we can now implement the queue:
public class Queue
{
private readonly Stream reader;
private readonly Stream writer;
private Stack front;
private Stack back;
private readonly BinaryReaderWith7BitEncoding binaryReader;
private readonly BinaryWriterWith7BitEncoding binaryWriter;
public Queue(Stream reader, Stream writer, StartMode mode)
{
this.reader = reader;
this.writer = writer;
binaryReader = new BinaryReaderWith7BitEncoding(reader);
binaryWriter = new BinaryWriterWith7BitEncoding(writer);
switch (mode)
{
case StartMode.Open:
ReadStacks();
break;
case StartMode.Create:
InitializeEmptyStacks();
break;
}
}
private void InitializeEmptyStacks()
{
front = new Stack(reader, writer, StartMode.Create);
back = new Stack(reader, writer, StartMode.Create);
}
private void ReadStacks()
{
var frontPos = binaryReader.ReadBitEncodedNullableInt64();
var backPos = binaryReader.ReadBitEncodedNullableInt64();
if (frontPos != null)
{
reader.Position = frontPos.Value;
front = new Stack(reader, writer, StartMode.Open);
}
else
{
front = new Stack(reader, writer, StartMode.Create);
}
if (backPos != null)
{
reader.Position = backPos.Value;
back = new Stack(reader, writer, StartMode.Open);
}
else
{
back = new Stack(reader, writer, StartMode.Create);
}
}
public void Enqueue(byte[] data)
{
back.Push(data);
}
public byte[] Dequeue()
{
var result = front.Pop();
if (result != null)
return result;
front = back.Reverse();
back = new Stack(reader, writer, StartMode.Create);
return front.Pop();
}
public void Flush()
{
front.Flush();
back.Flush();
QueuePosition = writer.Position;
binaryWriter.WriteBitEncodedNullableInt64(front.CurrentPosition);
binaryWriter.WriteBitEncodedNullableInt64(back.CurrentPosition);
}
public long QueuePosition { get; private set; }
}
Now, just to make things interesting, let us see what it actually means:
var sp = Stopwatch.StartNew();
using (var writer = new FileStream("data",
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.Delete | FileShare.Read,
16 * 1024,
FileOptions.SequentialScan))
using (var reader = new FileStream("data",
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite | FileShare.Delete,
16 * 1024,
FileOptions.RandomAccess))
{
Queue queue = new Queue(reader, writer, StartMode.Create);
var bytes = new byte[1024*7];
new Random().NextBytes(bytes);
for (int i = 0; i < 100000; i++)
{
queue.Enqueue(bytes);
}
queue.Flush();
writer.Flush(true);
}
Console.WriteLine(sp.ElapsedMilliseconds);
And this completes is a bit over 14 seconds, or over seven thousands (pretty big) items per second.
