Let's rokk! [Tudor Cret's blog]

January 25, 2010

Behind ciripescu.ro, a Windows Azure cloud application (part 2)

Filed under: Technologies — Tudor Cret @ 2:14 pm
Tags: , , , , , , ,

 

Let’s go deeper(1) – web role and storage

As the high level design diagram suggests, the communication between web roles and worker roles is done through the various features of the Storage account, such as Queues. Queues are normally used to send messages/tasks between roles, as they implement all the safety features needed for asynchronous communication between multiple applications, running of different servers. This includes locking and recovery in case one of the roles crashes while still processing a message. Roles are automatically restarted when they crash, and items popped from the queue are made visible again. In fact this is the beauty of Windows Azure, from the perspective of the software developer: you write the code as if you only run one role and the cloud will make it work automatically with any number of roles, to infinitely scale it up. The beast is that Azure Tables are somehow limited, because there are no sorting or grouping options. But of course, there is SQL Azure (the classical SQL relational database – cloud version). Yes but sometime even at a relation database you make some denormalization for speed purposes. So it would have been nice to have at least sort option for Azure Tables.

While message passing between roles is mainly done using Azure Queues, other information are still passed by the traditional way: shared access to resources such as a database. Azure Store offers two different types of non-relational cloud based storage: Blobs (that store entire files) and Azure Tables (that store entities).

Ciripescu.ro uses Azure Tables to store all its business entities, instead of a classic SQL Database. We are still talking of business objects, a business layer and a data access layer, as you would in any other application. The only difference is that the data access layer stores the objects in a non-relational storage, designed to be using in a cloud environment. What makes Azure Tables so special is that it allows you to have very large tables that are still searchable very fast. In Ciripescu’s case, the table containing messages sent between users (Cirips) could have millions of items. Querying a sql table with a few million items can take several seconds. If we think of Twitter we would realize that a few million is more like a joke. How do you search among a table with a billion entities? The answer is simple: split that table on a cloud of servers, by using a relevant partition key and either only search that one partition where the object is, or search all them, but in parallel. This is exactly what Azure Tables does: for each object you define a partition key that will allow Windows Azure to transparently split the table, and a row key that allows fast searching of data in a single partition.

User entity

A business object would have to define those two fields and map relevant properties to them. Take note of the User class from ciripescu.ro, that inherits from TableStorageEntity (that defines the said properties) and maps the username as PatitionKey and String.Empty as row key, through its constructor. The TableStorageEntity class is defines in the Windows Azure SDK.  

In order to query an entity from Azure Tables, one must first create a DataServiceContext. This is a class that must inherit TableStorageDataServiceContext, which in turn inherits from the LINQ class DataServiceContext. Here, the conceptual object model looks like the one from NHibernate, but it’s not the same thing.  

User entity objects retrieval

Everything else is simple LINQ: the programmer creates a DataServiceQuery object for each Table and makes LINQ queries on it:  

var users = from u in Users select u;  

foreach (User u in users) { ….}   

This is how a data access layer class would query items from the storage:  

Quering items from the storage

Inserting, updating and deleting objects is done in the same way LINQ programmers are used  to. Let’s not forget that the data access clases from Azure SDK are build on LINQ classes:  

CRUD operations on User entity

The other entities in the application are built using the same object model.  

Now let us go to the main advantage of a social networking platform made on cloud based technology: the table that holds all the messages (Cirip). Social networking websites can get very popular. Think of tweeter: millions of tweets per day, and they have to not only be stored somewhere but also queried. How do you query a table with 1 billion entries? Azure Tables is the answer. Lets take a look at our Cirip table. Let’s assume two users, Tudor and Pitagora, had the following message exchange:  

Tudor: This is the first post on ciripescu   

Pitagora: hello world  

Pitagora: Windows Azure rocks  

Pitagora: What a good day for a Microsoft presentation  

Tudor: A barbeque would go better 🙂  

Tudor: I want a vacation  

PartitionKey RowKey Content
pitagora 9 What a good day for a Microsoft presentation
pitagora 12 Windows Azure rocks!
pitagora 20 Hello world
tudor 7 I want a vacation
tudor 8 A barbeque would go better 🙂
tudor 25 This is the first post on ciripescu

The PartitionKey and RowKey are two mandatory fields for any entity stores in Azure Tables. The programmer has to chose what he wants to store there carefully because all the power of the cloud depends on these two fields. Windows Azure splits tables in different partitions and stores them on different nodes, using the PartitionKey field. For redundancy and speed, each partition has 3 copies in the cloud. All partitions and copies of them are managed transparently by the cloud. The user doesn’t have to know of them are there. His only control over this process is the choice of PartitionKey. Entities with the same PartitionKey will belong to the same partition and be stored on the same node. The fastest possible query is the one that only searches in a single partition, so this choice depends from application to application. One has to ask: what is the most frequent query my application will do? In the case of ciripescu.ro, which is a microblogging application, our answer layed of the blog profile of each user. We decided that displaying all messages of a single user has to be the fastest query, so we chose the sender’s username as PartitionKey. In the example above messages from pitagora, can be stored on a different node then those from tudor. The order in which they are listed in the table also suggests that. Windows Azure sorted those messages by PartitionKey and RowKey.  

The second field, RowKey, is similar to the primary key in a SQL Table. Entities are index by it, and the fastest query done in a partition is one where the PartitionKey is the search criteria. In the context of Windows Azure, the Rowkey has another very important function: sorting. You have to remember that Azure Tables is not a relation database. It’s just of storage of entities. That means that you can’t run complex queries that involve counting, grouping and sorting. The entities are sorted ascendant by RowKey, and there is no way to change that. So in the case of the Cirip table, we want the latest Cirip to be displayed first. We chose DateTime.Max – DateTime.Now as a RowKey. The reason why I didn’t use real values in the example is because these are 11 digit numbers. The only thing I kept in the example is their order: the lastest message will always have the smallest RowKey.  

Using a non-relational database has some disadvantages too. Besides the lack of sorting and grouping capabilities sometimes you feel that you need relational entities. Because you will write more code at DAL layer. You can’t build :  

Select c from Cirip c, User u, Urmarire urm Where c.PartitionKey = u.PartitionKey and  

u.RowKey = f.RowKey and urm.PartitionKey = ‘current user’  

Anyway why Azure Tables and not SQL Azure? Because in our case is faster when searching. Also at the beginnings of Windows Azure there was not an SQL Azure. There was some kind of storage but not-relational. After some months and based on the feedback from developers Microsoft announced SQL Azure(summer 2009) which officially was released in the fall of 2009.

Worker roles and queues exposed here.

Advertisements

1 Comment »

  1. […] This post was mentioned on Twitter by tweetcloud and Azure Cloud , Azure Magic. Azure Magic said: Behind ciripescu.ro, a Windows Azure cloud application (part 2 … http://bit.ly/6vO7Qi […]

    Pingback by Tweets that mention Behind ciripescu.ro, a Windows Azure cloud application (part 2) « Let’s rokk! [Tudor Cret's blog] -- Topsy.com — January 26, 2010 @ 9:34 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: