Working with the OData URI Conventions

Following on from my resource output format options post in my OData series. I thought I would briefly cover the topic of URI Conventions I had previously eluded to.

URI Conventions for OData allow for a simple standard way to control the resource that is returned. The documentation for these conventions can be found here on the OData.org site.

The first important choice being the result format. In my previous post I discussed the options of AtomPub and JSON, the way you go about making your format choice for the returned resources is via the $format option. If your choice is JSON then simply suffix query option $format=JSON, for example on the NetFlix OData service:

http://odata.netflix.com/Catalog/Titles?$format=JSON

As a quick interjection here for another convention; $callback is a handy feature to specify the name of a JavaScript function that will be called when the data returns, it’s even easy to use for an anonymous callback method by using “?” so $callback=?. I’ll go more into detail about this when covering my implementation in a future post.

In my experience with a few services the AtomPub choice is the default and the $format tag is not required, and in some cases (NetFlix in particular) $format=atom is not a valid input query. I’ll need to investigate this more, it seems like an oversight on at least the NetFlix service, I don’t see an issue with being more verbose even for a default value.

I also briefly mentioned deferred content and the use of $expand to force eager-loading of the result tree structure. This currently doesn’t seem to work on the NetFlix service, so this is yet another thing that will require some more investigation. It’s quite possible that the feature is disabled to prevent a type of attack (DoS) or just to prevent general abuse and waste of bandwidth/processing.

A simple convention is the $orderby option. It’s use is simple too and allows for chaining of various ascending and descending choices simply separated by commas.

http://odata.netflix.com/Catalog/Titles?$orderby=Runtime

and combined:

http://odata.netflix.com/Catalog/Titles?$orderby=ReleaseYear,Runtime desc,Name asc

I believe the browser will replace the spaces with the ‘%20‘ representation. If you make a mistake in the query (i.e. supplying an invalid order field) the error will look like:

No property ‘Title’ exists in type ‘System.Nullable`1

The last three I want to cover are; $filter, $top and $skip. They are related as they simply filter the returned resource. Skip and Top are quite simple and operate as you would expect (just like in LINQ).

Filter on the other hand is more advanced; it has all the logic operator options (==, !=, >=, and so on), all the arithmetic operators (add, sub, div, mult, mod) and precedence grouping (brackets). For the exact syntax and all the options see the specification.

I’ll just give a simple example that can be clicked through:

http://odata.netflix.com/Catalog/Titles?$filter=Runtime gt 3 and Runtime lt 90

I did say last, but as a bonus there’s a nice data reduction option to return only the selected called $select. Simply suffix an &$select=Name,Runtime set of params on any of the above queries and see the returned resource simplified to only the “data you want”.

For the lazy to modify here’s a click-able example:

http://odata.netflix.com/Catalog/Titles?$filter=Runtime gt 3 and Runtime lt 90&$select=Runtime,Name

NOTE: OData queries are case sensitive.

OData, AtomPub and JSON

Continuing my mini-series of looking into OData I thought I would cover off the basic structure of AtomPub and JSON. They are both formats that OData can deliver the requested resources (a collection of entities; e.g. products or customers).

For the most part there isn’t much difference in terms of data volume returned by AtomPub vs JSON, tho AtomPub being XML, is slightly more verbose (tags and closing tags) and referencing namespaces via xmlns. A plus for AtomPub for your OData service is ability to define the datatype as you’ll see below via m:type the example being an integer Edm.Int32. Whereas the lack of such features is a plus in a different way for JSON – it’s simpler, and a language such as JavaScript interprets the values of basic types (string, int, bool, array, etc).

I’m not attempting to promote one over the other, just saying that each can serve a purpose. If you’re after posts that discuss this is a more critical fashion, have a look at this post by Joe Gregorio.

What I do aim to show is that comparing the two side by side there’s only a slight difference, and based on what you’re intending to accomplish with processing said data the choice for format is up to you. If you’re just re-purposing some data on a web interface JSON would be a suitable choice. If you’re processing the data within another service first, making use of XDocument (C#.NET) would seem suitable.

There’s also a concept of ‘Deferred Content’ for both formats and it is achieved in a similar way through links. The objective being to conserve resources in processing and transmission by not transmitting the entire element tree on a request. In the comparisons below where there is a link to another URI that is content that has not been returned, the most obvious example is image data i.e. links to jpeg resrouces. OData has a URI command option called $expand that can force the inline return of the element data (this concept is called eager-loading). Have a look at my introductory post about the OData query options.

NOTE: In the examples that follow the returned result data is from the NetFlix OData service, I have stripped out some of the xmlns, and shortened/modified the urls in particular omitting http:// just so it fits better (less line wrapping).

So let us compare…

AtomPub
Yes that stuff that’s makes up web feeds.

Example from the NetFlix OData feed access via URL http://odata.netflix.com/Catalog/Titles

Atom Feed

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<feed allThatOtherNameSpaceStuff="">
  <title type="text">Titles</title>
  <id>http://odata.netflix.com/Catalog/Titles/</id>
  <entry m:etag="abced">
    <id>http://odata.netflix.com/Catalog/Titles('movieName')</id>
    <title type="text">FullMovieTitle</title>
    <summary type="html">Your everyday regular movie</summary>
    <allTheOtherTags type="text">...</allTheOtherTags>
    <m:properties xmlns:m="severalNameSpaces">
      <d:Id>movieName</d:Id>
      <d:Synopsis>Your everyday regular movie</d:Synopsis>
      <d:Runtime m:type="Edm.Int32">3600</d:Runtime>
      <d:BoxArt m:type="NetflixModel.BoxArt">
          <d:SmallUrl>http://c.dn/boxshots/m1bx.jpg</d:SmallUrl>
      </d:BoxArt>
    </m:properties>
  </entry>
</feed>

JSON
Yes that simple text used in JavaScript.

Example from the NetFlix OData feed access via URL http://odata.netflix.com/Catalog/Titles?$format=JSON

Javascript Object Notation

{ 
  "d" : 
  { 
    "results": [ { 
      "__metadata": { 
        "uri": "o.ntf.lx/Ctlog/Titles('movieName')", 
        "etag": "abcdef", 
        "type": "NetflixModel.Title", 
        "edit_media": "o.ntf.lx/Ctlog/Titles('mvName')/$value", 
        "media_src": "c.dn/boxshots/large/mnbx.jpg", 
        "content_type": "image/jpeg", 
      },
      "Id": "movieName", 
      "Synopsis": "Your everyday regular movie"
      "Runtime": 3600
      "BoxArt": { 
        "__metadata": { 
            "type": "NetflixModel.BoxArt" }, 
            "SmallUrl": "http://c.dn/boxshots/m1bx.jpg"
        }
    } ]
  }
}

PLINQ “Grok Talk” at Developer Developer Developer Melbourne

I did a very quick and choc-full of ramblings talk summarising Parallel LINQ (PLINQ) at the weekends Developer Developer Developer Melbourne.

First up, DDD Melbourne was great, thanks to all the sponsors (NAB, Readify, DevExpress, Pluralsight, JetBrains, Redgate), the presenters and key organisers Alex, Mahesh and others.

The message I wanted to get across was have a look at the Parallel Extensions in the Task Parallel Library of .NET, it can help speed up a few longer running tasks that might exist in your application and that it’s easy. Check out the parallel extension teams MSDN blog for the latest stuff.

The intent of this quick post is to clarify what I was rambling on about, and to offer some links to old posts, my PowerPoint slides that would have made my talk go a little smoother.

*Note: This is in fact demo-ware just to perform PLINQ benchmarks.

Contractual Obligations

Last night Wednesday 2nd of December I attended a presentation at the Melbourne Patterns & Practices User Group. After the Gang of Four pattern discussion (which was Chain of Responsibility) was a presentation on .NET Code Contracts.

Code Contracts are a Microsoft Labs Research project, that now has a beta release.

Code Contracts provide a language-agnostic way to express coding assumptions in .NET programs. The contracts take the form of preconditions, postconditions, and object invariants.

At their simplest level of application in a code base Code Contracts will help group guard conditions for functions, and also easily support exit guard conditions when a public method completes.

public List<Markers> ExtractGeneticMarkers(List<BioSample> samples)
{
   //Pre conditions use: 
   Contract.Requires(samples != null);
   //a lambda expression to perform the contract check on all elements
   Contract.Requires(samples.All(s => s.geneticData != null));

   //Post conditions use:
   Contract.Ensures(Contract.Result<Markers>() != null)

  var markers = new List<Makers>(); 
  //function logic
  return makers;
}

A summary of some of the benefits:

  • Compile time contract validation and error(/warning) output.
  • Runtime contract validation and exceptions thrown.
  • Toggling the contracts per assembly [Full / Pre & Post / Pre / ReleaseRequires / None].
  • Inheritance of contracts, even from interfaces.
  • Outputting documentation from the contracts, for accurate reflection of the state of the code.

It’s quite an extensive discussion for all it’s potential applications and what can be achieved, so for More Info check out some other posts on the topic too:

LINQ Basics

As part of preparation work I’m doing for a presentation to the Melbourne’s Patterns & Practices group in October on LINQ and PLINQ. I thought I would cover off the basics of LINQ in .NET 3.5 and 3.5 SP1 in this post. The changes in .NET 4.0 in the next post, and then a discussion about PLINQ in a third post.

O.k. so let’s sum up the basics. The core concept of LINQ is to easily query any kind of data you have access to in a type-safe fashion. Be it SQL Server stored, a collection (i.e. something implementing IEnumerable) or an xml data structure. A further addition to this power is the ability through C# 3.0 to create projections of new anonymous structural types on the fly. See the first code sample below for the projection; in that simple examples it’s the creation of an anonymous type that has 2 attributes.

Continuing on with my “medical theme” used for the WCF posts, here is a simple schema layout of the system, consisting of a patient and a medical treatment/procedure hierarchy. This is given the title of ‘MedicalDataContext’ to be used in our LINQ queries.

Medical System Basic Schema

Medical System Basic Schema

These items have been dragged onto a new ‘LINQ to SQL Classes’ diagram from the Server Explorer > Data Connections view of a database.

Server Explorer Window

Server Explorer Window

To create the ‘LINQ to SQL Classes’ diagram simply add …

Add New Item

Add New Item


a new …
Linq to SQL Classes

Linq to SQL Classes

Back to the logic. We have a Patient who’s undergoing a certain type of treatment, and that treatment has associated procedures. To obtain a collection of procedures for today, and the name of the patient who will be attending we simply build up a new query as such:

var db = new MedicalDataContext();

var sched = from p in db.Procedures
            where p.Scheduled == DateTime.Today
            select new {
                     p.ProcedureType.ProcedureTypeName,
                     p.Treatment.Patient.FullName
                    };

Note: The patient table structure doesn’t have a field called ‘FullName’ I make use of a partial class extending its properties to add a read-only representation, check out this post by Chris Sainty for more info on making use of partial classes with LINQ.

At this point we can now iterate over each item in our ‘sched’ (scheduled procedures) collection.

foreach (var procedure in sched)
{
   //process/display/etc
}

This brings me to another key point ‘Delayed Execution’ or (‘Deferred Execution’) check out Derik Whittaker’s: Linq and Delayed execution blog post for a more detailed walk through.

Basically the query we defined earlier is only a representation of the possible result set. Therefore when you first make a call to operate on the variable representing the query results, that’s when execution will occur.

So it becomes a program flow decision whether to always execute the query live when it’s being processed (i.e. most up-to-date data) or to force a single execution then make use of that data in the current process flow. A forced execution can be easily achieved several ways, the simplest choice is to just create a new list object via ToList() to execute the fetching of the data.

var allProcedures = todaysProcedures.ToList();

So far this has all revolved around accessing data in a SQL Server Database (the setup of a LINQ to SQL class). LINQ’s purpose is to be able to query any form of collection of data.

Now let’s say we wanted to obtain some information through recursion of a class using reflection.

Note: a business case for the use of reflection is often tied very deeply into some limitation, special case, etc. So a more specific example would take us well away from the topic of LINQ. So will keep this example trivial, this could just as easily manipulate a more useful class object to interrogate it.

var staticMethods = 
    from m in typeof(string).GetMethods()
    where !m.IsStatic
    order by m.Name
    group m by m.Name into g
    select new 
      { 
         Method = g.Key, 
         Overloads = g.Group.Count()
      };

Which output element by element will generate this output:

  { Method = Clone, Overloads = 1 }
  { Method = CompareTo, Overloads = 2 }
  { Method = Contains, Overloads = 1 }
  { Method = CopyTo, Overloads = 1 }
  { Method = EndsWith, Overloads = 3 }
  . . .

For an off-topic discussion, check out this article by Vance Morrison post titled Drilling into .NET Runtime microbenchmarks: ‘typeof’ optimizations, as a discussion of the use of ‘typeof’ in my above query.

For a more typical business case example using a LINQ statement to fetch a collection of controls and operate on them (in this case disable them). Here we’re operating on a Control collection.

Panel rootControl;

private void DisableVisibleButtons(Control root)
{
   var controls = from r in root.Controls
                       .OfType()
                   where r.Visible
                   select r;

    foreach(var c in controls)
    {
        if(c is Button) c.Enabled = false;
        DisableVisibleButtons(c);  //recursive call
    }
}

//kick off the recursion:
DisableVisibleButtons(rootControl);

Summary:

  • LINQ to SQL generates classes that map directly to your database.
  • LINQ helps you manipulate strongly typed results (including intellisense support).
  • “LINQ to SQL” is basically “LINQ to SQL Server” as that is the only connection type it supports.
  • There is a large set of extension methods out of the box check out for samples: 101 LINQ Samples.

Adventures in the land of SSIS

I had to whip up a solution to a data migration requirement and had no choice but to use SQL Server Integration Services (SSIS). It is marketed as fast and flexible tool for data extraction, no idea about the “fast”, it’s user interface and error/warning messages make using it far from flexible. A lot of the time I found myself in a battle to achieve the simplest task and not being supported by the tool. I admit that this is because I have no prior experience with any of the data control objects. What made matters was the interface wasn’t very helpful in the names of controls or descriptions of tool-tips. Note this is my experience with Visual Studio 2005 and SSIS, it may have improved in VS 2008 or the upcoming VS 2010.

I had 2 objectives to achieve: join data from 2 tables, and the use of the last generated ID for a subsequent query. It appears the latter was not even considered in the design of the tool. You would think that a data writing control would have more outputs than just exceptions.

Having “successfully” met the basic requirements of “migrating data” I thought I’d share the approach I took, it may not be the optimal approach, but it works, and in this scenario performance isn’t a concern.

The data being merged from one location (example: a legacy system) to a new system with a different data representation model. I’ve put in the context of the “Medical System” which is the theme of my posts. In this post I introduce a concept of related patients. Simply put a patient can be related to another patient in the system, examples of relationship types are ‘single’, ‘couple’, ‘sibling’, etc. There are other representation complexities here, but are not relevant to the post or SSIS discussion. The ER model is as follows:

Basic structure Many-to-Many table structure

Basic structure Many-Many table structure

As a requirement at the point of the data merge, every patient must be created with a default ‘single’ relationship entry. This is where SSIS doesn’t support it easily. Based on a requirement of maintaining existing patient ID’s as part of the merge and identity insert is performed by SSIS into the patient table. Then a new ‘single’ relationship type record must be created in the relationship table. Next the non-SSIS-supported task to create a new entry in the linking table (PatientRelationship) using the newly created ID of the single relationship record. This leads to the need for the use of the inbuilt database function SCOPE_IDENTITY() or it’s alternatives such as @@IDENTITY. I could not find a supported approach in SSIS to obtain this value via the output without the use of a stored procedure.

At this point all the material I found online was to make use of a SQL Stored Procedure with an OUTPUT parameter to obtain the value directly from an insert statement. This is fine if you need to make use of it back in SSIS. But in this case all that was required was a follow up insert statement. So I embedded the initial insert and the subsequent statement in 1 stored procedure, taking the PatientID of the record currently being processed by the SSIS package as the input:

CREATE PROCEDURE dbo.SetupRelationships
(   @PatientId int   )
AS
BEGIN
   INSERT INTO dbo.Relationship 
   (
      RelationshipType
   ) 
   VALUES 
   (   1   ) --Note: the ID 1 is the Foreign Key for the type 'Single'

   INSERT INTO dbo.PatientRelationship 
   (
      PatientId, RelationshipId
   )
   VALUES 
   (  @PatientId, SCOPE_IDENTITY() )
END

As a quick side note – I asked a question on Stack Overflow about mapping hard coded values inside SSIS the answer was to use a “Derived Columns” column, here is the stack overflow question and answer that has the tips for data formatting. Another option was to create default values on the database schema that housed the source data for the migration.

Once the stored procedure was created making use of it in SSIS required another “Data Flow Task” and inside that task using an “OLE DB Command” to call the procedure via

EXEC dbo.SetupRelationships ?

the question mark represents a parameter, if you had a procedure taking 3 input parameters and 1 output parameters it would look like this:

EXEC dbo.AnotherProc ?, ?, ?, ? OUTPUT

The SSIS “Data Flow Task” now looks like this:

SSIS Data Flow Task

SSIS Data Flow Task

With the Advanced Editor properties dialog looking like this (click on image to see the full sized screen shot):

OLEDB command setup

OLEDB command setup

The final step is now to create the column mapping to supply the Patient ID into the stored procedure on Column Mapping tab (again click for larger image):

OLEDB Column Mapping

OLEDB Column Mapping

That was it, “Execute Package” and the data would migrate meeting our requirements.

Please form a queue, for poison.

In the previous post about using MSMQ to facilitate One-Way-Calls I covered some of the basics of setting up a MSMQ binding. In that scenario if a consumer of that end point sent a malformed message or the message became corrupted in transmission, the service would either discard it and there would be no feedback on the problem or far worse the service may attempt to process the poison message again and again getting stuck in a loop. Such a processing loop would prevent a service from continuing normal operation of being able to handling subsequent messages. MSMQ 3.0 under Windows XP and Windows Server 2003 has only the ability to retry a failed message once, before it either simply drops the message or faults the channel. A message that is un-processable in this fashion is referred to as a poison message. The wise choice is therefore to use MSMQ 4.0 under Windows Server 2008/Vista/Windows 7 which offers more suitable alternatives.

One alternative I would like to discuss here is using a poison queue. The basic concept is; the primary (msmq based) service which has an “important function” (unlike our unimportant notification service in the previous msmq post) has an associated poison queue. Messages are moved into this queue when they are determined to in-fact be poison. These messages can then be dealt with by a separate service or just human monitoring.

A setup of a poison queue is achieved by exposing a second service and endpoint binding with an identical address but with a ;poison suffix (as well as making use of a different bindingConfiguration):

<services>
   <service name = "MyService">
      <endpoint
         address  = "net.msmq://localhost/private/ImportantQueue" 
         binding  = "netMsmqBinding"
         bindingConfiguration = "importantMsgHandling"
         contract = "IImportantService" 
      />
      <endpoint
         address  = "net.msmq://localhost/private/ImportantQueue;poison"
         binding  = "netMsmqBinding"
         contract = "IImportantService"
      />
   </service>

   <bindings>
      <netMsmqBinding>
         <binding name = "importantMsgHandling"
            maxRetryCycles= "2" 
            receiveRetryCount = "3"
            receiveErrorHandling="Move"
            retryCycleDelay = "00:00:10">
         </binding>
      </netMsmqBinding>
   </bindings>
</services>

I hit a few hiccups in this run through so I’ve added a trouble shooting section at the bottom of the post.

The other key configuration features to note on this configuration setup are (and illustrated below):

  • receiveErrorHandling – with the options of: Fault, Drop, Reject and Move. With our chosen option of move meaning once the error handling process has completed it will be moved to our poison queue
  • receiveRetryCount – number of immediate attempts to process the message.
  • maxRetryCycles – number of subsequent attempts to process message.
  • retryCycleDelay – time between retry cycles.
Poison Queue Message Processing

Poison Queue Message Processing

Once our poison message has failed to be processed, it is shifted to the “poison” sub-queue as shown in this Computer Management screen shot:

Computer Management - Poison Queue

Computer Management - Poison Queue

At this point we can do a few things, from the simplest option of having an admin user review the messages in this poison queue, to a more sophisticated approach of having a separate service attempt to process these poison messages with some more sophisticated logic. Juval Lowy (a while ago now) has published an MSDN Magazine article on an even more sophisticated error handling approach for dealing with a poison messages dubbed a “Response Service”. In essence a (potentially) disconnect client is given the ability to receive feedback via a separate queue. Message responses are generated based on the original flawed message. I plan to implement this approach if at some point the medical demo application I started a few weeks ago warrants it.

Troubleshooting:
On a side note under Windows 7 and Vista the user you are logged in as will not have access to listen for messages on a given port. You’ll quickly see the AddressAccessDeniedException:
HTTP could not register URL http://+:8000/. Your process does not have access rights to this namespace (see http://go.microsoft.com/fwlink/?LinkId=70353 for details)

AddressAccessDeniedxception

AddressAccessDeniedxception


To resolve this simply grant yourself permission to access the port via an administrator command prompt call to netsh.exe

netsh http add urlacl url=http://+:8000/ user="DOMAIN\User Name"

Note: “Domain” can be the machine name if you’re not on a domain. Add a few other ports you might be using too, i.e 8005 for the ServiceModelEx logbook service. For more info refer to this blog post, which has a detailed explanation and in the comments there is discussion of issues on other systems such as Windows Server 2003.