How does the Dynamimcs NAV Server perform?
With the delivery of the 3-tier architecture in Microsoft Dynamics NAV then it has opened up the possibility of developing applications utilising the NAV 2009 middle tier web services. Performance testing the middle tier from the Role Tailored Client (RTC) is difficult to perform, but testing the performance of the middle tier web services is much easier.
Recently we were tasked with completing some work for an existing client deployed with NAV 2009 to allow 450 users to connect to some specific functionality in NAV. They didn't want to purchase full NAV RTC users so instead we opted to develop a web based solution where the per user cost would be a lot cheaper. Perfect! This is exactly what the middle tier allows us to do. From here on in this article I may refer to the middle tier services as the NAV Service Tier or NST.
Let the learning curve commence!
3 tiers on 3 computers
First off we decided early on that there is no way this solution could be installed with the NST's on the SQL Server box. So we were going to dive in to the solution having the tiers being spanned across different servers. Early on we decided we'd need the following (although at this stage we hadn't decided how many of each we'd need);
- Client machines connecting to NAV through the Web Application and designed for Internet Explorer.
- Web Server (IIS) hosting our web application
- Middle tier server hosting our NST instances.
- SQL Server
Service Principal Names
Challenge number one (and there were many!). Service Principal Names (SPN's). Now for most classic NAV installers this is new territory, and to be honest the NAV documentation doesn't explain SPN's that well. However there is plenty of information out there explaining what they are and why they need to be setup. Our install also required some additional SPN's to be setup because we had a web server sitting in front of the NST layer.
Next decision we made was to run the different services under dedicated service accounts. That included the NST instances and the SQL Server service. Ok, so everything setup we tried to test by calling the base "services" web service in NAV to see what we could get returned from the system service.
Challenge number two. We are geting "403 unauthorised". Oh no...! Ok, so a lot of Wireshark and Kerbtray use later we realised that we had a duplicate SPN setup somewhere. Unfortunately if you look at the message returned at the network layer, you get the same error message returned if you don't have an SPN setup or you have duplicates. This isn't great as I can imagine a lot of NAV installers would try and add an SPN on one layer, then another, then take something away, and before they know... It's in a right mess. Eventually we got this working, and with a little tweaking for Fully Qualified Domain names, then all was up and running.
The Web Server
Next we needed to look at how we would install and setup the web server on IIS. For the purposes of our tests we had one web server with a quad core processor available and 4GB of RAM on Windows Server 2003 - 32 bit. We decided to install a web farm with 4 worker processes. One for each core. We allowed IIS to handle the State Management on the Web Server. The site was setup to not allow Impersonation and to use Windows Authentication (important if we're going to get Kerberos to work!)
The Middle Tier Server
The middle tier server also had a quad core processor, 4GB of RAM on Windows Server 2003 - 32 bit. For the purposes of the tests we installed 10 instances of the NST's. They were simply called DynamicsNAV1 through to DynamicsNAV10.
Load Balancing the NST Middle Tier
Challenge number 3 - or is this 4... I'd lost count... As we all know, automatically load balancing the middle tier is not possible through the RTC, and more on that will follow in a separate blog post... However, it is possible to do this from a web app. We built the load balancing in our web app and it was based on a simple round robin fashion. We had a web.config where we coud setup the generic web service addresses as well as the number of NST instances available. However, there is also no failover of the NST's in the middle tier either (and a blog post on that will follow shortly too) and we needed to ensure we had that working for sign off. So how did we achieve it? Quite simply one of our developers decided to make HTTP requests routinely to the middle tier services and analyse what their response is. This way we can keep track of which ones are running and shift users around accordingly in the data layer of our web app.
Before details of the test - then a few more facts about the web app. The web app was built using .NET Framwork 3.5. The web pages were ASPX based. The sites utilises AJAX and Java quite heavily. Certain "look up" data is cached and serialised on the web server for increased performance.
The test performed was of a "worst case" scenario. We wanted to get comfort around large loads of both our web server and the NST's. The test simulated a user entering a job journal line and submitting that to the database. The test involved performing a number of lookups and maximising the lookup data being retrieved. The test scenario simulated the user closing down the browser after the successful insertion of the job journal line.This ensured that we were fully loading the web server a lot more than it would be loaded in the real world scenario.
We were licensed to test a maximum of 250 concurrent users with our performance testing software.
The test was ran for 10 minutes and up to 215 concurrent users were tested. The load was generated from 3 client machines running the test case scenario detailed above. The baseline test size of the data involved in the test scenario was 548.8KB. Random "think time" was used during test case scenario generation. Over the 10 minute testing cycle then 2147 test case scenarios were completed - a lot of job journal lines entered in 10 minutes!
A screen shot showing a summary of the test results is shown below.
There were 7 tests performed covering;
- Page Completion Rate
- Page Duration
- Transaction (URL) Completion Rate
- Page Failures
- Bandwidth Consumption
- Test scenarios per minute
- Waiting users
Of these 7 goals analysed then all 7 goals were passed. The graphs produced (shown below), scaled linearly which implies that limitations are likely to be hardware based rather than the software itself.
The page duration goal was set to an average page duration of less than 4,000ms.
There were no page failures during the test.
The average page duration was well below the 4,000ms limit set. There were some spikes as shown below. Also as the number of users increased then we did see some waiting users as expected, however the results were reasonably encouraging.
The Page Completion Rate chart shows the total number of pages completed per second during the sample periods summarised for each user level. In a well-performing system, this number should scale linearly with the applied load (number of users).
The Transaction (URL) Completion Rate chart shows the total number of HTTP transactions (URLs) completed per second during the sample periods summarised for each user level. In a well-performing system, this number should scale linearly with the applied load (number of users).
The Bandwidth chart shows the total bandwidth consumed by traffic generated directly by the load test engines during the sample periods summarised for each user level. In a system that is not constrained by bandwidth, this number should scale linearly with the applied load (number of users). Note that other sources of bandwidth may be active during the test and may even be caused indirectly by the load test but may not be included in this metric.
The Testcase Completion Rate chart shows the total number of testcases completed per minute during the sample periods summarised for each user level. In a well-performing system, this number should scale linearly with the applied load (number of users).
The Waiting Users and Average Wait Time metrics help diagnose certain types of performance problems. For example, they can help determine what pages users have stopped on when a server becomes non-responsive. The 'Waiting Users' metric counts the number of users waiting to complete a web page at the end of the sample periods summarised for each user level. The 'Average Wait Time' describes the amount of time, on average, that each of those users has been waiting to complete the page.
Sorted by the elapsed test time, this table shows some of the key metrics that reflect the performance of the test as a whole. Note times are in milliseconds and bandwidth in bytes...
|Users||Pages per sec||Hits per sec||Bandwidth Out||Min Page Dur||Avg Page Dur||Max Page Dur||Waiting Users||Average Wait Time|
Apart from the conclusions already drawn above, then it is worth noting some additional observations.
Realistically we found that allocating about 20 users per NST is a realistic figure for performance purposes.
Each NST consumed about 50MB of RAM each.
Consumption of RAM and processing power on our SQL Server box was not an issue.
IIS was not fully utilised at about 60% of the total CPU available.
IIS was likely to fall over given scaled testing results before the NST's would. Clearly scaling of the NST's further would have been through installation of another instance and changing the web.config.
We performed similar tests virtualising the IIS layer and the NST layer and noticed some performance drops. Testing further showed that this drop was probably due to virtualisation of the NST layer as opposed to the IIS layer.
Given the figures generated in our test we were able to scale to support 450 users. The eventual setup ended up being hosted with the web server and NST layer being virtualised.
Some Points to Remember
- Read up and learn about SPN's.
- Get familiar with Wireshark and Kerbtray.
- Use Fully Qualified Domain Names on the SPN's.
- If developing something like a web app then try and get everything working properly first with the RTC connecting from a client PC. Then you can tell if the issue is in your setup or in your external app.
- Have a Domain Admin present to assist you. Best not to go playing around on the client's domain!
- Go 64-bit if you can.
- Always use the latest releases of the NST's if you can get them from Microsoft as there may be a number of performance improvements on going all the time.
- Forget about trying to do this on NAV2009 if you aren't using SP1. We noticed significant performance increases between the non-SP1 service tiers and the SP1 equivalents.
- If setting up a large install of the RTC then there is no automatic load balancing or automatic fail over of the NST's.
- If your server allows it then install more NST's than realistically needed.
- Ensure you have a suitable fail over position in place should one NST fail - watch this space for a future blog post on this subject...
- Be prepared for a few headaches as the 3 tier technology that is being made available now in Microsoft Dynamics NAV is still quite new.
I hope you have found this post to be helpful and of some use. If you have read all the way to here then THANK YOU! as it did take me some time to put this blog post together...