You are here:   Home
Jun
18
2010

Creating a Cluster Resource DLL (Part 2)

Hi,

In this series of blog posts we will help you to design, develop and debug the Resource DLL you are developing to give your application high-availability with Windows Server 2008 & 2008 R2 Failover Clustering.

 

We recommend you start with the first blog post in the series:

·         Part 1: http://blogs.msdn.com/clustering/archive/2010/03/11/9976620.aspx

 

Now that we’ve described which components are involved, and how they interact, let’s see what the resource state machine looks like from the point of view of RHS.

 

RHS interacts to the Resource DLL using the Resource DLL entry points. You can find detailed information about them on MSDN:  

http://msdn.microsoft.com/en-us/library/aa372244(VS.85).aspx

 

The list below shows all the entry points that a resource DLL is expected to implement.  Think of this as a mapping between common cluster operations, and how they specifically interact with this new resource type.  After the Resource DLL is loaded, RHS will call Startup to provide to the Resource DLL pointers to some of its functions, and to get from the Resource DLL pointers to the resource’s functions.

·         Close removes a resource instance from the cluster

·         IsAlive determines if a resource is actually operational

·         LooksAlive determines if a resource appears to be available for use

·         Offline performs a graceful shutdown of the resource

·         Online starts the resource and makes it available to the cluster

·         Open creates a new resource instance

·         ResourceControl supports resource control codes

·         ResourceTypeControl supports resource type control codes

·         Startup receives the LogEvent and SetResourceStatus callbacks and returns a function table

·         Terminate performs an immediate shutdown of the resource

 

Please note that ResourceControl and ResourceTypeControl are optional.  If user does not provide them, it only means this resource and resource type will not have private properties.

 

Open is called to notify the Resource DLL that a new instance of this resource has been created. After Open was called, the Resource DLL should be ready to receive ResourceTypeControls.  Close is called to notify the Resource DLL that the resource instance is being deleted.

 

At this point the Resource DLL and RHS are set up to start handling resources.  For every resource that will be handled in the context of the given RHS process, the appropriate Resource DLL will get an Open call. In the Open call the Resource DLL should allocate the context to keep information about this instance of the resource.  If Open completes successfully, then the resource is in the Opened state, Offline sub-state.

 

When cluster stops RCM and RHS will make sure to bring the resource to offline state, and to call Close for that resource so the Resource DLL can perform clean-up before it re-Open it in the new monitor.

 

In this post we will not discuss the calls that are related to the arbitration of the quorum resources – Arbitrate, and Release.  Instead we will concentrate on the calls Online, Offline, Terminate, IsAlive, LooksAlive and ResourceControl.

 

From the RHS perspective, while in the opened state, a resource can be in the Offline, Onlining, Online or Offlining sub-states. Onlining and Offlining have sub-states, and resource will be in one of the sub-states.  Transitions to the states are performed as RHS makes calls to the Resource DLL, and based on the calls results.  Please remember that in this post we are talking about states in the RHS.  RHS states are internal to the cluster implementation, and are not exposed to the user using UI or APIs.  RHS states are different from RCM states, and you can see the RCM states from the UI or programmatic interfaces, such as “offline”.

 

We can split the calls into several categories based on how RHS and Resource DLL handles them.

·         Online and Offline are more complicated because they have a concept of being in “pending” state, and the way RHS handles timeouts of these calls is different

·         ResourceControl can be sent to the resource in any sub-states of the Opened state

·         Terminate can be issued concurrently with any other call, except Open and Close

·         LooksAlive and IsAlive can be sent to the resource only while they are Online

 

Below you can see a state chart diagram that shows RHS Resource state machine.  I would like to stress that this is the RHS’s state machine.  The Resource DLL state machine might be based on this state machine, but might have some other sub-states specific for the application.  To interpret this diagram, you need to be aware that this is a hierarchical state machine diagraph, also known as a UML State Machine, as described here: http://en.wikipedia.org/wiki/UML_state_machine.

 Creating a Cluster Resource DLL (Part 2)


 

In the next post in this series we will walk through several scenarios using the above state diagram.

 

Thanks,
Vladimir Petter
Senior Software Development Engineer
Clustering & High-Availability
Microsoft

Creating a Cluster Resource DLL (Part 2)
read full article at source
 
Jun
18
2010

Troubleshooting: Cluster UI Says a Server is Already Joined to a Cluster

Hi Cluster Fans,

 

In this post I will discuss a 2008 and 2008 R2 deployment issue which is sometimes seen, and how to get around it.  Let’s say you want to Validate a server, create a cluster or add a node to a cluster.  We have a requirement that a node can be a member of only 1 cluster at a time.  When you try this operation, you may see a message which tells you that one of the servers you want to use, a non-clustered server, is already a part of a cluster and so you cannot continue…why?  The message states "The computer '<Server Name>' is joined to a cluster."

 

Troubleshooting: Cluster UI Says a Server is Already Joined to a Cluster

 

 

The reason for this is that this node was at some point a cluster node, however when the cluster was destroyed or the node was evicted it did not clean up this node properly.  This could happen if the node was offline while the original cluster was destroyed, so other nodes were cleaned up, however this node was never ‘unclustered’.

 

 

If you try to connect to that node using Failover Cluster Manager you will see that it attempts to connect, then it will time out after about 5 minutes.

 

Troubleshooting: Cluster UI Says a Server is Already Joined to a Cluster

 

 

If you check the status of the node using PowerShell, it will report that the node is in a ‘Joining’ state.

 

Troubleshooting: Cluster UI Says a Server is Already Joined to a Cluster 

 

 

If you get into this situation, you can run the following eviction commands directly on the node to properly clean up the cluster components from that node so that you can reuse it in another cluster.

 

                PowerShell (R2 only):

   PS> Get-ClusterNode NodeName | Remove-ClusterNode –Force

 

                Cluster.exe:

                                CMD> cluster.exe node –NodeName /force     

 

 

To avoid getting to this state, you should make sure that all of your nodes are online when you destroy the cluster or evict a node, so that the cluster can properly clean up the clustering components on every node.        

 

If you plan to evict a node which has resources on it, you should first gracefully move the resources off it to the best nodes (using Move Group or live migration for VMs).  If you forget, the cluster will still failover the resources for you, but it may not be in your preferred way.  

 

 

Thanks,

Symon Perriman

Program Manager II
Clustering & High-Availability

Microsoft

Troubleshooting: Cluster UI Says a Server is Already Joined to a Cluster
read full article at source
 
Jun
18
2010

Creating a Cluster Resource DLL (Part 1)

Hi,

In this series of blog posts we will review the Resource State Machine from the Resource Host Server (RHS or Resource Monitor) point of view.  Hopefully it will help you to design, develop and debug the Resource DLL you are developing to integrate your application with Windows Server 2008 & 2008 R2 Failover Clustering.

 

Introduction

Let’s first review what components are involved in the process of managing of the resource state and how they interact with other components.

If you need to integrate your application with Failover Clustering you have to write a Resource DLL.  Basically this will allow your custom resource to interact with the cluster and perform common functional tasks, like bringing the resource offline or online.  The Resource DLL is responsible for translating cluster resource state transition commands to commands specific for that application, and for bringing that application to a state corresponding to the current resource state.

The image below shows a 2-node cluster interacting with resources.

 

                Creating a Cluster Resource DLL (Part 1)

 

Both nodes have the Cluster Service running.  The cluster has one group configured, Group 1.  In the group we have three resources.  Resource 3 depends on Resource 2, and the Resource 2 depends on the Resource 1.  You will probably see this same relationship when you have a group with an Application (Resource 1) that depends on a NetName (Resource 2), which depends on an IP Address (Resource 3).  

 

Resource Control Manager (RCM)

In the Cluster Service there is a component called Resource Control Manager (RCM).  RCM instances on all the nodes will negotiate who owns the group.  The group owner will bring the group to its “persistent state”.   The persistent state is the last state user put the group to, which could be Online or Offline.  All other nodes that do not own this group will not bring this group online, to ensure it is online on at most one node at a time.  If the persistent state of the Group 1 is online then Node 1 will bring it online, and the Node 2 will make sure it is offline.  

To bring the group online, RCM will bring online all the resources in the group in the order of resource dependencies.  First RCM will send online command to the Resource 3.  As soon as Resource 3 is online RCM will send Online command to the Resource 2, and as soon as Resource 2 is online the RCM will send Online to the Resource 1. Once Resource 1 is online, the Group 1 is online.  This assumes that no failures have happened during the online process.

 

Resource Host Service (RHS)

Changing a resource state requires interacting with the actual application.  To achieve that, RCM will send state transition request to the Resource DLL.  Since the Resource DLL interacts with the application to perform its task it can experience some failures, such as a call taking too long or an exception. Failures of the Resource DLL such as exceptions or deadlocks should not bring the Cluster Service down.  To achieve that Cluster Service never loads the Resource DLL in its process.  Instead it spawns a child process – Resource Host Service (RHS or Resource Monitor). 

Normally RHS is shared among many resources, but if you see a flaky resource you can move it to a separate monitor using the resource properties so that it is isolated from stable resources.  In the image above you can see that the Resource 3 is isolated to a separate RHS process.

If your application requires multiple Resource DLLs then you can choose to create separate Resource DLLs for them or place them to the same Resource DLL.

For more information about RHS, see this blog post: http://blogs.msdn.com/clustering/archive/2009/06/27/9806160.aspx

 

Other Resources

You can find more information about the various cluster components in the following MSDN articles:

·         Cluster Resource Monitor: http://msdn.microsoft.com/en-us/library/aa372266(VS.85).aspx

·         Cluster Resource DLLs: http://msdn.microsoft.com/en-us/library/aa372239(VS.85).aspx

·         Resource DLL Entry Points: http://msdn.microsoft.com/en-us/library/aa372244(VS.85).aspx

·         Implementing Resource Health Monitoring: http://msdn.microsoft.com/en-us/library/aa372255(VS.85).aspx

·         Managing Resource State Transitions: http://msdn.microsoft.com/en-us/library/aa370988(VS.85).aspx

 

In our next blogs in the series we will discuss how these components interact with the cluster, describe the entry points and give some examples. 

Part 2 is now available: http://blogs.msdn.com/clustering/archive/2010/03/30/9987135.aspx

 

Thanks,
Vladimir Petter
Senior Software Development Engineer
Clustering & High-Availability
Microsoft

Creating a Cluster Resource DLL (Part 1)
read full article at source
 
Jun
18
2010

Wanted: A Good Cluster Developer to Support 2008 R2

Hi Cluster Fans,

 

Microsoft’s Windows Sustained Engineering (WinSE) organization has a job opening for a developer to support Failover Clustering and Network Load Balancing (NLB) technologies.  WinSE is Microsoft’s servicing branch which takes requests to fix issues after the product has been shipped, such as Hotfixes, QFEs and Service Packs.  The WinSE File Server Technologies team is looking for a developer to support our Windows Server 2008 R2 technology which has recently shipped.  This team also supports other technologies and this information can be found in the job posting.

 

Job ID: 708776

Link: https://careers.microsoft.com/JobDetails.aspx?ss=&pg=0&so=&rw=1&jid=13184&jlang=EN

 

If you or someone you know would be a good candidate, please follow the application instructions in the link to apply for this position.

 

Hopefully we will be working together soon!

 

Thanks,
Symon Perriman
Program Manager II

Clustering & High-Availability
Microsoft

Wanted: A Good Cluster Developer to Support 2008 R2
read full article at source
 
Jun
18
2010

Have a Passion for High-Availability? We are looking for new Cluster MVPs!

Hi Cluster Fans,

 

The MVP program is a way in which we thank active members in the Cluster and HA community for their contributions towards growing and supporting our products (more info).  Our MVP program focuses on the Failover Clustering and Network Load Balancing technologies.  We are looking to grow our MVP program by adding Cluster gurus from all regions of the world, in all positions and expertise, with focuses on a variety of sub-areas (like Storage with Clustering, SQL clustering, Hyper-V clustering, multi-site clustering, etc.).

 

MVPs are people who are genuinely interested in and passionate about Clustering and High Availability.  They go out of their way to promote HA, help others, and make clustering a primary consideration with IT deployments and planning.  They get feedback from customers about what works, issues they are finding, and how we can improve our product.  Essentially our MVPs are our lead customers – we run numerous ideas by them to get their feedback and let them explore our future products in detail under our NDA, giving them exposure and the ability to influence our technology in ways which the public cannot.  Additionally our MVPs get a direct channel to the product team to troubleshoot issues, suggest improvements, and find solutions to problems which you would have otherwise needed to call support for. 

 

This past week was the annual MVP Summit where Microsoft brings together technical experts from around the world to our Redmond campus for a week to deep technical discussions and product planning. 

 

There are currently 12 Cluster MVPs:

·         William Bressette, Canada, http://www.horn-it.com/

·         Andrew Cheng, Singapore, http://andrewchengnh.spaces.live.com

·         Allan Hirt, USA, http://sqlHA.com  

·         Russ Kaufmann, USA, http://www.clusterhelp.com, http://msmvps.com/blogs/clusterhelp/Default.aspx

·         Joachim Nasslander, Sweden, http://www.nullsession.com/

·         Nail Own, Germany, http://www.cluadmin.de/

·         John Savill, USA, http://www.SavillTech.com

·         Robert Smit, The Netherlands, http://fiberman.spaces.live.com

·         Michael Steineke, USA, http://bighammer.com/

·         John Toner, USA, http://msmvps.com/blogs/jtoner/

·         Edwin van Mierlo, Ireland, http://msmvps.com/blogs/edwinvanmierlo/

·         Hans Vredevoort, The Netherlands, http://hyper-v.nu

 

Have a Passion for High-Availability?  We are looking for new Cluster MVPs!

 

To become an MVP we would need to see continual community involvement from you over about a 12 month period.  Just to set your expectations, this is not something you get ahead of time with the promise that you will be active in the community, this is something you get rewarded after you have been active as this shows that you genuinely have a passion in high availability.

 

So, what do we consider ‘community interaction’?  Well there are numerous ways in which you can get involved, including, but not limited to:

  • Activity and helping people in the forums and newsgroups:

·         2008 Forum: http://forums.technet.microsoft.com/en-US/winserverClustering/threads/

·         2008 R2 Forum: http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/

·         2003/2008/R2 Newsgroup: https://www.microsoft.com/technet/community/newsgroups/dgbrowser/en-us/default.mspx?dg=microsoft.public.windows.server.clustering

  • A blog about the topic (you can see our MVPs’ blogs listed above)
  • Give training
  • Give presentations
  • Write articles, books, training guides, whitepapers
  • Anything else where you help others by sharing you expertise with the public to help build their clustering knowledge

 

If you are interested, and whenever you do something in the community (such as give a talk or training), please let me know and I can start tracking your contributions.  Simply send an email to   ClusMVP (at) Microsoft (dot) com   and I will add the information.  No contribution is too small!

 

Please let me know if you have any further questions and thanks for your interest.

 

Sincerely,

Symon Perriman
Program Manager II & MVP Lead

Clustering & High-Availability

Microsoft

Have a Passion for High-Availability?  We are looking for new Cluster MVPs!
read full article at source
 
Jun
18
2010

New Validation Tests in 2008 R2 Failover Clustering

Hi Cluster Fans,

 

Our Validate a Configuration Wizard was so popular that we’ve improved it in Windows Server 2008 R2.  Validate is the tool which verifies that your entire configuration is suitable for Failover Clustering.  It will test the servers, networks, storage, run a series of failover tests, and inventory all the configuration information into saved reports.  It can be run before, during or after deployment as a troubleshooting tool. 

 

 

Cluster Support

Running ‘Validate’ and making sure that no tests fail is one of only two requirements to have a supported Failover Cluster in Windows Server 2008 / R2.  The other is that each component has a Windows Server 2008 / R2 Logo.

 

More information about the Validate a Configuration Wizard: http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx

More information about the Logo Program: http://www.microsoft.com/whdc/winlogo/default.mspx

 

 

New Tests for Windows Server 2008 R2

There are 5 categories of tests in Windows Server 2008 R2.  The entire Cluster Configuration category is new, but this set of tests will only be run on clusters which have already been created.  It lists useful information about the current configuration which helps troubleshooters easily understand how it is deployed.  The image below is an example of some of the new, prescriptive information from Validate tests.

 

New Validation Tests in 2008 R2 Failover Clustering 

 

The Inventory and Storage categories run the same tests, with some minor changes.  The Network and System Configuration categories have a few additions.

 

In this image, the highlighted tests are new.

 

New Validation Tests in 2008 R2 Failover Clustering 

 

Cluster Configuration

       List * – Provides an overview of Core Group, Networks, Resources, Storage, Services and Applications.  It gives useful information about how the resources are configured and include graphical dependency reports.

       Validate Quorum Configuration – Checks if the quorum mode used is recommended, with the settings depending on the number of nodes and availability of storage similar to the “Configure Cluster Quorum Wizard”, in addition it checks the recommended values for quorum arbitration time

       Validate Resource Status - Validates that cluster resources are online, and list the cluster resources that are running in separate resource monitors.  If a resource is running in a separate resource monitor, it is usually because the resource failed and the Cluster service began running it in a separate resource monitor (to make it less likely to affect other resources if it fails again).

       Validate Service Principal Name - Issue a warning if the Service Principal Name cannot be found on a Kerberos enabled network names.  SPN verifies the identity of the computer to which it is connecting

       Validate Volume Consistency - If any volumes are flagged as inconsistent ("dirty"), it provide a reminder that running chkdsk is recommended.

 

Network

       List Network Binding Order - Lists the order in which networks are bound to each adapters on the nodes.

       Validate Multiple Subnet Properties - if it is determined to be a multi-subnet cluster, retrieve the settings for all network name resources and determine if the private properties for HostRecordTTL, RegisterAllProvidersIP and PublishPTRRecords are optimal for that configuration and validates that settings related to DNS are configured appropriately for clusters using multiple subnets.

 

System Configuration

·         Validate Cluster Service and Driver Settings – Validate startup settings used by services and drivers, including the Cluster service, CSVFilter.sys, NetFT.sys, and Clusdisk.sys.

·         Validate Memory Dump Settings - Validate that none of the nodes currently require a reboot (as part of a software update) and that each node is configured to capture a memory dump if it stops running.

·         Validate System Driver Variable - Validate that all nodes have the same value for the system drive environment variable, such as C:\

·         Validate Operating System Installation Options – The ‘Validate Operating System Installation Options’ test will check that all nodes are using the Core or the Full Installation option.  It is required that all nodes run the same installation option since not all roles and features are supported on Core, so workloads would not be able to failover to Core nodes if the role or feature is not available.

o   This replaced the ‘Validate Operating Systems’ tests which was no longer necessary since x86 architecture is no longer supported in Windows Server 2008 R2, and we now check that all nodes are x64 or ia64 when adding them to the list of servers to Validate. 

 

 

For more information about all of the Validation tests, visit: http://technet.microsoft.com/en-us/library/cc726064.aspx

 

We will continue to improve ‘Validate’ in our future products, so send us feedback about which new tests you would like to see.  You can send feedback by clicking on the ‘Email’ link in the upper right corner of this page.

 

Thanks,

Symon Perriman
Program Manager II
Clustering & High-Availability
Microsoft

New Validation Tests in 2008 R2 Failover Clustering
read full article at source
 
StartPrev31323334353637383940NextEnd

Hi, my name is Misha Hanin. I have served as an IT Network Administrator and IT Consultant for over 15 years. I have a number of certifications including CNE, Citrix CCA, VMWare VCP, MCP+I, MCSE, MCTS, MCITP Enterprise Messaging Administrator & MCITP Enterprise Administrator .

Microsoft presented me with the 2008 Microsoft® MVP Award (MVP) in Windows Server - Admin Frameworks! More...




Subscribe to CuruIT
Get tips, news and tutorials via RSS, Email or Twitter

Enter your email address:


Subscribe to my news feed for free Follow me on Twitter! Become a CuruIT fan on Facebook :) Become a CuruIT fan on YouTube :)