Hi, In this series of blog posts we will help you to design, develop and debug the Resource DLL you are developing to give your application high-availability with Windows Server 2008 & 2008 R2 Failover Clustering. We recommend you start with the first blog post in the series: · Part 1: http://blogs.msdn.com/clustering/archive/2010/03/11/9976620.aspx Now that we’ve described which components are involved, and how they interact, let’s see what the resource state machine looks like from the point of view of RHS. RHS interacts to the Resource DLL using the Resource DLL entry points. You can find detailed information about them on MSDN: http://msdn.microsoft.com/en-us/library/aa372244(VS.85).aspx The list below shows all the entry points that a resource DLL is expected to implement. Think of this as a mapping between common cluster operations, and how they specifically interact with this new resource type. After the Resource DLL is loaded, RHS will call Startup to provide to the Resource DLL pointers to some of its functions, and to get from the Resource DLL pointers to the resource’s functions. · Close removes a resource instance from the cluster · IsAlive determines if a resource is actually operational · LooksAlive determines if a resource appears to be available for use · Offline performs a graceful shutdown of the resource · Online starts the resource and makes it available to the cluster · Open creates a new resource instance · ResourceControl supports resource control codes · ResourceTypeControl supports resource type control codes · Startup receives the LogEvent and SetResourceStatus callbacks and returns a function table · Terminate performs an immediate shutdown of the resource Please note that ResourceControl and ResourceTypeControl are optional. If user does not provide them, it only means this resource and resource type will not have private properties. Open is called to notify the Resource DLL that a new instance of this resource has been created. After Open was called, the Resource DLL should be ready to receive ResourceTypeControls. Close is called to notify the Resource DLL that the resource instance is being deleted. At this point the Resource DLL and RHS are set up to start handling resources. For every resource that will be handled in the context of the given RHS process, the appropriate Resource DLL will get an Open call. In the Open call the Resource DLL should allocate the context to keep information about this instance of the resource. If Open completes successfully, then the resource is in the Opened state, Offline sub-state. When cluster stops RCM and RHS will make sure to bring the resource to offline state, and to call Close for that resource so the Resource DLL can perform clean-up before it re-Open it in the new monitor. In this post we will not discuss the calls that are related to the arbitration of the quorum resources – Arbitrate, and Release. Instead we will concentrate on the calls Online, Offline, Terminate, IsAlive, LooksAlive and ResourceControl. From the RHS perspective, while in the opened state, a resource can be in the Offline, Onlining, Online or Offlining sub-states. Onlining and Offlining have sub-states, and resource will be in one of the sub-states. Transitions to the states are performed as RHS makes calls to the Resource DLL, and based on the calls results. Please remember that in this post we are talking about states in the RHS. RHS states are internal to the cluster implementation, and are not exposed to the user using UI or APIs. RHS states are different from RCM states, and you can see the RCM states from the UI or programmatic interfaces, such as “offline”. We can split the calls into several categories based on how RHS and Resource DLL handles them. · Online and Offline are more complicated because they have a concept of being in “pending” state, and the way RHS handles timeouts of these calls is different · ResourceControl can be sent to the resource in any sub-states of the Opened state · Terminate can be issued concurrently with any other call, except Open and Close · LooksAlive and IsAlive can be sent to the resource only while they are Online Below you can see a state chart diagram that shows RHS Resource state machine. I would like to stress that this is the RHS’s state machine. The Resource DLL state machine might be based on this state machine, but might have some other sub-states specific for the application. To interpret this diagram, you need to be aware that this is a hierarchical state machine diagraph, also known as a UML State Machine, as described here: http://en.wikipedia.org/wiki/UML_state_machine. 
In the next post in this series we will walk through several scenarios using the above state diagram. Thanks, Vladimir Petter Senior Software Development Engineer Clustering & High-Availability Microsoft  read full article at source |
| Hi Cluster Fans, In this post I will discuss a 2008 and 2008 R2 deployment issue which is sometimes seen, and how to get around it. Let’s say you want to Validate a server, create a cluster or add a node to a cluster. We have a requirement that a node can be a member of only 1 cluster at a time. When you try this operation, you may see a message which tells you that one of the servers you want to use, a non-clustered server, is already a part of a cluster and so you cannot continue…why? The message states "The computer '<Server Name>' is joined to a cluster." 
The reason for this is that this node was at some point a cluster node, however when the cluster was destroyed or the node was evicted it did not clean up this node properly. This could happen if the node was offline while the original cluster was destroyed, so other nodes were cleaned up, however this node was never ‘unclustered’. If you try to connect to that node using Failover Cluster Manager you will see that it attempts to connect, then it will time out after about 5 minutes. 
If you check the status of the node using PowerShell, it will report that the node is in a ‘Joining’ state.
If you get into this situation, you can run the following eviction commands directly on the node to properly clean up the cluster components from that node so that you can reuse it in another cluster. PowerShell (R2 only): PS> Get-ClusterNode NodeName | Remove-ClusterNode –Force Cluster.exe: CMD> cluster.exe node –NodeName /force To avoid getting to this state, you should make sure that all of your nodes are online when you destroy the cluster or evict a node, so that the cluster can properly clean up the clustering components on every node. If you plan to evict a node which has resources on it, you should first gracefully move the resources off it to the best nodes (using Move Group or live migration for VMs). If you forget, the cluster will still failover the resources for you, but it may not be in your preferred way. Thanks, Symon Perriman Program Manager II Clustering & High-Availability Microsoft  read full article at source |
| Hi,
In this series of blog posts we will review the Resource State Machine from the Resource Host Server (RHS or Resource Monitor) point of view. Hopefully it will help you to design, develop and debug the Resource DLL you are developing to integrate your application with Windows Server 2008 & 2008 R2 Failover Clustering. IntroductionLet’s first review what components are involved in the process of managing of the resource state and how they interact with other components. If you need to integrate your application with Failover Clustering you have to write a Resource DLL. Basically this will allow your custom resource to interact with the cluster and perform common functional tasks, like bringing the resource offline or online. The Resource DLL is responsible for translating cluster resource state transition commands to commands specific for that application, and for bringing that application to a state corresponding to the current resource state. The image below shows a 2-node cluster interacting with resources.  Both nodes have the Cluster Service running. The cluster has one group configured, Group 1. In the group we have three resources. Resource 3 depends on Resource 2, and the Resource 2 depends on the Resource 1. You will probably see this same relationship when you have a group with an Application (Resource 1) that depends on a NetName (Resource 2), which depends on an IP Address (Resource 3). Resource Control Manager (RCM)In the Cluster Service there is a component called Resource Control Manager (RCM). RCM instances on all the nodes will negotiate who owns the group. The group owner will bring the group to its “persistent state”. The persistent state is the last state user put the group to, which could be Online or Offline. All other nodes that do not own this group will not bring this group online, to ensure it is online on at most one node at a time. If the persistent state of the Group 1 is online then Node 1 will bring it online, and the Node 2 will make sure it is offline. To bring the group online, RCM will bring online all the resources in the group in the order of resource dependencies. First RCM will send online command to the Resource 3. As soon as Resource 3 is online RCM will send Online command to the Resource 2, and as soon as Resource 2 is online the RCM will send Online to the Resource 1. Once Resource 1 is online, the Group 1 is online. This assumes that no failures have happened during the online process. Resource Host Service (RHS)Changing a resource state requires interacting with the actual application. To achieve that, RCM will send state transition request to the Resource DLL. Since the Resource DLL interacts with the application to perform its task it can experience some failures, such as a call taking too long or an exception. Failures of the Resource DLL such as exceptions or deadlocks should not bring the Cluster Service down. To achieve that Cluster Service never loads the Resource DLL in its process. Instead it spawns a child process – Resource Host Service (RHS or Resource Monitor). Normally RHS is shared among many resources, but if you see a flaky resource you can move it to a separate monitor using the resource properties so that it is isolated from stable resources. In the image above you can see that the Resource 3 is isolated to a separate RHS process. If your application requires multiple Resource DLLs then you can choose to create separate Resource DLLs for them or place them to the same Resource DLL. For more information about RHS, see this blog post: http://blogs.msdn.com/clustering/archive/2009/06/27/9806160.aspx Other ResourcesYou can find more information about the various cluster components in the following MSDN articles: · Cluster Resource Monitor: http://msdn.microsoft.com/en-us/library/aa372266(VS.85).aspx · Cluster Resource DLLs: http://msdn.microsoft.com/en-us/library/aa372239(VS.85).aspx · Resource DLL Entry Points: http://msdn.microsoft.com/en-us/library/aa372244(VS.85).aspx · Implementing Resource Health Monitoring: http://msdn.microsoft.com/en-us/library/aa372255(VS.85).aspx · Managing Resource State Transitions: http://msdn.microsoft.com/en-us/library/aa370988(VS.85).aspx In our next blogs in the series we will discuss how these components interact with the cluster, describe the entry points and give some examples. Part 2 is now available: http://blogs.msdn.com/clustering/archive/2010/03/30/9987135.aspx Thanks, Vladimir Petter Senior Software Development Engineer Clustering & High-Availability Microsoft  read full article at source |
| Hi Cluster Fans, Microsoft’s Windows Sustained Engineering (WinSE) organization has a job opening for a developer to support Failover Clustering and Network Load Balancing (NLB) technologies. WinSE is Microsoft’s servicing branch which takes requests to fix issues after the product has been shipped, such as Hotfixes, QFEs and Service Packs. The WinSE File Server Technologies team is looking for a developer to support our Windows Server 2008 R2 technology which has recently shipped. This team also supports other technologies and this information can be found in the job posting. Job ID: 708776 Link: https://careers.microsoft.com/JobDetails.aspx?ss=&pg=0&so=&rw=1&jid=13184&jlang=EN If you or someone you know would be a good candidate, please follow the application instructions in the link to apply for this position. Hopefully we will be working together soon! Thanks, Symon Perriman Program Manager II Clustering & High-Availability Microsoft
 read full article at source |
| Hi Cluster Fans, The MVP program is a way in which we thank active members in the Cluster and HA community for their contributions towards growing and supporting our products (more info). Our MVP program focuses on the Failover Clustering and Network Load Balancing technologies. We are looking to grow our MVP program by adding Cluster gurus from all regions of the world, in all positions and expertise, with focuses on a variety of sub-areas (like Storage with Clustering, SQL clustering, Hyper-V clustering, multi-site clustering, etc.). MVPs are people who are genuinely interested in and passionate about Clustering and High Availability. They go out of their way to promote HA, help others, and make clustering a primary consideration with IT deployments and planning. They get feedback from customers about what works, issues they are finding, and how we can improve our product. Essentially our MVPs are our lead customers – we run numerous ideas by them to get their feedback and let them explore our future products in detail under our NDA, giving them exposure and the ability to influence our technology in ways which the public cannot. Additionally our MVPs get a direct channel to the product team to troubleshoot issues, suggest improvements, and find solutions to problems which you would have otherwise needed to call support for. This past week was the annual MVP Summit where Microsoft brings together technical experts from around the world to our Redmond campus for a week to deep technical discussions and product planning. There are currently 12 Cluster MVPs: · William Bressette, Canada, http://www.horn-it.com/ · Andrew Cheng, Singapore, http://andrewchengnh.spaces.live.com · Allan Hirt, USA, http://sqlHA.com · Russ Kaufmann, USA, http://www.clusterhelp.com, http://msmvps.com/blogs/clusterhelp/Default.aspx · Joachim Nasslander, Sweden, http://www.nullsession.com/ · Nail Own, Germany, http://www.cluadmin.de/ · John Savill, USA, http://www.SavillTech.com · Robert Smit, The Netherlands, http://fiberman.spaces.live.com · Michael Steineke, USA, http://bighammer.com/ · John Toner, USA, http://msmvps.com/blogs/jtoner/ · Edwin van Mierlo, Ireland, http://msmvps.com/blogs/edwinvanmierlo/ · Hans Vredevoort, The Netherlands, http://hyper-v.nu 
To become an MVP we would need to see continual community involvement from you over about a 12 month period. Just to set your expectations, this is not something you get ahead of time with the promise that you will be active in the community, this is something you get rewarded after you have been active as this shows that you genuinely have a passion in high availability. So, what do we consider ‘community interaction’? Well there are numerous ways in which you can get involved, including, but not limited to: - Activity and helping people in the forums and newsgroups:
· 2008 Forum: http://forums.technet.microsoft.com/en-US/winserverClustering/threads/ · 2008 R2 Forum: http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/ · 2003/2008/R2 Newsgroup: https://www.microsoft.com/technet/community/newsgroups/dgbrowser/en-us/default.mspx?dg=microsoft.public.windows.server.clustering - A blog about the topic (you can see our MVPs’ blogs listed above)
- Give training
- Give presentations
- Write articles, books, training guides, whitepapers
- Anything else where you help others by sharing you expertise with the public to help build their clustering knowledge
If you are interested, and whenever you do something in the community (such as give a talk or training), please let me know and I can start tracking your contributions. Simply send an email to ClusMVP (at) Microsoft (dot) com and I will add the information. No contribution is too small! Please let me know if you have any further questions and thanks for your interest. Sincerely, Symon Perriman Program Manager II & MVP Lead Clustering & High-Availability Microsoft  read full article at source |
| Hi Cluster Fans, Our Validate a Configuration Wizard was so popular that we’ve improved it in Windows Server 2008 R2. Validate is the tool which verifies that your entire configuration is suitable for Failover Clustering. It will test the servers, networks, storage, run a series of failover tests, and inventory all the configuration information into saved reports. It can be run before, during or after deployment as a troubleshooting tool. Cluster SupportRunning ‘Validate’ and making sure that no tests fail is one of only two requirements to have a supported Failover Cluster in Windows Server 2008 / R2. The other is that each component has a Windows Server 2008 / R2 Logo. More information about the Validate a Configuration Wizard: http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx More information about the Logo Program: http://www.microsoft.com/whdc/winlogo/default.mspx New Tests for Windows Server 2008 R2There are 5 categories of tests in Windows Server 2008 R2. The entire Cluster Configuration category is new, but this set of tests will only be run on clusters which have already been created. It lists useful information about the current configuration which helps troubleshooters easily understand how it is deployed. The image below is an example of some of the new, prescriptive information from Validate tests.
The Inventory and Storage categories run the same tests, with some minor changes. The Network and System Configuration categories have a few additions. In this image, the highlighted tests are new.
Cluster Configuration• List * – Provides an overview of Core Group, Networks, Resources, Storage, Services and Applications. It gives useful information about how the resources are configured and include graphical dependency reports. • Validate Quorum Configuration – Checks if the quorum mode used is recommended, with the settings depending on the number of nodes and availability of storage similar to the “Configure Cluster Quorum Wizard”, in addition it checks the recommended values for quorum arbitration time • Validate Resource Status - Validates that cluster resources are online, and list the cluster resources that are running in separate resource monitors. If a resource is running in a separate resource monitor, it is usually because the resource failed and the Cluster service began running it in a separate resource monitor (to make it less likely to affect other resources if it fails again). • Validate Service Principal Name - Issue a warning if the Service Principal Name cannot be found on a Kerberos enabled network names. SPN verifies the identity of the computer to which it is connecting • Validate Volume Consistency - If any volumes are flagged as inconsistent ("dirty"), it provide a reminder that running chkdsk is recommended. Network• List Network Binding Order - Lists the order in which networks are bound to each adapters on the nodes. • Validate Multiple Subnet Properties - if it is determined to be a multi-subnet cluster, retrieve the settings for all network name resources and determine if the private properties for HostRecordTTL, RegisterAllProvidersIP and PublishPTRRecords are optimal for that configuration and validates that settings related to DNS are configured appropriately for clusters using multiple subnets. System Configuration· Validate Cluster Service and Driver Settings – Validate startup settings used by services and drivers, including the Cluster service, CSVFilter.sys, NetFT.sys, and Clusdisk.sys. · Validate Memory Dump Settings - Validate that none of the nodes currently require a reboot (as part of a software update) and that each node is configured to capture a memory dump if it stops running. · Validate System Driver Variable - Validate that all nodes have the same value for the system drive environment variable, such as C:\ · Validate Operating System Installation Options – The ‘Validate Operating System Installation Options’ test will check that all nodes are using the Core or the Full Installation option. It is required that all nodes run the same installation option since not all roles and features are supported on Core, so workloads would not be able to failover to Core nodes if the role or feature is not available. o This replaced the ‘Validate Operating Systems’ tests which was no longer necessary since x86 architecture is no longer supported in Windows Server 2008 R2, and we now check that all nodes are x64 or ia64 when adding them to the list of servers to Validate. For more information about all of the Validation tests, visit: http://technet.microsoft.com/en-us/library/cc726064.aspx We will continue to improve ‘Validate’ in our future products, so send us feedback about which new tests you would like to see. You can send feedback by clicking on the ‘Email’ link in the upper right corner of this page. Thanks, Symon Perriman Program Manager II Clustering & High-Availability Microsoft  read full article at source |
| |
|