Site-aware Failover Clusters in Windows Server 2016

August 19, 2015, 12:09 pm

≫ Next: Hyper-converged with Windows Server 2016

≪ Previous: Workgroup and Multi-domain clusters in Windows Server 2016

Windows Server 2016, debuts the birth of site-aware clusters. Nodes in stretched clusters can now be grouped based on their physical location (site). Cluster site-awareness enhances key operations during the cluster lifecycle such as failover behavior, placement policies, heartbeating between the nodes and quorum behavior. In the remainder of this blog I will explain how you can configure sites for your cluster, the notion of a “preferred site” and how site awareness manifests itself in your cluster operations.

Configuring Sites

A node’s site membership can be configured by setting the Site node property to a unique numerical value.

For example, in a four node cluster with nodes - Node1, Node2, Node3 and Node4, to assign the nodes to Sites 1 and Site 2, do the following:

Launch Microsoft PowerShell^© as an Administrator and type:

(Get-ClusterNodeNode1).Site=1

(Get-ClusterNodeNode2).Site=1

(Get-ClusterNodeNode3).Site=2

(Get-ClusterNodeNode4).Site=2

Configuring sites enhances the operation of your cluster in the following ways:

Failover Affinity

Groups failover to a node within the same site, before failing to a node in a different site
During Node Drain VMs are moved first to a node within the same site before being moved cross site
The CSV load balancer will distribute within the same site

Storage Affinity

Virtual Machines (VMs) follow storage and are placed in same site where their associated storage resides. VMs will begin live migrating to the same site as their associated CSV after 1 minute of the storage being moved.

Cross-Site Heartbeating

You now have the ability to configure the thresholds for heartbeating between sites. These thresholds are controlled by the following new cluster properties:

Property	Default Value	Description
CrossSiteDelay	1	Frequency heartbeat sent to nodes on dissimilar sites
CrossSiteThreshold	20	Missed heartbeats before interface considered down to nodes on dissimilar sites

To configure the above properties launch PowerShell^© as an Administrator and type:

(Get-Cluster).CrossSiteDelay = <value>

(Get-Cluster).CrossSiteThreshold = <value>

You can find more information on other properties controlling failover clustering heartbeating here.

The following rules define the applicability of the thresholds controlling heartbeating between two cluster nodes:

If the two cluster nodes are in two different sites and two different subnets, then the Cross-Site thresholds will override the Cross-Subnet thresholds.
If the two cluster nodes are in two different sites and the same subnets, then the Cross-Site thresholds will override the Same-Subnet thresholds.
If the two cluster nodes are in the same site and two different subnets, then the Cross-Subnet thresholds will be effective.
If the two cluster nodes are in the same site and the same subnets, then the Same-Subnet thresholds will be effective.

Configuring Preferred Site

In addition to configuring the site a cluster node belongs to, a “Preferred Site” can be configured for the cluster. The Preferred Site is a preference for placement. The Preferred Site will be your Primary datacenter site.

Before the Preferred Site can be configured, the site being chosen as the preferred site needs to be assigned to a set of cluster nodes. To configure the Preferred Site for a cluster, launch PowerShell^© as an Administrator and type:

(Get-Cluster).PreferredSite = <Site assigned to a set of cluster nodes>

Configuring a Preferred Site for your cluster enhances operation in the following ways:

Cold Start

During a cold start VMs are placed in in the preferred site

Quorum

Dynamic Quorum drops weights from the Disaster Recovery site (DR site i.e. the site which is not designated as the Preferred Site) first to ensure that the Preferred Site survives if all things are equal. In addition, nodes are pruned from the DR site first, during regroup after events such as asymmetric network connectivity failures.
During a Quorum Split i.e. the even split of two datacenters with no witness, the Preferred Site is automatically elected to win
- The nodes in the DR site drop out of cluster membership
- This allows the cluster to survive a simultaneous 50% loss of votes
- Note that the LowerQuorumPriorityNodeID property previously controlling this behavior is deprecated in Windows Server 2016

Preferred Site and Multi-master Datacenters

The Preferred Site can also be configured at the granularity of a cluster group i.e. a different preferred site can be configured for each group. This enables a datacenter to be active and preferred for specific groups/VMs.

To configure the Preferred Site for a cluster group, launch PowerShell^© as an Administrator and type:

(Get-ClusterGroupGroupName).PreferredSite = <Site assigned to a set of cluster nodes>

Placement Priority

Groups in a cluster are placed based on the following site priority:

Storage affinity site
Group preferred site
Cluster preferred site

↧

Hyper-converged with Windows Server 2016

September 8, 2015, 2:46 pm

≫ Next: Configuring site awareness for your multi-active disaggregated datacenters

≪ Previous: Site-aware Failover Clusters in Windows Server 2016

Windows Server 2016 Technical Preview 3 just recently released, and one of the big hot features which has me really excited is Storage Spaces Direct (S2D). With S2D you will be able to create a hyper-converged private cloud. A hyper-converged infrastructure (HCI) consolidates compute and storage into a common set of servers. Leveraging internal storage which is replicated, you can create a true Software-defined Storage (SDS) solution.

This is available in the Windows Server 2016 Technical Preview today! I encourage you to go try it out and give us some feedback. Here's where you can learn more:

Presentation from Ignite 2015:

Storage Spaces Direct in Windows Server 2016 Technical Preview
https://channel9.msdn.com/events/Ignite/2015/BRK3474

Deployment guide:

Enabling Private Cloud Storage Using Servers with Local Disks

https://technet.microsoft.com/en-us/library/mt126109.aspx

Claus Joergensen's blog:

Storage Spaces Direct
http://blogs.technet.com/b/clausjor/archive/2015/05/14/storage-spaces-direct.aspx

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

↧

Configuring site awareness for your multi-active disaggregated datacenters

September 10, 2015, 2:51 pm

≫ Next: How can we improve the installation and patching of Windows Server? (Survey Request)

≪ Previous: Hyper-converged with Windows Server 2016

In a previous blog,I discussed the introduction of site-aware Failover Clusters in Windows Server 2016. In this blog, I am going to walk through how you can configure site-awareness for your multi-active disaggregated datacenters. You can learn more about Software Defined Storage and the advantages of a disaggregated datacenter here.

Consider the following multi-active datacenters, with a compute and a storage cluster, stretched across two datacenters. Each cluster has two nodes on each datacenter.

To configure site-awareness for the stretched compute and storage clusters proceed as follows:

Compute Stretch Cluster

1) Assign the nodes in the cluster to one of the two datacenters (sites).

Open PowerShell^©as an Administrator and type:

(Get-ClusterNode Node1).Site = 1

(Get-ClusterNode Node2).Site = 1

(Get-ClusterNode Node3).Site = 2

(Get-ClusterNode Node4).Site = 2

2) Configure the site for your primary datacenter.

(Get-Cluster).PreferredSite = 1

Storage Stretch Cluster

In multi-active disaggregated datacenters, the storage stretch cluster hosts a Scale-Out File Server (SoFS). For optimal performance, it should be ensured that the site hosting the Cluster Shared Volumes comprising the SoFS, follows the site hosting the compute workload. This avoids the cost of cross-datacenter network traffic.

1) As in the case of the compute cluster assign the nodes in the storage cluster to one of the two datacenters (sites).

(Get-ClusterNode Node5).Site = 1

(Get-ClusterNode Node6).Site = 1

(Get-ClusterNode Node7).Site = 2

(Get-ClusterNode Node8).Site = 2

2) For each Cluster Shared Volume (CSV) in the cluster, configure the preferred site for the CSV group to be the same as the preferred site for the Compute Cluster.

$csv1 = Get-ClusterSharedVolume "Cluster Disk 1" | Get-ClusterGroup

($csv1).PreferredSite = 1

3) Set each CSV group in the cluster to automatically failback to the preferred site when it is available after a datacenter outage.

($csv1).AutoFailbackType = 1

Note: Step 2 and 3 can also be used to configure the Preferred Site for a CSV group in a hyper-converged data-center deployment. You can learn more about hyper-converged deployments in Windows Server 2016 here.

↧

How can we improve the installation and patching of Windows Server? (Survey Request)

September 11, 2015, 4:42 pm

≫ Next: Testing Storage Spaces Direct using Windows Server 2016 virtual machines

≪ Previous: Configuring site awareness for your multi-active disaggregated datacenters

Do you want your server OS deployment and servicing to move faster? We're a team of Microsoft engineers who want your experiences and ideas around solving real problems of deploying and servicing your server OS infrastructure. We prefer that you don't love server OS deployment already, and we’re interested even if you don’t use Windows Server. We need to learn it and earn it.

Click the link below if you wish to fill out a brief survey and perhaps participate in a short phone call.

https://aka.ms/deployland

Many Thanks!!!

-Rob.

↧

Testing Storage Spaces Direct using Windows Server 2016 virtual machines

May 27, 2015, 3:28 am

≫ Next: Microsoft Virtual Academy – Learn Failover Clustering & Hyper-V

≪ Previous: How can we improve the installation and patching of Windows Server? (Survey Request)

Windows Server Technical Preview 2 introduces Storage Spaces Direct (S2D), which enables building highly available(HA) storage systems with local storage. This is a significant step forward in Microsoft Windows Server software-defined storage (SDS) as it simplifies the deployment and management of SDS systems and also unlocks use of new classes of disk devices, such as SATA disk devices, that were previously unavailable to clustered Storage Spaces with shared disks. The following document has more details about the technology, functionality, and how to deploy on physical hardware.

Storage Spaces Direct Experience and Installation Guide

That experience and install guide notes that to be reliable and perform well in production, you need specific hardware (see the document for details). However, we recognize that you may want to experiment and kick the tires a bit in a test environment, before you go and purchase hardware. Therefore, as long as you understand it’s for basic testing and getting to know the feature, we are OK with you configuring it inside of Virtual Machines.

If you want to verify specific capabilities, performance, and reliability, you will need to work with your hardware vendor to acquire approved servers and configuration requirements.

Assumptions for this Blog

- You have a working knowledge of how to configure and manage Virtual Machines (VMs).

- You have a basic knowledge of Windows Server Failover Clustering (cluster).

Pre-requisites

- Windows Server 2012R2 or Windows Server 2016 with the Hyper-V Role installed and configured to host VMs.

- Enough capacity to host four VMs with the configuration requirements noted below.

- Hyper-V servers can be part of a host failover cluster, or stand-alone.

- VMs can be located on the same server, or distributed across servers (as long as the networking connectivity allows for traffic to be routed to all VMs with as much throughput and lowest latency possible.)

Note: These instructions and guidance focus on using our latest Windows Servers as the hypervisor. Windows Server 2012R2 and Widows Server 2016
(pre-release) is what I use. There is nothing that will restrict you to trying this with other private or public clouds. However, this blog post does not cover those scenarios and whether or not they work will depend on the environment providing the necessary storage/network and other resources. We will update our documentation as we verify for other private or public clouds.

Overview of Storage Spaces Direct

S2D uses disks that are exclusively connected to one node of a Windows Server 2016 failover cluster and allows Storage Spaces to create pools using those disks. Virtual Disks (Spaces) that are configured on the pool will have their redundant data (mirrors or parity) spread across the nodes of the cluster. This allows access to data even when a node fails, or is shutdown for maintenance.

You can implement S2D implement in VMs, with each VM configured with two or more virtual disks connected to the VM’s SCSI Controller. Each node of the cluster running inside of the VM will be able to connect to its own disks, but S2D will allow all the disks to be used in Storage Pools that span the cluster nodes.

S2D uses SMB as the transport to send redundant data, for the mirror or parity spaces, to be distributed across the nodes.

Effectively, this emulates the configuration in the following diagram:

Configuration 1: Single Hyper-V Server (or Client)

The simplest configuration is one machine hosting all of the VMs used for the S2D system. In my case, a Windows Server 2016 Technical Preview 2 (TP2) system running on a desktop class machine with 16GB or RAM and a 4 core modern processor.

The VMs are configured identically. I have a virtual switch connected to the host’s network and goes out to the world for clients to connect and I created a second virtual switch that is set for Internal network, to provide another network path for S2D to utilize between the VMs.

The configuration looks like the following diagram:

Hyper-V Host Configuration

- Configure the virtual switches: Configure a virtual switch connected to the machine’s physical NIC, and another virtual switch configured for internal only.

Example: Two virtual switches. One configured to allow network traffic out to the world, which I labeled “Public”. The other is configured to only allow network traffic between VMs configured on the same host, which I labeled “InternalOnly”.

VM Configuration

- Create four or more Virtual Machines

Memory: If using Dynamic Memory, the default of 1024 Startup RAM will be sufficient. If using Fixed Memory you should configure 4GB or more.
Network: Configure each two network adapters. One connected to the virtual switch with external connection, the other network adapter connected to the virtual switch that is configured for internal only.

It’s always recommended to have more than one network, each connected to separate virtual switches so that if one stops flowing network traffic, the other(s) can be used and allow the cluster and Storage Spaces Direct system to remain running.

Virtual Disks: Each VM needs a virtual disk that is used as a boot/system disk, and two or more virtual disks to be used for Storage Spaces Direct.

Disks used for Storage Spaces Direct must be connected to the VMs virtual SCSI Controller.
Like all other systems, the boot/system disk needs to have unique SIDs, meaning they need to be installed from ISO or other install methods, and if using duplicated VHDx it needs to be generalized (for example using Sysprep.exe), before the copy was made.
VHDx type and size: You need at least eight VHDx files (four VMs with two data VHDx each). The data disks can be either “dynamically expanding” or “fixed size”. If you use fixed size, then set the size to 8GB or more, then calculate the size the combined VHDx files so that you don’t exceed the storage available on your system.

Example: The following is the Settings dialog for a VM that is configured to be part of an S2D system on one of my Hyper-V hosts. It’s booting from the Windows Server TP2 VHD that I downloaded from Microsoft’s external download site, and that is connected to the IDE Controller 0 (this had to be a Gen1 VM since the TP2 file
that I downloaded is a VHD and not VHDx). I created two VHDx files to be used by S2D, and they are connected to the SCSI Controller. Also note the VM is connected to the Public and InternalOnly virtual switches.

Note: Do not enable the virtual machine’s Processor Compatibility setting. This setting disables certain processor capabilities that S2D requires inside the VM. This option is unchecked by default, and needs to stay that way. You can see this setting here:

Guest Cluster Configuration

Once the VMs are configured, creating and managing the S2D system inside the VMs is almost identical to the steps for supported physical hardware:

Start the VMs
Configure the Storage Spaces Direct system, using the “Installation and Configuration” section of the guide linked here: Storage Spaces Direct Experience and Installation Guide

Since this in VMs using only VHDx files as its storage, there is no SSD or other faster media to allow tiers. Therefore, skip the steps that enables or configures tiers.

Configuration 2: Two or more Hyper-V Servers

You may not have a single machine with enough resources to host all four VMs, or you may already have a Hyper-V host cluster to deploy on, or more than one Hyper-V servers that you want to spread the VMs across. Here is an diagram showing a configuration spread across two nodes, as an example:

This configuration is very similar to the single host configuration. The differences are:

Hyper-V Host Configuration

- Virtual Switches: Each host is recommended to have a minimum of two virtual switches for the VMs to use. They need to be connected externally to different NICs on the systems. One can be on a network that is routed to the world for client access, and the other can be on a network that is not externally routed. Or, they both can be on externally routed networks. You can choose to use a single network, but then it will have all the client traffic and S2D traffic taking common bandwidth, and there is no redundancy if the single network goes down for the system S2D VMs to stay connected. However, since this is for testing and verification of S2D, you don’t have the resiliency to network loss requirements that we strongly suggest for production deployments.

Example: On this system I have an internal 10/100 Intel NIC and a dual port Pro/1000 1gb card. All Three NICs have virtual switches. I labeled one Public and connected it to the 10/100 NIC since my connection to the rest of the world is through a 100mb infrastructure. I then have the 1gb NICs connected to a 1gb desktop switch (two different switches), and that provides my hosts two network paths between each other for S2D to use. As noted, three networks is not a requirement, but I have this available on my hosts so I use them all.

VM Configuration

- Network:If you choose to have a single network, then each VM will only have one network adapter in its configuration.

Example: Below is a snip of a VM configuration on my two host configuration. You will note the following:

- Memory: I have this configured with 4GB of RAM instead of dynamic memory. It was a choice since I have enough memory resources on my nodes to dedicate memory.

- Boot Disk: The boot disk is a VHDx, so I was able to use a Gen2 VM.

- Data Disks: I chose to configure four data disks per VM. The minimum is two, I wanted to try four. All VHDx are configured on the SCSI Controller (which you don’t have a choice in Gen2 VMs).

- Network Adapters: I have three adapters, each connected to one of the three virtual switches on the host to utilize the available network bandwidth that my hosts provide.

General Suggestions:

- Network. Since the network between the VMs transports the redundant data for mirror and parity spaces, the bandwidth and latency of the network will be a significant factor in the performance of the system. Keep this in mind as you experience the system in the test configurations.

- VHDx location optimization. If you have a Storage Space that is configured for a three way mirror, then the writes will be going to three separate disks (implemented as VHDx files on the hosts), each on different nodes of the cluster. Distributing the VHDx files across disks on the Hyper-V hosts will provide better response to the I/Os. For instance, if you have four disks or CSV volumes available on the Hyper-V hosts, and four VMs, then put the VHDx files for each VM on a separate disks (VM1 using CSV Volume 1, VM2 using CSV Volume 2, etc).

FAQ:

How does this differ from what I can do in VMs with Shared VHDx?

Shared VHDx remains a valid and recommended solution to provide shared storage to a guest cluster (cluster running inside of VMs). It allows a VHDx to be accessed by multiple VMs at the same time in order to provide clustered shared storage. If any nodes (VMs) fail, the others have access to the VHDx and the clustered roles using the storage in the VMs can continue to access their data.

S2D allows clustered roles access to clustered storage spaces inside of the VMs without provisioning shared VHDx on the host. With S2D, you can provision VMs with a boot/system disk and then two or more extra VHDx files configured for each VM. You then create a cluster inside of the VMs, configure S2D and have resilient clustered Storage Spaces to use for your clustered roles inside the VMs.

References

Storage Spaces Direct Experience and Installation Guide

↧

Microsoft Virtual Academy – Learn Failover Clustering & Hyper-V

June 2, 2015, 9:47 am

≫ Next: Virtual Machine Compute Resiliency in Windows Server 2016

≪ Previous: Testing Storage Spaces Direct using Windows Server 2016 virtual machines

Would you like to learn how to deploy, manage, and optimize a Windows Server 2012 R2 failover cluster? The Microsoft Virtual Academy is a free training website for IT Pros with over 2.7 million students. This technical course can teach you everything you want to know about Failover Clustering and Hyper-V high-availability and disaster recovery, and you don’t even need prior clustering experience! Start today: http://www.microsoftvirtualacademy.com/training-courses/failover-clustering-in-windows-server-2012-r2.

Join clustering experts Symon Perriman (VP at 5nine Software and former Microsoft Technical Evangelist) and Elden Christensen (Principal Program Manager Lead for Microsoft’s high-availability team) and to explore the basic requirements for a failover cluster and how to deploy and validate it. Learn how to optimize the networking and storage configuration, and create a Scale-Out File Server. Hear the best practices for configuring and optimizing highly available Hyper-V virtual machines (VMs), and explore disaster recovery solutions with both Hyper-V Replica and multi-site clustering. Next look at advanced administration and troubleshooting techniques, then learn how System Center 2012 R2 can be used for large-scale failover cluster management and optimization.

This full day of training includes the following modules:

Introduction to Failover Clustering
Cluster Deployment and Upgrades
Cluster Networking
Cluster Storage & Scale-Out File Server
Hyper-V Clustering
Multi-Site Clustering & Scale-Out File Server
Advanced Cluster Administration & Troubleshooting
Managing Clusters with System Center 2012 R2

Learn everything you need to know about Failover Clustering on the Microsoft Virtual Academy: http://www.microsoftvirtualacademy.com/training-courses/failover-clustering-in-windows-server-2012-r2

↧

Virtual Machine Compute Resiliency in Windows Server 2016

June 3, 2015, 5:39 am

≫ Next: Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks

≪ Previous: Microsoft Virtual Academy – Learn Failover Clustering & Hyper-V

In today’s cloud scale environments, commonly comprising of commodity hardware, transient failures have become more common than hard failures. In these circumstances, reacting aggressively to handle these transient failures can cause more downtime than it prevents. Windows Server 2016, therefore introduces increased Virtual Machine (VM) resiliency to intra-cluster communication failures in your compute cluster.

Interesting Transient Failure Scenarios

The following are some potentially transient scenarios where it would be beneficial for your VM to be more resilient to intra-cluster communication failures:

Node disconnected: The cluster service attempts to connect to all active nodes. The disconnected (Isolated) node cannot talk to any node in an active cluster membership.
Cluster Service crash: The Cluster Service on a node is down. The node is not communicating with any other node.
Asymmetric disconnect: The Cluster Service is attempting to connect to all active nodes. The isolated node can talk to at least one node in active cluster membership.

New Failover Clustering States

In Windows Server 2016, to reflect the new Failover Cluster workflow-in the event of transient failures, three new states have been introduced:

A new VM state, Unmonitored, has been introduced in Failover Cluster Manager to reflect a VM that is no longer monitored by the cluster service.

Two new cluster node states have been introduced to reflect nodes which are not in active membership but were host to VM role(s) before being removed from active membership:

Isolated:

The node is no longer in an active membership
The node continues to host the VM role

Quarantine:

The node is no longer allowed to join the cluster for a fixed time period (default: 2 hours)
This action prevents flapping nodes from negatively impacting other nodes and the overall cluster health
By default, a node is quarantined, if it ungracefully leaves the cluster, three times within an hour
VMs hosted by the node are gracefully drained once quarantined
No more than 25% of nodes can be quarantined at any given time

The node can be brought out of quarantine by running the Failover Clustering PowerShell^© cmdlet, Start-ClusterNode with the –CQ or –ClearQuarantine flag.

VM Compute Resiliency Workflow in Windows Server 2016

The VM resiliency workflow in a compute cluster is as follows:

In the event of a “transient” intra-cluster communication failure, on a node hosting VMs, the node is placed into an Isolated state and removed from its active cluster membership. The VM on the node is now considered to be in an Unmonitored state by the cluster service.
- File Storage backed (SMB): The VM continues to run in the Online state.
- Block Storage backed (FC / FCoE / iSCSI / SAS): The VM is placed in the Paused Critical state. This is because the isolated node no longer has access to the Cluster Shared Volumes in the cluster.
- Note that you can monitor the “true” state of the VM using the same tools as you would for a stand-alone VM (such as Hyper-V Manager).

If the isolated node continues to experience intra-cluster communication failures, after a certain period (default of 4 minutes), the VM is failed over to a suitable node in the cluster, and the node is now moved to a Down state.
If a node is isolated a certain number of times (default three times) within an hour, it is placed into a Quarantine state for a certain period (default two hours) and all the VMs from the node are drained to a suitable node in the cluster.

Configuring Node Isolation and Quarantine settings

To achieve the desired Service Level Agreement guarantees for your environment, you can configure the following cluster settings, controlling how your node is placed in isolation or quarantine:

Setting	Description	Default	Values
ResiliencyLevel	Defines how unknown failures handled	2	1 – Allow the node to be in Isolated state only if the node gave a notification and it went away for known reason, otherwise fail immediately. Known reasons include Cluster Service crash or Asymmetric Connectivity between nodes. 2- Always let a node go to an Isolated state and give it time before taking over ownership of the VMs. PowerShell: (Get-Cluster).ResiliencyLevel = <value>
ResiliencyPeriod	Duration to allow VM to run isolated (in seconds)	240	0 – Reverts to pre-Windows Server 2016 behavior PowerShell: Cluster property: (Get-Cluster).ResiliencyDefaultPeriod = <value> Group common property for granular control: (Get-ClusterGroup “My VM”).ResiliencyPeriod= <value> A value of -1 for the group property causes the cluster property to be used.
QuarantineThreshold	Number of failures before a node is Quarantined.	3	PowerShell: (Get-Cluster).QuarantineThreshold = <value>
QuarantineDuration	Duration to disallow cluster node join (in seconds)	7200	0xFFFFFFFF – Never allow node to join (in seconds) PowerShell: (Get-Cluster).QuarantineDuration = <value>

↧

Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks

July 29, 2015, 8:04 am

≫ Next: Workgroup and Multi-domain clusters in Windows Server 2016

≪ Previous: Virtual Machine Compute Resiliency in Windows Server 2016

In this post we will discuss how to find if performance that you observe on a Cluster Shared Volume (CSV) is what you expect and how to find which layer in your solution may be the bottleneck. This blog assumes you have read the previous blogs in the CSV series (see the bottom of this blog for links to all the blogs in the series).

Sometimes someone asks a question in why CSV performance does not match their expectations and how to investigate. The answer is that CSV consists of multiple layers, and the most straight forward troubleshooting approach is through a process of elimination to first remove all the layers, test speed of the disk and then start adding layers one by one until you find the one causing the issue.

You might be tempted to use copy file as a quick way to test performance. While copy file is an important workload it is not the best way to test your storage performance. Review this blog which goes into more details why it does not work well http://blogs.technet.com/b/josebda/archive/2014/08/18/using-file-copy-to-measure-storage-performance-why-it-s-not-a-good-idea-and-what-you-should-do-instead.aspx. It is important to understand copy file performance that you can expect from your storage so I would suggest to run copy file after you are done with micro benchmarks as a part of workload testing.

To test performance you can use DiskSpd that is described in this blog post http://blogs.technet.com/b/josebda/archive/2014/10/13/diskspd-powershell-and-storage-performance-measuring-iops-throughput-and-latency-for-both-local-disks-and-smb-file-shares.aspx.

When selecting file size you will run the tests on be aware of the caches and tiers on your storage. For instance a storage might have cache on NVRAM or NVME. All writes that go to fast tier might be very fast, but then once you used up all the space on the cache you will have to go with the speed of the next slower tier. If your intention is to test cache then create a file that fits into the cache, otherwise create file that is larger than the cache.

Some LUNs might have some offsets mapped to SSDs while others map to HDDs. An example would be tiered space. When creating a file be aware what tier the blocks of the files are located on.

Additionally, when measuring performance do not assume that if you’ve created two LUNs with the similar characteristics you will get identical performance. If the LUN’s are not laid out on the physical spindles in a different way it might be enough to cause completely different performance behavior. To avoid surprises as you are running tests through different layers (will be described below) ALWAYS use the same LUN. Several times we’ve seen cases when someone would run tests against one LUN, and then would run tests over CSVFS with another, with what was believed to be a similar LUN. Only to observe worse results in CSVFS case and would incorrectly come to a conclusion that CSVFS is the problem. When in the end, removing disk from CSV and running test directly on the LUN was showing that two LUNs have different performance.

Sample number you will see in this post were collected on a 2 Node Cluster,

CPU: Intel(R) Xeon(R) CPU E5-2450L 0 @ 1.80GHz, Intel64
Family 6 Model 45 Stepping 7, GenuineIntel,
2 NUMA nodes 8 Cores each with Hyperthreading disabled.
RAM: 32 GB DDR3.
Network: one RDMA Mellanox ConnectX-3 IPoIB Adapter
54GBiPS, and one Intel(R) I350 Gigabit network adapter.
The shared disk is a single HDD connected using SAS.
Model HP EG0300FBLSE Firmware version HPD6. Disk cache is disabled.

With this hardware my expectation is that the disk should be the bottleneck, and going over the network should not have any impact on throughput.

In the samples you will see below I was running a single threaded test application, which at any time was keeping eight 8K outstanding IOs on the disk. In your tests you might want to add more variations with different queue depth and different IO sizes, and different number of threads/CPU cores utilized. To help, I have provided the table below which outlines some tests to run and data to capture to get a more exhaustive picture of your disk performance. Running all these variation may take several hours. If you know IO patterns of your workloads then you can significantly reduce the test matrix.

			Queue Depth
			1	4	16	32	64	128	256
Unbuffered Write-Trough	4K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	8K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	16K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	64K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	128K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	256K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	512K	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes
	1MB	sequential read
		sequential write
		random read
		random write
		random 70% reads 30 % writes

If you have Storage Spaces then it might be useful to first collect performance numbers of the individual disks this Space will be created with. This will help set expectations around what kind of performance you should expect in best/worst case scenario from the Space.

As you are testing individual spindles that will be used to build Storage Spaces pay attention to different MPIO (Multi Path IO) modes. For instance you might expect that round robin over multiple paths would be faster than fail over, but for some HDDs you might find that they give you better throughput with fail over than with round robin. When it comes to SAN MPIO considerations are different. In case of SAN, MPIO is between the computer and a controller in the SAN storage box. In case of Storage Spaces MPIO is between computer and the HDD, so it comes to how efficient is the HDD’s firmware handling IO from different paths. In production for a JBOD connected to multiple computers IO will be coming from different computers so in any case HDD firmware need to be able to efficiently handle IOs coming from multiple computers/paths. Like with any kind of performance testing you should not jump to a conclusion that a particular MPIO mode is good or bad, always test first.

Another commonly discussed topic is what should be the file system allocation unit size (A.K.A cluster size). There is a variety of options between 4K and 64K.

For starters, CSVFS has no requirements for the underlying file system cluster size. It is fully compatible with all cluster sizes. The primary influencer for the cluster size is driven by the workload. For Hyper-V and SQL Server data and log files it is recommended to use a 64K cluster size. Since CSV is most commonly used to host VHD’s in one form or another, 64K is the recommended allocation unit size. Another influencer is your storage array, so it is good to have a discussion with your storage vendor for any optimizations unique to your storage device they recommend. There are also a few other considerations, let’s discuss:

File system fragmentation. If for the moment, we forget about the storage underneath the file system aside and look only at the file system layer by itself then

Smaller blocks mean better space utilization on the disk because if your file is only 1K then with 64K cluster size this file will consume 64K on the disk while with 4K cluster size it will consume only 4K, and you can have (64/4) 16 1K files on 64K. If you have lots of small files, then small cluster size might be a good choice.
On the other hand, if you have large files that are growing then smaller cluster size means more fragmentation. For instance in worst case scenario a 1 GB file with 4K cluster might have up to (1024×1024/4) 262,144 fragments (A.K.A runs) while with 64K clusters it will have only (1024×1024/64) 16,384 fragments. So why does fragmentation matter?

If you are constrained on RAM you may care more, as more fragments means more RAM needed to track all these metadata.
If your workload generates IO larger than the cluster size, and if your do not run defrag frequent enough, and consequently have lots of fragments then workloads IO might need to get split more often when cluster size is smaller. For instance, if on average workload generates a 32K IO then in worst case scenario on 4K cluster size this IO might need to be split to (32/4) 8 4K IOs to the volume, while with 64K cluster size it would never get split. Why splitting matters? Usually when it comes to a production workload it will be close to random IO, but larger the blocks are larger throughput you will see on average so ideally we should try to avoid splitting IO if this is not necessary.
If you are using storage copy offload then, some storage boxes support it only at a 64K granularity and would fail if cluster size is smaller. You need to check with your storage vendor.
If you anticipate lots of large file level trim commands (this is file system counterpart of storage block UNMAP). You might care about trim if you are using thinly provisioned LUN or if you have SSDs. SSDs garbage collection logic in firmware benefits from knowing certain blocks are not being used by a workload and can be used for garbage collection. For example, let’s assume we have a VHDX with NTFS inside, and this VHDX file itself is very fragmented. When you run defrag on NTFS inside the VHDX (most likely inside VM) then among other steps defrag will do free space consolidation, and then it will issue a file level trim to reclaim the free blocks. If there are lots of free space this might be a trim for a very large block. This trim will come to NTFS that hosts the VHDX. Then NTFS will need to translate this large file trim to block unmap for each fragment of the file. If the file highly fragmented then it may take a significant amount of time. A similar scenario might happen when you delete a large file or lots of files at once.
The list above is not exhaustive by any means, I am focusing on what I view as the more relevant
From the File System perspective, the rule of thumb would be to prefer larger cluster size unless you are planning to have lots of tiny files, and disk space saving from the smaller cluster size is important. No matter what cluster size you choose you will be better off periodically running defrag. You can monitor how much fragmentation is affecting your workload by looking at CSV File System Split IO, and PhysicalDisk Split IO performance counters.

File system block alignment and storage block alignment. When you create a LUN on a SAN or Storage Space it may be created out of multiple disks with different performance characteristics. For instance a mirrored spaces (http://blogs.msdn.com/b/b8/archive/2012/01/05/virtualizing-storage-for-scale-resiliency-and-efficiency.aspx ) would contain slabs on many disks, some slabs will be acting as mirrors, and then the entire space address range will be subdivided in 64K blocks and round robin across these slabs on different disks in RAID0 fashion to give you better aggregated throughput of multiple spindles.

This means that if you have 128K IO it will have to be split to 2 64K IOs that will go to different spindles. What if your File system is formatted with cluster size smaller than 64K? That means continues block in file system might not be 64K aligned. For example, if the file system is formatted with 4K clusters, and we have a file that is 128K, then my file can start at 4K alignment. If my application performs a 128K read, then it is possible this 128K block will map to up to 3 64 blocks on the storage spaces.

If your format your file system with 64K cluster size, then file allocations are always 64K aligned and on average you will see less IOPS on the spindles. Performance difference will be even larger when it comes to writes to Parity, RAID5 or RAID6 like LUNs. When you are overwriting part of the block storage have to do read-modify-write multiplying number of IOPS that is hitting your spindles. If you overwriting the entire block then it will be exactly one IO. If you want to be accurate then you need to evaluate what is the average block size you expect your workload to produce. If it is larger than 4K then you want FS cluster size to be at least as large your average IO size so on average it would not get split at the storage layer. A rule of thumb might be to simply use the same cluster size as block size used by the storage layer. Always consult your storage vendor for advice, modern storage arrays have very sophisticated tiering and load balancing logic and unless you understand everything about how your storage box works you might end up with unexpected results. Alternatively you can run variety of performance tests with different cluster sizes and see which one gives you better results. If you do not have time to do that then I recommend 64k block size.

Performance of HDD/SSD might change after updating disk or storage box firmware so it might save you time if you rerun performance tests after update.

As you are running the tests you can use performance counters described here http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx to get further insights into behavior of each layer by monitoring average queue depth, latency, throughput and IOPS at CSV, SMB and Physical Disk layers. For instance if your disk is bottleneck then latency, and queue depth at all of these layers will be the same. Once you see queue depth and latency at the higher level is above what you see on the disk that means this layer might be the bottleneck.

Run performance tests only on the hardware that is currently not used by any other workloads/tests otherwise your results may not be valid because of too much variability. You also might want to rerun each variation several times to make sure there is no variability.

Baseline 1 – No CSV; Measure Performance of NTFS

In this case IO has to traverse the NTFS file system and disk stack in the OS, so conceptually we can represent it this way:

For most disks, expectations are that sequential read >= sequential write >= random read >= random write. For an SSD you may observe no difference between random and sequential while for HDD the difference may be significant. Differences between read and write will vary from disk to disk.

As you are running this test keep an eye out if you are saturating CPU. This might happen when your disk is very fast. For instance if you are using Simple Space backed by 40 SSDs.

Run baseline tests multiple times. If you see variance at this level then most likely it is coming from the disk and it will be affecting other tests as well. Below you can see the number I’ve collected on my hardware, the results match expectations.

				Queue Depth
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19906
		sequential read	MB/Sec	155
		sequential write	IOPS	17311
		sequential write	MB/Sec	135
		random read	IOPS	359
		random read	MB/Sec	2
		random write	IOPS	273
		random write	MB/Sec	2

Baseline 2 – No CSV; Measure SMB Performance between Cluster Nodes

To run this test online clustered disk on one cluster node.
Assign it a drive letter – for example K:. Run test from another node over SMB using an admin share. For instance your path might look like this \\Node1\K$. In this case IO have to go over following layers

You need to be aware of SMB multichannel and make sure that you are using only the NICs that you expect cluster to use for intra-node traffic. You can read more about SMB multichannel in clustered environment in
this blog post http://blogs.msdn.com/b/emberger/archive/2014/09/15/force-network-traffic-through-a-specific-nic-with-smb-multichannel.aspx

If you have RDMA network or when your disk is slower than what SMB can pump through all channels, and you have sufficiently large queue depth then you might see Baseline 2 close or even equal to Baseline 1. That means your bottleneck is disk, and not network.

Run the baseline test several times. If you see variance at this level then most likely it is coming from the disk or network and it will be affecting other tests as well. Assuming you’ve already sorted out variance that is coming from the disk while you were collecting Baseline 1, now you should focus on variance that is causing by network.

Here are the numbers I’ve collected on my hardware. To make it easier for you to compare I am repeating Baseline 1 numbers here.

				Queue Depth	Baseline 1
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19821	19906
		sequential read	MB/Sec	154	155
		sequential write	IOPS	810	17311
		sequential write	MB/Sec	6	135
		random read	IOPS	353	359
		random read	MB/Sec	2	2
		random write	IOPS	272	273
		random write	MB/Sec	2	2

In my case I have verified that IO is going over RDMA and network indeed almost does not add latency, but there is a difference in IOPS between sequential write with Baseline 1 which seems odd. First I’ve looked at performance counters:

Physical disk performance counters for Baseline 1

Physical disk and SMB Server Share performance counters for Baseline 2

SMB Client Share and SMB Direct Connection performance counters for Baseline 2

Observe that in both cases PhysicalDisk\Avg.Disk Queue Length is the same. That tells us SMB does not queue IO, and disk has all the pending IOs all the time. Second observe that PhysicalDisk\Avg.Disk sec/Transfer in Baseline 1 is 0 while in Baseline 2 is 10 milliseconds. Huh!
This tells me that the disk got slower because requests came over SMB!?

Next step was to record a trace using Windows Performance Toolkit (http://msdn.microsoft.com/en-us/library/windows/hardware/hh162962.aspx ) with Disk IO for both Baseline 1 and Baseline 2. Looking at the traces I’ve noticed the Disk Service time for some reason got longer for Baseline 2! Then I also noticed that when requests were coming from SMB they hit disk from 2 threads while using my test utility all requests were issued from single thread. Remember that we are investigating sequential write. Even though when running over SMB test is issuing all writes from one thread in sequential order, SMB on the server was dispatching these writes to the disk using 2
threads and sometimes writes would get reordered. Consequently IOPS I am getting for sequential write are close to random write. To verify that I reran test for Baseline 1 with 2 threads, and bingo! I’ve got matching numbers.

Here is what you would see in WPA for IO over SMB.

Average disk service time is about 8.1 milliseconds, and IO time is about 9.6 milliseconds. The green and violate colors match to IO issued by different threads. If you look close, expand table, remove thread Id from grouping and sort by Init Time you can see how IO are interleaving and Min Offset is not strictly sequential:

While without SMB all IOs came on one thread, disk service time is about 600 microseconds, and IO time is about 4 milliseconds

If you expand and sort by Init Time you will see Min Offset is strictly increasing

In production in most of the cases you will have workload that is close to random IO, and sequential IO is only giving you a theoretical best case scenario.

Next interesting question is why we do not see similar degradation for sequential read. The theory is that in case of read disk might be reading the entire track and keeping it in the cache so even when reads are rearranged the track is already in the cache and reads on average stay not affected. Since I disabled disk cache for writes, they always have to hit spindle and more often would pay seek cost.

Baseline 3 – No CSV; Measure SMB Performance between Compute Nodes and Cluster Nodes

If you are planning to run workload and storage on the same set of nodes then you can skip this step. If you are planning to disaggregate workload and storage and access storage using a Scale Out File Server (SoFS) then you should run the same test as Baseline 2, just in this case select a compute node as a client, and make sure that over network you are using the NICs that will be used to handle compute to storage traffic once you create the cluster.

Remember that for reliability reasons files over SOFS are always opened with write-through so we would suggest to always add write-through to your tests. As an option you can create a classing singleton (non SOFS) file server over a clustered disk, create a Continuously Available share on that file server and run your test there. It will make sure traffic will go only over networks marked in the cluster as public, and because this is a CA share all opens will be write-through.

Layers diagram and performance considerations in this case is exactly the same as in case of Baseline 2.

CSVFS Case 1 – CSV Direct IO

Now add disk to CSVFS.

You can run same test on coordinating node and non-coordinating node and you should see the same results. Numbers should match to the Baseline 1. The length of the code path is the same, just instead of NTFS you will have CSVFS. Following diagram represents the layers IO will be going through

Here are the number I’ve collected on my hardware, to make it easier for you to compare I am repeating Baseline 1 numbers here.

On coordinating node:

				Queue Depth	Baseline 1
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19808	19906
		sequential read	MB/Sec	154	155
		sequential write	IOPS	17590	17311
		sequential write	MB/Sec	137	135
		random read	IOPS	356	359
		random read	MB/Sec	2	2
		random write	IOPS	273	273
		random write	MB/Sec	2	2

On non-coordinating node

				Queue Depth	Baseline 1
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19793	19906
		sequential read	MB/Sec	154	155
		sequential write	IOPS	177880	17311
		sequential write	MB/Sec	138	135
		random read	IOPS	359	359
		random read	MB/Sec	2	2
		random write	IOPS	273	273
		random write	MB/Sec	2	2

CSVFS Case 2 – CSV File System Redirected IO on Coordinating Node

In this case we are not traversing network, but we do traverse 2 file systems. If you are disk bound you should see numbers matching Baseline 1. If you have very fast storage and you are CPU bound then you will saturate CPU a bit faster and will be about 5-10% below Baseline 1.

Here are the numbers I’ve got on my hardware. To make it easier for you to compare I am repeating Baseline 1 and Baseline 2 numbers here.

				Queue Depth	Baseline 1	Baseline 2
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19807	19906	19821
		sequential read	MB/Sec	154	155	154
		sequential write	IOPS	5670	17311	810
		sequential write	MB/Sec	44	135	6
		random read	IOPS	354	359	353
		random read	MB/Sec	2	2	2
		random write	IOPS	271	273	272
		random write	MB/Sec	2	2	2

Looks like some IO reordering is happening in this case too so you can see sequential write numbers are somewhere between Baseline 1 and Baseline 2. All other number perfectly lines up with expectations.

CSVFS Case 3 – CSV File System Redirected IO on Non-Coordinating Node

You can put CSV in file system redirected mode using cluster UI

Or using PowerShell cmdlet Suspend-ClusterResource with parameter –RedirectedAccess.

This is the longest IO path where we are not only traversing 2 file systems, but also going over SMB and network. If you are network bound then you should see your numbers are close to Baseline 2. If your network is very fast and your bottleneck is storage then numbers will be close to Baseline 1. If storage is also very fast and you are CPU bound then numbers should be 10-15% below Baseline 1.

Here are the numbers I’ve got on my hardware. To make it easier for you to compare I am repeating Baseline 1 and Baseline 2 numbers here.

				Queue Depth	Baseline 1	Baseline 2
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19793	19906	19821
		sequential read	MB/Sec	154	155	154
		sequential write	IOPS	835	17311	810
		sequential write	MB/Sec	6	135	6
		random read	IOPS	352	359	353
		random read	MB/Sec	2	2	2
		random write	IOPS	273	273	272
		random write	MB/Sec	2	2	2

In my case numbers are matching Baseline 2, and in all cases, except sequential write are close to Baseline 1.

CSVFS Case 4 – CSV Block Redirected IO on Non-Coordinating Node

If you have SAN then you can play with LUN masking to hide this LUN from the node where you will run this test. If you are using Storage Spaces then Mirrored Space is always attached only on the Coordinator node and any non-coordinator node will be in block redirected mode as long as you do not have tiering heatmap enabled on this volume. See this blog post for more details http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx on how Storage Spaces tiering affects CSV IO mode.

Please note that CSV never uses Block Redirected IO on Coordinator node. Since on the coordinator node disk is always attached CSV will always use Direct IO. So remember to run this test on non-coordinating node. If you are network bound then you should see your numbers are close to Baseline 2. If your network is very fast and your bottleneck is storage then numbers will be close to Baseline 1. If storage is also very fast and you are CPU bound then numbers should be about 10-15% below Baseline 1.

Here are the numbers I’ve got on my hardware. To make it easier for you to compare I am repeating Baseline 1 and Baseline 2 numbers here.

				Queue Depth	Baseline 1	Baseline 2
				8
Unbuffered Write-Trough	8K	sequential read	IOPS	19773	19906	19821
		sequential read	MB/Sec	154	155	154
		sequential write	IOPS	820	17311	810
		sequential write	MB/Sec	6	135	6
		random read	IOPS	352	359	353
		random read	MB/Sec	2	2	2
		random write	IOPS	274	273	272
		random write	MB/Sec	2	2	2

In my case numbers match to the Baseline 2 and are very close to Baseline 1.

Scale-out File Server (SoFS)

To test Scale-out File Server you need to create the SOFS resource using Failover Cluster Manager or PowerShell, and add a share that maps to the same CSV volume that you have been using for the tests so far. Now your baselines will be CSVFS cases. In case of SOFS SMB will deliver IO to CSVFS on coordinating or non-coordinating node (depending where the client is connected; you use PowerShell Get-SMBWitnessClient to learn client connectivity), and then it will be up to CSVFS to deliver IO to the disk. The path that CSVFS will take is predictable, but depends on nature of your storage and current connectivity. You will need to select baseline between CSV Case 1 – 4.

If you see numbers are similar to CSV baseline then you know that SMB above CSV is not adding overhead and you can look at numbers collected for the CSV baseline to detect where the bottleneck is. If you see numbers are lower comparing to CSV baseline then your client network is the bottleneck, and you should validate that it matches difference between Baseline 3 and Baseline 1.

Summary

In this blog post we looked at how to tell if CSVFS performance for reads and writes is at expected levels. You can achieve that by running performance tests before and after adding disk to CSV. You will use ‘before’ numbers as your baseline. Then add disk to CSV and test different IO dispatch modes. Compare observed numbers to the baselines to learn what layer is your bottleneck.

Thanks!
Vladimir Petter
Principal Software Engineer
High-Availability & Storage
Microsoft

To learn more, here are others in the Cluster Shared Volume (CSV) blog series:

Cluster Shared Volume (CSV) Inside Out
http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx

Cluster Shared Volume Diagnostics
http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx

Cluster Shared Volume Performance Counters
http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx

Cluster Shared Volume Failure Handling
http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx

Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120
http://blogs.msdn.com/b/clustering/archive/2014/12/08/10579131.aspx

Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142
http://blogs.msdn.com/b/clustering/archive/2015/03/26/10603160.aspx

↧

Workgroup and Multi-domain clusters in Windows Server 2016

August 17, 2015, 7:55 am

≫ Next: Site-aware Failover Clusters in Windows Server 2016

≪ Previous: Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks

In Windows Server 2012 R2 and previous versions, a cluster could only be created between member nodes joined to the same domain. Windows Server 2016 breaks down these barriers and introduces the ability to create a Failover Cluster without Active Directory dependencies. Failover Clusters can now therefore be created in the following configurations:

Single-domain Clusters: Clusters with all nodes joined to the same domain
Multi-domain Clusters: Clusters with nodes which are members of different domains
Workgroup Clusters: Clusters with nodes which are member servers / workgroup (not domain joined)

Pre-requisites

The prerequisites for Single-domain clusters are unchanged from previous versions of Windows Server.

All servers must be running Windows Server 2016.
All servers must have the Failover Clustering feature installed.
All servers must use logo’d hardware that has been certified and the collection of servers must pass all cluster validation tests. For more information, see Failover Clustering Hardware Requirements and Storage Options and Validate Hardware for a Failover Cluster.

In addition to the pre-requisites of Single-domain clusters, the following are the pre-requisites for Multi-domain or Workgroup clusters in the Windows Server 2016 Technical Preview 3 (TP3) release:

Management operations may only be performed using Microsoft PowerShell^©. The Failover Cluster Manager snap-in tool is not supported in these configurations.
To create a new cluster (using the New-Cluster cmdlet) or to add nodes to the cluster (using the Add-ClusterNode cmdlet), a local account needs to be provisioned on all nodes of the cluster (as well as the node from which the operation is invoked) with the following requirements:

Create a local ‘User’ account on each node in the cluster
The username and password of the account must be the same on all nodes
The account is a member of the local ‘Administrators’ group on each node
When using a non-builtin local administrator account to create the cluster, set the LocalAccountTokenFilterPolicy registry policy to 1, on all the nodes of the cluster. Builtin administrator accounts include the ‘Administrator’ account. You can set the LocalAccountTokenFilterPolicy registry policy as follows:

On each node of the cluster launch a Microsoft PowerShell shell as an administrator and type:

new-itemproperty -path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System -Name LocalAccountTokenFilterPolicy -Value 1

Without setting this policy you will see the following error while trying to create a cluster using non-builtin administrator accounts.

The Failover Cluster needs to be created as an Active Directory-Detached Cluster without any associated computer objects. Therefore, the cluster needs to have a Cluster Network Name (also known as administrative access point) of type DNS.
Each cluster node needs to have a primary DNS suffix.

Deployment

Workgroup and Multi-domain clusters maybe deployed using the following steps:

Create consistent local user accounts on all nodes of the cluster. Ensure that the username and password of these accounts are same on all the nodes and add the account to the local Administrators group.

2. Ensure that each node to be joined to the cluster has a primary DNS suffix.

3. Create a Cluster with the Workgroup nodes or nodes joined to different domains. When creating the cluster, use the AdministrativeAccessPoint switch to specify a type of DNS so that the cluster does not attempt to create computer objects.

New-Cluster –Name <Cluster Name> -Node <Nodes to Cluster> -AdministrativeAccessPoint DNS

Workload

The following table summarizes the workload support for Workgroup and Multi-site clusters.

Cluster Workload	Supported/Not Supported	More Information
SQL Server	Supported	We recommend that you use SQL Server Authentication.
File Server	Supported, but not recommended	Kerboros (which is not available) authentication is the preferred authentication protocol for Server Message Block (SMB) traffic.
Hyper-V	Supported, but not recommended	Live migration is not supported. Quick migration is supported.
Message Queuing (MSMQ)	Not supported	Message Queuing stores properties in AD DS.

Quorum Configuration

The witness type recommended for Workgroup clusters and Multi-domain clusters is a Cloud Witness or Disk Witness. File Share Witness (FSW) is not supported with a Workgroup or Multi-domain cluster.

Cluster Validation

Cluster Validation for Workgroup and Multi-domain clusters can be run using the Test-Cluster PowerShell cmdlet. Note the following for the Windows Server 2016 TP3 release:

The following tests will incorrectly generate an Error and can safely be ignored:
- Cluster Configuration – Validate Resource Status
- System Configuration – Validate Active Directory Configuration

Cluster Diagnostics

The Get-ClusterDiagnostics cmdlet is not supported on Workgroup and Multi-domain clusters in the Windows Server 2016 TP3 release.

Servicing

It is recommended that nodes in a cluster have a consistent configuration. Multi-domain and Workgroup clusters introduce higher risk of configuration drift, when deploying ensure that:

The same set of Windows patches are applied to all nodes in the clusters
If group policies are rolled out to the cluster nodes, they are not conflicting.

DNS Replication

It should be ensured that the cluster node and network names for Workgroup and Multi-domain clusters are replicated to the DNS servers authoritative for the cluster nodes.

↧

Site-aware Failover Clusters in Windows Server 2016

August 19, 2015, 5:09 am

≫ Next: Hyper-converged with Windows Server 2016

≪ Previous: Workgroup and Multi-domain clusters in Windows Server 2016

Configuring Sites

A node’s site membership can be configured by setting the Site node property to a unique numerical value.

For example, in a four node cluster with nodes – Node1, Node2, Node3 and Node4, to assign the nodes to Sites 1 and Site 2, do the following:

Launch Microsoft PowerShell^© as an Administrator and type:

(Get-ClusterNode Node1).Site=1

(Get-ClusterNode Node2).Site=1

(Get-ClusterNode Node3).Site=2

(Get-ClusterNode Node4).Site=2

Configuring sites enhances the operation of your cluster in the following ways:

Failover Affinity

Groups failover to a node within the same site, before failing to a node in a different site
During Node Drain VMs are moved first to a node within the same site before being moved cross site
The CSV load balancer will distribute within the same site

Storage Affinity

Cross-Site Heartbeating

You now have the ability to configure the thresholds for heartbeating between sites. These thresholds are controlled by the following new cluster properties:

Property	Default Value	Description
CrossSiteDelay	1	Frequency heartbeat sent to nodes on dissimilar sites
CrossSiteThreshold	20	Missed heartbeats before interface considered down to nodes on dissimilar sites

To configure the above properties launch PowerShell^© as an Administrator and type:

(Get-Cluster).CrossSiteDelay = <value>

(Get-Cluster).CrossSiteThreshold = <value>

You can find more information on other properties controlling failover clustering heartbeating here.

The following rules define the applicability of the thresholds controlling heartbeating between two cluster nodes:

If the two cluster nodes are in two different sites and two different subnets, then the Cross-Site thresholds will override the Cross-Subnet thresholds.
If the two cluster nodes are in two different sites and the same subnets, then the Cross-Site thresholds will override the Same-Subnet thresholds.
If the two cluster nodes are in the same site and two different subnets, then the Cross-Subnet thresholds will be effective.
If the two cluster nodes are in the same site and the same subnets, then the Same-Subnet thresholds will be effective.

Configuring Preferred Site

(Get-Cluster).PreferredSite = <Site assigned to a set of cluster nodes>

Configuring a Preferred Site for your cluster enhances operation in the following ways:

Cold Start

During a cold start VMs are placed in in the preferred site

Quorum

Dynamic Quorum drops weights from the Disaster Recovery site (DR site i.e. the site which is not designated as the Preferred Site) first to ensure that the Preferred Site survives if all things are equal. In addition, nodes are pruned from the DR site first, during regroup after events such as asymmetric network connectivity failures.
During a Quorum Split i.e. the even split of two datacenters with no witness, the Preferred Site is automatically elected to win
- The nodes in the DR site drop out of cluster membership
- This allows the cluster to survive a simultaneous 50% loss of votes
- Note that the LowerQuorumPriorityNodeID property previously controlling this behavior is deprecated in Windows Server 2016

Preferred Site and Multi-master Datacenters

To configure the Preferred Site for a cluster group, launch PowerShell^© as an Administrator and type:

(Get-ClusterGroupGroupName).PreferredSite = <Site assigned to a set of cluster nodes>

Placement Priority

Groups in a cluster are placed based on the following site priority:

Storage affinity site
Group preferred site
Cluster preferred site

↧

Hyper-converged with Windows Server 2016

September 8, 2015, 7:46 am

≫ Next: Configuring site awareness for your multi-active disaggregated datacenters

≪ Previous: Site-aware Failover Clusters in Windows Server 2016

This is available in the Windows Server 2016 Technical Preview today! I encourage you to go try it out and give us some feedback. Here’s where you can learn more:

Presentation from Ignite 2015:

Storage Spaces Direct in Windows Server 2016 Technical Preview
https://channel9.msdn.com/events/Ignite/2015/BRK3474

Deployment guide:

Enabling Private Cloud Storage Using Servers with Local Disks

https://technet.microsoft.com/en-us/library/mt126109.aspx

Claus Joergensen’s blog:

Storage Spaces Direct
http://blogs.technet.com/b/clausjor/archive/2015/05/14/storage-spaces-direct.aspx

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

↧

Configuring site awareness for your multi-active disaggregated datacenters

September 10, 2015, 7:51 am

≫ Next: How can we improve the installation and patching of Windows Server? (Survey Request)

≪ Previous: Hyper-converged with Windows Server 2016

Consider the following multi-active datacenters, with a compute and a storage cluster, stretched across two datacenters. Each cluster has two nodes on each datacenter.

To configure site-awareness for the stretched compute and storage clusters proceed as follows:

Compute Stretch Cluster

1) Assign the nodes in the cluster to one of the two datacenters (sites).

Open PowerShell^©as an Administrator and type:

(Get-ClusterNode Node1).Site = 1

(Get-ClusterNode Node2).Site = 1

(Get-ClusterNode Node3).Site = 2

(Get-ClusterNode Node4).Site = 2

2) Configure the site for your primary datacenter.

(Get-Cluster).PreferredSite = 1

Storage Stretch Cluster

1) As in the case of the compute cluster assign the nodes in the storage cluster to one of the two datacenters (sites).

(Get-ClusterNode Node5).Site = 1

(Get-ClusterNode Node6).Site = 1

(Get-ClusterNode Node7).Site = 2

(Get-ClusterNode Node8).Site = 2

2) For each Cluster Shared Volume (CSV) in the cluster, configure the preferred site for the CSV group to be the same as the preferred site for the Compute Cluster.

$csv1 = Get-ClusterSharedVolume “Cluster Disk 1″ | Get-ClusterGroup

($csv1).PreferredSite = 1

3) Set each CSV group in the cluster to automatically failback to the preferred site when it is available after a datacenter outage.

($csv1).AutoFailbackType = 1

↧

How can we improve the installation and patching of Windows Server? (Survey Request)

September 11, 2015, 9:42 am

≫ Next: Managing Failover Clusters with 5nine Manager

≪ Previous: Configuring site awareness for your multi-active disaggregated datacenters

Do you want your server OS deployment and servicing to move faster? We’re a team of Microsoft engineers who want your experiences and ideas around solving real problems of deploying and servicing your server OS infrastructure. We prefer that you don’t love server OS deployment already, and we’re interested even if you don’t use Windows Server. We need to learn it and earn it.

Click the link below if you wish to fill out a brief survey and perhaps participate in a short phone call.

https://aka.ms/deployland

Many Thanks!!!

-Rob.

↧

Managing Failover Clusters with 5nine Manager

January 21, 2016, 2:49 am

≫ Next: Troubleshooting Hangs Using Live Dump

≪ Previous: How can we improve the installation and patching of Windows Server? (Survey Request)

Hi Cluster Fans,

It is nice to be back on the Cluster Team Blog! After founding this blog and working closely with the cluster team for almost eight years, I left Microsoft last year to join a Hyper-V software partner, 5nine Software. I’ve spoken with thousands of customers and I realized that Failover Clustering is so essential to Hyper-V, that a majority of all VMs are using it, and it is businesses of all sizes that are doing this, not just enterprises. Most organizations need continual availability for their services to run 24/7, and their customers expect it. Failover Clustering is now commonplace even amongst small and medium-sized businesses. I was able to bring my passion for cluster management to 5nine’s engineering team, and into 5nine’s most popular SMB product, 5nine Manager. This blog provides an overview of how 5nine Manager can help you centralize management of your clustered resources.

Create a Cluster

5nine Manager lets you discover hosts and create a Failover Cluster. It will allow you to specify nodes, run Cluster Validation, provide a client access point, and then create the cluster.

Validate a Cluster

Failover Cluster validation is an essential task in all deployments as it is required for a support cluster. With 5nine Manager you can test the health of your cluster during configuration, or afterwards as a troubleshooting tool. You can granularly select the different tests to run, and see the same graphical report as you are familiar with.

Host Best Practice Analyzer

In addition to testing the clustering configuration, you can also run a series of advanced Hyper-V tests on each of the hosts and Scale-Out File Servers through 5nine Manager. The results will provide recommendations to enhance your node’s stability and performance.

Configure Live Migration Settings

It is important to have a dedicated network to Live Migration to ensure that its traffic does not interfere with cluster heartbeats or other important traffic. With 5nine Manager you can specify the number of simultaneous Live Migrations and Storage Live Migrations, and even copy those settings to the other cluster nodes.

View Cluster Summary

5nine Manager has a Summary Dashboard which centrally reports the health of the cluster and its VMs. It quickly identifies nodes or VMs with problems, and lists any alerts from its resources. This Summary Dashboard can also be refocused at the Datacenter, Cluster, Host, and VM level for more refined results.

Manage Cluster Nodes

Using 5nine Manager you can configure your virtual disk and network settings. You can also perform standard maintenance tasks, such as to Pause and Resume a cluster node, which can live migrate VMs to other nodes. A list of active and failed cluster tasks is also displayed through the interface.

Manage Clustered VMs

You can manage any type of virtual machine that is supported by Hyper-V, including Windows Server, Windows, Linux, UNIX, and Windows Server 2016 Nano Server. 5nine Manager lets you centrally manage all your virtual machines, including the latest performance and security feature for virtualization. The full GUI console even runs on all versions of Windows Server, including the otherwise GUI-less Windows Server Core and Hyper-V Server.

Cluster Status Report

It is now easy to create a report about the configuration and health of your cluster, showing you information about the configuration and settings for every resource. This document can be exported and retained for compliance.

Host Load Balancing

5nine Manager allows you to pair cluster nodes and hosts to form a group that will load balance VMs. It live migrates the VMs between hosts when customizable threshold are exceeded. This type of dynamic optimization ensures that a single host does not get overloaded, providing higher-availability and greater performance for the VMs.

Cluster Logs

Sometime it can be difficult to see all the events from across your cluster. 5nine Manager pulls together all the logs for your clusters, hosts and VMs to simplify troubleshooting.

Cluster Monitoring

5nine Manager provides a Monitor Dashboard to provide current and historical data about the usage of your clusters, hosts and VMs. It will show you which VMs are consuming the most resources, the latest alarms, and a graphical view of CPU, memory, disk and network usage. You can also browse through previous performance data to help isolate a past issue.

Hyper-V Replica with Clustering

Hyper-V Replica allows a virtual machine’s virtual hard disk to be copied to a secondary location for disaster recovery. Using 5nine Manager you can configure the Replication Settings on a host, then apply them to other cluster nodes and hosts.

You can also configure replication on a virtual machine that is running a cluster node with the Hyper-V Replica Broker configured. The health state of replica is also displayed in the centralized console.

Failover Clustering should be an integral part of your virtualized infrastructure, and 5nine Manager provides a way to centrally manage all your clustered VMs. Failover cluster support will continue to be enhanced in future releases of 5nine Manager.

Thanks!
Symon Perriman
VP, 5nine Software
Hyper-V MVP
@SymonPerriman

↧

Troubleshooting Hangs Using Live Dump

March 1, 2016, 11:28 pm

≫ Next: Failover Cluster Node Fairness in Windows Server 2016

≪ Previous: Managing Failover Clusters with 5nine Manager

In this blog post https://blogs.msdn.microsoft.com/clustering/2014/12/08/troubleshooting-cluster-shared-volume-auto-pauses-event-5120/ we discussed what a Cluster Shared Volumes (CSV) event ID 5120 means, and how to troubleshoot it. In particular, we discussed the reason for auto-pause due to STATUS_IO_TIMEOUT (c00000b5), and some options on how to troubleshoot it. In this post we will discuss how to troubleshoot it using LiveDumps, which enables debugging the system with no downtime for your system.

First let’s discuss what is the LiveDump. Some of you are probably familiar with kernel crash dumps https://support.microsoft.com/en-us/kb/927069. You might have at least two challenges with kernel dump.

Bugcheck halts the system resulting in downtime
Entire contents of memory are dumped to a file. On a system with a lot of memory, you might not have enough space on your system drive for OS to save the dump

The good news is that LiveDump solves both of these issues. Live Dump was a new feature added in Windows Server 2012 R2. For the purpose of this discussion you can think of LiveDump as an OS feature that allows you to create a consistent snapshot of kernel memory and save it to a dump file for the future analysis. Taking this snapshot will NOT cause bugcheck so no downtime. LiveDump does not include all kernel memory, it excludes information which is not valuable in debugging. It will not include pages from stand by list and file caches. The kind of livedump that cluster collects for you also would not have pages consumed by Hypervisor. In Windows Server 2016 Cluster also makes sure to exclude from the livedump CSV Cache. As a result LiveDump has much smaller dump file size compared to what you would get when you bugcheck the server, and would not require as much space on your system drive. In Windows Server 2016 there is a new bugcheck option called an “Active Dump”, which similarly excludes unnecessary information to create a smaller dump file during bugchecks.

You can create LiveDump manually using LiveKD from Windows Sysinternals (https://technet.microsoft.com/en-us/sysinternals/bb897415.aspx ). To generate LiveDump run command “livekd –ml –o <path to a dump file>” from an elevated command prompt. Path to the dump file does not have to be on the system drive, you can save it to any location. Here is an example of creating live dump on a Windows 10 Desktop with 12 GB RAM, which resulted in a dump file of only 3.7 GB.

D:\>livekd -ml -o d1.dmp
LiveKd v5.40 - Execute kd/windbg on a live system
Sysinternals - www.sysinternals.com

Copyright (C) 2000-2015 Mark Russinovich and Ken Johnson

Saving live dump to D:\d1.dmp... done.

D:\>dir *.dmp

Directory of D:\

02/25/2016 12:05 PM     2,773,164,032 d1.dmp
1 File(s) 2,773,164,032 bytes
0 Dir(s) 3,706,838,417,408 bytes free

If you are wondering how much disk space you would need to livedump you can generate one using LiveKD, and check its size.

You might wonder what so great about LiveDump for troubleshooting. Logs and traces work well when something fails because hopefully in a log there will be a record where someone admits that he is failing operations and blames someone who causes that. LiveDump is great when we need to troubleshoot a problem where something is taking long time, and nothing is technically failing. If we start a watchdog when operation started, and if watchdog expires before operation completes then we can try to take a dump of the system hoping that we can walk a wait chain for that operation and see who owns it and why it is not completing. Looking at the livedump is just like looking at kernel dumps. It requires some skills, and understanding of Windows Internals. It has a steep learning curve for customers, but it is a great tool for Microsoft support and product teams who already have that expertise. If you reach out to Microsoft support with an issue where something is stuck in kernel, and a live dump taken while it was stuck then chances of prompt root causing of the issue are much higher.

Windows Server Failover Clustering has many watchdogs which control how long it should wait for cluster resources to execute calls like resource online or offline. Or how long we should wait for CSVFS to complete a state transition. From our experience we know that in most cases some of these scenarios will be stuck in the kernel so we automatically ask Windows Error Reporting to generate LiveDump. It is important to notice that LiveKd uses different API that produces LiveDump without checking any other conditions. Cluster uses Windows Error Reporting. Windows Error Reporting will throttle LiveDump creation. We are using WER because it manages disk space consumption for us and it also will send telemetry information about the incident to Microsoft where we can see what issues are affecting customers. This helps us to priorities and strategize fixes. Starting from Windows Server 2016 you can control WER telemetry through common telemetry settings, and before that there was a separate control panel applet to control what WER is allowed to share with Microsoft.

By default, Windows Error Reporting will allow only one LiveDump per report type per 7 days and only 1 LiveDump per machine per 5 days. You can change that by setting following registry keys

reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v SystemThrottleThreshold /t REG_DWORD /d 0 /f
reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v ComponentThrottleThreshold /t REG_DWORD /d 0 /f

Once LiveDump is created WER would launch a user mode process that creates a minidump from LiveDump, and immediately after that would delete the LiveDump. Minidump is only couple hundred kilobytes, but unfortunately it is not helpful because it would have call stack only of the thread that invoked LiveDUmp creation, and we need all other threads in the kernel to track down where we are stuck. You can tell WER to keep original Live dumps using these two registry keys.

reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v FullLiveReportsMax /t REG_DWORD /d 10 /f
reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl" /v AlwaysKeepMemoryDump /t REG_DWORD /d 1 /f

Set FullLiveReportsMax to the number of dumps you want to keep, the decision on how many to keep depends on how much free space you have and the size of LiveDump.
You need to reboot the machine for Windows Error Reporting registry keys to take an effect.
LiveDumps created by Windows Error Reporting are located in the %SystemDrive%\Windows\LiveKernelReports.

Windows Server 2016

In Windows Server 2016 Failover Cluster Live Dump Creation is on by default. You can turn it on/off by manipulating lowest bit of the cluster DumpPolicy public property. By default, this bit is set, which means cluster is allowed to generate LiveDump.

PS C:\Windows\system32> (get-cluster).DumpPolicy
1118489

If you set this bit to 0 then cluster will stop generating LiveDumps.

PS C:\Windows\system32> (get-cluster).DumpPolicy=1118488

You can set it back to 1 to enable it again

PS C:\Windows\system32> (get-cluster).DumpPolicy=1118489

Change take effect immediately on all cluster nodes. You do NOT need to reboot cluster nodes.

Here is the list of LiveDump report types generated by cluster. Dump files will have report type string as a prefix.

Report Type	Description
CsvIoT	A CSV volume AutoPaused due to STATUS_IO_TIMEOUT and cluster on the coordinating node created LiveDump
CsvStateIT	CSV state transition to Init state is taking too long.
CsvStatePT	CSV state transition to Paused state is taking too long
CsvStateDT	CSV state transition to Draining state is taking too long
CsvStateST	CSV state transition to SetDownLevel state is taking too long
CsvStateAT	CSV state transition to Active state is taking too long

You can learn more about CSV state transition in this blog post:

http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx

Following is the list of LiveDump report types that cluster generates when cluster resource call is taking too long

Report Type	Description
ClusResCO	Cluster resource Open call is taking too long
ClusResCC	Cluster resource Close call is taking too long
ClusResCU	Cluster resource Online call is taking too long
ClusResCD	Cluster resource Offline call is taking too long
ClusResCK	Cluster resource Terminate call is taking too long
ClusResCA	Cluster resource Arbitrate call is taking too long
ClusResCR	Cluster resource Control call is taking too long
ClusResCT	Cluster resource Type Control call is taking too long
ClusResCI	Cluster resource IsAlive call is taking too long
ClusResCL	Cluster resource LooksAlive call is taking too long
ClusResCF	Cluster resource Fail call is taking too long

You can learn more about cluster resource state machine in these two blog posts:

You can control what resource types will generate LiveDumps by changing value of the first bit of the resource type DumpPolicy public property. Here are the default values:

C:\> Get-ClusterResourceType | ft Name,DumpPolicy

Name                                DumpPolicy
----                                ----------
Cloud Witness                       5225058576
DFS Replicated Folder               5225058576
DHCP Service                        5225058576
Disjoint IPv4 Address               5225058576
Disjoint IPv6 Address               5225058576
Distributed File System             5225058576
Distributed Network Name            5225058576
Distributed Transaction Coordinator 5225058576
File Server                         5225058576
File Share Witness                  5225058576
Generic Application                 5225058576
Generic Script                      5225058576
Generic Service                     5225058576
Health Service                      5225058576
IP Address                          5225058576
IPv6 Address                        5225058576
IPv6 Tunnel Address                 5225058576
iSCSI Target Server                 5225058576
Microsoft iSNS                      5225058576
MSMQ                                5225058576
MSMQTriggers                        5225058576
Nat                                 5225058576
Network File System                 5225058577
Network Name                        5225058576
Physical Disk                       5225058577
Provider Address                    5225058576
Scale Out File Server               5225058577
Storage Pool                        5225058577
Storage QoS Policy Manager          5225058577
Storage Replica                     5225058577
Task Scheduler                      5225058576
Virtual Machine                     5225058576
Virtual Machine Cluster WMI         5225058576
Virtual Machine Configuration       5225058576
Virtual Machine Replication Broker  5225058576
Virtual Machine Replication Coor... 5225058576
WINS Service                        5225058576

By default, Physical Disk resources would produce LiveDump. You can disable that by setting lowest bit to 0. Here is an example how to do that for the physical disk resource

(Get-ClusterResourceType -Name "Physical Disk").DumpPolicy=5225058576

Later on you can enable it back

(Get-ClusterResourceType -Name "Physical Disk").DumpPolicy=5225058577

Changes take effect immediately on all new calls, no need to offline/online resource or restart the cluster.

The last group is the report types that cluster service would generate when it observes that some operations are taking too long.

Report Type	Description
ClusWatchDog	Cluster service watchdog

Windows Server 2012 R2

We had such a positive experience troubleshooting issues using LiveDump on Windows Server 2016 that we’ve backported a subset of that back to Windows Server R2. You need to make sure that you have all the recommended patches outlined here. On Windows Server 2012 R2 LiveDump will not be generated by default, it can be enabled using following PowerShell command:

Get-Cluster | Set-ClusterParameter -create LiveDumpEnabled -value 1

LiveDump can be disabled using the following command:

Get-Cluster | Set-ClusterParameter -create LiveDumpEnabled -value 0

Only CSV report types were backported, as a result you will not see LiveDumps from cluster resource calls or cluster service watchdog. Windows Error Reporting throttling will also need to be adjusted as discussed above.

CSV AutoPause due to STATUS_IO_TIMEOUT (c00000b5)

Let’s see how LiveDump help troubleshooting this issue. In the blog post https://blogs.msdn.microsoft.com/clustering/2014/12/08/troubleshooting-cluster-shared-volume-auto-pauses-event-5120/ we’ve discussed that it is usually caused by an IO on the coordinating node taking long time. As a result of that CSVFS on a non-coordinating node would get an error STATUS_IO_TIMEOUT. CSVFS will notify cluster service about that event. Cluster service will create LiveDump with report type CsvIoT on the coordinating node where IO is taking time. If we are lucky, and the IO has not completed before the LiveDump has been generated then we can load the dump using WinDbg to try to find the IO that is taking a long time and see who owns that IO.

Thanks!
Vladimir Petter
Principal Software Engineer
High-Availability & Storage
Microsoft

Additional Resources:

To learn more, here are others in the Cluster Shared Volume (CSV) blog series:

Cluster Shared Volume (CSV) Inside Out
http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx

Cluster Shared Volume Diagnostics
http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx

Cluster Shared Volume Performance Counters
http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx

Cluster Shared Volume Failure Handling
http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx

Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120
http://blogs.msdn.com/b/clustering/archive/2014/12/08/10579131.aspx

Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142
http://blogs.msdn.com/b/clustering/archive/2015/03/26/10603160.aspx

Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks
https://blogs.msdn.microsoft.com/clustering/2015/07/29/cluster-shared-volume-a-systematic-approach-to-finding-bottlenecks/

↧

Failover Cluster Node Fairness in Windows Server 2016

April 29, 2016, 1:12 pm

≫ Next: Speeding Up Failover Tips-n-Tricks

≪ Previous: Troubleshooting Hangs Using Live Dump

Windows Server 2016 introduces the Node Fairness feature to optimize the utilization of nodes in a Failover Cluster. During the lifecycle of your private cloud, certain operations (such as rebooting a node for patching), results in the Virtual Machines (VMs) in your cluster being moved. This could result in an unbalanced cluster where some nodes are hosting more VMs and others are underutilized (such as a freshly rebooted server). The Node Fairness feature seeks to identify over committed nodes and re-distribute VMs from those nodes. VMs are live migrated to idle nodes with no down time. Failure policies such as anti-affinity, fault domains and possible owners are honored. Thus, the Node Fairness feature seamlessly balances your private cloud.

Heuristics for Balancing

Node Fairness evaluates a node’s load based on the following heuristics:

Current Memory pressure: Memory is the most common resource constraint on a Hyper-V host
CPU utilization of the Node averaged over a 5 minute window: Mitigates a node in the cluster becoming overcommitted

Controlling Aggressiveness of Balancing

The aggressiveness of balancing based on the Memory and CPU heuristics can be configured using the by the cluster common property ‘AutoBalancerLevel’. To control the aggressiveness run the following in PowerShell:

(Get-Cluster).AutoBalancerLevel = <value>

AutoBalancerLevel	Aggressiveness	Behavior
1 (default)	Low	Move when host is more than 80% loaded
2	Medium	Move when host is more than 70% loaded
3	High	Move when host is more than 60% loaded

Controlling Node Fairness

Node Fairness is enabled by default and when load balancing occurs can be configured by the cluster common property ‘AutoBalancerMode’. To control when Node Fairness balances the cluster:

Using Failover Cluster Manager:

Right-click on your cluster name and select the “Properties” option

2. Select the “Balancer” pane

Using PowerShell:

Run the following: :

(Get-Cluster).AutoBalancerMode = <value>

AutoBalancerMode	Behavior
0	Disabled
1	Load balance on node join
2 (default)	Load balance on node join and every 30 minutes

Node Fairness vs. SCVMM Dynamic Optimization

The node fairness feature, provides in-box functionality, which is targeted towards deployments without System Center Virtual Machine Manager (SCVMM). SCVMM Dynamic Optimization is the recommended mechanism for balancing virtual machine load in your cluster for SCVMM deployments. SCVMM automatically disables the Node Fairness feature when Dynamic Optimization is enabled.

↧

Speeding Up Failover Tips-n-Tricks

April 29, 2016, 2:35 pm

≫ Next: NetFT Virtual Adapter Performance Filter

≪ Previous: Failover Cluster Node Fairness in Windows Server 2016

From time-to-time people ask me for suggestions on what tweaks they can do to make Windows server Failover Cluster failover faster. In this blog I’ll discuss a few tips-n-tricks.

Disable NetBIOS over TCP/IP - Unless you want to have >15 character names (Node / Cluster / Network Name) or have some legacy apps / clients, NetBIOS is doing nothing but slow you down. You want to disable NetBIOS in a couple different places:
1. Every Cluster IP Address resources – Here is the syntax (again, this needs to be set on all IP Address resources). Note: NetBIOS is disabled on all Cluster IP Addresses in Windows Server 2016 by default.
```
Get-ClusterResource “Cluster IP address” | Set-ClusterParameter EnableNetBIOS 0
```
2. Base Network Interfaces – In the Advanced TCP/IP Settings, go to the WINS tab, and select “Disable NetBIOS over TCP/IP. This needs to be done on every network interface.
Go Pure IPv6 – Going pure IPv6 will give faster failover as a result of optimizations in how Duplication Address Detection (DAD) works in the TCP/IP stack.
Avoid IPSec on Servers – Internet Protocol Security (IPsec) is a great security feature, especially for client scenarios. But it comes at a cost, and really shouldn’t be used on servers. Specifically enabling a single IPSec policy will reduce overall network performance by ~30% and significantly delay failover times.

A few things I’ve found you can do to speed up failover and reduce downtime.

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

↧

NetFT Virtual Adapter Performance Filter

May 27, 2016, 10:11 am

≫ Next: Using PowerShell script make any application highly available

≪ Previous: Speeding Up Failover Tips-n-Tricks

In this blog I will discuss what the NetFT Virtual Adapter Performance Filter is and the scenarios when you should or should not enable it.

The Microsoft Failover Cluster Virtual Adapter (NetFT) is a virtual adapter used by the Failover Clustering feature to build fault tolerant communication routes between nodes in a cluster for intra-cluster communication.

When the Cluster Service communicates to another node in the cluster, it sends data down over TCP to the NetFT virtual adapter. NetFT then sends the data over UDP down to the physical network card, which then sends it over the network to another node. See the below diagram:

When the data is received by the other node, it follows the same flow. Up the physical adapter, then to NetFT, and finally up to the Cluster Service. The NetFT Virtual Adapter Performance Filter is a filter in Windows Server 2012 and Windows Server 2012 R2 which inspects traffic inbound on the physical NIC and then reroutes cluster traffic addressed for NetFT directly to the NetFT driver. This bypasses the physical NIC UDP / IP stack and delivers increased cluster network performance.

When to Enable the NetFT Virtual Adapter Performance Filter

The NetFT Virtual Adapter Performance Filter is disabled by default. The filter is disabled because it can cause issues with Hyper-V clusters which have a Guest Cluster running in VMs running on top of them. Issues have been seen where the NetFT Virtual Adapter Performance Filter in the host incorrectly routes NetFT traffic bound for a guest VM to the host. This can result in communication issues with the guest cluster in the VM. More details can be found in this article:

https://support.microsoft.com/en-us/kb/2872325

If you are deploying any workload other than Hyper-V with guest clusters, enabling the NetFT Virtual Adapter Performance Filter will optimize and improve cluster performance.

Windows Server 2016

Due to changes in the networking stack in Windows Server 2016, the NetFT Virtual Adapter Performance Filter has been removed.

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

↧

Using PowerShell script make any application highly available

June 6, 2016, 3:36 pm

≫ Next: Failover Clustering @ Ignite 2016

≪ Previous: NetFT Virtual Adapter Performance Filter

Author:
Amitabh Tamhane
Senior Program Manager
Windows Server Microsoft

OS releases: Applicable to Windows Server 2008 R2 or later

Now you can use PowerShell scripts to make any application highly available with Failover Clusters!!!

The Generic Script is a built-in resource type included in Windows Server Failover Clusters. Its advantage is flexibility: you can make applications highly available by writing a simple script. For instance, you can make any PowerShell script highly available! Interested?

We created GenScript in ancient times and it supports only Visual Basic scripts – including Windows Server 2016. This means you can’t directly configure PowerShell as GenScript resource. However, in this blog post, I’ll walk you through a sample Visual Basic script – and associated PS scripts – to build a custom GenScript resource that works well with PowerShell.

Pre-requisites: This blog assumes you have the basic understanding of Windows Server Failover Cluster & built-in resource types.

Disclaimer: Microsoft does not intend to officially support any source code/sample scripts provided as part of this blog. This blog is written only for a quick walk-through on how to run PowerShell scripts using GenScript resource. To make your application highly available, you are expected to modify all the scripts (Visual Basic/PowerShell) as per the needs of your application.

Visual Basic Shell

It so happens that Visual Basic Shell supports calling PowerShell script, then passing parameters and reading output. Here’s a Visual Basic Shell sample that uses some custom Private Properties:

'<your application name> Resource Type

Function Open( )
    Resource.LogInformation "Enter Open()"

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.AddProperty("PSScriptsPath")
    End If

    If Resource.PropertyExists("Name") = False Then
        Resource.AddProperty("Name")
    End If

    If Resource.PropertyExists("Data1") = False Then
        Resource.AddProperty("Data1")
    End If

    If Resource.PropertyExists("Data2") = False Then
        Resource.AddProperty("Data2")
    End If

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.AddProperty("DataStorePath")
    End If

    '...Result...
    Open = 0

    Resource.LogInformation "Exit Open()"
End Function


Function Online( )
    Resource.LogInformation "Enter Online()"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        Online = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_Online.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling Online PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Resource.LogInformation "Online Success"
        Online = 0
    Else
        Resource.LogInformation "Online Error"
        Online = 1
    End If

    Resource.LogInformation "Exit Online()"
End Function

Function Offline( )
    Resource.LogInformation "Enter Offline()"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        Offline = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_Offline.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling Offline PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Resource.LogInformation "Offline Success"
        Offline = 0
    Else
        Resource.LogInformation "Offline Error"
        Offline = 1
    End If

    Resource.LogInformation "Exit Offline()"
End Function

Function LooksAlive( )
    '...Result...
    LooksAlive = 0
End Function

Function IsAlive( )
    Resource.LogInformation "Entering IsAlive"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        IsAlive = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_IsAlive.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling IsAlive PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Resource.LogInformation "IsAlive Success"
        IsAlive = 0
    Else
        Resource.LogInformation "IsAlive Error"
        IsAlive = 1
    End If

    Resource.LogInformation "Exit IsAlive()"
End Function

Function Terminate( )
    Resource.LogInformation "Enter Terminate()"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        Terminate = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_Terminate.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling Terminate PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Terminate = 0
    Else
        Terminate = 1
    End If

    Resource.LogInformation "Exit Terminate()"
End Function

Function Close( )
    '...Result...
    Close = 0
End Function

Entry Points

In the above sample VB script, the following entry points are defined:

Open – Ensures all necessary steps complete before starting your application
Online – Function to start your application
Offline – Function to stop your application
IsAlive – Function to validate your application startup and monitor health
Terminate – Function to forcefully cleanup application state (ex: Error during Online/Offline)
Close – Ensures all necessary cleanup completes after stopping your application

Each of the above entry points is defined as a function (ex: “Function Online( )”). Failover Cluster then calls these entry point functions as part of the GenScript resource type definition.

Private Properties

For resources of any type, Failover Cluster supports two types of properties:

Common Properties – Generic properties that can have unique value for each resource
Private Properties – Custom properties that are unique to that resource type. Each resource of that resource type has these private properties.

When writing a GenScript resource, you need to evaluate if you need private properties. In the above VB sample script, I have defined five sample private properties (only as an example):

PSScriptsPath – Path to the folder containing PS scripts
Name
Data1 – some custom data field
Data2 – another custom data field
DataStorePath – path to a common backend store (if any)

The above private properties are shown as example only & you are expected to modify the above VB script to customize it for your application.

PowerShell Scripts

The Visual Basic script simply connects the Failover Clusters’ RHS (Resource Hosting Service) to call PowerShell scripts. You may notice the “PScmd” parameter containing the actual PS command that will be called to perform the action (Online, Offline etc.) by calling into corresponding PS scripts.

For this sample, here are four PowerShell scripts:

Online.ps1 – To start your application
Offline.ps1 – To stop your application
Terminate.ps1 – To forcefully cleanup your application
IsAlive.ps1 – To monitor health of your application

Example of PS scripts:

Entry Point: Online

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_Online.log"

@"
    Starting Online...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your online script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

Entry Point: Offline

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_Offline.log"

@"
    Starting Offline...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your offline script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

Entry Point: Terminate

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_Terminate.log"

@"
    Starting Terminate...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your terminate script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

Entry Point: IsAlive

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_IsAlive.log"

@"
    Starting IsAlive...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your isalive script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

Parameters

The private properties are passed in as arguments to the PS script. In the sample scripts, these are all string values. You can potentially pass in different value types with more advanced VB script magic.

Note: Another way to simplify this is by writing only one PS script, such that the entry points are all functions, with only a single primary function called by the VB script. To achieve this, you can pass in additional parameters giving the context of the action expected (ex: Online, Offline etc.).

Step-By-Step Walk-Through

Great! Now that you have the VB Shell & Entry Point Scripts ready, let’s make the application highly available…

Copy VB + PS Scripts to Server

It is important to copy the VB script & all PS scripts to a folder on each cluster node. Ensure that the scripts is copied to the same folder on all cluster nodes. In this walk-through, the VB + PS scripts are copied to “C:\SampleScripts” folder:

Create Group & Resource

Using PowerShell:

Create Group Resource

The “ScriptFilePath” private property gets automatically added. This is the path to the VB script file. There are no other private properties which get added (see above).

You can also create Group & Resource using Failover Cluster Manager GUI:

Specify VB Script

To specify VB script, set the “ScriptFilePath” private property as:

When the VB script is specified, cluster automatically calls the Open Entry Point (in VB script). In the above VB script, additional private properties are added as part of the Open Entry Point.

Configure Private Properties

You can configure the private properties defined for the Generic Script resource as:

In the above example, “PSScriptsPath” was specified as “C:\SampleScripts” which is the folder where all my PS scripts are stored. Additional example private properties like Name, Data1, Data2, DataStoragePath are set with custom values as well.

At this point, the Generic Script resource using PS scripts is now ready!

Starting Your Application

To start your application, you simply will need to start (aka online) the group (ex: SampleGroup) or resource (ex: SampleResUsingPS). You can start the group or resource using PS as:

You can use Failover Cluster Manager GUI to start your Group/Role as well:

To view your application state in Failover Cluster Manager GUI:

Verify PS script output:

In the sample PS script, the output log is stored in the same directory as the PS script corresponding to each entry point. You can see the output of PS scripts for Online & IsAlive Entry Points below:

Awesome! Now, let’s see what it takes to customize the generic scripts for your application.

Customizing Scripts For Your Application

The sample VB Script above is a generic shell that any application can reuse. There are few important things that you may need to edit:

Defining Custom Private Properties: The “Function Open” in the VB script defines sample private properties. You will need to edit those add/remove private properties for your application.
Validating Custom Private Properties: The “Function Online”, “Function Offline”, “Function Terminate”, “Function IsAlive” validate private properties whether they are set or not (in addition to being required or not). You will need to edit the validation checks for any private properties added/removed.
Calling the PS scripts: The “PSCmd” variable contains the exact syntax of the PS script which gets called. For any private properties added/removed you would need to edit that PS script syntax as well.
PowerShell scripts: Parameters for the PowerShell scripts would need to be edited for any private properties added/removed. In addition, your application specific logic would need to be added as specified by the comment in the PS scripts.

Summary

Now you can use PowerShell scripts to make any application highly available with Failover Clusters!!!

The sample VB script & the corresponding PS scripts allow you to take any custom application & make it highly available using PowerShell scripts.
Thanks,
Amitabh

↧

Failover Clustering @ Ignite 2016

September 23, 2016, 1:35 pm

≫ Next: Failover Clustering Sets for Start Ordering

≪ Previous: Using PowerShell script make any application highly available

I am packing my bags getting ready for Ignite 2016 in Atlanta, and I thought I would post all the cluster and related sessions you might want to check out. See you there!
If you couldn’t make it to Ignite this year, don’t worry you can stream all these sessions online.

Cluster

BRK3196 – Keep the lights on with Windows Server 2016 Failover Clustering
BRK2169 – Explore Windows Server 2016 Software Defined Datacenter

Storage Spaces Direct for clusters with no shared storage:

BRK3088 – Discover Storage Spaces Direct, the ultimate software-defined storage for Hyper-V
BRK2189 – Discover Hyper-converged infrastructure with Windows Server 2016
BRK3085 – Optimize your software-defined storage investment with Windows Server 2016
BRK2167 – Enterprise-grade Building Blocks for Windows Server 2016 SDDC: Partner Offers

Storage Replica for stretched clusters:

BRK3072 – Drill into Storage Replica in Windows Server 2016

SQL Clusters

BRK3187 – Learn how SQL Server 2016 on Windows Server 2016 are better together
BRK3286 – Design a Private and Hybrid Cloud for High Availability and Disaster Recovery with SQL Server 2016

Thanks!
Elden Christensen
Principal PM Manager
High Availability & Storage

↧

Configuring Sites

Configuring Preferred Site

Compute Stretch Cluster

Storage Stretch Cluster

Assumptions for this Blog

Pre-requisites

Overview of Storage Spaces Direct

Configuration 1: Single Hyper-V Server (or Client)

Hyper-V Host Configuration

VM Configuration

Guest Cluster Configuration

Configuration 2: Two or more Hyper-V Servers

Hyper-V Host Configuration

VM Configuration

General Suggestions:

FAQ:

How does this differ from what I can do in VMs with Shared VHDx?

References

Interesting Transient Failure Scenarios

New Failover Clustering States

VM Compute Resiliency Workflow in Windows Server 2016

Configuring Node Isolation and Quarantine settings

Baseline 1 – No CSV; Measure Performance of NTFS

Baseline 2 – No CSV; Measure SMB Performance between Cluster Nodes

Baseline 3 – No CSV; Measure SMB Performance between Compute Nodes and Cluster Nodes

CSVFS Case 1 – CSV Direct IO

On coordinating node:

On non-coordinating node

CSVFS Case 2 – CSV File System Redirected IO on Coordinating Node

CSVFS Case 3 – CSV File System Redirected IO on Non-Coordinating Node

CSVFS Case 4 – CSV Block Redirected IO on Non-Coordinating Node

Scale-out File Server (SoFS)

Summary

To learn more, here are others in the Cluster Shared Volume (CSV) blog series:

Pre-requisites

Deployment

Workload

Cluster Workload

Supported/Not Supported

More Information

SQL Server

Supported

We recommend that you use SQL Server Authentication.

File Server

Supported, but not recommended

Kerboros (which is not available) authentication is the preferred authentication protocol for Server Message Block (SMB) traffic.

Hyper-V

Supported, but not recommended

Live migration is not supported. Quick migration is supported.

Message Queuing (MSMQ)

Not supported

Message Queuing stores properties in AD DS.

Quorum Configuration

Cluster Validation

Cluster Diagnostics

Servicing

DNS Replication

Configuring Sites

Configuring Preferred Site

Compute Stretch Cluster

Storage Stretch Cluster

Validate a Cluster

Host Best Practice Analyzer

Configure Live Migration Settings

View Cluster Summary

Manage Cluster Nodes

Manage Clustered VMs

Cluster Status Report

Host Load Balancing

Cluster Logs

Cluster Monitoring

Hyper-V Replica with Clustering

Windows Server 2016

Windows Server 2012 R2

CSV AutoPause due to STATUS_IO_TIMEOUT (c00000b5)

Additional Resources:

Heuristics for Balancing

Controlling Aggressiveness of Balancing

Controlling Node Fairness

Node Fairness vs. SCVMM Dynamic Optimization