How to Configure Clustered Tasks with Windows Server 2012

May 31, 2012, 11:08 am

≫ Next: How to Configure a Clustered Storage Space in Windows Server 2012

≪ Previous: How to Troubleshoot Create Cluster failures in Windows Server 2012

Many customers use the Windows Task Scheduler to perform regularly scheduled maintenance tasks on their servers, to run audit checks, generate reports, and even updating application data caches. A task in the Windows Task Scheduler performs an action when a given trigger (condition) has been met.

In previous releases of Windows Server, you could create a task that was local to a single node (server) that was part of a Failover Cluster, but the Task Scheduler did not have any understanding of the entire cluster. Configuring and managing tasks on a on a large scale 32 or 64 Node Cluster can be more challenging than maintaining them on a single machine. Manually copying tasks from one machine to another can become time consuming and error prone. In Windows Server 2012, this experience is significantly improved as you can now use Clustered Scheduled Tasks for the tasks that you want to run on your cluster. There are three types of Clustered Scheduled tasks:

Any Node: There is one instance enabled of the task in the cluster, hence the task only triggers in one machine. This task will be present in the cluster until it’s unregistered or the cluster is destroyed.
Example: if you have a program that pulls information out of the cluster and sends out a report, you only need it to be run in one machine and you do not care which one. This can be an Any Node task.

Resource Specific: There is one instance enabled of task which is bound to a resource in the cluster. The task will run on the same node as the resource. So if the cluster resource is moved to another node, so is the task. Unlike an Any Node task, if this resource is removed so is the task.
Example: if you have a physical disk resource and you want to defragment the disk every month. This is a good example for a Resource Specific task.

Cluster Wide: There is one instance of the task enabled in each node of the cluster. In this case when the trigger is met, the action is executed in all nodes present at that time in the cluster that meet the condition of the trigger.
Example: if you want to have a tool or set of tools to be opened when you login to any of the nodes, this is the kind of task you can add as a Cluster Wide task.

Managing Clustered Tasks using PowerShell:

Now let’s see how to configure and manage clustered tasks using PowerShell. There are four basic PowerShell commands available to configure, query or modify clustered tasks.

Cmdlet	Description
Get-ClusteredScheduledTask	Query cluster tasks.
Register-ClusteredScheduledTask	Register a cluster task.
Set-ClusteredScheduledTask	Update an already registered cluster task.
Unregister-ClusteredScheduledTask	Unregister a cluster task.

In Windows Server 2012, PowerShell commands from different modules are auto-loaded upon first use. It is important to note that the above PowerShell commands are available through TaskScheduler module. Notice that the cmdlets have the “Clustered” prefix as part of the noun in the cmdlet. If you try using the cmdlets without the clustered word in it you won’t be creating a clustered task, or querying the cluster for tasks – instead you’ll be using on a regular node (server) scoped task.

Need more information? Remember you can run any of the following from your PowerShell window:

[name of the cmdlet] -?
Get-help [name of the cmdlet] –full

Registering Cluster Task

In this first entry for clustered tasks we’ll show you how to create a Resource Specific task in 3 easy steps:

1. Pick your action

$action = New-ScheduledTaskAction – Execute C:\ClusterStorage\Volume1\myprogram.exe

This creates the action to be performed by my task. As you can see in the value for the Execute parameter my program is located on a Cluster Shared Volume (CSV) in my cluster which already makes it highly available to my cluster and accessible from all nodes.

2. Pick your trigger

$trigger = New-ScheduledTaskTrigger -At 13:00 –Daily

This creates the trigger that starts my task in the cluster. For this example I want to run my program every day at 13:00.

3. Register your task

Register-ClusteredScheduledTask –Cluster MyCluster –TaskName MyResourceSpecificTask –TaskType ResourceSpecific –Resource MyResourceName –Action $action –Trigger $trigger

Register-ClusteredScheduledTask –Cluster MyCluster –TaskName MyAnyNodeTask –TaskType AnyNode –Action $action –Trigger $trigger

Register-ClusteredScheduledTask –Cluster MyCluster –TaskName MyClusterWideTask –TaskType ClusterWide –Action $action –Trigger $trigger

And you are done. Your cluster now has a task that will run daily at 13:00.

Once you have your tasks registered you still might want to query it and even unregister it. To do that you can do the following:

Querying Cluster Task

The Get-ClusteredScheduledTask allows you to query the tasks in the cluster in the following ways:

Tasks in the cluster
Tasks of a certain type
Task by its name

To query all cluster tasks

Get-ClusteredScheduledTask –Cluster MyCluster

To query all task of a type

Get-ClusteredScheduledTask –Cluster MyCluster –TaskType ResourceSpecific

To query a task by name

Get-ClusteredScheduledTask –Cluster MyCluster –TaskName MyResourceSpecificTask

Updating Cluster Task

After a task is registered, its actions and triggers can be modified independently. In this case we want to update the trigger so that instead of executing at 13:00 it executes at 23:00 once everyone is out of the office.

$trigger = New-ScheduledTaskTrigger -At 23:00 - Daily

Set-ClusteredScheduledTask –Cluster MyCluster –TaskName MyResourceSpecificTask –Trigger $trigger

Similarly if you want to update the action you can create a new action and assign it to the task.

How to view the values of the Trigger

To see the current values of the action and triggers, you go into the ‘TaskDefinition’ of your task. This task definition contains the Triggers and Actions. This is an example of how to see the triggers, and the output for the task after we have updated the task.

(Get-ClusteredScheduledTask -TaskName MyResourceSpecificTask).TaskDefinition.Triggers

Enabled : True

EndBoundary :

ExecutionTimeLimit :

Id :

Repetition : MSFT_TaskRepetitionPattern

StartBoundary : 2012-05-15T23:00:00

RandomDelay : P0DT0H0M0S

PSComputerName :

Unregistering Cluster Task

The Unregister-ClusteredScheduledTask allows you to remove tasks by invoking the cmdlet indicating the cluster and the name of the task to be removed.

Unregister-ClusteredScheduledTask –Cluster MyCluster –TaskName MyResourceSpecificTask

Task Scheduler (taskschd.msc) snap-in

While clustered scheduled tasks can only be managed through PowerShell, they also show up in the Task Scheduler UI under the Failover Clustering folder.

Frequently Asked Questions (FAQ):

If the resource for a ‘resource specific’ task is removed, what happens to my Task?

It can happen that the resource we selected for a ‘resource specific’ task gets removed from the cluster. In this case, the task will also be removed from the cluster.

Evicting a node. Will my tasks still be in the node?

No. As part of the process of evicting a node from the cluster, the tasks get deleted.

Can I change the type of my task after it’s registered?

No. A task that has been registered cannot be converted from any node, cluster wide, or resource specific.

Can I still create non-clustered tasks on a cluster node?

Yes. All the Cluster Tasks get created under path <PATH>. Thus, as long as your non-clustered tasks are under a different path you can always create them on individual cluster nodes. It is important to note that as part of the process of evicting a node from the cluster, all the tasks under the <PATH> would get cleaned up.

Summary

In Windows Server 2012, Clustered Scheduled Tasks can be created quickly and easily – they can be used to maintain the cluster, cluster resources such as disks, and even applications that are running on the cluster.

Ramón Alcántara
Software Development Engineer in Test
Clustering & High-Availability
Microsoft

↧

How to Configure a Clustered Storage Space in Windows Server 2012

June 2, 2012, 9:54 pm

≫ Next: Failover Cluster Sessions at TechEd Orlando 2012 and TechEd Amsterdam 2012

≪ Previous: How to Configure Clustered Tasks with Windows Server 2012

This blog outlines the sequence of steps to configure a Clustered Storage Space in Windows Server 2012 using the Failover Cluster Manager or Windows PowerShell®. You can learn more about Storage Spaces here:

http://blogs.msdn.com/b/b8/archive/2012/01/05/virtualizing-storage-for-scale-resiliency-and-efficiency.aspx

Prerequisites

A minimum of three physical drives, with at least 4 gigabytes (GB) capacity each, are required to create a storage pool in a Failover Cluster.
The clustered storage pool MUST be comprised of Serial Attached SCSI (SAS) connected physical disks. Layering any form of storage subsystem, whether an internal RAID card or an external RAID box, regardless of being directly connected or connected via a storage fabric, is not supported.
All physical disks used to create a clustered pool must pass the Failover Cluster validation tests. To run cluster validation tests:
- Open the Failover Cluster Manager interface (cluadmin.msc) and select the Validate Cluster option.

Clustered storage spaces must use fixed provisioning.
Simple and mirror storage spaces are supported for use in Failover Cluster. Parity Spaces are not supported.
The physical disks used for a clustered pool must be dedicated to the pool. Boot disks should not be added to a clustered pool nor should a physical disk be shared among multiple clustered pools.
Storage spaces formatted with ReFS cannot be added to the Cluster Shared Volume (CSV).

Steps to configure using the Failover Cluster Manager

1. Add the File Services Role and the File Services Role Administration Tools to all nodes in the Failover Cluster

2. Open the Failover Cluster Manager interface (cluadmin.msc)

3. In the left-hand pane, expand Storage. Right-click on Pools and select New Storage Pool. This will start the New Storage Pool Wizard

4. Specify a Name for the Storage Pool and choose the Storage Subsystem that is available to the cluster and click Next

5. Select the Physical Disks (a minimum of three with minimum capacity 4GB each and bus type SAS) for the storage pool and confirm the creation of the pool. The pool will be added to the cluster and brought Online, once created.

6. The next step is to create a Virtual Disk (storage space) that will be associated with a storage pool. In the Failover Cluster Manager, select the storage pool that will be supporting the Virtual Disk. Right-click and choose New Virtual Disk

7. This initiates the New Virtual Disk Wizard. Select the server and storage pool for the virtual disk and click Next. Note that the cluster node hosting the storage pool will be listed.

8. Provide a name and description for the virtual disk and click Next

9. Specify the desired Storage Layout (Simple or Mirror; Parity is not supported in a Failover Cluster) and click Next

Note: I/O operations to a CSV mirror space are redirected at the block level through the CSV coordinator node. This may result in different performance characteristics for I/O to the storage, compared to a simple space.

10. Specify the size of the virtual disk and click Next. After you confirm your selection, the virtual disk is created. The New Volume Wizard is launched if you do not uncheck this option on the confirmation page.

11. The correct Disk and the Server to provision the disk to should be selected for you. Verify this selection and click Next.

12. Specify the size of the volume and click Next

13. Optionally assign a drive letter to the volume and click Next

14. Select the file system settings and click Next and confirm the volume settings. The new volume will be created on the virtual disk and will be added to the Failover Cluster.

Note: The NTFS File System should be selected if the volume is to be added to Cluster Shared Volumes.

15. Your clustered storage space can now be used to host clustered workloads. You can also see the properties of the clustered storage space and the clustered pool that contains it, from the Failover Cluster Manager.

Steps to configure using Windows PowerShell®

Open a Windows PowerShell® console and run the following steps:

1. Create a new pool

a. Select physical disks to add to the pool

$phydisk = Get-PhysicalDisk –CanPool $true | Where BusType -eq "SAS”

b. Obtain the storage subsystem for the pool

$stsubsys = Get-StorageSubsystem

c. Create the new storage pool

$pool = New-StoragePool -FriendlyName TestPool -StorageSubsystemFriendlyName $stsubsys.FriendlyName -PhysicalDisks $phydisk -ProvisioningTypeDefault Fixed

d. Optionally add an additional disk as a HotSpare
$hotSpareDisk = Get-PhysicalDisk –CanPool $true |Out-GridView -PassThru

Add-PhysicalDisk -StoragePoolFriendlyName TestPool -PhysicalDisks $hotSpareDisk -Usage HotSpare

2. Now create a Storage Space in the pool created in the previous step

a. $newSpace = New-VirtualDisk –StoragePoolFriendlyName TestPool –FriendlyName space1 -Size (1GB) -ResiliencySettingName Mirror

3. Initialize, partition and format the Storage Space created

a. $spaceDisk = $newSpace | Get-Disk

b. Initialize-Disk -Number $spaceDisk.Number -PartitionStyle GPT

c. $partition = New-Partition -DiskNumber $spaceDisk.Number -DriveLetter $driveletter -size $spaceDisk.LargestFreeExtent

d. Format-Volume -Partition $partition -FileSystem NTFS

4. Add the Storage Space created to the Cluster

a. $space = Get-VirtualDisk -FriendlyName space1

b. Add-ClusterDisk $space

Note:

Clustered Spaces can also be created using the Server Manager:

You can find a full end to end Windows PowerShell® sample on setting up a file server cluster with Storage Spaces here.

Troubleshooting tips:

If you come across any of the following errors while attempting to add a storage pool to the cluster please review the Prerequisites section at the beginning of this blog to determine which requirement was not met:

Failed to add storage pool to cluster – {XXXXXXX-XXXX-XXXX-XXXX-XXXXXXX}

No storage pool suitable for cluster was found.

Thanks!

Subhasish Bhattacharya
Program Manager
Clustering & High Availability
Microsoft

↧

Failover Cluster Sessions at TechEd Orlando 2012 and TechEd Amsterdam 2012

June 15, 2012, 8:31 pm

≫ Next: How to Move Highly Available (Clustered) VMs to Windows Server 2012 with the Cluster Migration Wizard

≪ Previous: How to Configure a Clustered Storage Space in Windows Server 2012

There were several exciting sessions at the sold-out TechEd Orlando from June 10-14^th on Failover Clustering! If you didn’t get a chance to attend the conference in-person, the sessions are now posted online so you can watch on-demand! Here are the sessions, their descriptions, and links where to go watch them.

The sessions from the clustering team at TechEd Orlando:

1) WSV324 - Building a Highly Available Failover Cluster Solution with Windows Server 2012 from the Ground UP

Windows Server 2012 delivers innovative new capabilities that enable you to build dynamic availability solutions in which workloads, networks, and storage become more flexible, efficient, and available than ever before. This session covers creating a Windows Server 2012 highly available Failover Cluster leveraging the new technologies in Windows Server 2012. This session walks through a demo leveraging a highly available Space, encrypting data with shared BitLocker disks, asymmetrical storage configurations with CSV I/O redirection… from the bottom up to a highly available solution.

2) WSV430 - Cluster Shared Volumes Reborn in Windows Server 2012: Deep Dive

This session takes a deep technical dive into the new Cluster Shared Volumes (CSV) architecture and new features coming in Windows Server 2012. CSV is now a full-blown clustered file system, and all of the challenges of the past have been addressed, along with many enhancements. This is an in-depth session that covers the CSV architecture, CSV backup integration, and integration with a wealth of new features that enhance CSV and its performance.

3) WSV411 - Guest Clustering and VM Monitoring in Windows Server 2012

In Windows Server 2012 there will be new ways to monitor application health state and have recovery inside of a virtual machine. This session details the new VM Monitoring feature in Windows Server 2012 as well as discusses Guest Clustering and changes in Windows Server 2012 (such as virtual FC), along with pros and cons of when to use each.

4) WSV322 - Update Management in Windows Server 2012: Revealing Cluster-Aware Updating and the New Generation of WSUS

Today, patch management is a required component of any security strategy. In Windows Server 2012, the new Cluster-Aware Updating (CAU) feature delivers Continuous Availability through automated self-updating of failover clusters. In Windows Server 2012, Windows Server Update Services (WSUS) has evolved to become a Server Role with exciting new capabilities. This session introduces CAU with a discussion of its GUI, cmdlets, remote-updating and self-updating capabilities. And then we proceed to highlight the main functionalities of WSUS in Windows Server 2012 including the security enhancements, patch deployment automation, and new Windows PowerShell cmdlets to perform maintenance, manage and deploy updates

5) VIR401 - Hyper-V High-Availability and Mobility: Designing the Infrastructure for Your Private Cloud

Private Cloud Technical Evangelist Symon Perriman leads this session discussing Windows Server 2012 and Windows Server 2008 R2 Hyper-V and Failover Clustering design, infrastructure planning and deployment considerations for your highly-available datacenter or Private Cloud. Do you know the pros and cons of how different virtualization solutions can provide continual availability? Do you know how Microsoft System Center 2012 can move the solution closer to a Private Cloud implementation? This session covers licensing, hardware, validation, deployment, upgrades, host clustering, guest clustering, disaster recovery, multi-site clustering, System Center Virtual Machine Manager 2008 and 2012, and offers a wealth of best practices. Prior clustering and Hyper-V knowledge recommended.

You also have another chance to attend the Clustering team sessions at TechEd Amsterdam from June 26-29^th! You can register at: http://europe.msteched.com/

We will present the following clustering sessions at TechEd Amsterdam:

1) WSV324 - Building a Highly Available Failover Cluster Solution with Windows Server 2012 from the Ground UP
http://europe.msteched.com/Sessions?q=%23TEWSV324

2) WSV430 - Cluster Shared Volumes Reborn in Windows Server 2012: Deep Dive
http://europe.msteched.com/Sessions?q=%23TEWSV430

Thanks!

Subhasish Bhattacharya
Program Manager
Clustering & High Availability
Microsoft

↧

How to Move Highly Available (Clustered) VMs to Windows Server 2012 with the Cluster Migration Wizard

June 25, 2012, 12:14 am

≫ Next: How to Configure BitLocker Encrypted Clustered Disks in Windows Server 2012

≪ Previous: Failover Cluster Sessions at TechEd Orlando 2012 and TechEd Amsterdam 2012

The Windows Server 2012 Cluster Migration Wizard is a powerful and time-saving tool that copies cluster roles from a source cluster to a target cluster. Although the Cluster Migration Wizard can move almost any clustered workload to Windows Server 2012, we get many questions about migrating highly available virtual machines (HA VMs). There are two ways that you will be able to move HA VMs to a Windows Server 2012 Failover Cluster:

Windows Server 2012 Cluster Migration Wizard integrated into the Failover Clustering feature
System Center Virtual Machine Manager 2012 (SCVMM 2012) with Service Pack 1

In this blog I will focus on using the Cluster Migration Wizard to move HA VMs. Depending on what operating system version you are running today, there are some considerations:

Tool	Migrate Clustered VMs	Migrate Clustered VMs from Windows Server 2008 R2 SP1 to Window Server 2012	Move Clustered VMs from Windows Server 2008 SP2 to Windows Server 2012
Windows Server 2012 Failover Clustering Cluster Migration Wizard	Yes	Yes	Yes
System Center Virtual Machine Manager 2012 (SCVMM 2012)	Yes	Yes	No

Note: Live Migration of virtual machines (VMs) from Windows Server 2008 R2 to Windows Server 2012 is not supported. As a result, migrating VMs to Windows Server 2012 can be fast, but it is not a zero-downtime event - a brief maintenance window is required to cut over to the new cluster roles. Fortunately, cluster migration can be tested with no impact to a running cluster, so that issues can be identified prior to actual migration.

Windows Server 2012 Cluster Migration Wizard Source and Target OS Versions

The Windows Server 2012 Cluster Migration Wizard will move VMs from the following Windows Server OS versions:

Source Cluster Node OS	Target Cluster Node OS
Windows Server 2008 SP2	Windows Server 2012
Windows Server 2008 R2 SP1	Windows Server 2012
Windows Server 2012	Windows Server 2012

Note: The Windows Server 2012 Cluster Migration Wizard requires that the latest service packs be installed on the source clusters. Windows Server 2008 clusters are required to be upgraded to Service Pack 2 prior to migration. Windows Server 2008 R2 clusters are required to be upgraded to Service Pack 1 prior to migration.

Migration for Highly Available (Clustered) Hyper-V VMs

The following steps are required to prepare a new (target) cluster for the Cluster Migration Wizard – it may typically take approximately two hours to prepare a new Windows Server 2012 cluster with a small number of nodes. Here is an overview of the process:

The new (target) cluster nodes need to be physically configured (network, storage) – or in the case of cluster virtualization, the virtual network and storage settings of the VMs need to be configured. Ideally, both the old (source) cluster and the new (target) cluster will see common shared storage– storage can be reused and this will allow for the smoothest migration
Windows Server 2012 needs to be installed on all of the nodes in the cluster target cluster, and the Hyper-V Server Role and Failover Clustering feature should be installed on all nodes as well.
Create the new Windows Server 2012 target cluster using the Failover Cluster Manager or the New-Cluster PowerShell cmdlet.
Launch the Cluster Migration Wizard from the Failover Cluster Manager, select the source cluster, and then select the cluster roles on the source cluster that you’d like to migrate to the new cluster.
The Pre-Migration Report will identify issues that can impact migration of the selected cluster roles. After migrating, a Post-Migration Report will identify any manual steps that are needed to bring the cluster online.
The new cluster roles are always created offline - when VMs and users are ready, the following steps should be used during a maintenance window:

i. The source VMs should be shut down and turned off.

ii. The source cluster CSV volumes that have been migrated should be off-lined.

iii. The storage that is common to both clusters (LUNS) should be masked (hidden) from the source cluster, to prevent accidental usage by both clusters.

iv. The storage that is common to both clusters (LUNS) should be presented to the new cluster.

v. The CSV volumes on the target cluster should be on-lined.

vi. The VMs on the target cluster should be on-lined.

vii. VMs are migrated and ready for use!

Note: If one VM on a CSV disk is selected for migration, the Cluster Migration Wizard will require all VMs (and auto-select them for you) on that CSV to be migrated too.

Walk Through: Migrating a HA VM from Windows Server 2008 R2 to Windows Server 2012

A. Let’s assume that we’ve completed the planning steps 1-3 above, and that we have a Highly Available VM running on a Windows Server 2008 R2 cluster – the source cluster - notice that the VM is running, and that it depends on a CSV disk resource:

B. On the Windows Server 2012 cluster – the target cluster - from the Failover Cluster Manger, select a cluster and then use the More Actions | Migrate Roles… menu to launch the Cluster Migration Wizard:

C. The Cluster Migration Wizard (Migrate a Cluster Wizard) will appear – press Next:

D. Specify the name of the source cluster – press Next:

E. The source cluster (Windows Server 2008 R2) will be scanned, and the resources that can be moved will be identified – here I have selected the VM called “VHD_CSV”:

F. After pressing Next, we see that the Migration Wizard will prompt us for the Virtual Network Switch that the VM should use on the new (target) cluster – here I use the drop-down menu and select “Destination Lab Private”:

G. Pressing View Report will display the Pre-Migration Report – this will show you the Cluster Migration Wizard’s analysis of the cluster roles that can be migrated. Note that the Cluster Group and Available Storage are never migrated:

H. When you are ready to migrate the resources, press Next:

I. After migrating resources, the Post-Migration Report is displayed in the dialog:

J. By pressing View Report, the full report will be displayed in the default browser:

K. Note that there are two new resources on the target cluster – identical to the source cluster. Under Roles, you will see the VHD_CSV VM – note that it is Off. Migrated VMs are always initially set to off on the Target clusters, this allows you to pre-stage the new cluster, but to control when to make the cut over:

L. Under Storage then Disks, you will see the VHD_CSV-disk Physical Disk resource that was copied to the target cluster:

M. Now that the target cluster has been pre-staged, use the following steps during a maintenance window to cut over to the new Windows Server 2012 cluster:

1. Shutdown all VMs on the source Windows Server 2008 R2 cluster that have been migrated.

2. Configure the storage:

a. Unmask the common shared storage (LUNs) so that they are not presented to the Windows Server 2008 R2source cluster

Note: Data could become corrupt if they are presented to multiple clusters at the same time.

b. Mask the common shared storage (LUNs) to the Windows Server 2012 target cluster.

3. Start all VMs on the target Windows Server 2012 cluster.

Summary

In Windows Server 2012, the Cluster Migration Wizard is a powerful tool that provides agility and flexibility to customers using highly available VMs on Failover Clusters.

To learn how to use the Cluster Migration Wizard to move roles other than Hyper-V virtual machines, see the following step-by-step guide.

Rob Hindman

Program Manager

Clustering & High-Availability

Microsoft

↧

How to Configure BitLocker Encrypted Clustered Disks in Windows Server 2012

July 20, 2012, 8:37 pm

≫ Next: MSMQ Errors in the Cluster.log

≪ Previous: How to Move Highly Available (Clustered) VMs to Windows Server 2012 with the Cluster Migration Wizard

Windows Server 2012 introduces the ability to encrypt Cluster Shared Volumes (CSV) using BitLocker®. You can learn more about BitLocker in Windows Server 2012 here.

Data on a lost or stolen storage is vulnerable to unauthorized access, either by running a software-attack tool against it or by transferring the storage to a different server. BitLocker helps mitigate unauthorized data access by enhancing file and system protections. BitLocker also helps render data inaccessible when BitLocker-protected storage is decommissioned or recycled. BitLocker on a Clustered Disk, either a traditional Physical Disk Resource (PDR) or Cluster Shared Volume therefore allows for an additional layer of protection for administrators wishing to protect sensitive, highly available data. By adding additional protectors to the clustered volume, administrators can also add an additional barrier of security to resources within an organization by allowing only certain user accounts access to unlock the BitLocker volume.

This blog outlines the sequence of steps to configure BitLocker on a Clustered disk using Windows PowerShell®

Prerequisites:

· A Windows Server 2012 Domain Controller (DC) is reachable from all nodes in the cluster.

· The BitLocker Drive Encryption feature is installed on all nodes in the cluster. To install, open a Windows PowerShell console and run:

Add-WindowsFeature BitLocker

Note: The cluster node will need to be restarted after installing the BitLocker Drive Encryption feature.

· Ensure that the disk to be encrypted is formatted with NTFS. For a traditional PDR you need to assign a drive letter to the disk. For CSV you can use the mount point for the volume. Partition, initialize and format the disk if required. Open a Windows PowerShell console and run:

a) Initialize-Disk -Number <num> -PartitionStyle <style>

b) $partition = New-Partition -DiskNumber <num> -DriveLetter <letter>

c) Format-Volume -Partition $partition -FileSystem NTFS

Steps to configure using Windows PowerShell

In Windows Server 2012, BitLocker Drive Encryption can be turned on for both traditional Failover cluster disks as well as Cluster Shared Volumes (CSV). BitLocker encrypts at the volume level, so if a clustered disk consists of more than one volume and you may want to protect the entire disk, by turning BitLocker protection on each volume of the disk.

Volumes can be encrypted before adding them to a cluster. Additionally, data volumes already in use by clustered workloads can be encrypted.

To configure, open a Windows PowerShell console and run the following steps:

1. If the Clustered disk is currently added to a cluster and Online put it into maintenance mode.

Traditional PDR: Get-ClusterResource “Cluster Disk 1” | Suspend-ClusterResource

CSV: Get-ClusterSharedVolume “Cluster Disk 1” | Suspend-ClusterResource

2. Configure BitLocker® on the volume using your choice of protector.

To enable using a password protector:

Enable-BitLocker <drive letter or CSV mount point> -PasswordProtector –Password <password>

Recovery Password Protector

Creating a recovery password and backing up the password in Active Directory (AD) provides a mechanism to restore access to a BitLocker protected drive in the event that the drive cannot be unlocked normally. A domain administrator can obtain the recovery password from AD and use it to unlock and access the drive. Some of the reasons a BitLocker recovery might be necessary include:

- The CNO used to establish a SID protector in step 4 has been accidently deleted from AD.

- An attacker has modified your server. This is applicable for a computer with a Trusted Platform Module (TPM) because the TPM checks the integrity of boot components during startup

To enable using a recovery password protector and backup the protector to Active Directory:

Enable-BitLocker <drive letter or CSV mount point> -RecoveryPasswordProtector

$protectorId = (Get-BitLockerVolume <drive or CSV mount point>).Keyprotector | Where-Object {$_.KeyProtectorType -eq "RecoveryPassword”}

Backup-BitLockerKeyProtector <drive or CSV mount point> -KeyProtectorId $protectorId.KeyProtectorId

To disable: Disable-BitLocker <drive letter or CSV mount point>

Warning: It is important for you to capture and secure the password protector for future use.

Note: During encryption, a CSV Volume will be in Redirected mode until BitLocker builds its metadata and watermark on all data present on the encrypted volume. The duration of redirected mode will be proportional to the size of the volume, the real data size and BitLocker encryption mode picked (DataOnly or Full). The BitLocker encryption rate is typically in the order of minutes per Giga Byte. The cluster service will switch to Direct I/O mode within 3 minutes after encryption has completed.

3. Determine the Cluster Name Object for your cluster:

$cno = (Get-Cluster).name + “$”

4. Add an Active Directory Security Identifier (SID) to the CSV disk using the Cluster Name Object (CNO)

The Active Directory protector is a domain security identifier (SID) based protector for protecting clustered volumes held within the Active Directory infrastructure. It can be bound to a user account, machine account or group. When an unlock request is made for a protected volume, the BitLocker service interrupts the request and uses the BitLocker protect/unprotect APIs to unlock or deny the request. For the cluster service to self-manage BitLocker enabled disk volumes, an administrator must add the Cluster Name Object (CNO), which is the Active Directory identity associated with the Cluster Network name, as a BitLocker protector to the target disk volumes.

Add-BitLockerKeyProtector <drive letter or CSV mount point> -ADAccountOrGroupProtector –ADAccountOrGroup $cno

Warning: You must add a SID based protector using the CNO for an encrypted Clustered Disk. This is necessary for the cluster service to automatically unlock when surfacing the BitLocker protected volumes on one or more nodes of the cluster.

5. If the CSV disk was put it into maintenance mode in step 1, resume operation of the disk.

Traditional PDR: Get-ClusterResource “Cluster Disk 1” | Resume-ClusterResource

CSV: Get-ClusterSharedVolume “Cluster Disk 1” | Resume-ClusterResource

6. If the disk had not been added to the cluster and optionally added to CSV you may do so now.

Get-Partition -DriveLetter <letter> | Get-Disk | Add-ClusterDisk | Add-ClusterSharedVolume

Thanks!

Subhasish Bhattacharya

Program Manager

Clustering and High Availability

Microsoft

↧

MSMQ Errors in the Cluster.log

April 5, 2013, 10:53 am

≫ Next: Optimizing CSV Backup Performance

≪ Previous: How to Configure BitLocker Encrypted Clustered Disks in Windows Server 2012

After using the Get-ClusterLog cmdlet to generate the Cluster.log, you may notice the following errors in the cluster log:

ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
WARN [RCM] Failed to load restype 'MSMQ': error 21.

ERR [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.

Root Cause:

These events are logged because the MSMQ and MSMQ Triggers resource types are registered with the cluster service, but the MSMQ resource DLL cannot be loaded because the MSMQ feature is not installed. This is the default configuration when the Failover Clustering feature is installed.

Possible Solutions:

Ignore - These are benign events to a debug log and can be safely ignored. They have no impact on the functionality of the cluster, nor do they indicate a failure.
Install MSMQ - If you plan to make MSMQ highly available on this cluster, open Server Manager and install the “Message Queuing” feature on all nodes in the cluster. The above errors will no longer be logged.
Unregister MSMQ Resources - If this is a non-MSMQ cluster, you can unregister the MSMQ and MSMQ resource type with the Cluster Service and the above errors will no longer be logged. This can be accomplished with the Remove-ClusterResourceType cmdlet. Open a PowerShell window and type the following:

PS C:\> Remove-ClusterResourceType MSMQ

PS C:\> Remove-ClusterResourceType MSMQTriggers

In summary... just ignore them, they are just noise. If they annoy you and you don't plan to use MSMQ, then unregister the MSMQ resource types.

Thanks!
Elden Christensen
Principal Program Manager Lead
Clustering & High-Availability
Microsoft

↧

Optimizing CSV Backup Performance

May 6, 2013, 8:58 pm

≫ Next: Validate Storage Spaces Persistent Reservation Test Results with Warning

≪ Previous: MSMQ Errors in the Cluster.log

Cluster Shared Volumes (CSV) is a clustered file system available in Windows Server 2012 where all nodes in a Windows Server Failover Cluster can simultaneously access a common shared NTFS volume. CSV has a distributed backup infrastructure which enables backups to be taken from any node in the cluster. In this blog I will discuss some considerations with how backups work with CSV which can help optimize the performance of backups.

When a volume level backup is taken, the cluster service returns all the VMs hosted on the volume(s) to the requester (backup application), including VMs running on non-requester nodes. The requester can choose to pick only the VMs that are running on the node where the backup was initiated (this becomes a local node VM backup), or it can choose to include VMs that are running across different nodes (this becomes a distributed VM backup). The snapshot creation has some differences based on the type of snapshot configured:

Hardware snapshots– The snapshot will be created and surfaced on the node where the backup was invoked by the requestor, which need not be the case as the coordinator node. The backup will then be taken from the local snapshot.
Software snapshots– The underlying snapshot device will be created via volsnap.sys on the coordinator node, and a CSV snapshot volume will be surfaced on every node that points to this volsnap device. On non-coordinator nodes, the CSV snapshot device will access the volsnap snapshot over SMB. It is transparent to the requestor as the CSV snapshot volume appears like a local device, all access to the snapshot will be happening over the network unless the requester happens to be running on the coordinator node.

Considerations:

When taking a backup of a CSV volume, it can be done from any node. However, when using software snapshots the snapshot device will be created on the coordinator node and if the backup was initiated on a non-coordinator node the backup data will be accessed remotely. This means that the data for the backup will be streamed over the network from the coordinator node to the node where the backup was initiated. If you have maintenance window requirements that require shortening the overall backup time you may wish to optimize the performance of backups when using software snapshots in one of the following ways:

Initiate Backups on the Coordinator Node– When using software snapshots the snapshot device will always be created on the node which currently owns the cluster Physical Disk resource associated with the CSV volume. If the backup is conducted locally on the coordinator node, then the data access will be local and backup performance may be improved. This can be achieved by either initiating the backup application on the coordinator node or by moving the Physical Disk resource locally to the node before initiating the backup. CSV ownership can be moved seamlessly with no downtime.
Scale Intra-node Communication– If you wish to have the flexibility of invoking backups with software snapshots from any node, to achieve optimized performance of backups scale up the performance of intra-node communication. It is recommended to use a minimum of 10 GB Ethernet or InfiniBand. You may also wish to use aggregate network bandwidth with NIC Teaming or SMB Multi-channel to increase network performance between the nodes in the Failover Cluster.

Recommendations:

To achieve the highest levels of performance of backups on a Cluster Shared Volume, it is recommended to use Hardware snapshots over Software snapshots.
To achieve the highest levels of performance with Software snapshots on a Cluster Shared Volume, it is recommend either to initiate the backup locally on the CSV coordinator node or to scale up the bandwidth of intra-node communication.

Thanks!
Elden Christensen
Principal Program Manager Lead
Clustering & High-Availability
Microsoft

↧

Validate Storage Spaces Persistent Reservation Test Results with Warning

May 24, 2013, 11:49 am

≫ Next: Failover Clustering Sessions @ TechEd 2013

≪ Previous: Optimizing CSV Backup Performance

I have seen questions from customers who get a warning in the results of their failover cluster validation that indicates the storage doesn’t support persistent reservations for Storage Spaces. They want to know why they got the warning, what it means, and what should they do about it. First, here is the text you will see in the report from the failover cluster validation. It will be highlighted in Yellow and the test may have a Yellow triangle icon next to it:

Validate Storage Spaces Persistent Reservation

Validate that storage supports the SCSI-3 Persistent Reservation commands needed by Storage Spaces to support clustering.

Test Disk <number X> does not support SCSI-3 Persistent Reservations commands needed by clustered storage pools that use the Storage Spaces subsystem. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Contact your storage administrator or storage vendor for help with configuring the storage to function properly with failover clusters that use Storage Spaces.

Question: Why did I get this warning?

Failover cluster requires a specific set of SCSI-3 persistent reservation commands to be implemented by the storage so that storage spaces can be properly managed as clustered disks. The commands that are specifically needed for Storage Spaces are tested, and if they are not implemented in the way that the cluster requires, this warning will be given.

Question: What does this mean and why is it a warning and not a failure?

Failover cluster has multiple tests that check how the storage implements SCSI-3 persistent reservations. This particular test for Storage Spaces is a warning instead of a failure because clustered disks that aren’t going to use Storage Spaces will work correctly if the other SCSI-3 persistent reservation tests pass.

Question: What should I do when I get this warning?

Check the disks that are identified in the warning message and verify whether you will ever want to use those disks with Storage Spaces.

If you want to use the disks with Storage Spaces on the cluster, then you should check your storage configuration and documentation to see if there are settings or firmware/driver versions required to support clustered storage spaces.

If you aren’t going to use Storage Spaces with this cluster and storage, and the other storage validation tests indicate the tests passed, then you can ignore this warning.

The following note is in the KB article that states the support policy for Windows Server 2012 failover clusters. The yellow yield sign mentioned is referring to a warning in the validation test results. http://support.microsoft.com/kb/2775067

Note The yellow yield sign indicates that the aspect of the proposed failover cluster that is being tested is not in alignment with Microsoft best practices. Investigate this aspect to make sure that the configuration of the cluster is acceptable for the environment of the cluster, for the requirements of the cluster, and for the roles that the cluster hosts.

Here are some links to more information regarding clustered storage spaces, cluster validation, and the support policies regarding the validation tests:

Blog: “How to Configure a Clustered Storage Space in Windows Server 2012” http://blogs.msdn.com/b/clustering/archive/2012/06/02/10314262.aspx

TechNet: Deploy Clustered Storage Spaces http://technet.microsoft.com/en-us/library/jj822937.aspx

TechNet: Validate Hardware for a Windows Server 2012 Failover Cluster http://technet.microsoft.com/en-us/library/jj134244.aspx

Microsoft Knowledge Base Article: The Microsoft support policy for Windows Server 2012 failover clusters http://support.microsoft.com/kb/2775067

Steven Ekren
Senior Program Manager
Windows Server Failover Clustering

↧

Failover Clustering Sessions @ TechEd 2013

June 11, 2013, 10:35 am

≫ Next: How to Properly Shutdown a Failover Cluster or a Node

≪ Previous: Validate Storage Spaces Persistent Reservation Test Results with Warning

If you were not able to make it to TechEd 2013 this year, you can still watch the sessions and learn about the new enhancements coming. Here’s links to the recorded cluster sessions at TechEd 2013:

Continuous Availability: Deploying and Managing Clusters Using Windows Server 2012 R2
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B305#fbid=WOoBzkT2vlt

Failover Cluster Networking Essentials
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B337#fbid=WOoBzkT2vlt

Upgrading Your Private Cloud with Windows Server 2012 R2
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B331#fbid=WOoBzkT2vlt

Application Availability Strategies for the Private Cloud
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B311#fbid=WOoBzkT2vlt

Storage and Availability Improvements in Windows Server 2012 R2
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B333#fbid=WOoBzkT2vlt

Understanding the Hyper-V over SMB Scenario, Configurations, and End-to-End Performance
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B335#fbid=WOoBzkT2vlt

Thanks!
Elden Christensen
Principal Program Manager Lead
Clustering & High-Availability
Microsoft

↧

How to Properly Shutdown a Failover Cluster or a Node

August 23, 2013, 4:31 pm

≫ Next: Windows Server 2012 R2 Virtual Machine Recovery from Network Disconnects

≪ Previous: Failover Clustering Sessions @ TechEd 2013

This blog will discuss the proper process for shutting down an individual node in a Windows Server 2012 R2 Failover Cluster or the entire cluster with all the nodes. Note: While the steps outlined will be specific to Windows Server 2012 R2, the process applies to a cluster of any OS version.

Shutting Down a Node

When shutting down or rebooting a node in a Failover Cluster, you first want to drain (move off) any roles running on that server (such as a virtual machine). This ensures that the shutting down of a node is graceful to any applications running on that node.

Open Failover Cluster Manager (CluAdmin.msc)
Click on “Nodes”
Right-click on the node name and under ‘Pause’ click on ‘Drain Roles’
Under Status the node will appear as ‘Paused’. At the bottom of the center pane click on the ‘Roles’ tab. Once all roles have moved off this node, it is safe to shut down or reboot the node.

To resume the node after it has been restarted…

When the node is once again powered on and ready to be put back into service, use the Resume action to re-enable the node to host roles.

Open Failover Cluster Manager (CluAdmin.msc)
Click on “Nodes”
Right-click on the node name and select ‘Resume’, then select either:
1. ‘Fail Roles Back’ – This will resume the node and move any roles which were running on the node prior to the node back. Caution: This could incur downtime based on the role
2. ‘Do Not Fail Roles Back’ – This will resume the node and not move any roles back.

Shutting Down a Node with Windows PowerShell®

Open a PowerShell window as Administrator
Type: Suspend-ClusterNode -Drain
Type: Get-ClusterGroup

Verify that there are no roles listed under “OwnerNode” for that node
This could be scripted with the following syntax:
PS C:\> (Get-ClusterGroup).OwnerNode –eq "NodeBeingDrained"

Shutdown or restart the computer by typing either:

Stop-Computer

Restart-Computer

To resume the node after it has been restarted…

Open a PowerShell window as Administrator
Type: Resume-ClusterNode
1. If you wish to fail back the roles which were previously running on this node type:

PS C:\> Resume-ClusterNode –Failback Immediate

Shutting Down a Cluster

Shutting down the entire cluster involves stopping all roles and then stopping the Cluster Service on all nodes. While you can shut down each node in the cluster individually, using the cluster UI will ensure the shutdown is done gracefully.

Open Failover Cluster Manager (CluAdmin.msc)
Right-click on the cluster name, select ‘More Actions’, then “Shut Down Cluster…”
When prompted if you are sure you want to shut down the cluster, click “Yes”

Shutting Down a Cluster with PowerShell

Open a PowerShell window as Administrator
Type: Stop-Cluster

Controlling VM Behavior on Shutdown

When the cluster is shut down, the VMs will be placed in a Saved state. This can be controlled using the OfflineAction property.

To configure the shut down action for an individual VM (where "Virtual Machine" is the name of the VM):

PS C:\> Get-ClusterResource "Virtual Machine" | Set-ClusterParameter OfflineAction 1

Value	Effect
0	The VM is turned off
1 (default)	The VM is saved
2	The guest OS is shut down
3	The guest OS is shut down forcibly

To start the cluster after it has been shut down

Type: Start-Cluster

Thanks!
Elden Christensen
Principal Program Manager Lead
Clustering & High-Availability
Microsoft

↧

Windows Server 2012 R2 Virtual Machine Recovery from Network Disconnects

September 4, 2013, 7:07 pm

≫ Next: Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start

≪ Previous: How to Properly Shutdown a Failover Cluster or a Node

Overview

Windows Server Failover Clustering has always monitored the running state of virtual machines and the health state of clustered network and clustered storage. We are furthering the failure detection to include monitoring of the virtual machine network and virtual switch connectivity.

Windows Server 2012 R2 introduces a new functionality that allows a virtual machine (VM) to be moved to another node in a failover cluster, using live migration if a network that it’s using becomes disconnected. This improves the availability in cases where a network connection issue may cause clients using the services running inside the VM to be cut off by moving the VM to a node that can provide the networking access to the VM. By default, the Protected Network setting is enabled for all virtual adapters with the assumption that most networks that a VM uses will be important enough to want to relocate the VM if it becomes disconnected.

The live migration of the VM to another node of the cluster will not occur if the destination node doesn’t have the network available that is disconnected on the current cluster node. This avoids moving a virtual machine to a node that doesn’t have the resources that triggered the move in the first place. Another node of the cluster will be selected to move the VM to, unless there are no nodes of the cluster available that have the required network and system resources.

VM live migrations are queued if there are more VMs that are affected by a network issue on a host than can be concurrently live migrated. If the disconnected network becomes available again and there are VMs in the queue to be live migrated, the VMs pending will have the live migrations canceled.

The VMs network adapter settings have a new property in the advanced configuration section that allows you to select whether the network that the adapter is connected to is important enough to the availability of the VM to have it moved if it fails. For instance, if you have an external network where clients connect to the application running inside of the VM, and another network that is used for backups, you can disable the property for the network adapter used for backups but leave it enabled for the external network. If the backup network becomes disconnected the VM will not be moved. If the client access network is disconnected, the VM will be live migrated to a node that has the network enabled.

It is important to note that we do recommend using network teaming for any critical networks for redundancy and seamless handling of many network failures.

Walkthrough

Let’s take walk through some of the concepts to illustrate how this functionality works and ways to configure it.

The diagram below (Diagram 1) shows a simple 2 node cluster with a VM running on it.

(Note: the network configuration depicted in this document is used as an example; the network configuration for your systems may vary depending on the number of adapter, speed of the adapters, and other network considerations)

The parent partition, sometimes referred to as the management partition, on each node has a dedicated network adapter on each node. There is a second adapter on each node that is configured with a Hyper-V virtual switch. The virtual machine has a synthetic network adapter that is configured to connect to the virtual switch.

If the physical network adapter that the virtual switch is using becomes disconnected, then the virtual machine will be live migrated to node B, since node B still has a connection to the network that the virtual machine uses. The virtual machine can be live migrated from node A to B because the private network between those servers is still functioning.

Diagram 1

Configuring a VMs virtual network adapter to not cause the VM to be moved if it is disconnected

Let’s take the same configuration and add another network adapter to each of the nodes and connect it to another virtual switch on each node (see diagram 2 below). We then configure the VM for a second virtual adapter and connect it to the new virtual switch. For this scenario, the network may be used for backups, or for communications between VMs for which a short outage doesn’t affect the clients that use the VM. Let’s call this new network “Backup”.

Because this new network can tolerate short outages, we want to configure the V’s virtual adapter to not be considered a critical network. That will allow the Backup network to become disconnected without causing the VM to be moved to another node of the cluster.

To do this, open the VM’s settings, go to the virtual adapter for the Backup network, and then expand it so you see the “Advanced Features” item. The option to clear the “Protected Network” check box will be shown (see Screen Shot 1 below).

By default, the Protected Network setting is enabled for all virtual adapters with the assumption that most networks that a VM uses will be important enough to want to relocate the VM if it becomes disconnected.

Diagram 2

Screen Shot 1

Configuring a VMs network adapter to not react to a network disconnect using Windows PowerShell

Here is the Windows PowerShell command and output that will show the virtual network adapters for a VM named “VM1”. This command will work from any node of the cluster, even if the VM is not being hosted on the node that you initiate the command from. If you want to run the command from a node that is not part of the cluster, you can add the Get-Cluster cmdlet at the start of the command line and specify the cluster name.

PS C:\Windows\system32> Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | FL VMName,SwitchName,MacAddress,ClusterMonitored

VMName : VM1

SwitchName : Corp

MacAddress : 00155D867239

ClusterMonitored : True

VMName : VM1

SwitchName : Storage

MacAddress : 00155D86723A

ClusterMonitored : True

VMName : VM1

SwitchName : Private

MacAddress : 00155D86723B

ClusterMonitored : True

Here is the Windows PowerShell command that will disable the ClusterMonitored property for network adapter that is configured to use the virtual switch named “Private”.

(Note that the Property is “ClusterMonitored” but the parameter to change it is “NotMonitoredInCluster. Therefore, specifying -NotMonitoredInCluster with True actually changes the ClusterMonitored property to false, and vice-versa.):

PS C:\Windows\system32> Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | Where-Object {$_.SwitchName -eq "Private"} | Set-VmNetworkAdapter -NotMonitoredInCluster $True

PS C:\Windows\system32> Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | FL VMName,SwitchName,MacAddress,ClusterMonitored

VMName : VM1

SwitchName : Corp

MacAddress : 00155D867239

ClusterMonitored : True

VMName : VM1

SwitchName : Storage

MacAddress : 00155D86723A

ClusterMonitored : True

VMName : VM1

SwitchName : Private

MacAddress : 00155D86723B

ClusterMonitored : False

Testing

You can test this behavior by disconnecting the network cable for the physical adapter of a server where a VM is running.

It may take up to 1 minute for the cluster to detect that a virtual machine is affected by a network disconnect. Each virtual machine on a cluster has a cluster resource that monitors the virtual machine for failures. By default the cluster resource will check the state of each virtual switch that a VM is using every 60 seconds.

This means that the time a specific VM takes to identify that a virtual switch is connected to a disconnected physical NIC can be very short or up to 60 seconds, depending on the timing of when the disconnect happened and when the next check for the VM will occur.

This means that if you have more than one VM using a switch that becomes disconnected, not all the VMs will go into the state that will cause them to be live migrated at the same time.

As noted previously, if the network becomes connected again, if there are any VMs that are queued to be moved, they will be removed from the queue and remain on the same server. Any live migrations in progress will finish.

Steven Ekren
Senior Program Manager
Windows Server Failover Clustering and High Availability

↧

Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start

November 8, 2013, 2:09 pm

≫ Next: Decoding Bugcheck 0x0000009E

≪ Previous: Windows Server 2012 R2 Virtual Machine Recovery from Network Disconnects

Add Hyper-V to your server virtualization skillset and improve your career options: Register for this free course, led by Microsoft experts Symon Perriman and Corey Hynes: Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start.

Windows Server 2012 R2 Failover Clustering will be covered in depth, including cluster validation, configuration, management, and best practices for virtualization, from both a Hyper-V and System Center 2012 R2 perspective.

This course helps you prepare for the new Microsoft virtualization certification: Microsoft Specialist Exam 74-409-Server Virtualization with Windows Server Hyper-V and System Center. Event attendees will get a free voucher for the exam*—normally $150.

Already familiar with other virtualization platforms such as VMware or Citrix? Upgrading virtualization platforms? New to virtualization? If any of these are true, then this course is intended for you. Get expert instruction on Microsoft Server Virtualization with Windows Server 2012 R2 Hyper-V and System Center 2012 R2 Virtual Machine Manager in this two-day Jump Start. You will learn how to configure, manage, and maintain Windows Server 2012 R2 Hyper-V and System Center 2012 R2 Virtual Machine Manager including networking and storage services. You will also learn how to configure key Microsoft server virtualization features such as Generation 2 Virtual Machines, Replication Extension, Online Export, Cross-Version Live Migration, Online VHDX Resizing, Live Migration Performance tuning as well as Dynamic Virtual Switch Load Balancing and virtual Receive Side Scaling (vRSS).

Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start

Date: November 19 & 20, 2013
Time: 9:00am – 4:30pm
Where: Live, online virtual classroom
Cost: Free!

Put your career in hyperdrive and Register now.

Additional Resources:

Free expert-led MVA courses on Virtualization
Download Windows Server 2012 R2
Download Hyper-V Server 2012 R2
Download System Center 2012 R2

*The number of free exams is limited, so be sure to schedule your appointment to lock in your free exam. Vouchers expire and all exams must be taken by June 30, 2014.

↧

Decoding Bugcheck 0x0000009E

November 13, 2013, 9:54 am

≫ Next: Cluster Shared Volume (CSV) Inside Out

≪ Previous: Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start

In the System event log you may find an event similar to the following:

Event ID 1001

Source: Microsoft-Windows-WER-SystemErrorReporting

Description: The computer has rebooted from a bugcheck. The bugcheck was: 0x0000009e (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP.

Let's start out discussing what a STOP 0x9e is... Failover Clustering actively conducts health monitoring of many components and at different layers of a server, one of the attributes of a highly available system is to have the health detection mechanisms in place to detect when something goes wrong and to react. Under some conditions when an extreme failure occurs, the cluster service may intentionally bugcheck the server in an attempt to recover. The bugcheck will be a USER_MODE_HEALTH_MONITOR (9e) and invoked by the Failover Cluster kernel mode driver NetFT.sys.

The first and most important thing to understand is that this is normal cluster health detection and recovery, it is intended recovery behavior. It is not a “bug” in clustering, nor is it a bug in NetFT.sys... it is a feature, not a flaw. I say this, because the most common first troubleshooting step I see is that customers apply the latest hotfix for NetFT.sys… and that won’t help.

By far the most common reason for a 0x9e is that Failover Clustering is conducting health monitoring between the NetFT kernel mode driver to the user mode service. If NetFT stop receiving heartbeats, then user mode is considered to be non-responsive and clustering will bugcheck the box in an effort to force a recovery.

So the next question is what caused user mode to become unresponsive? In general, you can troubleshoot this like any other user mode hang… you can setup perfmon and look for memory leaks, etc… the most valuable diagnostic tool will be that when clustering bugchecks the box, you can capture a dump and analyze it to reach root cause. This will involve a call to Microsoft support to help debug the dump.

There are however a couple different conditions which can invoke a bugcheck 0x9e. In this blog I will discuss the different parameters logged in the Event ID 1001 and what they mean.

Decoding STOP 0x0000009E

The bugcheck code will have the following format with the following parameters.

Stop 0x0000009E ( Parameter1 , Parameter2 , Parameter3 , Parameter4 )

Parameter1 value meaning:

Process that failed to satisfy a health check within the configured timeout

Parameter2 value meaning:

Hex value which defines the time in seconds for the timeout which was hit. This will detail how long it took for the bugcheck to be invoked.

Parameter3 value meaning:

Value	Description
0x0000000000000000	The source of the reason for the bugcheck was not specified.
0x0000000000000001	The node has been bugchecked because the RHS process was attempting to gracefully close and did not complete successfully.
0x0000000000000002	The node has been bugchecked because a resource did not respond to a resource entry point call within the configured 'DeadlockTimeout' timeout. The node was configured to bugcheck by the 'DebugBreakOnDeadlock' registry key being set to a value of 3.
0x0000000000000003	The node has been bugchecked because of an unhandled exception with one of the cluster resources and when attempting to recover the RHS process did not terminate successfully within 20 minutes.
0x0000000000000004	The node has been bugchecked because of an unhandled exception with the Resource Hosting Subsystem (RHS) and when attempting to recover the RHS process did not terminate successfully within 20 minutes.
0x0000000000000005	The node has been bugchecked because a resource did not respond to a resource entry point call within the 'DeadlockTimeout' timeout (5 minutes by default) and an attempt was made to terminate the RHS process to recover. However, the RHS process did not terminate successfully within the timeout, which is four times the 'DeadlockTimeout' timeout (20 minutes by default).
0x0000000000000006	The node has been bugchecked because a resource type did not respond to a resource entry point call within the 'DeadlockTimeout' timeout and an attempt was made to terminate the RHS process to recover. However, the RHS process did not terminate successfully.
0x0000000000000007	The node has been bugchecked because of an unhandled exception with the Cluster Service (ClusSvc) and when attempting to recover the ClusSvc process did not terminate successfully within 20 minutes.
0x0000000000000008	The node has been bugchecked by the request of another node in the Failover Cluster
0x0000000000000009	The node has been bugchecked because the cluster service detected an internal subcomponent of the cluster service was being unresponsive. The system was configured to bugcheck by the 'HangRecoveryAction' setting being set to a value of 4
0x000000000000000A	The node has been bugchecked because the kernel mode NetFT driver did not receive a heartbeat from the user mode Cluster Service within the configured 'ClusSvcHangTimeout' timeout. The recovery action was configured to bugcheck by having the 'HangRecoveryAction' cluster common property being set to a value of 3 (default) or 4

Note: Parameter3 is a new value introduced in Windows Server 2012 R2 and will always be 0x0000000000000000 in previous releases.

Parameter4 value meaning:

Currently unused / reserved for future use, and will always be 0x0000000000000000

Thanks!
Elden Christensen
Principal Program Manager Lead
Clustering & High-Availability
Microsoft

↧

Cluster Shared Volume (CSV) Inside Out

December 2, 2013, 11:10 am

≫ Next: Understanding the state of your Cluster Shared Volumes in Windows Server 2012 R2

≪ Previous: Decoding Bugcheck 0x0000009E

In this blog we will take a look under the hood of the cluster file system in Windows Server 2012 R2 called Cluster Shared Volumes (CSV). This blog post is targeted at developers and ISV’s who are looking to integrate their storage solutions with CSV.

Note: Throughout this blog, I will refer to C:\ClusterStorage assuming that the Windows is installed on the C:\ drive. Windows can be installed on any available drive and the CSV namespace will be built on the system drive, but instead of using %SystemDrive%\ClusterStorage\ I’ve used C:\ClusteredStorage for better readability since C:\ is used as the system drive most of the time.

Components

Cluster Shared Volume in Windows Server 2012 is a completely re-architected solution from Cluster Shared Volumes you knew in Windows Server 2008 R2. Although it may look similar in the user experience – just a bunch of volumes mapped under the C:\ClusterStorage\ and you are using regular windows file system interface to work with the files on these volumes, under the hood, these are two completely different architectures. One of the main goals is that in Windows Server 2012, CSV has been expanded beyond the Hyper-V workload, for example Scale-out File Server and in Windows Server 2012 R2 CSV is also supported with SQL Server 2014.

First, let us look under the hood of CsvFs at the components that constitute the solution.

Figure 1 CSV Components and Data Flow Diagram

The diagram above shows a 3 node cluster. There is one shared disk that is visible to Node 1 and Node 2. Node 3 in this diagram has no direct connectivity to the storage. The disk was first clustered and then added to the Cluster Shared Volume. From the user’s perspective, everything will look the same as in the Windows 2008 R2. On every cluster node you will find a mount point to the volume: C:\ClusterStorage\Volume1. The “VolumeX” naming can be changed, just use Windows Explorer and rename like you would any other directory. CSV will then take care of synchronizing the updated name around the cluster to ensure all nodes are consistent. Now let’s look at the components that are backing up these mount points.

Terminology

The node where NTFS for the clustered CSV disk is mounted is called the Coordinator Node. In this context, any other node that does not have clustered disk mounted is called Data Servers (DS). Note that coordinator node is always a data server node at the same time. In other words, coordinator is a special data server node when NTFS is mounted.

If you have multiple disks in CSV, you can place them on different cluster nodes. The node that hosts a disk will be a Coordinator Node only for the volumes that are located on that disk. Since each node might be hosting a disk, each of them might be a Coordinator Node, but for different disks. So technically, to avoid ambiguity, we should always qualify “Coordinator Node” with the volume name. For instance we should say: “Node 2 is a Coordinator Node for the Volume1”. Most of the examples we will go through in this blog post for simplicity will have only one CSV disk in the cluster so we will drop the qualification part and will just say Coordinator Node to refer to the node that has this disk online.

Sometimes we will use terms “disk” and “volume” interchangeably because in the samples we will be going through one disk will have only one NTFS volume, which is the most common deployment configuration. In practice, you can create multiple volumes on a disk and CSV fully supports that as well. When you move a disk ownership from one cluster node to another, all the volumes will travel along with the disk and any given node will be the coordinator for all volumes on a given disk. Storage Spaces would be one exception from that model, but we will ignore that possibility for now.

This diagram is complicated so let’s try to break it up to the pieces, and discuss each peace separately, and then hopefully the whole picture together will make more sense.

On the Node 2, you can see following stack that represents mounted NTFS. Cluster guarantees that only one node has NTFS in the state where it can write to the disk, this is important because NTFS is not a clustered file system. CSV provides a layer of orchestration that enables NTFS or ReFS (with Windows Server 2012 R2) to be accessed concurrently by multiple servers. Following blog post explains how cluster leverages SCSI-3 Persistent Reservation commands with disks to implement that guarantee http://blogs.msdn.com/b/clustering/archive/2009/03/02/9453288.aspx .

Figure 2 CSV NTFS stack

Cluster makes this volume hidden so that Volume Manager (Volume in the diagram above) does not assign a volume GUID to this volume and there will be no drive letter assigned. You also would not see this volume using mountvol.exe or using FindFirstVolume() and FindNextVolume() WIN32 APIs.

On the NTFS stack the cluster will attach an instance of a file system mini-filter driver called CsvFlt.sys at the altitude 404800. You can see that filter attached to the NTFS volume used by CSV if you run following command:

>fltmc.exe instances
Filter Volume Name Altitude Instance Name
-------------------- ------------------------------------- ------------ ----------------------
<skip>
CsvFlt \Device\HarddiskVolume7 404800 CsvFlt Instance
<skip>

Applications are not expected to access the NTFS stack and we even go an extra mile to block access to this volume from the user mode applications. CsvFlt will check all create requests coming from the user mode against the security descriptor that is kept in the cluster public property SharedVolumeSecurityDescriptor. You can use power shell cmdlet “Get-Cluster | fl SharedVolumeSecurityDescriptor” to get to that property. The output of this PowerShell cmdlet shows value of the security descriptor in self-relative binary format (http://msdn.microsoft.com/en-us/library/windows/desktop/aa374807(v=vs.85).aspx):

PS D:\Windows\system32> Get-Cluster | fl SharedVolumeSecurityDescriptor

SharedVolumeSecurityDescriptor : {1, 0, 4, 128...}

CsvFlt plays several roles:

Provides an extra level of protection for the hidden NTFS volume used for CSV
Helps provide a local volume experience (after all CsvFs does look like a local volume). For instance you cannot open volume over SMB or read USN journal. To enable these kinds of scenarios CsvFs often times marshals the operation that need to be performed to the CsvFlt disguising it behind a tunneling file system control. CsvFlt is responsible for converting the tunneled information back to the original request before forwarding it down-the stack to NTFS.
It implements several mechanisms to help coordinate certain states across multiple nodes. We will touch on them in the future posts. File Revision Number is one of them for example.

The next stack we will look at is the system volume stack. On the diagram above you see this stack only on the coordinator node which has NTFS mounted. In practice exactly the same stack exists on all nodes.

Figure 3 System Volume Stack

The CSV Namespace Filter (CsvNsFlt.sys) is a file system mini-filter driver at an altitude of 404900:

D:\Windows\system32>fltmc instances
Filter Volume Name Altitude Instance Name
-------------------- ------------------------------------- ------------ ----------------------
<skip>
CsvNSFlt C: 404900 CsvNSFlt Instance
<skip>

CsvNsFlt plays the following roles:

It protects C:\ClusterStorage by blocking unauthorized attempts that are not coming from the cluster service to delete or create any files or subfolders in this folder or change any attributes on the files. Other than opening these folders about the only other operation that is not blocked is renaming the folders. You can use command prompt or explorer to rename C:\ClusterStorage\Volume1 to something like C:\ClusterStorage\Accounting. The directory name will be synchronized and updated on all nodes in the cluster.
It helps us to dispatch the block level redirected IO. We will cover this in more details when we talk about the block level redirected IO later on in this post.

The last stack we will look at is the stack of the CSV file system. Here you will see two modules CSV Volume Manager (csvvbus.sys), and CSV File System (CsvFs.sys). CsvFs is a file system driver, and mounts exclusively to the volumes surfaced up by CsvVbus.

Figure 5 CsvFs stack

Data Flow

Now that we are familiar with the components and how they are related to each other, let’s look at the data flow.

First let’s look at how Metadata flows. Below you can see the same diagram as on the Figure 1. I’ve just kept only the arrows and blocks that is relevant to the metadata flow and removed the rest from the diagram.

Figure 6 Metadata Flow

Our definition of metadata operation is everything except read and write. Examples of metadata operation would be create file, close file, rename file, change file attributes, delete file, change file size, any file system control, etc. Some writes may also, as a side effect cause a metadata change. For instance, an extending write will cause CsvFs to extend all or some of the following: file allocation size, file size and valid data length. A read might cause CsvFs to query some information from NTFS.

On the diagram above you can see that metadata from any node goes to the NTFS stack on Node 2. Data server nodes (Node 1 and Node 3) are using Server Message Block (SMB) as a protocol to forward metadata over.

Metadata are always forwarded to NTFS. On the coordinator node CsvFs will forward metadata IO directly to the NTFS volume while other nodes will use SMB to forward the metadata over the network.

Next, let’s look at the data flow for the Direct IO. The following diagram is produced from the diagram on the Figure 1 by removing any blocks and lines that are not relevant to the Direct IO. By definition Direct IO are the reads and writes that never go over the network, but go from CsvFs through CsvVbus straight to the disk stack. To make sure there is no ambiguity I’ll repeat it again: - Direct IO bypasses volume stack and goes directly to the disk.

Figure 7 Direct IO Flow

Both Node 1 and Node 2 can see the shared disk - they can send reads and writes directly to the disk completely avoiding sending data over the network. The Node 3 is not in the diagram on the Figure 7 Direct IO Flow since it cannot perform Direct IO, but it is still part of the cluster and it will use block level redirected IO for reads and writes.

The next diagram shows a File SystemRedirected IO request flows. The diagram and data flow for the redirected IO is very similar to the one for the metadata from the Figure 6 Metadata Flow:

Figure 8 File System Redirected IO Flow

Later we will discuss when CsvFs uses the file system redirected IO to handle reads and writes and how it compares to what we see on the next diagram – Block Level Redirected IO:

Figure 9 Block Level Redirected IO Flow

Note that on this diagram I have completely removed CsvFs stack and CSV NTFS stack from the Coordinator Node leaving only the system volume NTFS stack. The CSV NTFS stack is removed because Block Level Redirected IO completely bypasses it and goes to the disk (yes, like Direct IO it bypasses the volume stack and goes straight to the disk) below the NTFS stack. The CsvFs stack is removed because on the coordinating node CsvFs would never use Block Level Redirected IO, and would always talk to the disk. The reason why Node 3 would use Redirected IO, is because Node 3 does not have physical connectivity to the disk. A curious reader might wonder why Node 1 that can see the disk would ever use Block Level Redirected IO. There are at least two cases when this might be happening. Although the disk might be visible on the node it is possible that IO requests will fail because the adapter or storage network switch is misbehaving. In this case, CsvVbus will first attempt to send IO to the disk and on failure will forward the IO to the Coordinator Node using the Block Level Redirected IO. The other example is Storage Spaces - if the disk is a Mirrored Storage Space, then CsvFs will never use Direct IO on a data server node, but instead it will send the block level IO to the Coordinating Node using Block Level Redirected IO. In Windows Server 2012 R2 you can use the Get-ClusterSharedVolumeState cmdlet http://technet.microsoft.com/en-us/library/dn456528.aspx to query the CSV state (direct / file level redirected / block level redirected) and if redirected it will state why.

Note that CsvFs sends the Block Level Redirected IO to the CsvNsFlt filter attached to the system volume stack on the Coordinating Node. This filter dispatches this IO directly to the disk bypassing NTFS and volume stack so no other filters below the CsvNsFlt on the system volume will see that IO. Since CsvNsFlt sits at a very high altitude, in practice no one besides this filter will see these IO requests. This IO is also completely invisible to the CSV NTFS stack. You can think about Block Level Redirected IO as a Direct IO that CsvVbus is shipping to the Coordinating Node and then with the help of the CsvNsFlt it is dispatched directly to the disk as a Direct IO is dispatched directly to the disk by CsvVbus.

What are these SMB shares?

CSV uses the Server Message Block (SMB) protocol to communicate with the Coordinator Node. As you know, SMB3 requires certain configuration to work. For instance it requires file shares. Let’s take a look at how cluster configures SMB to enable CSV.

If you dump list of SMB file shares on a cluster node with CSV volumes you will see following:

> Get-SmbShare
Name                          ScopeName                     Path                          Description
----                          ---------                     ----                          -----------
ADMIN$                        *                            C:\Windows                    Remote Admin
C$                            *                             C:\                           Default share
ClusterStorage$               CLUS030512                  C:\ClusterStorage             Cluster Shared Volumes Def...
IPC$                          *                                                           Remote IPC

There is a hidden admin share that is created for CSV, shared as ClusterStorage$. This share is created by the cluster to facilitate remote administration. You should use it in the scenarios where you would normally use an admin share on any other volume (such as D$). This share is scoped to the Cluster Name. Cluster Name is a special kind of Network Name that is designed to be used to manage a cluster. You can learn more about Network Name in the following blog post http://blogs.msdn.com/b/clustering/archive/2009/07/17/9836756.aspx. You can access this share using the Cluster Name \\<cluster name>\ClusterStorage$

Since this is an admin share, it is ACL’d so only members of the Administrators group have full access to this share. In the output the access control list is defined using Security Descriptor Definition Language (SDDL). You can learn more about SDDL here http://msdn.microsoft.com/en-us/library/windows/desktop/aa379567(v=vs.85).aspx

ShareState            : Online
ClusterType           : ScaleOut
ShareType             : FileSystemDirectory
FolderEnumerationMode : Unrestricted
CachingMode           : Manual
CATimeout             : 0
ConcurrentUserLimit   : 0
ContinuouslyAvailable : False
CurrentUsers          : 0
Description           : Cluster Shared Volumes Default Share
EncryptData           : False
Name                  : ClusterStorage$
Path                  : C:\ClusterStorage
Scoped                : True
ScopeName             : CLUS030512
SecurityDescriptor    : D:(A;;FA;;;BA)

There are also couple hidden shares that are used by the CSV. You can see them if you add the IncludeHidden parameter to the get-SmbShare cmdlet. These shares are used only on the Coordinator Node. Other nodes either do not have these shares or these shares are not used:

> Get-SmbShare -IncludeHidden
Name                          ScopeName                     Path                          Description
----                          ---------                     ----                          -----------
17f81c5c-b533-43f0-a024-dc... *                             \\?\GLOBALROOT\Device\Hard...
ADMIN$                        *                             C:\Windows                    Remote Admin
C$                            *                             C:\                           Default share
ClusterStorage$               VPCLUS030512                  C:\ClusterStorage             Cluster Shared Volumes Def...
CSV$                          *                             C:\ClusterStorage
IPC$                          *                                                           Remote IPC

Each Cluster Shared Volume hosted on a coordinating node cluster creates a share with a name that looks like a GUID. This is used by CsvFs to communicate to the hidden CSV NTFS stack on the coordinating node. This share points to the hidden NTFS volume used by CSV. Metadata and the File System Redirected IO are flowing to the Coordinating Node using this share.

ShareState            : Online
ClusterType           : CSV
ShareType             : FileSystemDirectory
FolderEnumerationMode : Unrestricted
CachingMode           : Manual
CATimeout             : 0
ConcurrentUserLimit   : 0
ContinuouslyAvailable : False
CurrentUsers          : 0
Description           :
EncryptData           : False
Name                  : 17f81c5c-b533-43f0-a024-dc431b8a7ee9-1048576$
Path                  : \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\
Scoped                : False
ScopeName             : *
SecurityDescriptor    : O:SYG:SYD:(A;;FA;;;SY)(A;;FA;;;S-1-5-21-2310202761-1163001117-2437225037-1002)
ShadowCopy            : False
Special               : True
Temporary             : True

On the Coordinating Node you also will see a share with the name CSV$. This share is used to forward Block Level Redirected IO to the Coordinating Node. There is only one CSV$ share on every Coordinating Node:

ShareState            : Online
ClusterType           : CSV
ShareType             : FileSystemDirectory
FolderEnumerationMode : Unrestricted
CachingMode           : Manual
CATimeout             : 0
ConcurrentUserLimit   : 0
ContinuouslyAvailable : False
CurrentUsers          : 0
Description           :
EncryptData           : False
Name                  : CSV$
Path                  : C:\ClusterStorage
Scoped                : False
ScopeName             : *
SecurityDescriptor    : O:SYG:SYD:(A;;FA;;;SY)(A;;FA;;;S-1-5-21-2310202761-1163001117-2437225037-1002)
ShadowCopy            : False
Special               : True
Temporary             : True

Users are not expected to use these shares - they are ACL’d so only Local System and Failover Cluster Identity user (CLIUSR) have access to the share.

All of these shares are temporary - information about these shares is not in any persistent storage, and when node reboots they will be removed from the Server Service. Cluster takes care of creating the shares every time during CSV start up.

Conclusion

You can see that that Cluster Shared Volumes in Windows Server 2012 R2 is built on a solid foundation of Windows storage stack, CSVv1, and SMB3.

Thanks!
Vladimir Petter
Principal Software Development Engineer
Clustering & High-Availability
Microsoft

To learn more, here are others in the Cluster Shared Volume (CSV) blog series:

Cluster Shared Volume (CSV) Inside Out
http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx

Cluster Shared Volume Diagnostics
http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx

Cluster Shared Volume Performance Counters
http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx

Cluster Shared Volume Failure Handling
http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx

↧

Understanding the state of your Cluster Shared Volumes in Windows Server 2012 R2

December 5, 2013, 3:06 pm

≫ Next: Understanding the Repair Active Directory Object Recovery Action

≪ Previous: Cluster Shared Volume (CSV) Inside Out

Cluster Shared Volumes (CSV) is the clustered file system for the Microsoft Private cloud, first introduced in Windows Server 2008 R2. In Windows Server 2012, we radically improved the CSV architecture. We presented a deep dive of these architecture improvements at TechEd 2012. Building on this new and improved architecture, in Windows Server 2012 R2, we have introduced several new CSV features. In this blog, I am going to discuss one of these new features – the new Get-ClusterSharedVolumeState Windows Server Failover Clustering PowerShell® cmdlet.This cmdlet enables you to view the state of your CSV. Understanding the state of your CSV is useful in troubleshooting failures as well as optimizing the performance of your CSV. In the remainder of this blog, I will explain how to use this cmdlet as well as how to interpret the information provided by the cmdlet.

Get-ClusterSharedVolumeState Windows PowerShell® cmdlet

The Get-ClusterSharedVolumeState cmdlet allows you to view the state of your CSV on a node in the cluster. Note that the state of your CSV can vary between the nodes of a cluster. Therefore, it might be useful to determine the state of your CSV on multiple or all nodes of your cluster.

To use the Get-ClusterSharedVolumeState cmdlet open a new Windows PowerShell console and run the following:

To view the state of all CSVs on all the nodes of your cluster

Get-ClusterSharedVolumeState

To view the state of all CSVs on a subset of the nodes in your cluster

Get-ClusterSharedVolumeState –Node clusternode1,clusternode2

To view the state of a subset of CSVs on all the nodes of your cluster

Get-ClusterSharedVolumeState –Name "Cluster Disk 2","Cluster Disk 3"

Get-ClusterSharedVolume "Cluster Disk 2" | Get-ClusterSharedVolumeState

Understanding the state of your CSV

The Get-ClusterSharedVolumeStatecmdlet output provides two important pieces of information for a particular CSV – the state of the CSV and the reason why the CSV is in that particular state. There are three states of a CSV – Direct, File System Redirected and Block Redirected. I will now examine the output of this cmdlet for each of these states.

Direct Mode

In Direct Mode, I/O operations from the application on the cluster node can be sent directly to the storage. It therefore, bypasses the NTFS or ReFS volume stack.

File System Redirected Mode

In File System Redirected mode, I/O on a cluster node is redirected at the top of the CSV pseudo-file system stack over SMB to the disk. This traffic is written to the disk via the NTFS or ReFS file system stack on the coordinator node.

Note:

When a CSV is in File System Redirected Mode, I/O for the volume will not be cached in the CSV Block Cache.
Data deduplication occurs on a per file basis. Therefore, when a file on a CSV volume is deduped, all I/O for that file will occur in File System Redirected mode. I/O for the file will not be cached in the CSV Block Cache – it will instead be cached in the Deduplication Cache. For the remaining non-deduped files, CSV will be in direct mode. The state of the CSV will be reflected as being in Direct mode.
The Failover Cluster Manager will show a volume as in Redirected Access only when it is in File System Redirected Mode and the FileSystemRedirectedIOReason is UserRequest.

Block Redirected Mode

In Block level redirected mode, I/O passes through the local CSVFS proxy file system stack and is written directly to Disk.sys on the coordinator node. As a result it avoids traversing the NTFS/ReFS file system stack twice.

In conclusion, the Get-ClusterSharedVolumeState cmdlet is a powerful tool that enables you to understand the state of your Cluster Shared Volume and thus troubleshoot failures and optimize the performance of your private cloud storage infrastructure.

Thanks!
Subhasish Bhattacharya
Program Manager
Clustering and High Availability
Microsoft

↧

Understanding the Repair Active Directory Object Recovery Action

December 13, 2013, 4:49 pm

≫ Next: How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2

≪ Previous: Understanding the state of your Cluster Shared Volumes in Windows Server 2012 R2

One of the responsibilities of cluster Network Name resource is to rotate the password of the computer object in Active Directory associated with it. When the Network Name resource is online, it will rotate the password according to domain and local machine policy (which is 30 days by default).

If the password is different from what is stored in the cluster database, the cluster service will be unable to logon to the computer object and the Network Name will fail to come online. This may also cause issues such as Kerberos errors, failure to register in a secure DNS zone, and live migration to fail.

The Repair Active Directory Object option is a recovery tool to re-synchronize the password for cluster computer objects. It can be found in Failover Cluster Manager (CluAdmin.msc) by right-clicking on the Network Name, selecting More Actions…, and then clicking Repair Active Directory Object.

Cluster Name Object (CNO) - The CNO is the computer object associated with the Cluster Name resource. When using Repair on the Cluster Name, it will use the credentials of the currently logged on user and reset the computer objects password. To run Repair, you must have the "Reset Password" permissions to the CNO computer object.
Virtual Computer Object (VCO) - The CNO is responsible for managing the passwords on all other computer objects (VCO's) for other cluster network names in the cluster. If the password for a VCO falls out of sync, the CNO will reset the password and self-heal automatically. Therefore it is not needed to run Repair to reset the password for a VCO. In Windows Server 2012 a Repair action was added for all other cluster Network Names, and is a little bit different. Repair will check to see if the associated computer object exists in Active Directory. If the VCO had been accidentally deleted, then using Repair will re-create the computer object if it is missing. The recommended process to recover deleted computer objects is with the AD Recycle Bin feature, using Repair to re-create computer objects when they have been deleted should be a last resort recovery action. This is because some applications store attributes in the computer object (namely MSMQ), and recreating a new computer object will break the application. Repair is a safe action to perform on any SQL Server, or File Server deployment. The CNO must have "Create Computer Objects" permissions on the OU in which it resides to recreate the VCO's.

To run Repair, the Network Name resource must be in a "Failed" or "Offline" state. Otherwise the option will be grayed out.

Repair is only available through the Failover Cluster Manager snap-in, there is no Powershell cmdlet available to script the action.

If you are running Windows Server 2012 and find that you are having to repeatidly run Repair every ~30 days, ensure you have hotfix KB2838043 installed.

Matt Kurjanowicz
Senior Software Development Engineer
Clustering & High-Availability
Microsoft

↧

How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2

January 1, 2014, 8:59 pm

≫ Next: Event ID 5120 in System Event Log

≪ Previous: Understanding the Repair Active Directory Object Recovery Action

Cluster Shared Volumes (CSV) is a layer of abstraction on either the ReFS or NTFS file system (which is used to format the underlying private cloud storage). Just as with a non-CSV volume, at times it may be necessary to run ChkDsk and Defrag on the file system. In this blog, I am going to first address the recommended procedure to run Defrag on your CSV, in Windows Server 2012 R2. I will then discuss how ChkDsk is run on your CSVs.

Procedure to run Defrag on your CSV:

Fragmentation of files on a CSV can impact the perceived file system performance by increasing the seek time to retrieve file system metadata. It is therefore recommended to periodically run Defrag on your CSV volume. Fragmentation is primarily a concern when running dynamic VHDs and less prevalent with static VHDs. On a stand-alone server defrag runs as part of the “Maintenance Task”, so it runs automatically. However, on a CSV volume it will never run automatically, so you need to run it manually or script it to run (potentially using a Clustered Scheduled Task). It is recommended to conduct this process during non-peak production times, as performance may be impacted. The following are the steps to defragment your CSV:

1. Determine if defragmentation is required for your CSV by running the following on an elevated command prompt:

Defrag.exe <CSV Mount Point> /A /U /V

/A Perform analysis on the specified volumes

/U Print the progress of the operation on the screen

/V Print verbose output containing the fragmentation statistics

Note:

If your CSV is backed by thinly provisioned storage, slab consolidation analysis (not the actual slab consolidation) is run during defrag analysis. Slab consolidation analysis requires the CSV to be placed in redirected mode before execution. Please refer to step 2, for instructions on how to place your CSV into redirected mode.

2. If defragmentation is required for your CSV, put the CSV into redirected mode. This can be achieve in either of the following ways:

Suspend-ClusterResource<Cluster Disk Name> -RedirectedAccess

b. Using the Failover Cluster Manager right-click on the CSV and select “Turn On Redirected Access”:

Note:

If you attempt to run Defrag on a CSV without first putting it in redirected mode, it will fail with the following error:

CSVFS failed operation as volume is not in redirected mode. (0x8007174F)

3. Run defrag on your CSV by running the following on an elevated command prompt:

Defrag.exe <CSV Mount Point>

4. Once defrag has completed, revert the CSV back into direct mode by using either of the follow methods:

Resume-ClusterResource<Cluster Disk Name>

b. Using the Failover Cluster Manager right-click on the CSV and select “Turn Off Redirected Access”:

How is ChkDsk run on your CSV:

During the lifecycle of your file system corruptions may occur which require resolution through ChkDsk. As you are aware, CSVs in Windows Server 2012 R2 also supports the ReFS file system. However, the ReFS filesystem achieves self-healing through integrity checks on metadata. As a consequence, ChkDsk does not need to be run for CSV volumes with the ReFS file system. Thus, this discussion is scoped to corruptions in CSV with the NTFS file system. Also, note the redesigned ChkDsk operation introduced with Windows Server 2012, which separates the ChkDsk scan for errors (online operation) and the ChkDsk fix (offline operation). This results in higher availability for your Private Cloud storage since you only need to take your storage offline to fix corruptions in your storage (which is a significantly faster process than the scan for corruptions). In Windows Server 2012, we integrated ChkDsk /SpotFix into the cluster IsAlive health check for the Physical Disk Resource corresponding to the CSV. As a consequence we will now attempt to fix corruptions in your CSV without any perceptible downtime for your application.

Detection of Corruptions – ChkDsk /Scan:

The following is the workflow on Windows Server 2012 R2 systems to scan for NTFS corruptions:

Note:

If the system is never idle it is possible that the ChkDsk scan will never be run. In this case the administrator will need to invoke this operation manually. To invoke this operation manually, on an elevated command prompt run the following:

chkdsk.exe <CSV mount point name> /scan

Resolution of CSV corruptions during Physical Disk Resource IsAlive Checks:

The following is the CSV workflow in Windows Server 2012 R2 to fix corruptions:

Note:

In the rare event that a single CSV corruption takes greater than 15 seconds to fix, the above workflow will not resolve the error. In this case the administrator will need to manually fix this error. A CSV does not need to be place in maintenance or redirected mode before invoking chkdsk. The CSV will re-establish its state automatically once the chkdsk run has completed. To invoke this operation manually, on an elevated command prompt run the following:

chkdsk.exe <CSV mount point name> /SpotFix

Running Defrag or ChkDsk through Repair-ClusterSharedVolume cmdlet:

Running Defrag or ChkDsk on your CSV, through the Repair-ClusterSharedVolume, is deprecated. It is instead highly encouraged to directly use either Defrag.exe or ChkDsk.exe for your CSV, using the procedure indicated in the preceding sections. The use of the Repair-ClusterSharedVolume cmdlet, however is still supported by Microsoft. To use this cmdlet to run chkdsk or defrag, run the following on a new elevated Windows PowerShell console:

Repair-ClusterSharedVolume <Cluster Disk Name> -ChkDsk –Parameters <ChkDsk parameters>

Repair-ClusterSharedVolume <Cluster Disk Name> –Defrag –Parameters <Defrag parameters>

You can determine the Cluster Disk Name corresponding to your CSV using the Get-ClusterSharedVolume cmdlet by running the following:

Get-ClusterSharedVolume | fl *

Thanks!

Subhasish Bhattacharya
Program Manager
Clustering and High Availability
Microsoft

↧

Event ID 5120 in System Event Log

February 26, 2014, 1:42 pm

≫ Next: Cluster Shared Volume Diagnostics

≪ Previous: How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2

When conducting backups of a Windows Server 2012 or later Failover Cluster using Cluster Shared Volumes (CSV), you may encounter the following event in the System event log:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Description: Cluster Shared Volume 'VolumeName' ('ResourceName') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Having an Event ID 5120 logged may or may not be the sign of a problem with the cluster, based on the error code logged. Having an Event 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR or the error code c0130021 may be expected and can be safely ignored in most situations.

An Event ID 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR is logged on the node which owns the cluster Physical Disk resource when there was a VSS Software Snapshot which clustering knew of, but the software snapshot was deleted. When a snapshot is deleted which Failover Clustering had knowledge of, clustering must resynchronize its state of the view of the snapshots.

One scenario where an Event ID 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR may be logged is when using System Center Data Protection Manager (DPM), and DPM may delete a software snapshot once a backup has completed. When DPM requests deletion of a software snapshot, volsnap will mark the software snapshot for deletion. However volsnap conducts deletion in an asynchronous fashion which occurs at a later point in time. Even though the snapshot has been marked for deletion, Clustering will detect that the software snapshot still exists and needs to handle it appropriately. Eventually volsnap will perform the actual deletion operation of the software snapshot. When clustering then notices that a software snapshot it knew of was deleted, it must resynchronize its view of the snapshots.

Think of it as clustering getting surprised by an un-notified software snapshot deletion, and the cluster service telling the various internal components of the cluster service that they need to resynchronize their views of the snapshots.

There are also a few other expected scenarios where volsnap will delete snapshots, and as a result clustering will need to resynchronize its snapshot view. Such as if a copy on write fails due to lack of space or an IO error. In these conditions volsnap will log an event in the system event log associated with those failures. So review the system event logs for other events accompanying the event 5120, this could be logged on any node in the cluster.

Troubleshooting:

If you see a few random event 5120 with an error of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR or the error code c0130021, they can be safely ignored. We recognize this is not optimal as they create false positive alarms and trigger alerts in management software. We are investigating breaking out cluster state resynchronization into a separate non-error event in the future.
If you are seeing many Event 5120’s being logged, this is a sign that clustering is in need of constantly resynchronizing its snapshot state. This could be a sign of a problem and may require engaging Microsoft support for investigation.
If you are seeing event 5120’s logged with error codes other than STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR, it is a sign of a problem. Be due-diligent to review the error code in the description of all of the 5120’s logged be certain. Be careful not to dismiss the event because of a single event with STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR. If you see other errors logged, there are fixes available that need to be applied. Your first troubleshooting step should be to apply the recommended hotfixes in the appropriate article for your OS version:

Recommended hotfixes and updates for Windows Server 2012-based failover clusters
http://support.microsoft.com/kb/2784261

Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters
http://support.microsoft.com/kb/2920151
If an Event 5120 is accompanied by other errors, such as an Event 5142 as below. It is a sign of a failure and should not be ignored.

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Event ID: 5142
Task Category: Cluster Shared Volume
Level: Error
Description: Cluster Shared Volume 'VolumeName' ('ResourceName') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

Thanks!
Elden Christensen
Principal Program Manager Lead
Clustering & High-Availability
Microsoft

↧

Cluster Shared Volume Diagnostics

March 13, 2014, 8:58 pm

≫ Next: Failover Clustering and IPv6 in Windows Server 2012 R2

≪ Previous: Event ID 5120 in System Event Log

This is the second blog post in a series about Cluster Shared Volumes (CSV). In this post we will go over diagnostics. We assume that reader is familiar with the previous blog post that explains CSV components and different CSV IO modes http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx

Is Direct IO on this Volume Possible?

Let’s assume you have created a cluster, added a disk to Cluster Shared Storage, you see that disk is online, and path to the volume (let’s say c:\ClusterStorage\Volume1) is accessible. The very first question you might have is if Direct IO even possible on this volume. With Windows Server 2012 R2 there is a PowerShell cmdlet that attempts to answer exactly that question:

Get-ClusterSharedVolumeState [[-Name] <StringCollection>] [-Node <StringCollection>] [-InputObject <psobject>] [-Cluster <string>] [<CommonParameters>]

If you run this PowerShell cmdlet providing name of the cluster Physical Disk Resource then for each cluster node it will tell you if on that node if the volume is in File System Redirected mode or Block Level Redirected mode, and will tell you the reason.

Here is how output would look like if Direct IO is possible

PS C:\Windows\system32> get-ClusterSharedVolumeState -Name "Cluster Disk 1"
Name                         : Cluster Disk 1
VolumeName                   : \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\
Node                         : clus01
StateInfo                    : Direct
VolumeFriendlyName           : Volume1
FileSystemRedirectedIOReason : NotFileSystemRedirected
BlockRedirectedIOReason      : NotBlockRedirected

Name                         : Cluster Disk 1
VolumeName                   : \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\
Node                         : clus02
StateInfo                    : Direct
VolumeFriendlyName           : Volume1
FileSystemRedirectedIOReason : NotFileSystemRedirected
BlockRedirectedIOReason      : NotBlockRedirected

In the output above you can see that Direct IO on this volume is possible on both cluster nodes.

If we put this disk in File System Redirected mode using

PS C:\Windows\system32> Suspend-ClusterResource -Name "Cluster Disk 1" -RedirectedAccess -Force

Name                                    State                                   Node
----                                    -----                                   ----
Cluster Disk 1                          Online(Redirected)                      clus01

Then output of get-ClusterSharedVolumeState will change to

PS C:\Windows\system32> get-ClusterSharedVolumeState -Name "Cluster Disk 1"

Name                         : Cluster Disk 1
VolumeName                   : \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\
Node                         : clus01
StateInfo                    : FileSystemRedirected
VolumeFriendlyName           : Volume1
FileSystemRedirectedIOReason : UserRequest
BlockRedirectedIOReason      : NotBlockRedirected

Name                         : Cluster Disk 1
VolumeName                   : \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\
Node                         : clus02
StateInfo                    : FileSystemRedirected
VolumeFriendlyName           : Volume1
FileSystemRedirectedIOReason : UserRequest
BlockRedirectedIOReason      : NotBlockRedirected

You can turn off File System redirected mode using following cmdlet

PS C:\Windows\system32> resume-ClusterResource -Name "Cluster Disk 1"

Name                                    State                                   Node
----                                    -----                                   ----
Cluster Disk 1                          Online                                  clus01

State of CSV volume does not have to be the same on all nodes. For instance if disk is not connected to all the nodes then you might see volume in Direct mode on nodes where disk is connected and in BlockRedirected mode on the nodes where it is not connected.

CSV volume might be in a Block Level Redirected mode for one of the following reasons

NoDiskConnectivity– Disk is not visible on/connected to this node. You need to validate your SAN settings.
StorageSpaceNotAttached– Space is not attached on this node. Many Storage Spaces on disk formats are not trivial, and cannot be accessed for read/write by multiple nodes at the same time. Cluster enforces that a Space is accessible by only one cluster node at a time. Space is detached on all other nodes and it is attached only on the node where corresponding Physical Disk Resource is online. The only type of Space that can be attached on multiple nodes and is a Simple Space, which does not have write-back cache.

When you are using a Mirrored or Parity Space then most often you will see that volume is in Direct IO mode on the coordinator node and in Block Redirected mode on all other nodes, and the reason for block redirected mode is StorageSpaceNotAttached. Please note that if a Space uses write-back cache then it always will be in Block Redirected mode even it is a Simple Space.

CSV might be in the File System Redirected mode for one of the following reasons

UserRequest– user put volume in redirected state. This can be done using the Failover Cluster Manager snap-in or PowerShell cmdlet Suspend-ClusterResource.
IncompatibleFileSystemFilter– An incompatible file system filter attached to the NTFS/REFS file system. Use “fltmc instances” system event log and cluster log to learn more. Usually that means you have installed a storage solution that uses a file system filter. In the previous blog post you can find samples of fltmc output. To resolve that you can either disable or uninstall the filter. The presence of a Legacy File System filter will always disable direct IO. If solution uses a File System Minifilter Driver then filters present at the following altitudes will cause CSV to stay in File System Redirected mode

300000 – 309999 Replication
280000 – 289999 Continuous Backup
180000 – 189999 HSM
160000 – 169999 Compression
140000 – 149999 Encryption

The reason is that some of these filters might do something that is not compatible with Direct IO or Block Level Redirected IO. For instance a replication filter might assume that it will observe all IO so it can then replicate data to the remote site. A compression or encryption filter might need to modify data before it goes to/from the disk. If we perform Direct IO or Block Redirected IO we will bypass these filters attached to NTFS and consequently might corrupt data. Our choice is to be safe by default so we put volume in File System Redirected Mode if we notice a filter at one of the above altitudes is attached to this volume. You can explicitly inform cluster that this filter is compatible with Direct IO by adding the minifilter name to the cluster common property SharedVolumeCompatibleFilters. If you have a filter that is not on one of the altitudes that are not compatible with Direct IO, but you know that it is not compatible then you can add this minifilter to the cluster property SharedVolumeIncompatibleFilters.

IncompatibleVolumeFilter - An incompatible volume filter attached below NTFS/REFS. Use system event log and cluster log to learn more. The reasons and solution are similar to what we’ve discussed above.
FileSystemTiering - Volume is in file system redirected mode because the volume is a Tiered Space with heatmap tracking enabled. Tiering heatmap assumes that it can see every IO. Information about IO operations is produced by REFS/NTFS. If we perform Direct IO then statistics will be incorrect and the tiering engine could make incorrect placement decisions by moving hot data to a cold tier or vice versa. You can control if per volume heatmap is enabled/disabled using

fsutil.exe tiering setflags/clearflags with flag /TrNH

If you choose to disable heatmap then you can control which files should go to what tier by pinning them to a tier using PowerShell cmdlet Set-FileStorageTier, and then running Optimize-Volume with –TierOptimize. Please note that for Optimize-Volume to work on CSV volume you need to put volume in File System Redirected mode using Suspend-ClusterResource. You can learn more about Storage Spaces tiering from this blogpost http://blogs.technet.com/b/josebda/archive/2013/08/28/step-by-step-for-storage-spaces-tiering-in-windows-server-2012-r2.aspx .
BitLockerInitializing– Volume is in redirected state because we are waiting for BitLocker to finish initial volume encryption of this volume.

If Get-ClusterSharedVolumeState tells volume on a node is in Direct IO state does it mean that absolutely all IO will go Direct IO way? The answer is: It is not so simple.

Here is another blog post that covers Get-ClusterSharedVolumeState PowerShell cmdlet http://blogs.msdn.com/b/clustering/archive/2013/12/05/10474312.aspx .

Is Direct IO on this File Possible?

Even if CSV volume is in Direct IO or Volume Level Redirected mode to be able to do Direct IO on a file there are number of preconditions that have to be true:

CSVFS understands on disk file format

Such as the file is not sparse, compressed, encrypted, resilient etc

There are no File System filters that might modify file layout or expect to see all IO

File System minifilters that provide compression, encryption, replication etc

There are no File System filters that object to Direct IO on the stream. An example would be the Windows Server Deduplication feature. When you install deduplication and enable it on a CSV volume it will NOT disable Direct IO on all files. Instead it will veto Direct IO only for the files that have been optimized by dedup.
CsvFs was able to make sure NTFS/REFS will not change location of the file data on the volume – file is pinned. If NTFS relocates file’s block while CSVFS does Direct IO that could result in volume corruption.
There are no applications that need to make sure IO is observed by NTFS/REFS stack. There is an FSCTL that an application can send to the file system to tell it to keep the file in File System Redirected mode for as long as this application has the file opened. File will be switched back to the redirected mode as soon as application closes the file.
CSVFS has appropriate oplock level. Oplocks guarantee cross node cache coherency. Oplocks are documented on MSDN http://msdn.microsoft.com/en-us/library/windows/hardware/ff551011(v=vs.85).aspx

Read-Write-Handle (RWH) or Read-Write (RW) for write. If CSVFS was able to obtain this level of oplock that means this file is opened only from this node.
Read-Write-Handle (RWH) or Read-Handle (RH) or Read-Write (RW) or Read (R) for reads. If CSVFS was able to obtain RH or R oplock then this file is opened from multiple nodes, but all nodes perform only file read or other operations that do not modify file content.

CSVFS was able to purge cache on NTFS/REFS. Make sure there is no stale cache on NTFS/REFS.

If any of the preconditions are not true then IO is dispatched using File System Redirected mode. If all preconditions are true then CSVFS will translate IO from file offsets to the volume offsets and will send it to the CSV Volume Manager. Keep in mind that CSV Volume Manager might send it using Direct IO to the disk when disk is connected or it might send it over SMB to the disk on the Coordinator node using Block level Redirected IO. CSV Volume Manager always prefer Direct IO, and Block Level Redirected IO is used only when disk is not connected or when disk fails IO.

Summary

To provide high availability and good performance CSVFS has several alternative ways how IO might be dispatched to the disk. This demonstrated some of the tools that can be used to analyze why CSV volume chooses one path for IO versus the other.

Thanks!
Vladimir Petter
Principal Software Development Engineer
Clustering & High-Availability
Microsoft

To learn more, here are others in the Cluster Shared Volume (CSV) blog series:

Cluster Shared Volume Performance Counters
http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx

Cluster Shared Volume Failure Handling
http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx

↧

Failover Clustering and IPv6 in Windows Server 2012 R2

March 24, 2014, 7:06 pm

≫ Next: Configuring a File Share Witness on a Scale-Out File Server

≪ Previous: Cluster Shared Volume Diagnostics

In this blog, I will discuss some common questions pertaining to IPv6 and Windows Server 2012 R2 Failover Clusters.

What network protocol does Failover Clustering default to?

If both IPv4 and IPv6 are enabled (which is the default configuration), IPv6 will be always used by clustering. The key take away is that it is not required to configure IPv4 when the IPv6 stack is enabled and you can go as far as to unbind IPv4. Additionally, you can use link-local (fe80) IPv6 address for your internal cluster traffic so IPv6 can be used for clustering even if you don’t use IPv6 for your public facing interfaces. Note that you can only have one cluster network using IPv6 link-local (fe80) addresses in your cluster. All networks that have IPv6 also have an IPv6 link-local address which is ignored if any IPv4 or other IPv6 prefix is present.

Should IPv6 be disabled for Failover Clustering?

The recommendation for Failover Clustering and Windows in general, starting in 2008 RTM, is to not disable IPv6 for your Failover Clusters. The majority of the internal testing for Failover Clustering is done with IPv6 enabled. Therefore, having IPv6 enabled will result in the safest configuration for your production deployment.

Will Failover Clustering cease to work if IPv6 is disabled?

A common misconception is that Failover Clustering will cease to work if IPv6 is disabled. This is incorrect. The Failover Clustering release criterion includes functional validation in an IPv4-only environment.

How does Failover Clustering handle IPv6 being disabled?

There are two levels at which IPv6 can be disabled:

1) At the adapter level: This is done by unbinding the IPv6 stack by launching ncpa.cpl and unchecking “Internet Protocol Version 6 (TCP/IPv6)”.

Failover Clustering behavior: NetFT, the virtual cluster adapter, will still tunnel traffic using IPv6 over IPv4.

2) At the registry level: This can be done using the following steps:

Launch regedit.exe
Navigating to the HKEY_LOCAL_MACHINE> SYSTEM > CurrentControlSet > services >TCPIP6 > Parameters key.
Right clicking Parameters in the left sidebar and choosing New->DWORD (32 bit) Value and creating an entry DisabledComponents with value FF.
Restarting your computer to disable IPv6

Failover Clustering behavior: This is the only scenario where NetFT traffic will be sent entirely over IPv4. It is to be noted that this is not recommended and not the mainstream tested code path.

Any gotchas with using Symantec Endpoint Protection and Failover Clustering?

A default Symantec Endpoint Protection (SEP) firewall policy has rules to Block IPv6 communication and IPv6 over IPv4 communication, which conflicts with the Failover Clustering communication over IPv6 or IPv6 over IPv4. Currently Symantec Endpoint Protection Firewall doesn't support IPv6. This is also indicated in the guidance from Symantec here. The default Firewall policies in SEP Manager is shown below:

It is therefore recommended that if SEP is used on a Failover Cluster, the rules indicated above blocking IPv6 and IPv6 over IPv4 traffic be disabled. Also, refer to the following article - About Windows and Symantec firewalls

Do Failover Clusters support static IPv6 addresses?

The Failover Cluster Manager and clustering in general is streamlined for the most common case (in which customers do not use static IPv6 address). Networks are configured automatically, in that the cluster will automatically generate IPv6 addresses for the IPv6 Address resources on your networks. If you prefer to select your own statically assigned IPv6 addresses, you can reconfigure the IPv6 Address resources using PowerShell as follows (it cannot be specified when the cluster is created):

Open a Windows PowerShell® console as an Administrator and do the following:

1) Create a new IPv6 Cluster IP Resource

Add-ClusterResource -Name "IPv6 Cluster Address" -ResourceType "IPv6 Address" -Group "Cluster Group"

2) Set the properties for the newly created IP Address resource

Get-ClusterResource "IPv6 Cluster Address" | Set-ClusterParameter –Multiple @{"Network"="Cluster Network 1"; "Address"= "2001:489828:4::";"PrefixLength"=64}

3) Stop the netname which corresponds to this static IPv6 address

Stop-ClusterResource "Cluster Name"

4) Create a dependency between the netname and the static IPv6 address

Set-ClusterResourceDependency "Cluster Name" "[Ipv6 Cluster Address]"

You might consider having an OR dependency with between the netname and, the static IPv6 and IPv4 addresses as follows:

Set-ClusterResourceDependency "Cluster Name" "[Ipv6 Cluster Address] or [Ipv4 Cluster Address]"

5) Restart the netname

Start-ClusterResource "Cluster Name"

For name resolution, if you prefer not to use dynamic DNS, you can configure DNS mappings for the address automatically generated by the cluster, or you can configure DNS mappings for your static address. Also note that, Cluster IPv6 Address resources do not support DHCPv6.

Thanks!

Subhasish Bhattacharya

Program Manager

Clustering & High Availability

Microsoft

↧