How to change the Failure Condition Level value in SQL AG / Availability groups

To modify the FailureConditionlevel value, use the Failover Cluster Manager and follow these steps:

  1. In the roles tab, find the target AG role. Select the target AG role.
  2. Right-click the AG resource at the bottom of the window and select Properties.Failover cluster manager
  3. In the popup window, navigate to the properties tab and there will be a list of values specific to this AG. Click the FailueConditionLevel to change it.

The failure condition level of the AG changes the failure conditions for the health check. For any failure level, if the AG element is reported unhealthy by sp_server_diagnostics then the health check will fail. Each level inherits all the failure conditions from the levels below it.Expand table

LevelCondition under which the instance is considered dead
1: OnServerDownHealth check takes no action if any resources fail besides the AG. If AG data is not received within 5 intervals, or 5/3 * HealthCheckTimeout
2: OnServerUnresponsiveIf no data is received from sp_server_diagnostics for the HealthCheckTimeout
3: OnCriticalServerError(Default) If the system component reports an error
4: OnModerateServerErrorIf the resource component reports an error
5: OnAnyQualifiedFailureConitionsIf the query processing component reports an error

Failure-Condition Level

Whether the diagnostic data and health information returned by sp_server_diagnostics warrants an automatic failover depends on the failure-condition level of the availability group. The failure-condition level specifies what failure conditions trigger an automatic failover. There are five failure-condition levels, which range from the least restrictive (level one) to the most restrictive (level five). A given level encompasses the less restrictive levels. Thus, the strictest level, five, includes the four less restrictive conditions, and so forth.

 Important

Damaged databases and suspect databases are not detected by any failure-condition level. Therefore, a database that is damaged or suspect (whether due to a hardware failure, data corruption, or other issue) never triggers an automatic failover.

The following table describes the failure-condition that corresponds to each level.Expand table

LevelFailure-conditionTransact-SQL ValuePowerShell Value
OneOn server down. Specifies that an automatic failover is initiated when one the following occurs:

The SQL Server service is down.

The lease of the availability group for connecting to the WSFC cluster expires because no ACK is received from the server instance. For more information, see How It Works: SQL Server Always On Lease Timeout.



This is the least restrictive level.
1OnServerDown
TwoOn server unresponsive. Specifies that an automatic failover is initiated when one of the following occurs:

The instance of SQL Server does not connect to cluster, and the user-specified health check timeout threshold of the availability group is exceeded.

The availability replica is in failed state.
2OnServerUnresponsive
ThreeOn critical server error. Specifies that an automatic failover is initiated on critical SQL Server internal errors, such as orphaned spinlocks, serious write-access violations, or too many memory dumps generated in a short period of time.

This is the default level.
3OnCriticalServerError
FourOn moderate server error. Specifies that an automatic failover is initiated on moderate SQL Server internal errors, such as a persistent out-of-memory condition in the SQL Server internal resource pool.4OnModerateServerError
FiveOn any qualified failure conditions. Specifies that an automatic failover is initiated on any qualified failure conditions, including:

Detection of Scheduler deadlock.

Detection of an unsolvable deadlock.



This is the most restrictive level.
5OnAnyQualifiedFailureConditions

References:

https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-flexible-automatic-failover-policy?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-group-lease-healthcheck-timeout?view=sql-server-ver16

Lease Timeouts and Health Checks in SQL Server Always On Availability Groups (sqlshack.com)

How to stripe SQL Server database backups Using Ola Hallengren’s DatabaseBackup procedure

with the @NumberOfFiles option you can stripe the SQL server Database backups upto 64.

EXECUTE dbo.DatabaseBackup

@Databases = ‘Adventureworks’,

@Directory = ‘E:\Backup\DBBackups\’,

@BackupType = ‘FULL‘,

@Compress = ‘Y‘,

@NumberOfFiles = 8

Find which user database is using more SQL memory

DECLARE @total_buffer INT;

SELECT @total_buffer = cntr_value
FROM sys.dm_os_performance_counters
WHERE RTRIM([object_name]) LIKE ‘%Buffer Manager’
AND counter_name = ‘Database Pages’;

;WITH src AS
(
SELECT
database_id, db_buffer_pages = COUNT_BIG(*)
FROM sys.dm_os_buffer_descriptors
–WHERE database_id BETWEEN 5 AND 32766
GROUP BY database_id
)
SELECT
[db_name] = CASE [database_id] WHEN 32767
THEN ‘Resource DB’
ELSE DB_NAME([database_id]) END,
db_buffer_pages,
db_buffer_MB = db_buffer_pages / 128,
db_buffer_percent = CONVERT(DECIMAL(6,3),
db_buffer_pages * 100.0 / @total_buffer)
FROM src
ORDER BY db_buffer_MB DESC;

Get the instant file initialization status using Powershell

TSQL:

SELECT @@SERVERNAME AS [Server Name] ,
— RIGHT(@@version, LEN(@@version) – 3 – CHARINDEX(‘ ON ‘, @@VERSION)) AS [OS Info] ,
— LEFT(@@VERSION, CHARINDEX(‘-‘, @@VERSION) – 2) + ‘ ‘
— + CAST(SERVERPROPERTY(‘ProductVersion’) AS NVARCHAR(300)) AS [SQL Server Version] ,
service_account ,
instant_file_initialization_enabled
FROM sys.dm_server_services
WHERE servicename LIKE ‘SQL Server (%’

  1. save above Tsql as ifiq.sql
  2. save the server names in a text file as serverlist.txt
  3. make sure all paths are updated and run below script in powershell

$servers = Get-Content “C:\Users\Desktop\serverlist.txt”

$output =foreach ($ser in $servers)
{
Invoke-Sqlcmd -inputfile “C:\Users\Desktop\ifiq.sql” -serverinstance $ser

}

$output | format-table |out-file -FilePath “C:\Users\Desktop\ifiq.txt” -Append

Default Trace in SQL Server

The default trace is a system-defined trace which resides in SQL Server directory ,which consists of 5 files with the size of 20 MB each one.
Below one is the script to find out the path of default trace where it is running or saved.
SELECT * FROM fn_trace_getinfo(default);

traceid property value
1 1 2 Configured trace options
1 2 C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQLEXPRESS\MSSQL\Log\log_16.trc Trace file name
1 3 20 Max file size for the *.trc file
1 4 NULL Stop time for the trace session
1 5 1 Current trace status (1 = On and 0 = Off)

After get the default trace path info, use the below query to fetch the trace information.
SELECT * FROM ::fn_trace_gettable(‘C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQLEXPRESS\MSSQL\Log\log_16.trc’, default)

SQL Server Perfmon Counters

Listed below are few important Perfmon Counters and threshold values for CPU, Memory,Disk trouble shooting.

Metric Object Counter Threshold value
CPU Process (sqlservr) %Processor Time < 80%
System Processor Queue Length < 4 per CPU
Memory Memory Available Mbytes > 100MB
SQL Server: Buffer Manager Lazy Writes/Sec < 20
SQL Server: Buffer Manager Page Life Expectancy (PLE) > 300
SQL Server: Buffer Manager Page Lookups/sec <100
SQL Server: Buffer Manager Page Reads/sec <90
SQL Server: Buffer Manager Readahead/sec < 20% of page Reads/sec
SQL Server: Buffer Manager Buffer Cache Hit Ratio >99 %
Disk Physical Disk Avg. Disk Sec/Read < 8ms
PhysicalDisk Avg. Disk sec/Write < 8ms
Index SQLServer:Access Methods Full Scans / sec 1 Full Scan/sec per 1000 Index Searches/sec
SQLServer:SQL Statistics SQL Compilations/sec < 10% of the number of the Batch Requests/sec
SQLServer:SQL Statistics SQL Re-Compilations/sec < 10% of the number of SQL compilations/sec