12 min read

Crafting a Robust Custom IPMI Fan Control Script: Overcoming Challenges and Mitigating Unexpected Pulsing

Server ManagementIPMIBash ScriptingHardware

Managing server hardware efficiently is crucial for maintaining optimal performance and longevity. This article explores the process of writing a custom IPMI fan control script, the challenges encountered, and strategies to address unexpected fan pulsing.

Understanding the Need for Custom Fan Control

Servers operate under varying loads, and their cooling systems must adapt dynamically to maintain safe operating temperatures. Default fan settings are often generic and may not account for specific workload patterns or environmental conditions. Custom fan control scripts allow administrators to optimize cooling efficiency, reduce noise levels, enhance energy savings, and extend hardware lifespan.

The Challenge: Unexpected Fan Pulsing

One of the most perplexing issues encountered was unexpected fan pulsing—rapid fluctuations in fan speed that can lead to both acoustic disturbances and reduced hardware lifespan. This behavior typically stems from several factors including sensor noise, insufficient hysteresis, and aggressive scaling factors.

Core Script Implementation

The foundation of our fan control solution is a bash script that continuously monitors CPU temperatures and adjusts fan speeds accordingly. Here's the core implementation:

#!/bin/bash

# Function to retrieve fan zones using IPMI
get_fan_zones() {
    local output
    if ! output=$(ipmitool sdr 2>/dev/null); then
        echo "Error: Failed to retrieve IPMI sensor data." >&2
        exit 1
    fi
    
    local zones=()
    while IFS= read -r line; do
        if [[ $line =~ ^Fan\ ([0-9][A-Z])\ Tach ]]; then
            zones+=("${BASH_REMATCH[1]}")
        fi
    done <<< "$output"
    
    echo "${zones[@]}"
}

# Initialize fan zones
fan_zones=($(get_fan_zones))
if [[ ${#fan_zones[@]} -eq 0 ]]; then
    echo "Error: No fan zones found." >&2
    exit 1
fi

# Define temperature thresholds
MIN_TEMP=30  # Minimum temperature (°C)
MAX_TEMP=80  # Maximum temperature (°C)

# Scaling factor to moderate fan speed adjustments
SCALING_FACTOR=0.4  # 40% of calculated speed to prevent aggressive changes

# Hysteresis to prevent rapid fan speed oscillations
HYSTERESIS=5  # Degrees

# Update frequency
SLEEP_INTERVAL=15  # Seconds between updates

# Minimum allowable fan speed to avoid system warnings
MIN_FAN_SPEED=20

Temperature Monitoring and Fan Speed Calculation

The script continuously monitors CPU temperatures and calculates appropriate fan speeds using a cubic scaling function for smooth transitions:

while true; do
    # Retrieve temperature sensor data
    if ! ipmi_output=$(ipmitool sdr type temperature 2>/dev/null); then
        echo "Warning: Failed to retrieve IPMI sensor data." >&2
        sleep $SLEEP_INTERVAL
        continue
    fi
    
    # Extract CPU temperatures
    cpu_temps=()
    while IFS= read -r line; do
        if [[ $line == *"CPU"* && $line == *"Temp"* ]]; then
            temp=$(echo "$line" | awk -F'|' '{print $5}' | sed 's/[^0-9.]//g')
            if [[ -n $temp ]]; then
                cpu_temps+=("$temp")
            fi
        fi
    done <<< "$ipmi_output"
    
    # Determine the highest CPU temperature
    max_temp="${cpu_temps[0]}"
    for temp in "${cpu_temps[@]}"; do
        (( temp > max_temp )) && max_temp=$temp
    done
    
    # Calculate fan speed based on temperature
    if (( max_temp <= MIN_TEMP )); then
        fan_speed=$MIN_FAN_SPEED
    elif (( max_temp >= MAX_TEMP )); then
        fan_speed=255
    else
        # Cubic scaling for non-linear fan speed control
        temp_ratio=$(echo "scale=4; ($max_temp - $MIN_TEMP) / ($MAX_TEMP - $MIN_TEMP)" | bc)
        raw_speed=$(echo "scale=0; 255 * ($temp_ratio ^ 3)" | bc)
        scaled_speed=$(echo "$raw_speed * $SCALING_FACTOR" | bc)
        fan_speed=${scaled_speed%.*}
    fi

Implementing Hysteresis for Stability

To prevent rapid fan speed oscillations, we implement hysteresis—a technique that creates a buffer zone around the current fan speed:

    # Apply hysteresis to stabilize fan speed changes
    if (( prev_fan_speed != -1 )); then
        if (( fan_speed > prev_fan_speed && fan_speed - prev_fan_speed < HYSTERESIS )); then
            fan_speed=$prev_fan_speed
        elif (( fan_speed < prev_fan_speed && prev_fan_speed - fan_speed < HYSTERESIS )); then
            fan_speed=$prev_fan_speed
        fi
    fi
    
    # Convert fan speed to hexadecimal for IPMI commands
    fan_speed_hex=$(printf '%02x' "$fan_speed")
    
    # Send IPMI commands to set fan speed for each fan zone
    i=0
    for fan_zone in "${fan_zones[@]}"; do
        i=$((i + 1))
        ipmitool raw 0x3a 0x07 0x0${i} 0x$fan_speed_hex 0x01 &> /dev/null
    done
    
    # Apply the fan speed changes
    ipmitool raw 0x3a 0x06 &> /dev/null
    
    # Update previous fan speed and wait for next iteration
    prev_fan_speed=$fan_speed
    sleep $SLEEP_INTERVAL
done

Advanced Improvements: Temperature Averaging

To further reduce fan pulsing caused by sensor noise, we can implement temperature averaging over a rolling window:

# Enhanced version with temperature averaging
AVERAGE_WINDOW=3
temperature_history=()

# Inside the main loop, after getting current_max_temp:
temperature_history+=("$current_max_temp")
if [[ ${#temperature_history[@]} -gt $AVERAGE_WINDOW ]]; then
    temperature_history=("${temperature_history[@]:1}")
fi

# Compute the average temperature
sum=0
for temp in "${temperature_history[@]}"; do
    sum=$(echo "$sum + $temp" | bc)
done
average_temp=$(echo "scale=2; $sum / ${#temperature_history[@]}" | bc)

# Use average_temp instead of max_temp for fan speed calculation

Key Strategies for Mitigating Fan Pulsing

Through testing and refinement, several key strategies emerged for creating stable fan control:

  • Increased Hysteresis: Expanding the buffer zone reduces sensitivity to minor temperature fluctuations
  • Conservative Scaling: Using a lower scaling factor (0.3 instead of 0.4) creates smoother transitions
  • Temperature Averaging: Smoothing sensor readings over multiple samples reduces noise impact
  • Appropriate Sleep Intervals: Balancing responsiveness with stability through proper timing
  • Minimum Speed Enforcement: Preventing fan speeds from dropping too low to avoid system warnings

Best Practices for IPMI Fan Control

When implementing custom fan control solutions, consider these best practices:

  • Test Thoroughly: Validate the script under various load conditions and temperature scenarios
  • Monitor System Health: Implement logging to track temperature trends and fan behavior
  • Fail-Safe Mechanisms: Ensure the system can revert to default fan control if the script fails
  • Hardware-Specific Tuning: Adjust parameters based on your specific server model and cooling architecture
  • Regular Maintenance: Periodically review and update thresholds based on system performance

Conclusion

Creating a robust IPMI fan control script requires careful consideration of temperature monitoring, fan speed calculation, and stability mechanisms. By implementing hysteresis, temperature averaging, and conservative scaling factors, we can achieve smooth and reliable fan control that optimizes both cooling efficiency and acoustic performance. The key is finding the right balance between responsiveness and stability for your specific hardware and environmental conditions.