Crafting a Robust Custom IPMI Fan Control Script: Overcoming Challenges and Mitigating Unexpected Pulsing
Managing server hardware efficiently is crucial for maintaining optimal performance and longevity. This article explores the process of writing a custom IPMI fan control script, the challenges encountered, and strategies to address unexpected fan pulsing.
Understanding the Need for Custom Fan Control
Servers operate under varying loads, and their cooling systems must adapt dynamically to maintain safe operating temperatures. Default fan settings are often generic and may not account for specific workload patterns or environmental conditions. Custom fan control scripts allow administrators to optimize cooling efficiency, reduce noise levels, enhance energy savings, and extend hardware lifespan.
The Challenge: Unexpected Fan Pulsing
One of the most perplexing issues encountered was unexpected fan pulsing—rapid fluctuations in fan speed that can lead to both acoustic disturbances and reduced hardware lifespan. This behavior typically stems from several factors including sensor noise, insufficient hysteresis, and aggressive scaling factors.
Core Script Implementation
The foundation of our fan control solution is a bash script that continuously monitors CPU temperatures and adjusts fan speeds accordingly. Here's the core implementation:
#!/bin/bash
# Function to retrieve fan zones using IPMI
get_fan_zones() {
local output
if ! output=$(ipmitool sdr 2>/dev/null); then
echo "Error: Failed to retrieve IPMI sensor data." >&2
exit 1
fi
local zones=()
while IFS= read -r line; do
if [[ $line =~ ^Fan\ ([0-9][A-Z])\ Tach ]]; then
zones+=("${BASH_REMATCH[1]}")
fi
done <<< "$output"
echo "${zones[@]}"
}
# Initialize fan zones
fan_zones=($(get_fan_zones))
if [[ ${#fan_zones[@]} -eq 0 ]]; then
echo "Error: No fan zones found." >&2
exit 1
fi
# Define temperature thresholds
MIN_TEMP=30 # Minimum temperature (°C)
MAX_TEMP=80 # Maximum temperature (°C)
# Scaling factor to moderate fan speed adjustments
SCALING_FACTOR=0.4 # 40% of calculated speed to prevent aggressive changes
# Hysteresis to prevent rapid fan speed oscillations
HYSTERESIS=5 # Degrees
# Update frequency
SLEEP_INTERVAL=15 # Seconds between updates
# Minimum allowable fan speed to avoid system warnings
MIN_FAN_SPEED=20
Temperature Monitoring and Fan Speed Calculation
The script continuously monitors CPU temperatures and calculates appropriate fan speeds using a cubic scaling function for smooth transitions:
while true; do
# Retrieve temperature sensor data
if ! ipmi_output=$(ipmitool sdr type temperature 2>/dev/null); then
echo "Warning: Failed to retrieve IPMI sensor data." >&2
sleep $SLEEP_INTERVAL
continue
fi
# Extract CPU temperatures
cpu_temps=()
while IFS= read -r line; do
if [[ $line == *"CPU"* && $line == *"Temp"* ]]; then
temp=$(echo "$line" | awk -F'|' '{print $5}' | sed 's/[^0-9.]//g')
if [[ -n $temp ]]; then
cpu_temps+=("$temp")
fi
fi
done <<< "$ipmi_output"
# Determine the highest CPU temperature
max_temp="${cpu_temps[0]}"
for temp in "${cpu_temps[@]}"; do
(( temp > max_temp )) && max_temp=$temp
done
# Calculate fan speed based on temperature
if (( max_temp <= MIN_TEMP )); then
fan_speed=$MIN_FAN_SPEED
elif (( max_temp >= MAX_TEMP )); then
fan_speed=255
else
# Cubic scaling for non-linear fan speed control
temp_ratio=$(echo "scale=4; ($max_temp - $MIN_TEMP) / ($MAX_TEMP - $MIN_TEMP)" | bc)
raw_speed=$(echo "scale=0; 255 * ($temp_ratio ^ 3)" | bc)
scaled_speed=$(echo "$raw_speed * $SCALING_FACTOR" | bc)
fan_speed=${scaled_speed%.*}
fi
Implementing Hysteresis for Stability
To prevent rapid fan speed oscillations, we implement hysteresis—a technique that creates a buffer zone around the current fan speed:
# Apply hysteresis to stabilize fan speed changes
if (( prev_fan_speed != -1 )); then
if (( fan_speed > prev_fan_speed && fan_speed - prev_fan_speed < HYSTERESIS )); then
fan_speed=$prev_fan_speed
elif (( fan_speed < prev_fan_speed && prev_fan_speed - fan_speed < HYSTERESIS )); then
fan_speed=$prev_fan_speed
fi
fi
# Convert fan speed to hexadecimal for IPMI commands
fan_speed_hex=$(printf '%02x' "$fan_speed")
# Send IPMI commands to set fan speed for each fan zone
i=0
for fan_zone in "${fan_zones[@]}"; do
i=$((i + 1))
ipmitool raw 0x3a 0x07 0x0${i} 0x$fan_speed_hex 0x01 &> /dev/null
done
# Apply the fan speed changes
ipmitool raw 0x3a 0x06 &> /dev/null
# Update previous fan speed and wait for next iteration
prev_fan_speed=$fan_speed
sleep $SLEEP_INTERVAL
done
Advanced Improvements: Temperature Averaging
To further reduce fan pulsing caused by sensor noise, we can implement temperature averaging over a rolling window:
# Enhanced version with temperature averaging
AVERAGE_WINDOW=3
temperature_history=()
# Inside the main loop, after getting current_max_temp:
temperature_history+=("$current_max_temp")
if [[ ${#temperature_history[@]} -gt $AVERAGE_WINDOW ]]; then
temperature_history=("${temperature_history[@]:1}")
fi
# Compute the average temperature
sum=0
for temp in "${temperature_history[@]}"; do
sum=$(echo "$sum + $temp" | bc)
done
average_temp=$(echo "scale=2; $sum / ${#temperature_history[@]}" | bc)
# Use average_temp instead of max_temp for fan speed calculation
Key Strategies for Mitigating Fan Pulsing
Through testing and refinement, several key strategies emerged for creating stable fan control:
- Increased Hysteresis: Expanding the buffer zone reduces sensitivity to minor temperature fluctuations
- Conservative Scaling: Using a lower scaling factor (0.3 instead of 0.4) creates smoother transitions
- Temperature Averaging: Smoothing sensor readings over multiple samples reduces noise impact
- Appropriate Sleep Intervals: Balancing responsiveness with stability through proper timing
- Minimum Speed Enforcement: Preventing fan speeds from dropping too low to avoid system warnings
Best Practices for IPMI Fan Control
When implementing custom fan control solutions, consider these best practices:
- Test Thoroughly: Validate the script under various load conditions and temperature scenarios
- Monitor System Health: Implement logging to track temperature trends and fan behavior
- Fail-Safe Mechanisms: Ensure the system can revert to default fan control if the script fails
- Hardware-Specific Tuning: Adjust parameters based on your specific server model and cooling architecture
- Regular Maintenance: Periodically review and update thresholds based on system performance
Conclusion
Creating a robust IPMI fan control script requires careful consideration of temperature monitoring, fan speed calculation, and stability mechanisms. By implementing hysteresis, temperature averaging, and conservative scaling factors, we can achieve smooth and reliable fan control that optimizes both cooling efficiency and acoustic performance. The key is finding the right balance between responsiveness and stability for your specific hardware and environmental conditions.