soft lockup的确认
接下来实际确认锁定的检测情况。本次使用的是Fedora 14。
这里在初始化代码中生成执行死循环的内核模块。安装这个模块后,内核就会进行无限循环处理,进入锁定状态。
源文件和Makefile如下所示。
cat Makefle
obj-m:=lockup.o
cat lockup.c
include<linux/module.h>
static int lockup_init(void)
{
for(;)
;
return 0;
}
static void lockup_exit(void)
{
}
module_init(lockup_init);
module_exit(lockup_exit);
执行make,创建内核模块。
make-C/lib/modules/2.6.35.11-83.fc14.x86_64/build M='pwd'modules
当前目录下应当生成了lockup.ko。下面使用insmod命令将问题模块安装到内核中。当执行insmod命令时,内核就会因无限循环而进入锁定状态。约60秒后锁定检测出来,就会输出如下所示的BUG:soft lockup之后的内容。
这个模块初始化无限循环不会结束,因此需要在确认锁定之后重启。
insmod lockup.ko
[1001.569249]lockup:module license'unspecified'taints kernel.
[1001.571751]Disabling lock debugging due to kernel taint
[1066.226006]BUG:soft lockup-CPU#0 stuck for 61s![insmod:1374]
[1066.226006]Modules linked in:lockup(P+)ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 i2c_piix4 ppdev parport_pc parport virtio_net i2c_core virtio_blk virtio_pci virtio_ring virtio[last unloaded:scsi_wait_scan]
[1066.226006]CPU 0
[1066.226006]Modules linked in:lockup(P+)ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 i2c_piix4 ppdev parport_pc parport virtio_net i2c_core virtio_blk virtio_pci virtio_ring virtio[last unloaded:scsi_wait_scan]
[1066.226006]
[1066.226006]Pid:1374,comm:insmod Tainted:P 2.6.35.11-83.
fc14.x86_64#1/Bochs
[1066.226006]RIP:0010:[<ffffffffa00fa009>][<ffffffffa00fa009>]lockup_init+0x9/0xb[lockup]
[1066.226006]RSP:0018:ffff880037b59f08 EFLAGS:00000246
[1066.226006]RAX:ffff880037b59fd8 RBX:ffff880037b59f08 RCX:0000000000000000
[1066.226006]RDX:0000000000000001 RSI:0000000000000000 RDI:ffffffffa00fa000
[1066.226006]RBP:ffffffff8100a68e R08:0000000000000000 R09:ffff88003d51a128
[1066.226006]R10:ffffffffa00fa280 R11:0000000000000000 R12:ffffffffa00fa050
[1066.226006]R13:ffffffffa00fa280 R14:ffffffff810c3200 R15:ffff880037b59ea8
[1066.226006]FS:00007fb3b696b720(0000)GS:ffff880002000000(0000)knlGS:0000000000000000
[1066.226006]CS:0010 DS:0000 ES:0000 CR0:000000008005003b
[1066.226006]CR2:0000000000e58050 CR3:000000003bbe9000 CR4:00000000000006f0
[1066.226006]DR0:0000000000000000 DR1:0000000000000000 DR2:0000000000000000
[1066.226006]DR3:0000000000000000 DR6:00000000ffff0ff0 DR7:0000000000000400
[1066.226006]Process insmod(pid:1374,threadinfo ffff880037b58000,task ffff88003b961740)
[1066.226006]Stack:
[1066.226006]ffff880037b59f38 ffffffff810021a1 00000000000164e3 ffffffffa00fa050
[1066.226006]<0>0000000000000000 0000000001d0e030 ffff880037b59f78 ffffffff8107cce1
[1066.226006]<0>0000000001d0e010 00000000000164e3 00000000000164e3 00007fffedd1fe69
[1066.226006]Call Trace:
[1066.226006][<ffffffff810021a1>]?do_one_initcall+0x5e/0x155
[1066.226006][<ffffffff8107cce1>]?sys_init_module+0xa6/0x1e4
[1066.226006][<ffffffff81009cf2>]?system_call_fastpath+0x16/0x1b
[1066.226006]Code:<eb>fe 55 48 89 e5 0f 1f 44 00 00 c9 c3 90 90 04 00 00 00 14 00 00
[1066.226006]Call Trace:
[1066.226006][<ffffffff810021a1>]?do_one_initcall+0x5e/0x155
[1066.226006][<ffffffff8107cce1>]?sys_init_module+0xa6/0x1e4
[1066.226006][<ffffffff81009cf2>]?system_call_fastpath+0x16/0x1b
此外,也可以应对像下面这样适用实时等级的FIFO调度策略的进程持续占用CPU的情况。而Fedora 14在标准设置中对于实时等级的CPU时间不加限制,因此无法直接检测出锁定。这里设置为sched_rt_runtime_us=-1,来解除对实时级进程的CPU时间的限制。
cat lockup.c
include<sched.h>
int main(int argc, char**argv)
{
struct sched_param p={.sched_priority=99};
sched_setscheduler(0,SCHED_FIFO,&p);
for(;)
;
return 0;
}
gcc-g-O2-o lockup lockup.c
sysctl-w kernel.sched_rt_runtime_us=-1
kernel.sched_rt_runtime_us=-1
./lockup
[638.877007]BUG:soft lockup-CPU#0 stuck for 61s![lockup:1204]
[638.877007]Modules linked in:ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 ppdev parport_pc parport i2c_piix4 virtio_net i2c_core virtio_blk virtio_pci virtio_ring virtio[last unloaded:scsi_wait_scan]
[638.877007]CPU 0
[638.877007]Modules linked in:ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 ppdev parport_pc parport i2c_piix4 virtio_net i2c_core virtio_blk virtio_pci virtio_ring virtio[last unloaded:scsi_wait_scan]
[638.877007]
[638.877007]Pid:1204,comm:lockup Not tainted 2.6.35.11-83.fc14.x86_64#1/Bochs
[638.877007]RIP:0033:[<00000000004004fa>][<00000000004004fa>]0x4004fa
[638.877007]RSP:002b:00007fff4e495fb0 EFLAGS:00000217
[638.877007]RAX:0000000000000000 RBX:0000000000000000 RCX:00007eff7c9395a7
[638.877007]RDX:00007fff4e495fb0 RSI:0000000000000001 RDI:0000000000000000
[638.877007]RBP:ffffffff8100a68e R08:00007eff7cc075e0 R09:00007eff7cc1b230
[638.877007]R10:00007fff4e495d20 R11:0000000000000202 R12:00007fff4e4960a0
[638.877007]R13:00000000004003f0 R14:0000000000000000 R15:00000000004003f0
[638.877007]FS:00007eff7ce22720(0000)GS:ffff880002000000(0000)
knlGS:0000000000000000
[638.877007]CS:0010 DS:0000 ES:0000 CR0:0000000080050033
[638.877007]CR2:00007eff7c9395a0 CR3:000000003750c000 CR4:00000000000006f0
[638.877007]DR0:0000000000000000 DR1:0000000000000000 DR2:0000000000000000
[638.877007]DR3:0000000000000000 DR6:00000000ffff0ff0 DR7:0000000000000400
[638.877007]Process lockup(pid:1204,threadinfo ffff88003dc38000,task
ffff88003b961740)
[638.877007]
[638.877007]Call Trace: