5.2.7 由Windows虚拟内存导致的长时间停顿[1]
例如,有一个带心跳检测功能的GUI桌面程序,每15秒会发送一次心跳检测信号,如果对方30秒以内都没有信号返回,那就认为和对方程序的连接已经断开。程序上线后发现心跳检测有误报的概率,查询日志发现误报的原因是程序会偶尔出现间隔约一分钟左右的时间完全无日志输出,处于停顿状态。
因为是桌面程序,所需的内存并不大(-Xmx256m),所以开始并没有想到是GC导致的程序停顿,但是加入参数-XX:+PrintGCApplicationStoppedTime-XX:+PrintGCDateStamps-Xloggc:gclog.log后,从GC日志文件中确认了停顿确实是由GC导致的,大部分GC时间都控制在100毫秒以内,但偶尔就会出现一次接近1分钟的GC。
Total time for which application threads were stopped:0.0112389 seconds
Total time for which application threads were stopped:0.0001335 seconds
Total time for which application threads were stopped:0.0003246 seconds
Total time for which application threads were stopped:41.4731411 seconds
Total time for which application threads were stopped:0.0489481 seconds
Total time for which application threads were stopped:0.1110761 seconds
Total time for which application threads were stopped:0.0007286 seconds
Total time for which application threads were stopped:0.0001268 seconds
从GC日志中找到长时间停顿的具体日志信息(添加了-XX:+PrintReferenceGC参数),找到的日志片段如下所示。从日志中可以看出,真正执行GC动作的时间不是很长,但从准备开始GC,到真正开始GC之间所消耗的时间却占了绝大部分。
2012-08-29T19:14:30.968+0800:10069.800:[GC10099.225:[SoftReference,0 refs,0.0000109 secs]10099.226:[WeakReference,4072 refs,0.0012099 secs]10099.227:[FinalReference,984 refs,1.5822450 secs]10100.809:[PhantomReference,251 refs,0.0001394 secs]10100.809:[JNI Weak Reference,0.0994015 secs][PSYoungGen:175672K->8528K(167360K)]251523K->100182K(353152K),31.1580402 secs][Times:user=0.61 sys=0.52,real=31.16 secs]
除GC日志之外,还观察到这个GUI程序内存变化的一个特点,当它最小化的时候,资源管理中显示的占用内存大幅度减小,但是虚拟内存则没有变化,因此怀疑程序在最小化时它的工作内存被自动交换到磁盘的页面文件之中了,这样发生GC时就有可能因为恢复页面文件的操作而导致不正常的GC停顿。
在MSDN上查证[2]后确认了这种猜想,因此,在Java的GUI程序中要避免这种现象,可以加入参数“-Dsun.awt.keepWorkingSetOnMinimize=true”来解决。这个参数在许多AWT的程序上都有应用,例如JDK自带的Visual VM,用于保证程序在恢复最小化时能够立即响应。在这个案例中加入该参数后,问题得到解决。
[1]本案例来源于HLLVM组群的讨论:http://hllvm.group.iteye.com/group/topic/28745。
[2]http://support.microsoft.com/default.aspx?scid=kb;en-us;293215。