通过Android trace文件分析死锁ANR

对于从事Android开发的人来说,遇到ANR(Application Not Responding)是比较常见的问题。一般情况下,如果有ANR发生,系统都会在/data/anr/目录下生成trace文件,通过分析trace文件,可以定位产生ANR的原因。产生ANR的原因有很多,比如CPU使用过高、事件没有得到及时的响应、死锁等,下面将通过一次因为死锁导致的ANR问题,来说明如何通过trace文件分析ANR问题。
 
对应的部分trace文件内容如下:
 
"PowerManagerService" prio=5 tid=24 MONITOR
  | group="main" sCount=1 dsCount=0 obj=0x41dd0eb0 self=0x5241b218
  | sysTid=567 nice=0 sched=0/0 cgrp=apps handle=1380038664
  | state=S schedstat=( 6682116007 11324451214 33313 ) utm=450 stm=219 core=1
  at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:~13045)
  - waiting to lock <0x41a874a0> (a com.android.server.am.ActivityManagerService) held by tid=12 (android.server.ServerThread)
  at android.app.ContextImpl.sendBroadcast(ContextImpl.java:1144)
  at com.android.server.power.PowerManagerService$DisplayBlankerImpl.unblankAllDisplays(PowerManagerService.java:3442)
  at com.android.server.power.DisplayPowerState$PhotonicModulator$1.run(DisplayPowerState.java:456)
  at android.os.Handler.handleCallback(Handler.java:800)
  at android.os.Handler.dispatchMessage(Handler.java:100)
  at android.os.Looper.loop(Looper.java:194)
  at android.os.HandlerThread.run(HandlerThread.java:60)
 
  "Binder_B" prio=5 tid=85 MONITOR
  | group="main" sCount=1 dsCount=0 obj=0x42744770 self=0x58329e88
  | sysTid=3700 nice=-20 sched=0/0 cgrp=apps handle=1471424616
  | state=S schedstat=( 1663727513 2044643318 6806 ) utm=132 stm=34 core=1
  at com.android.server.power.PowerManagerService$DisplayBlankerImpl.toString(PowerManagerService.java:~3449)
  - waiting to lock <0x41a7e420> (a com.android.server.power.PowerManagerService$DisplayBlankerImpl) held by tid=24 (PowerManagerService)
  at java.lang.StringBuilder.append(StringBuilder.java:202)
  at com.android.server.power.PowerManagerService.dump(PowerManagerService.java:3052)
  at android.os.Binder.dump(Binder.java:264)
  at android.os.Binder.onTransact(Binder.java:236)
  at android.os.IPowerManager$Stub.onTransact(IPowerManager.java:373)
  at android.os.Binder.execTransact(Binder.java:351)
  at dalvik.system.NativeStart.run(Native Method)
 
 "android.server.ServerThread" prio=5 tid=12 MONITOR
  | group="main" sCount=1 dsCount=0 obj=0x41a76178 self=0x507837a8
  | sysTid=545 nice=-2 sched=0/0 cgrp=apps handle=1349936616
  | state=S schedstat=( 15368096286 21707846934 69485 ) utm=1226 stm=310 core=0
  at com.android.server.power.PowerManagerService.isScreenOnInternal(PowerManagerService.java:~2529)
  - waiting to lock <0x41a7e2e8> (a java.lang.Object) held by tid=85 (Binder_B)
  at com.android.server.power.PowerManagerService.isScreenOn(PowerManagerService.java:2522)
  at com.android.server.wm.WindowManagerService.sendScreenStatusToClientsLocked(WindowManagerService.java:7749)
  at com.android.server.wm.WindowManagerService.setEventDispatching(WindowManagerService.java:7628)
  at com.android.server.am.ActivityManagerService.updateEventDispatchingLocked(ActivityManagerService.java:8083)
  at com.android.server.am.ActivityManagerService.wakingUp(ActivityManagerService.java:8077)
  at com.android.server.power.Notifier.sendWakeUpBroadcast(Notifier.java:474)
  at com.android.server.power.Notifier.sendNextBroadcast(Notifier.java:455)
  at com.android.server.power.Notifier.access$700(Notifier.java:62)
  at com.android.server.power.Notifier$NotifierHandler.handleMessage(Notifier.java:600)
  at android.os.Handler.dispatchMessage(Handler.java:107)
  at android.os.Looper.loop(Looper.java:194)
  at com.android.server.ServerThread.run(SystemServer.java:1328)
 
           
 
  从trace文件看,是因为TID为24的线程等待一个TID为12的线程持有的锁,TID为12的线程等待一个TID为85的线程持有的锁,而TID为85的线程确等待一个TID为24的线程持有的锁,导致了循环等待的现象,对应的trace文件的语句如下:
 
TID 24:- waiting to lock <0x41a874a0> (a com.android.server.am.ActivityManagerService) held by tid=12 (android.server.ServerThread)
 

TID 12: - waiting to lock <0x41a7e2e8> (a java.lang.Object) held by tid=85 (Binder_B)
 
TID 85:- waiting to lock <0x41a7e420> (a com.android.server.power.PowerManagerService$DisplayBlankerImpl) held by tid=24 (PowerManagerService)
 

既然是死锁,那么先看各线程都有那些锁。
 
先看TID=24的线程的栈顶,ActivityManagerService的broadcastIntent函数代码如下:
 
    public final int broadcastIntent(IApplicationThread caller,
            Intent intent, String resolvedType, IIntentReceiver resultTo,
            int resultCode, String resultData, Bundle map,
            String requiredPermission, boolean serialized, boolean sticky, int userId) {
        enforceNotIsolatedCaller("broadcastIntent");
        synchronized(this) {
            intent = verifyBroadcastLocked(intent);
           
            final ProcessRecord callerApp = getRecordForAppLocked(caller);
            final int callingPid = Binder.getCallingPid();
            final int callingUid = Binder.getCallingUid();
            final long origId = Binder.clearCallingIdentity();
            int res = broadcastIntentLocked(callerApp,
                    callerApp != null ? callerApp.info.packageName : null,
                    intent, resolvedType, resultTo,
                    resultCode, resultData, map, requiredPermission, serialized, sticky,
                    callingPid, callingUid, userId);
            Binder.restoreCallingIdentity(origId);
            return res;
        }

linux

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/cd7dda13bc37dd7898a4c14d63f8aa3d.html