文章目录

前几天遇到了线程死锁问题,导致系统ANR。然后拿到系统/data/anr/traces.txt日志文件后,进行分析,然后找到原因,并解决了问题。

java线程产生死锁的原因,要么同时满足死锁的四个条件,要么是android系统机制的原因,认为发生了类似死锁的情况而导致系统ANR

很明显,既然导致了系统ANR,那么肯定与UI线程有关。首先看下面出现死锁的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

/**
* HTTP请求处理
*/
public class HttpRequest {

/**
* 请求次数计数
*/
private static int mRequestCount;

private static final Object mSyncObject = new Object();

public synchronized static void request(final PushRequestParams params, final RequestCallback callback) {
new Thread(new Runnable() {
@Override
public void run() {
android.os.Process.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND);
synchronized (mSyncObject) {
mRequestCount = 0;
final RequestCallback cb = new RequestCallback() {
@Override
public void onRequestCallback(int state, String s) {
if (state == STATE_SUCCESS) {
// 请求成功
if (callback != null) {
callback.onRequestCallback(state, s);
}
} else {
// 这个必须放到while循环前面
if (mRequestCount >= 3 && callback != null) {
// 请求失败
callback.onRequestCallback(state, s);
}

// 失败后重试三次
while (mRequestCount < 3) {
++mRequestCount;
request(params.toString(), params.sessionId, this);
}
}
}
};

request(params.toString(), params.sessionId, cb);
}
}
}).start();
}

private synchronized static void request(String host, String sessionId, RequestCallback callback) {
boolean isSuccess = false;
String s = null;
try {
URL url = new URL(host);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setConnectTimeout(30 * 1000);
conn.setReadTimeout(30 * 1000);
conn.setRequestProperty("Cookie", "JSESSIONID=" + sessionId);
conn.connect();
int state = conn.getResponseCode();
if (BuildConfig.DEBUG) {
Log.d("may", "request state: " + state);
}

if (state == HttpURLConnection.HTTP_OK) {
s = streamToString(conn.getInputStream());
if (BuildConfig.DEBUG) {
Log.d("may", "result: " + s);
}

try {
JSONObject jobj = new JSONObject(s);
int code = jobj.optInt("code", -1);
if (code == 0) {
isSuccess = true;
}
} catch (JSONException e) {
e.printStackTrace();
}
}

} catch (MalformedURLException e) {
e.printStackTrace();
s = e.toString();
} catch (IOException e) {
e.printStackTrace();
s = e.toString();
} finally {
if (callback != null) {
callback.onRequestCallback(isSuccess ? RequestCallback.STATE_SUCCESS : RequestCallback.STATE_ERROR, s == null ? "" : s);
}
}
}
}

上面代码是为了保证请求次数mRequestCount的值不超过3次,同时也考虑多线程的情况,所以对这个值进行修改时时行了同步。

接着看出现ANR的提示,以下是错误日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 obj=0x750d06e8 self=0xe7685400
| sysTid=29016 nice=-10 cgrp=default sched=0/0 handle=0xea2f5534
| state=S schedstat=( 83840100682 22307763410 181443 ) utm=6702 stm=1680 core=1 HZ=100
| stack=0xff53a000-0xff53c000 stackSize=8MB
| held mutexes=
at *.*.HttpRequest.request(HttpRequest.java:33)
- waiting to lock <0x08483a94> (a java.lang.Class<*.*.HttpRequest>) held by thread 24
at android.app.LoadedApk$ReceiverDispatcher$Args.run(LoadedApk.java:1122)
at android.os.Handler.handleCallback(Handler.java:751)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:154)
at android.app.ActivityThread.main(ActivityThread.java:6119)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:886)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:776)

上面的日志要耐着性子在traces.txt文件中慢慢找,因为这个文件非常大,刚开始的内容不一定就是异常日志。如果运气好,打开就可以看到。所以可以试着找类似"main" prio=5 tid=1 Blocked这样的字眼,或者查找自己的代码中类文件的主包名,这样来得更快(也有可能找不到:) )。

从上面日志大概可以读出一些信息。比如"main" prio=5 tid=1 Blocked表示当前线程被Blocked,即阻塞,线程的名称是main,即UI主线程,prio=5是线程的优先级,tid=1是线程的ID。`held mutexes=
at ..HttpRequest.request(HttpRequest.java:33)

  • waiting to lock <0x08483a94> (a java.lang.Class<..HttpRequest>) held by thread 24这段日志中已经告诉了具体发生问题的原因。在HttpRequest.java:33这个位置发生了阻塞,waiting to lock <0x08483a94>等待0x08483a94这个锁被释放,这个锁java.lang.Class<..HttpRequest>) 是一个Class锁,被held by thread 24 `线程ID为24的线程持有。

于是搜索关键字<0x08483a94>,又可以找到下面一段日志。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

"Thread-173" prio=5 tid=24 Native
| group="main" sCount=1 dsCount=0 obj=0x33824e50 self=0xb6cf2800
| sysTid=11277 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0xc4453920
| state=S schedstat=( 8781561 3715104 14 ) utm=0 stm=0 core=2 HZ=100
| stack=0xc4351000-0xc4353000 stackSize=1038KB
| held mutexes=
kernel: __switch_to+0x8c/0x98
kernel: poll_schedule_timeout+0x54/0xbc
kernel: do_sys_poll+0x2c4/0x384
kernel: compat_sys_ppoll+0x134/0x1e8
kernel: cpu_switch_to+0x48/0x4c
native: #00 pc 00048684 /system/lib/libc.so (__ppoll+20)
native: #01 pc 0001cf67 /system/lib/libc.so (poll+46)
native: #02 pc 0000e6bd /system/lib/libopenjdk.so (NET_Poll+52)
native: #03 pc 0001888f /system/lib/libopenjdk.so (PlainSocketImpl_socketConnect+286)
native: #04 pc 000b2cb7 /system/framework/arm/boot.oat (Java_java_net_PlainSocketImpl_socketConnect__Ljava_net_InetAddress_2II+114)
at java.net.PlainSocketImpl.socketConnect(Native method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:334)
- locked <0x09ddcdde> (a java.net.SocksSocketImpl)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:196)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:356)
at java.net.Socket.connect(Socket.java:605)
at com.android.okhttp.internal.Platform.connectSocket(Platform.java:113)
at com.android.okhttp.Connection.connectSocket(Connection.java:196)
at com.android.okhttp.Connection.connect(Connection.java:172)
at com.android.okhttp.Connection.connectAndSetOwner(Connection.java:367)
at com.android.okhttp.OkHttpClient$1.connectAndSetOwner(OkHttpClient.java:130)
at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:329)
at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:246)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:457)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:126)
at *.*.HttpRequest.request(PushRequest.java:79)
- locked <0x08483a94> (a java.lang.Class<*.*.HttpRequest>)
at *.*.HttpRequest.access$200(PushRequest.java:23)
at *.*.HttpRequest$1.run(PushRequest.java:63)
- locked <0x0fde7bbf> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:761)

搜索上面的关键字会找到这里locked <0x08483a94> (a java.lang.Class<*.*.HttpRequest>),是在这个位置产生问题at *.*.HttpRequest.request(PushRequest.java:79)。到这里问题就容易理解了,因为执行了conn.connect();这行代码,这行代码所在的request方法是同步方法,所以持有Class锁。而这行代码执行需要的时间是不确定的,这个得看网络情况,如果网络不好,需要等待30秒。

当在主线程中调用这个类的public synchronized static void request(final PushRequestParams params, final RequestCallback callback)这个方法时,由于这个方法也是同步的,因为是静态方法,所以这个同步的是Class,即持有Class锁。因为这个锁已经被执行private synchronized static void request(String host, String sessionId, RequestCallback callback)方法所在的线程持有,而这个方法在做一个执行时间不确定的http请求,从而导致了互斥条件的产生。

这种互斥不很严格意义上互斥,因为这个http请求最长时间是30秒,也就是30秒后释放锁,其他线程可以获得锁,但是违反了系统规则,因为在这个类的外部是在主线程中调用这个request方法,导致主线程等待。由于android系统的机制,等待的时间超过了允许的时间限制,导致其他事件不能处理,所以直接就ANR了。

知道了原因,就开始改代码了,代码修改后如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

/**
* HTTP请求处理
*/
public class HttpRequest {

private static ThreadPoolExecutor mThreadPool = new ThreadPoolExecutor(1, 1, 10, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), Executors.defaultThreadFactory(), new ThreadPoolExecutor.DiscardOldestPolicy());

public static void request(final PushRequestParams params, final RequestCallback callback) {
mThreadPool.execute(new Runnable() {

int requestCount = 0;

@Override
public void run() {
android.os.Process.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND);
requestCount = 0;
final RequestCallback cb = new RequestCallback() {
@Override
public void onRequestCallback(int state, String s) {
if (state == STATE_SUCCESS) {
// 请求成功
if (callback != null) {
callback.onRequestCallback(state, s);
}
} else {
// 这个必须放到while循环前面
if (requestCount >= 3 && callback != null) {
// 请求失败
callback.onRequestCallback(state, s);
}

// 失败后重试三次
while (requestCount < 3) {
++requestCount;
request(params.toString(), params.sessionId, this);
}

}
}
};

request(params.toString(), params.sessionId, cb);
}
});

}

private static void request(String host, String sessionId, RequestCallback callback) {
boolean isSuccess = false;
String s = null;
try {
URL url = new URL(host);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setConnectTimeout(30 * 1000);
conn.setReadTimeout(30 * 1000);
conn.setRequestProperty("Cookie", "JSESSIONID=" + sessionId);
conn.connect();
int state = conn.getResponseCode();
if (BuildConfig.DEBUG) {
Log.d("may", "request state: " + state);
}

if (state == HttpURLConnection.HTTP_OK) {
s = streamToString(conn.getInputStream());
if (BuildConfig.DEBUG) {
Log.d("may", "result: " + s);
}

try {
JSONObject jobj = new JSONObject(s);
int code = jobj.optInt("code", -1);
if (code == 0) {
isSuccess = true;
}
} catch (JSONException e) {
e.printStackTrace();
}
}

} catch (MalformedURLException e) {
e.printStackTrace();
s = e.toString();
} catch (IOException e) {
e.printStackTrace();
s = e.toString();
} finally {
if (callback != null) {
callback.onRequestCallback(isSuccess ? RequestCallback.STATE_SUCCESS : RequestCallback.STATE_ERROR, s == null ? "" : s);
}
}

}
}

使用线程池,并且只有一个线程,在外部多次调用request方法,让其进入队列等待。把请求次数的计数器requestCount放在新建的Runnable对象中,这样就不用担心多线程同步的问题。

中间有个插曲。

使用adb pull /data/anr/traces.txt D:\命令获取ANR日志时,会提示adb: error: cannot create file/directory 'D:\': No such file or directory错误,立马反应是修改目录为D:\temp目录,PC的D盘是有这个目录的,然后就再执行adb pull /data/anr/traces.txt D:\temp,OK了。后来在群里也有人遇到这个错误,然后问这个问题怎么解决,然后就是各种骂网上各种文章不靠谱,照着做都出问题。我想说的是,出现问题,自己先思考一下,看错误提示,不要急着找搜索引擎,答案就在错误提示中。

文章目录