2011年10月24日 星期一

[C++] pthread_cancel issue

今天遇到一個很神奇的情況, 發現program在call pthread_cancel的時候居然abort, 下面是簡化過後的code, create一個thread, 並且在之後把他cancel.


#include

#include

#include
#include
using namespace std;

void* sleepyThread(void*)
{
try
{
cerr << "enter sleep" << endl;
sleep(20);
}
catch(...)
{
cerr <<"catch all";
}
}

int main()
{
pthread_t thread;
int id=pthread_create(&thread, NULL, &sleepyThread, NULL);

cerr<<"lets try to cancel it..."<< id << endl;
sleep(1);
pthread_cancel(thread);
pthread_join(thread, NULL);
}

經過實驗, 發現sleepyThread不catch all exception或是不呼叫pthreadjoin就可以正常運作, 不然一旦child thread有catch all就會產生abort,
從http://stackoverflow.com/questions/4766768/unhandled-forced-unwind-causes-abort 得知,
在呼叫pthread 
cancel的時候會產生unwind exception, 這時一定要re-throw, 不然會有問題.
基本上在call pthread
cancel他是async的方式, 只是把thread state設成cancel, 之後就會等待. 所以之前實驗把pthreadjoin拿掉不會abort, 只是因為他還沒走到cancellation point.

來看一下當發生abort時的backtrace, 從gdb上面看到他最後呼叫了unwind_cleanup


(gdb) r
Starting program: /home/ytshen/a.out
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff709c700 (LWP 17360)]
lets try to cancel it...0
enter sleep
helloFATAL: exception not rethrown

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff709c700 (LWP 17360)]
0x00007ffff70d0ba5 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x00007ffff70d0ba5 in raise () from /lib/libc.so.6
#1 0x00007ffff70d46b0 in abort () from /lib/libc.so.6
#2 0x00007ffff7bcd311 in unwind_cleanup () from /lib/libpthread.so.0
#3 0x0000000000400b81 in sleepyThread(void*) ()
#4 0x00007ffff7bc6971 in start_thread () from /lib/libpthread.so.0
#5 0x00007ffff718392d in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

所以接下來就直接去看pthread code, 他有註冊unwindcleanup, 當exception被catch的話會被call,
如果看pthread
cancel這個function可以知道底層他其實找出thread id並且送給他signal. (From http://sourceware.org/git/?p=glibc.git;a=blobplain;f=nptl/pthread_cancel.c;hb=HEAD)

(pthread是pure C, 但是如果是用C++, 底層會用signal handler並且在裡面throw exception, 這應該是為了C++必須符合離開scope要把local variable都給destroy, 所以採用exception方式, 可以從下面的pthread_create function得知, 他把parent signal mask reset)


/* If the parent was running cancellation handlers while creating
264 the thread the new thread inherited the signal mask. Reset the
265 cancellation signal mask. */
if (__builtin_expect (pd->parent_cancelhandling & CANCELING_BITMASK, 0))
...
__sigemptyset (&mask);
__sigaddset (&mask, SIGCANCEL);

Ans: 

http://kolpackov.net/projects/glibc/cxx-unwind/

http://groups.google.com/group/comp.programming.threads/browse_thread/thread/652bcf186fbbf697/f63757846514e5e5?pli=1

 

從以下code, 可以發現當exception被抓住, 就會呼叫abort,


// From http://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/unwind.c
103 unwind_cleanup (_Unwind_Reason_Code reason, struct _Unwind_Exception *exc)
104 {
105 /* When we get here a C++ catch block didn't rethrow the object. We
106 cannot handle this case and therefore abort. */
107 # define STR_N_LEN(str) str, strlen (str)
108 INTERNAL_SYSCALL_DECL (err);
109 INTERNAL_SYSCALL (write, err, 3, STDERR_FILENO,
110 STR_N_LEN ("FATAL: exception not rethrown\n"));
111 abort ();
112 }
...
117 void
118 __cleanup_fct_attribute __attribute ((noreturn))
119 __pthread_unwind (__pthread_unwind_buf_t *buf)
120 {
121 struct pthread_unwind_buf *ibuf = (struct pthread_unwind_buf *) buf;
122 struct pthread *self = THREAD_SELF;
123
124 #ifdef HAVE_FORCED_UNWIND
125 /* This is not a catchable exception, so don't provide any details about
126 the exception type. We do need to initialize the field though. */
127 THREAD_SETMEM (self, exc.exception_class, 0);
128 THREAD_SETMEM (self, exc.exception_cleanup, unwind_cleanup);
129
130 _Unwind_ForcedUnwind (&self->exc, unwind_stop, ibuf);
131 #else

結論:
寫C++ code的時候, 如果有catch all exception, 最後還是要re-throw 或是根本不要在C++裡面cancel thread, 不然在multi-thread的情況下, 有可能有未預期的情況出現!!

問題:
為什饃是在pthreadjoin才會造成abort, 從code來看, 應該在pthreadcancel就會trigger catch all exception, 怎麼會在pthreadjoin才發生... 
我在pthread_
join上面沒有看到類似的code, 不過大家都不推薦在C++裡面cancel thread!

Reference:

  • http://stackoverflow.com/questions/4766768/unhandled-forced-unwind-causes-abort
  • http://stackoverflow.com/questions/4760687/cancelling-a-thread-using-pthread-cancel-good-practice-or-bad
  • http://udrepper.livejournal.com/21541.html
  • http://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_cancel.c;h=55bb0da922ba1ed1c4bd33478075e1b41f2baaff;hb=3a33e487eeb65e2f1f633581c56bee2c60d0ca43
  • http://skaark.wordpress.com/2010/08/26/pthread_cancel-considered-harmful/