Command Not Found: 10月 2011

2011年10月31日星期一

[C++] wrapper C++ method in C (inline)

先說結論, 這應該是不可行的, 等於你想要直接把C++ embed在C裡面. 通常inline是要寫在header file, 讓使用者直接include.

在這情形就等於你要讓C program直接include C++ library, 所以無法達成.

那如果寫在.cpp裡面, 編成.o, 再跟C program 一起compile.


// this is foo.cpp
#ifdef __cplusplus
extern "C" {
#endif

#include "foo.h"

int foo()
{
    int i = 0;

    return i++;
}

#ifdef __cplusplus
}
#endif


// this is foo.h
inline int foo();

接下來main.c


#include 
#include "foo.h"

int main()
{
    int t = foo();

    printf("%d\n", t);
    return 0;
}

但是在compile main.o foo.o會找不到foo這個symbol, 利用strings也找不到, 這是因為inline寫在foo.cpp, compile展開完之後, foo因為被宣告程inline, 所以就沒有出現在foo.o裡面了.

因此, 在C裡面要wrap C++ library並且inline 應該是不可行的, 不過可以使用wrapper function回傳function pointer的方式作到~

2011年10月24日星期一

[C++] pthread_cancel issue

今天遇到一個很神奇的情況, 發現program在call pthread_cancel的時候居然abort, 下面是簡化過後的code, create一個thread, 並且在之後把他cancel.


#include

#include 

#include 
#include 
using namespace std;   

void* sleepyThread(void*)
{
  try
  {
      cerr << "enter sleep" << endl;
      sleep(20);
  }
  catch(...)
  {
     cerr <<"catch all";
  }
}

int main()
{
  pthread_t thread;
  int id=pthread_create(&thread, NULL, &sleepyThread, NULL);

  cerr<<"lets try to cancel it..."<< id << endl;
  sleep(1);
  pthread_cancel(thread);
  pthread_join(thread, NULL);
}

經過實驗, 發現sleepyThread不catch all exception或是不呼叫pthreadjoin就可以正常運作, 不然一旦child thread有catch all就會產生abort,
從http://stackoverflow.com/questions/4766768/unhandled-forced-unwind-causes-abort 得知,
在呼叫pthread cancel的時候會產生unwind exception, 這時一定要re-throw, 不然會有問題.
基本上在call pthreadcancel他是async的方式, 只是把thread state設成cancel, 之後就會等待. 所以之前實驗把pthreadjoin拿掉不會abort, 只是因為他還沒走到cancellation point.

來看一下當發生abort時的backtrace, 從gdb上面看到他最後呼叫了unwind_cleanup


(gdb) r
Starting program: /home/ytshen/a.out
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff709c700 (LWP 17360)]
lets try to cancel it...0
enter sleep
helloFATAL: exception not rethrown

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff709c700 (LWP 17360)]
0x00007ffff70d0ba5 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007ffff70d0ba5 in raise () from /lib/libc.so.6
#1  0x00007ffff70d46b0 in abort () from /lib/libc.so.6
#2  0x00007ffff7bcd311 in unwind_cleanup () from /lib/libpthread.so.0
#3  0x0000000000400b81 in sleepyThread(void*) ()
#4  0x00007ffff7bc6971 in start_thread () from /lib/libpthread.so.0
#5  0x00007ffff718392d in clone () from /lib/libc.so.6
#6  0x0000000000000000 in ?? ()

所以接下來就直接去看pthread code, 他有註冊unwindcleanup, 當exception被catch的話會被call,
如果看pthreadcancel這個function可以知道底層他其實找出thread id並且送給他signal. (From http://sourceware.org/git/?p=glibc.git;a=blobplain;f=nptl/pthread_cancel.c;hb=HEAD)

(pthread是pure C, 但是如果是用C++, 底層會用signal handler並且在裡面throw exception, 這應該是為了C++必須符合離開scope要把local variable都給destroy, 所以採用exception方式, 可以從下面的pthread_create function得知, 他把parent signal mask reset)


/* If the parent was running cancellation handlers while creating
264      the thread the new thread inherited the signal mask.  Reset the
265      cancellation signal mask.  */
 if (__builtin_expect (pd->parent_cancelhandling & CANCELING_BITMASK, 0))
...
__sigemptyset (&mask);
__sigaddset (&mask, SIGCANCEL);

Ans:

http://kolpackov.net/projects/glibc/cxx-unwind/

http://groups.google.com/group/comp.programming.threads/browse_thread/thread/652bcf186fbbf697/f63757846514e5e5?pli=1

從以下code, 可以發現當exception被抓住, 就會呼叫abort,


// From http://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/unwind.c
 103 unwind_cleanup (_Unwind_Reason_Code reason, struct _Unwind_Exception *exc)
 104 {
 105   /* When we get here a C++ catch block didn't rethrow the object.  We
 106      cannot handle this case and therefore abort.  */
 107 # define STR_N_LEN(str) str, strlen (str)
 108   INTERNAL_SYSCALL_DECL (err);
 109   INTERNAL_SYSCALL (write, err, 3, STDERR_FILENO,
 110                     STR_N_LEN ("FATAL: exception not rethrown\n"));
 111   abort ();
 112 }
...
 117 void
 118 __cleanup_fct_attribute __attribute ((noreturn))
 119 __pthread_unwind (__pthread_unwind_buf_t *buf)
 120 {
 121   struct pthread_unwind_buf *ibuf = (struct pthread_unwind_buf *) buf;
 122   struct pthread *self = THREAD_SELF;
 123
 124 #ifdef HAVE_FORCED_UNWIND
 125   /* This is not a catchable exception, so don't provide any details about
 126      the exception type.  We do need to initialize the field though.  */
 127   THREAD_SETMEM (self, exc.exception_class, 0);
 128   THREAD_SETMEM (self, exc.exception_cleanup, unwind_cleanup);
 129
 130   _Unwind_ForcedUnwind (&self->exc, unwind_stop, ibuf);
 131 #else

結論:
寫C++ code的時候, 如果有catch all exception, 最後還是要re-throw 或是根本不要在C++裡面cancel thread, 不然在multi-thread的情況下, 有可能有未預期的情況出現!!

問題:
為什饃是在pthreadjoin才會造成abort, 從code來看, 應該在pthreadcancel就會trigger catch all exception, 怎麼會在pthreadjoin才發生...
我在pthread_join上面沒有看到類似的code, 不過大家都不推薦在C++裡面cancel thread!

Reference:

http://stackoverflow.com/questions/4766768/unhandled-forced-unwind-causes-abort
http://stackoverflow.com/questions/4760687/cancelling-a-thread-using-pthread-cancel-good-practice-or-bad
http://udrepper.livejournal.com/21541.html
http://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_cancel.c;h=55bb0da922ba1ed1c4bd33478075e1b41f2baaff;hb=3a33e487eeb65e2f1f633581c56bee2c60d0ca43
http://skaark.wordpress.com/2010/08/26/pthread_cancel-considered-harmful/

2011年10月16日星期日

[RabbitMQ] rabbit_mnesia module

RabbitMQ所有的meta data (ex: exchange, user, virtual host)也就是server state 都是存在erlang內建的mnesia distribute database, 並且會存在cluster內部每個node, 這個module主要就是node啟動的時候, 初始化mnesia 這個 database的動作, 包含如果是已經存在其他cluster node, 那就會去跟其他node 同步, 不然就會自己建立新的table.

並且他會檢查erlang version, 是不是一樣, 還有當node reset, 在ram mode and disc mode之間轉換要做的事情.

2011年10月15日星期六

[RabbitMQ] HA on cluster

原來RabbitMQ在使用cluster的時候, 所有的meta都會在所有node各存一份, 除了queue以外, 所以當其中某個node掛掉的時候, 在這個node當中queue裡面的message會全部遺失,

自RabbitMQ 從2.6之後開始support active-active HA, 使用的方法也很簡單, 只需要在declare queue的時候多加"x-ha-policy" 這個argument就可以了.

他是用mirror queue的方式來達到HA的目的, 也就是說, 你可以選擇要把某個queue mirror到哪些其他的nodes.

另外他mirror的方式, 是把master node mirror到其他slave node, 而且要從slave node加進來之後, 他才會開始sync, 之前已經存在master的msg並不會sync.

如果master 掛掉, 他會選擇存在最久的slave node來當作下一個master node.

以目前的2.6.1, "x-ha-policy" 支援兩種方式:

1. all

把目前的queue之後的message都mirror到cluster裡面其他的所有nodes!

2. nodes

可以自己選擇要mirror到哪些node, 就算是目前不存在cluster的node也行, mirror會等到那個node加進cluster之後才開始mirror,
這個需要多加一個參數"x-ha-policy-params", 並把node list 當成value
!!這個目前有問題, 需要等到下一個release才會修正!!

以下是用pika這個python amqp client API的例子:

    channel.exchange_declare(exchange='test.topic', type='topic')
    channel.queue_declare(queue="test_topic", durable=True,
                          exclusive=False, auto_delete=False,
                          arguments={'x-ha-policy': 'all'},
                          callback=on_queue_declared)

Reference:
http://www.rabbitmq.com/ha.html

2011年10月31日 星期一