September 19, 2005

Priority inversion 簡介

之前的 blog [簡報:Approaches to Realtime Linux] 分享我在內部報告所作的 slides,沒想到早上醒來,就收到幾封來信,而且還有來自國外的朋友,也讓我對於本 blog 的讀者群相當好奇,本 blog 的議題大多頗冷門...

顯然這些來信都有提及火星探測計畫裡頭 Priority inversion 發生的情況,不過 slides 只有粗略的描述,主要是因為預定於 70 分鐘內介紹完畢,所以這個有趣的主題就只能使用簡單的幾個字帶過。處於這個「找資料遠比閱讀資料快的時代」,我想還是直接參照以下資訊:
  • [Introduction to Priority Inversion] by Michael Barr
      [Michael Barr] 不需要我多介紹了,沈浸於 Embedded Systems 領域的朋友,應該閱讀過其大作,而其編輯的 [Michael Barr's Embedded Systems Glossary] 更是經典。這篇文章簡要的提到 Realtime 最重要的特性之一,就是 multi-tasking 的支援,而這牽涉到 resource sharing / locking 的議題,對於 priority-based scheduler 來說,是個挑戰。而 Michael Barr 用以下圖示說明:

      考慮這個狀況:當具備中度優先權的 task (簡稱 M) 搶先 (preempt) 一個原本享有 resource 的低優先權的 task (簡稱 L),而該 resouce 又是一個高優先權的 task (簡稱 H) 所等待。問題就在於,H 與 L 共享 resource,當 L 被 preempt 時,就該放下 resource,這是合理的行為,而原本 H 就在等待 resource 的釋放,因為隨後就會使用到。但問題是,這段 latency 中,M 把這個規則打破,先行 preempt 了 L,也就把 L 的 resource 給「搶走」,這下有趣的事情就發生了。原本 L 與 H 具備相對高低的優先權差異,但因為 M 的介入,造成延遲,如果 M 一類的 task 相當多,或者 M 本身是 non-RT task,這樣的過程可能就讓 H 發生超出 deadline 的情況,更可能逆轉 L 與 H 的執行順序。更甚者,當 H 發生崩潰的情況,因為 watchdog 的效應,很可能 H 因此甚失其原本的高優先權。

      很明顯的,NASA 的火星探測計畫中,登陸的軟體並沒有問題,而是出在在火星表面的任務。

      探測太空船中,多個裝置之間透過一組 MIL-STD-1553 data bus 彼此傳遞資料,而具體的配置情況,可參考 [What really happened on Mars ? -- Authoritative Account] (by Glenn Reeves, Mars Pathfinder Flight Software Cognizant Engineer)。

      問題的描述就如文章所提及:
        Activity on this bus was managed by a pair of high-priority tasks. One of the bus manager tasks communicated through a pipe with a low-priority meteorological science task.

        On Earth, the software mostly ran without incident. On Mars, however, a problem developed that was serious enough to trigger a series of software resets during the mission. The sequence of events leading to each reset began when the low-priority science task was preempted by a couple of medium-priority tasks while it held a mutex related to the pipe. While the low-priority task was preempted, the high-priority bus distribution manager tried to send more data to it over the same pipe. Because the mutex was still held by the science task, the bus distribution manager was made to wait. Shortly thereafter, the other bus scheduler became active. It noticed that the distribution manager hadn't completed its work for that bus cycle and forced a system reset.
      當進行 transaction 時,處理 distribution 的 task (bc_dist) 變成上述的 M,而相當重要的 task,也就是控制 1553 data bus 上的 transaction (bc_sched),當然具備最高的優先權,而 bc_dist 則具備第三高的優先權。問題發生於 bc_dist 在 bc_sched 開始之前,就完成執行,這會導致系統 reset,這在設計初期就確保不會有 data lost 的情況。但是,bc_dist 一旦被優先權頗低的 ASI/MET task (也就是上述的 L) 給 block 時,就導致 priority inversion 現象。這樣的現象事實上有 workaround,不過因為內部廣泛使用 IPC 機制,於是這一切發生了:
        When the bc_sched task was activated, to setup the transactions for the next 1553 bus cycle, it detected that the bc_dist task had not completed its execution. The resource that caused this problem was a mutual exclusion semaphore used within the select() mechanism to control access to the list of file descriptors that the select() mechanism was to wait on.

        The select mechanism creates a mutual exclusion semaphore to protect the "wait list" of file descriptors for those devices which support select. The vxWorks pipe() mechanism is such a device and the IPC mechanism we used is based on using pipes. The ASI/MET task had called select, which had called pipeIoctl(), which had called selNodeAdd(), which was in the process of giving the mutex semaphore. The ASI/ MET task was preempted and semGive() was not completed. Several medium priority tasks ran until the bc_dist task was activated. The bc_dist task attempted to send the newest ASI/MET data via the IPC mechanism which called pipeWrite(). pipeWrite() blocked, taking the mutex semaphore. More of the medium priority tasks ran, still not allowing the ASI/MET task to run, until the bc_sched task was awakened. At that point, the bc_sched task determined that the bc_dist task had not completed its cycle (a hard deadline in the system) and declared the error that initiated the reset.
      問題就是這樣環環相扣。
  • [What really happened on Mars?]
      由 Microsoft Research 的 Michael B. Jones 所整理的分析,後續的 Related Links 也很值得參考。
  • [Priority inversion]
TODO: 整理一份圖文並茂的簡報
由 jserv 發表於 September 19, 2005 02:13 PM
迴響

謝謝您的分享, 想請教一下,

那 Mutual exclusion 與 semaphore 的差別在那裡??

bwhite 發表於 October 7, 2005 05:04 PM