Performance Measurements for Multithreaded Programs

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Abstract Multithreaded programming is an e ective way to exploit concurrency, but it is di cult to debug and tune a highly threaded program. This paper describes a performance tool called Tmon for monitoring, analyzing and tuning the performance of multithreaded programs. The performance tool has two novel features: it uses \thread waiting time" as a measure and constructs thread waiting graphs to show thread dependencies and thus performance bottlenecks, and it identi es \semi-busy-waiting" points where CPU cycles are wasted in condition checking and context switching. We have implemented the Tmon tool and, a used it to measure and tune a heavily threaded le system. We used four workloads to tune di erent aspects of the le system. We were able to improve the le system bandwidth and throughput signi cantly. In one case, we were able to improve the bandwidth by two orders of magnitude. 1 Introduction Multithreading is a powerful technique to exploit parallelism on multiprocessors and to speed up execution on uniprocessors by overlapping CPU computation with I/O operations 4, 11, 12]. Nowadays, many systems and applications such as Microsoft Windows, Microsoft Windows applications and Netscape Navigator are multithreaded. However, it is nontrivial to achieve high performance in multithreading 15]. A good performance tool is important for performance tuning of multithreaded programs. A multithreaded program confronts more problems than a sequential program: competition for resources among threads, slowdown due to synchronization, context-switching overheads, con icting considerations in scheduling, trade-o s in task distribution, nonoverlapped I/Os 2, 15], and higher-level interactions between these factors. In other words, multithreaded programs are di cult to debug and tune because of the existence of multiple threads of control and their data and timing dependencies. An important issue in designing a performance tool is to decide on what measures it should use for evaluating performance. The simplest approach is to look at the processor time taken by each part of the code such as the contextswitching overheads, synchronization overheads, and so on. These can be computed by pro ling tools for sequential programs such as gprof, pixie, VTune and Etch. The drawback of this approach is that it does not measure the concurrency of a multithreaded program. Another way is to use the computation-to-communication ratio to measure the achieved parallelism 8]. This method is for parallel programs on parallel machines with explicit communication. It is not suitable for multithreaded programs on uniprocessors because threads are in the same address space and share the same global variables so that there is no explicit communication among them. Other methods include computing the normalized processor time for each thread 2] and de ning the speedup of P processors as the ratio of the execution time on uniprocessor to that on P processors 5]. These measures alone are not enough to reveal the interactions and dependencies among threads. This paper describes two measures used in our performance tool. The rst is waiting time. A thread is said to be waiting when it is not consuming CPU cycles, unlike the busy-waiting in the multiprocessor case, therefore a thread's waiting time does not show up in pro les. How much time do threads spend in waiting? Assume we have n threads in a process running on a uniprocessor and the total elapsed time for the whole process is T . Let Trun be the total running time for all n threads and Twait be the total waiting time for all n threads. Assuming no other processes are running, we have Trun = T and Twait = (n ? 1) T . The waiting-to-running ratio increases linearly with n. Another intuitive reason for using waiting time as a measure is that most performance problems in multithreaded programs such as competition for resources, synchronization and scheduling problems, result in excess waiting time. If threads frequently wait for a thread B , then improving B 's performance is likely to speed up the application. One of our goals is to identify the most-waited-for threads and speed them up. Waiting times can be used to derive waiting graphs showing timing dependencies among threads. An essential di erence between waiting time and running time as a measure is that how long a thread has to wait is not directly decided by its own code. A thread often cannot continue its work due to some other part of the application which is not necessarily related to what this thread is doing. This makes it hard to explain why a thread spends so much time waiting by just looking at its own code. Therefore, what we want is to view the fact of waiting as waiting for \somebody," i.e. a thread, instead of waiting for \something" such as a lock or an event. Such a view can help us derive the relationships among waiting threads (waiters) and threads being waited for (waitees), and track down the real bottlenecks. The second measure is called \semi-busy-waiting" time. When a thread is awakened, it is not guaranteed that it can continue its work. For example, when a thread waits for a resource, it may or may not be able to seize the resource when it is awakened because another waiting thread may seize the resource before this thread does. Therefore, the thread needs to re-test the condition and go back to sleep again if it is unavailable. If such re-test and sleep happens often, it can be nearly as bad as busy waiting. We call this
Performance Measurements for Multithreaded Programs
Minwen Ji, Edward W. Felten and Kai Li Department of Computer Science, Princeton University Princeton, NJ 08544, USA fmji, felten, lig@