.de Sp .if t .sp .5v .if n .sp .. .TH mprof-report 1 "" .SH The Mono log profiler .PP The Mono \f[I]log\f[] profiler can be used to collect a lot of information about a program running in the Mono runtime. This data can be used (both while the process is running and later) to do analyses of the program behaviour, determine resource usage, performance issues or even look for particular execution patterns. .PP This is accomplished by logging the events provided by the Mono runtime through the profiling interface and periodically writing them to a file which can be later inspected with the command line \f[I]mprof-report\f[] program or with a GUI (not developed yet). .PP The events collected include (among others): .IP \[bu] 2 method enter and leave .IP \[bu] 2 object allocation .IP \[bu] 2 garbage collection .IP \[bu] 2 JIT compilation .IP \[bu] 2 metadata loading .IP \[bu] 2 lock contention .IP \[bu] 2 exceptions .PP In addition, the profiler can periodically collect info about all the objects present in the heap at the end of a garbage collection (this is called heap shot and currently implemented only for the sgen garbage collector). Another available profiler mode is the \f[I]sampling\f[] or \f[I]statistical\f[] mode: periodically the program is sampled and the information about what the program was busy with is saved. This allows to get information about the program behaviour without degrading its performance too much (usually less than 10%). .SS Basic profiler usage .PP The simpler way to use the profiler is the following: .PP \f[B]mono\ --profile=log\ program.exe\f[] .PP At the end of the execution the file \f[I]output.mlpd\f[] will be found in the current directory. A summary report of the data can be printed by running: .PP \f[B]mprof-report\ output.mlpd\f[] .PP With this invocation a huge amount of data is collected about the program execution and collecting and saving this data can significantly slow down program execution. If saving the profiling data is not needed, a report can be generated directly with: .PP \f[B]mono\ --profile=log:report\ program.exe\f[] .PP If the information about allocations is not of interest, it can be excluded: .PP \f[B]mono\ --profile=log:noalloc\ program.exe\f[] .PP On the other hand, if method call timing is not important, while allocations are, the needed info can be gathered with: .PP \f[B]mono\ --profile=log:nocalls\ program.exe\f[] .PP You will still be able to inspect information about the sequence of calls that lead to each allocation because at each object allocation a stack trace is collected if full enter/leave information is not available. .PP To periodically collect heap shots (and exclude method and allocation events) use the following options (making sure you run with the sgen garbage collector): .PP \f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[] .PP To perform a sampling profiler run, use the \f[I]sample\f[] option: .PP \f[B]mono\ --profile=log:sample\ program.exe\f[] .SS Profiler option documentation .PP By default the \f[I]log\f[] profiler will gather all the events provided by the Mono runtime and write them to a file named \f[I]output.mlpd\f[]. When no option is specified, it is equivalent to using: .PP \f[B]--profile=log:calls,alloc,output=output.mlpd,maxframes=32,calldepth=100\f[] .PP The following options can be used to modify this default behaviour. Each option is separated from the next by a \f[B],\f[] character, with no spaces and all the options are included after the \f[I]log:\f[] profile module specifier. .IP \[bu] 2 \f[I]help\f[]: display concise help info about each available option .IP \[bu] 2 \f[I][no]alloc\f[]: \f[I]noalloc\f[] disables collecting object allocation info, \f[I]alloc\f[] enables it if it was disabled by another option like \f[I]heapshot\f[]. .IP \[bu] 2 \f[I][no]calls\f[]: \f[I]nocalls\f[] disables collecting method enter and leave events. When this option is used at each object allocation and at some other events (like lock contentions and exception throws) a stack trace is collected by default. See the \f[I]maxframes\f[] option to control this behaviour. \f[I]calls\f[] enables method enter/leave events if they were disabled by another option like \f[I]heapshot\f[]. .IP \[bu] 2 \f[I]heapshot[=MODE]\f[]: collect heap shot data at each major collection. The frequency of the heap shots can be changed with the \f[I]MODE\f[] parameter. When this option is used allocation events and method enter/leave events are not recorded by default: if they are needed, they need to be enabled explicitly. The optional parameter \f[I]MODE\f[] can modify the default heap shot frequency. heapshot can be used multiple times with different modes: in that case a heap shot is taken if either of the conditions are met. MODE can be one of: .RS 2 .IP \[bu] 2 \f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[] milliseconds passed since the last one. .IP \[bu] 2 \f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major garbage collections .IP \[bu] 2 \f[I]ondemand\f[]: perform a heap shot when such a command is sent to the control port .RE .IP \[bu] 2 \f[I]sample[=FREQ]\f[]: collect statistical samples of the program behaviour. The default is to collect a 100 times per second (100 Hz) the instruction pointer. This is equivalent to the value \[lq]100\[rq]. A value of zero for \f[I]FREQ\f[] effectively disables sampling. .IP \[bu] 2 \f[I]heapshot-on-shutdown\f[]: collect heap shot data when the runtime shuts down. .IP \[bu] 2 \f[I]maxframes=NUM\f[]: when a stack trace needs to be performed, collect \f[I]NUM\f[] frames at the most. The default is 32. .IP \[bu] 2 \f[I]maxsamples=NUM\f[]: stop allocating reusable sample events once \f[I]NUM\f[] events have been allocated (a value of zero for all intents and purposes means unlimited). By default, the value of this setting is the number of CPU cores multiplied by 1000. This is usually a good enough value for typical desktop and mobile apps. If you're losing too many samples due to this default (which is possible in apps with an unusually high amount of threads), you may want to tinker with this value to find a good balance between sample hit rate and performance impact on the app. The way it works is that sample events are enqueued for reuse after they're flushed to the output file; if a thread gets a sampling signal but there are no sample events in the reuse queue and the profiler has reached the maximum number of sample allocations, the sample gets dropped. So a higher number for this setting will increase the chance that a thread is able to collect a sample, but also necessarily means that there will be more work done by the profiler. You can run Mono with the \f[I]--stats\f[] option to see statistics about sample events. .IP \[bu] 2 \f[I]calldepth=NUM\f[]: ignore method enter/leave events when the call chain depth is bigger than NUM. .IP \[bu] 2 \f[I]zip\f[]: automatically compress the output data in gzip format. .IP \[bu] 2 \f[I]output=OUTSPEC\f[]: instead of writing the profiling data to the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[] with the current process id and \f[I]%t\f[] with the current date and time, then do according to \f[I]OUTSPEC\f[]: .RS 2 .IP \[bu] 2 If \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the rest as a program and feed the data to its standard input. .IP \[bu] 2 If \f[I]OUTSPEC\f[] begins with a \f[I]#\f[] character, parse the rest as a file descriptor number, and feed the data to this file descriptor. .IP \[bu] 2 otherwise write the data the the named file: note that is a file by that name already exists, it is truncated. .RE .IP \[bu] 2 \f[I]report\f[]: the profiling data is sent to mprof-report, which will print a summary report. This is equivalent to the option: \f[B]output=mprof-report\ -\f[]. If the \f[I]output\f[] option is specified as well, the report will be written to the output file instead of the console. .IP \[bu] 2 \f[I]port=PORT\f[]: specify the tcp/ip port to use for the listening command server. Currently not available for windows. This server is started for example when heapshot=ondemand is used: it will read commands line by line. The following commands are available: .RS 2 .IP \[bu] 2 \f[I]heapshot\f[]: perform a heapshot as soon as possible .RE .IP \[bu] 2 \f[I]nocounters\f[]: disables sampling of runtime and performance counters, which is normally done every 1 second. .RE .SS Analyzing the profile data .PP Currently there is a command line program (\f[I]mprof-report\f[]) to analyze the data produced by the profiler. This is ran automatically when the \f[I]report\f[] profiler option is used. Simply run: .PP \f[B]mprof-report\ output.mlpd\f[] .PP to see a summary report of the data included in the file. .SS Trace information for events .PP Often it is important for some events, like allocations, lock contention and exception throws to know where they happened. Or we may want to see what sequence of calls leads to a particular method invocation. To see this info invoke mprof-report as follows: .PP \f[B]mprof-report\ --traces\ output.mlpd\f[] .PP The maximum number of methods in each stack trace can be specified with the \f[I]--maxframes=NUM\f[] option: .PP \f[B]mprof-report\ --traces\ --maxframes=4\ output.mlpd\f[] .PP The stack trace info will be available if method enter/leave events have been recorded or if stack trace collection wasn't explicitly disabled with the \f[I]maxframes=0\f[] profiler option. .PP The \f[I]--traces\f[] option also controls the reverse reference feature in the heapshot report: for each class it reports how many references to objects of that class come from other classes. .SS Sort order for methods and allocations .PP When a list of methods is printed the default sort order is based on the total time spent in the method. This time is wall clock time (that is, it includes the time spent, for example, in a sleep call, even if actual cpu time would be basically 0). Also, if the method has been ran on different threads, the time will be a sum of the time used in each thread. .PP To change the sort order, use the option: .PP \f[B]--method-sort=MODE\f[] .PP where \f[I]MODE\f[] can be: .IP \[bu] 2 \f[I]self\f[]: amount of time spent in the method itself and not in its callees .IP \[bu] 2 \f[I]calls\f[]: the number of method invocations .IP \[bu] 2 \f[I]total\f[]: the total time spent in the method. .PP Object allocation lists are sorted by default depending on the total amount of bytes used by each type. .PP To change the sort order of object allocations, use the option: .PP \f[B]--alloc-sort=MODE\f[] .PP where \f[I]MODE\f[] can be: .IP \[bu] 2 \f[I]count\f[]: the number of allocated objects of the given type .IP \[bu] 2 \f[I]bytes\f[]: the total number of bytes used by objects of the given type .PP To change the sort order of counters, use the option: .PP \f[B]--counters-sort=MODE\f[] .PP where \f[I]MODE\f[] can be: .IP \[bu] 2 \f[I]time\f[]: sort values by time then category .IP \[bu] 2 \f[I]category\f[]: sort values by category then time .SS Selecting what data to report .PP The profiler by default collects data about many runtime subsystems and mprof-report prints a summary of all the subsystems that are found in the data file. It is possible to tell mprof-report to only show information about some of them with the following option: .PP \f[B]--reports=R1[,R2...]\f[] .PP where the report names R1, R2 etc. can be: .IP \[bu] 2 \f[I]header\f[]: information about program startup and profiler version .IP \[bu] 2 \f[I]jit\f[]: JIT compiler information .IP \[bu] 2 \f[I]sample\f[]: statistical sampling information .IP \[bu] 2 \f[I]gc\f[]: garbage collection information .IP \[bu] 2 \f[I]alloc\f[]: object allocation information .IP \[bu] 2 \f[I]call\f[]: method profiling information .IP \[bu] 2 \f[I]metadata\f[]: metadata events like image loads .IP \[bu] 2 \f[I]exception\f[]: exception throw and handling information .IP \[bu] 2 \f[I]monitor\f[]: lock contention information .IP \[bu] 2 \f[I]thread\f[]: thread information .IP \[bu] 2 \f[I]domain\f[]: app domain information .IP \[bu] 2 \f[I]context\f[]: remoting context information .IP \[bu] 2 \f[I]heapshot\f[]: live heap usage at heap shots .IP \[bu] 2 \f[I]counters\f[]: counters samples .IP \[bu] 2 \f[I]stats\f[]: event statistics .PP It is possible to limit some of the data displayed to a timeframe of the program execution with the option: .PP \f[B]--time=FROM-TO\f[] .PP where \f[I]FROM\f[] and \f[I]TO\f[] are seconds since application startup (they can be floating point numbers). .PP Another interesting option is to consider only events happening on a particular thread with the following option: .PP \f[B]--thread=THREADID\f[] .PP where \f[I]THREADID\f[] is one of the numbers listed in the thread summary report (or a thread name when present). .PP By default long lists of methods or other information like object allocations are limited to the most important data. To increase the amount of information printed you can use the option: .PP \f[B]--verbose\f[] .SS Track individual objects .PP Instead of printing the usual reports from the profiler data, it is possible to track some interesting information about some specific object addresses. The objects are selected based on their address with the \f[I]--track\f[] option as follows: .PP \f[B]--track=0xaddr1[,0xaddr2,...]\f[] .PP The reported info (if available in the data file), will be class name, size, creation time, stack trace of creation (with the \f[I]--traces\f[] option), etc. If heapshot data is available it will be possible to also track what other objects reference one of the listed addresses. .PP The object addresses can be gathered either from the profiler report in some cases (like in the monitor lock report), from the live application or they can be selected with the \f[I]--find=FINDSPEC\f[] option. FINDSPEC can be one of the following: .IP \[bu] 2 \f[I]S:SIZE\f[]: where the object is selected if its size is at least \f[I]SIZE\f[] .IP \[bu] 2 \f[I]T:NAME\f[]: where the object is selected if \f[I]NAME\f[] partially matches its class name .PP This option can be specified multiple times with one of the different kinds of FINDSPEC. For example, the following: .PP \f[B]--find=S:10000\ --find=T:Byte[]\f[] .PP will find all the byte arrays that are at least 10000 bytes in size. .PP Note that with a moving garbage collector the object address can change, so you may need to track the changed address manually. It can also happen that multiple objects are allocated at the same address, so the output from this option can become large. .SS Saving a profiler report .PP By default mprof-report will print the summary data to the console. To print it to a file, instead, use the option: .PP \f[B]--out=FILENAME\f[] .SS Dealing with profiler slowness .PP If the profiler needs to collect lots of data, the execution of the program will slow down significantly, usually 10 to 20 times slower. There are several ways to reduce the impact of the profiler on the program execution. .IP "\f[I]Use the statistical sampling mode\f[]" 4 .Sp Statistical sampling allows executing a program under the profiler with minimal performance overhead (usually less than 10%). This mode allows checking where the program is spending most of its execution time without significantly perturbing its behaviour. .IP "\f[I]Collect less data\f[]" 4 .Sp Collecting method enter/leave events can be very expensive, especially in programs that perform many millions of tiny calls. The profiler option \f[I]nocalls\f[] can be used to avoid collecting this data or it can be limited to only a few call levels with the \f[I]calldepth\f[] option. .Sp Object allocation information is expensive as well, though much less than method enter/leave events. If it's not needed, it can be skipped with the \f[I]noalloc\f[] profiler option. Note that when method enter/leave events are discarded, by default stack traces are collected at each allocation and this can be expensive as well. The impact of stack trace information can be reduced by setting a low value with the \f[I]maxframes\f[] option or by eliminating them completely, by setting it to 0. .Sp The other major source of data is the \f[I]heapshot\f[] profiler option: especially if the managed heap is big, since every object needs to be inspected. The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be used to reduce the frequency of the heap shots. .SS Dealing with the size of the data files .PP When collecting a lot of information about a profiled program, huge data files can be generated. There are a few ways to minimize the amount of data, for example by not collecting some of the more space-consuming information or by compressing the information on the fly or by just generating a summary report. .IP "\f[I]Reducing the amount of data\f[]" 4 .Sp Method enter/leave events can be excluded completely with the \f[I]nocalls\f[] option or they can be limited to just a few levels of calls with the \f[I]calldepth\f[] option. For example, the option: .Sp \f[B]calldepth=10\f[] .Sp will ignore the method events when there are more than 10 managed stack frames. This is very useful for programs that have deep recursion or for programs that perform many millions of tiny calls deep enough in the call stack. The optimal number for the calldepth option depends on the program and it needs to be balanced between providing enough profiling information and allowing fast execution speed. .Sp Note that by default, if method events are not recorded at all, the profiler will collect stack trace information at events like allocations. To avoid gathering this data, use the \f[I]maxframes=0\f[] profiler option. .Sp Allocation events can be eliminated with the \f[I]noalloc\f[] option. .Sp Heap shot data can also be huge: by default it is collected at each major collection. To reduce the frequency, you can specify a heapshot mode: for example to collect every 5 collections (including major and minor): .Sp \f[B]heapshot=5gc\f[] .Sp or when at least 5 seconds passed since the last heap shot: .Sp \f[B]heapshot=5000ms\f[] .IP "\f[I]Compressing the data\f[]" 4 .Sp To reduce the amout of disk space used by the data, the data can be compressed either after it has been generated with the gzip command: .Sp \f[B]gzip\ -9\ output.mlpd\f[] .Sp or it can be compressed automatically by using the \f[I]zip\f[] profiler option. Note that in this case there could be a significant slowdown of the profiled program. .Sp The mprof-report program will tranparently deal with either compressed or uncompressed data files. .IP "\f[I]Generating only a summary report\f[]" 4 .Sp Often it's enough to look at the profiler summary report to diagnose an issue and in this case it's possible to avoid saving the profiler data file to disk. This can be accomplished with the \f[I]report\f[] profiler option, which will basically send the data to the mprof-report program for display. .Sp To have more control of what summary information is reported (or to use a completely different program to decode the profiler data), the \f[I]output\f[] profiler option can be used, with \f[B]|\f[] as the first character: the rest of the output name will be executed as a program with the data fed in on the standard input. .Sp For example, to print only the Monitor summary with stack trace information, you could use it like this: .Sp \f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[] .SH WEB SITE http://www.mono-project.com/docs/debug+profile/profile/profiler/ .SH SEE ALSO .PP mono(1) .SH AUTHORS Paolo Molaro, Alex Rønne Petersen