man/mprof-report.1

   1 .de Sp
   2 .if t .sp .5v
   3 .if n .sp
   4 ..
   5 .TH mprof-report 1 ""
   6 .SH The Mono log profiler
   7 .PP
   8 The Mono \f[I]log\f[] profiler can be used to collect a lot of
   9 information about a program running in the Mono runtime.
  10 This data can be used (both while the process is running and later)
  11 to do analyses of the program behaviour, determine resource usage,
  12 performance issues or even look for particular execution patterns.
  13 .PP
  14 This is accomplished by logging the events provided by the Mono
  15 runtime through the profiling interface and periodically writing
  16 them to a file which can be later inspected with the command line
  17 \f[I]mprof-report\f[] program or with a GUI (not developed yet).
  18 .PP
  19 The events collected include (among others):
  20 .IP \[bu] 2
  21 method enter and leave
  22 .IP \[bu] 2
  23 object allocation
  24 .IP \[bu] 2
  25 garbage collection
  26 .IP \[bu] 2
  27 JIT compilation
  28 .IP \[bu] 2
  29 metadata loading
  30 .IP \[bu] 2
  31 lock contention
  32 .IP \[bu] 2
  33 exceptions
  34 .PP
  35 In addition, the profiler can periodically collect info about all
  36 the objects present in the heap at the end of a garbage collection
  37 (this is called heap shot and currently implemented only for the
  38 sgen garbage collector).
  39 Another available profiler mode is the \f[I]sampling\f[] or
  40 \f[I]statistical\f[] mode: periodically the program is sampled and
  41 the information about what the program was busy with is saved.
  42 This allows to get information about the program behaviour without
  43 degrading its performance too much (usually less than 10%).
  44 .SS Basic profiler usage
  45 .PP
  46 The simpler way to use the profiler is the following:
  47 .PP
  48 \f[B]mono\ --profile=log\ program.exe\f[]
  49 .PP
  50 At the end of the execution the file \f[I]output.mlpd\f[] will be
  51 found in the current directory.
  52 A summary report of the data can be printed by running:
  53 .PP
  54 \f[B]mprof-report\ output.mlpd\f[]
  55 .PP
  56 With this invocation a huge amount of data is collected about the
  57 program execution and collecting and saving this data can
  58 significantly slow down program execution.
  59 If saving the profiling data is not needed, a report can be
  60 generated directly with:
  61 .PP
  62 \f[B]mono\ --profile=log:report\ program.exe\f[]
  63 .PP
  64 If the information about allocations is not of interest, it can be
  65 excluded:
  66 .PP
  67 \f[B]mono\ --profile=log:noalloc\ program.exe\f[]
  68 .PP
  69 On the other hand, if method call timing is not important, while
  70 allocations are, the needed info can be gathered with:
  71 .PP
  72 \f[B]mono\ --profile=log:nocalls\ program.exe\f[]
  73 .PP
  74 You will still be able to inspect information about the sequence of
  75 calls that lead to each allocation because at each object
  76 allocation a stack trace is collected if full enter/leave
  77 information is not available.
  78 .PP
  79 To periodically collect heap shots (and exclude method and
  80 allocation events) use the following options (making sure you run
  81 with the sgen garbage collector):
  82 .PP
  83 \f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
  84 .PP
  85 To perform a sampling profiler run, use the \f[I]sample\f[] option:
  86 .PP
  87 \f[B]mono\ --profile=log:sample\ program.exe\f[]
  88 .SS Profiler option documentation
  89 .PP
  90 By default the \f[I]log\f[] profiler will gather all the events
  91 provided by the Mono runtime and write them to a file named
  92 \f[I]output.mlpd\f[].
  93 When no option is specified, it is equivalent to using:
  94 .PP
  95 \f[B]--profile=log:calls,alloc,output=output.mlpd,maxframes=32,calldepth=100\f[]
  96 .PP
  97 The following options can be used to modify this default behaviour.
  98 Each option is separated from the next by a \f[B],\f[] character,
  99 with no spaces and all the options are included after the
 100 \f[I]log:\f[] profile module specifier.
 101 .IP \[bu] 2
 102 \f[I]help\f[]: display concise help info about each available
 103 option
 104 .IP \[bu] 2
 105 \f[I][no]alloc\f[]: \f[I]noalloc\f[] disables collecting object
 106 allocation info, \f[I]alloc\f[] enables it if it was disabled by
 107 another option like \f[I]heapshot\f[].
 108 .IP \[bu] 2
 109 \f[I][no]calls\f[]: \f[I]nocalls\f[] disables collecting method
 110 enter and leave events.
 111 When this option is used at each object allocation and at some
 112 other events (like lock contentions and exception throws) a stack
 113 trace is collected by default.
 114 See the \f[I]maxframes\f[] option to control this behaviour.
 115 \f[I]calls\f[] enables method enter/leave events if they were
 116 disabled by another option like \f[I]heapshot\f[].
 117 .IP \[bu] 2
 118 \f[I]heapshot[=MODE]\f[]: collect heap shot data at each major
 119 collection.
 120 The frequency of the heap shots can be changed with the
 121 \f[I]MODE\f[] parameter.
 122 When this option is used allocation events and method enter/leave
 123 events are not recorded by default: if they are needed, they need
 124 to be enabled explicitly.
 125 The optional parameter \f[I]MODE\f[] can modify the default heap
 126 shot frequency.
 127 heapshot can be used multiple times with different modes: in that
 128 case a heap shot is taken if either of the conditions are met.
 129 MODE can be one of:
 130 .RS 2
 131 .IP \[bu] 2
 132 \f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
 133 milliseconds passed since the last one.
 134 .IP \[bu] 2
 135 \f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
 136 garbage collections
 137 .IP \[bu] 2
 138 \f[I]ondemand\f[]: perform a heap shot when such a command is sent
 139 to the control port
 140 .RE
 141 .IP \[bu] 2
 142 \f[I]sample[=FREQ]\f[]: collect statistical samples of the
 143 program behaviour.
 144 The default is to collect a 100 times per second (100 Hz) the
 145 instruction pointer.
 146 This is equivalent to the value \[lq]100\[rq].
 147 A value of zero for \f[I]FREQ\f[] effectively disables sampling.
 148 .IP \[bu] 2
 149 \f[I]maxframes=NUM\f[]: when a stack trace needs to be performed,
 150 collect \f[I]NUM\f[] frames at the most.
 151 The default is 32.
 152 .IP \[bu] 2
 153 \f[I]maxsamples=NUM\f[]: stop allocating reusable sample events
 154 once \f[I]NUM\f[] events have been allocated (a value of zero for
 155 all intents and purposes means unlimited). By default, the value
 156 of this setting is the number of CPU cores multiplied by 1000. This
 157 is usually a good enough value for typical desktop and mobile apps.
 158 If you're losing too many samples due to this default (which is
 159 possible in apps with an unusually high amount of threads), you
 160 may want to tinker with this value to find a good balance between
 161 sample hit rate and performance impact on the app. The way it works
 162 is that sample events are enqueued for reuse after they're flushed
 163 to the output file; if a thread gets a sampling signal but there are
 164 no sample events in the reuse queue and the profiler has reached the
 165 maximum number of sample allocations, the sample gets dropped. So a
 166 higher number for this setting will increase the chance that a
 167 thread is able to collect a sample, but also necessarily means that
 168 there will be more work done by the profiler. You can run Mono with
 169 the \f[I]--stats\f[] option to see statistics about sample events.
 170 .IP \[bu] 2
 171 \f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
 172 call chain depth is bigger than NUM.
 173 .IP \[bu] 2
 174 \f[I]zip\f[]: automatically compress the output data in gzip
 175 format.
 176 .IP \[bu] 2
 177 \f[I]output=OUTSPEC\f[]: instead of writing the profiling data to
 178 the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[]
 179 with the current process id and \f[I]%t\f[] with the current date
 180 and time, then do according to \f[I]OUTSPEC\f[]:
 181 .RS 2
 182 .IP \[bu] 2
 183 if \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the
 184 rest as a program and feed the data to its standard input
 185 .IP \[bu] 2
 186 if \f[I]OUTSPEC\f[] begins with a \f[I]-\f[] character, use the
 187 rest of OUTSPEC as the filename, but force overwrite any existing
 188 file by that name
 189 .IP \[bu] 2
 190 otherwise write the data the the named file: note that is a file by
 191 that name already exists, a warning is issued and profiling is
 192 disabled.
 193 .RE
 194 .IP \[bu] 2
 195 \f[I]report\f[]: the profiling data is sent to mprof-report, which
 196 will print a summary report.
 197 This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
 198 If the \f[I]output\f[] option is specified as well, the report will
 199 be written to the output file instead of the console.
 200 .IP \[bu] 2
 201 \f[I]port=PORT\f[]: specify the tcp/ip port to use for the
 202 listening command server.
 203 Currently not available for windows.
 204 This server is started for example when heapshot=ondemand is used:
 205 it will read commands line by line.
 206 The following commands are available:
 207 .RS 2
 208 .IP \[bu] 2
 209 \f[I]heapshot\f[]: perform a heapshot as soon as possible
 210 .RE
 211 .IP \[bu] 2
 212 \f[I]nocounters\f[]: disables sampling of runtime and performance
 213 counters, which is normally done every 1 second.
 214 .IP \[bu] 2
 215 \f[I]coverage\f[]: collect code coverage data. This implies enabling
 216 the \f[I]calls\f[] option.
 217 .IP \[bu] 2
 218 \f[I]onlycoverage\f[]: can only be used with \f[I]coverage\f[]. This
 219 disables most other events so that the profiler mostly only collects
 220 coverage data.
 221 .RE
 222 .SS Analyzing the profile data
 223 .PP
 224 Currently there is a command line program (\f[I]mprof-report\f[])
 225 to analyze the data produced by the profiler.
 226 This is ran automatically when the \f[I]report\f[] profiler option
 227 is used.
 228 Simply run:
 229 .PP
 230 \f[B]mprof-report\ output.mlpd\f[]
 231 .PP
 232 to see a summary report of the data included in the file.
 233 .SS Trace information for events
 234 .PP
 235 Often it is important for some events, like allocations, lock
 236 contention and exception throws to know where they happened.
 237 Or we may want to see what sequence of calls leads to a particular
 238 method invocation.
 239 To see this info invoke mprof-report as follows:
 240 .PP
 241 \f[B]mprof-report\ --traces\ output.mlpd\f[]
 242 .PP
 243 The maximum number of methods in each stack trace can be specified
 244 with the \f[I]--maxframes=NUM\f[] option:
 245 .PP
 246 \f[B]mprof-report\ --traces\ --maxframes=4\ output.mlpd\f[]
 247 .PP
 248 The stack trace info will be available if method enter/leave events
 249 have been recorded or if stack trace collection wasn't explicitly
 250 disabled with the \f[I]maxframes=0\f[] profiler option.
 251 .PP
 252 The \f[I]--traces\f[] option also controls the reverse reference
 253 feature in the heapshot report: for each class it reports how many
 254 references to objects of that class come from other classes.
 255 .SS Sort order for methods and allocations
 256 .PP
 257 When a list of methods is printed the default sort order is based
 258 on the total time spent in the method.
 259 This time is wall clock time (that is, it includes the time spent,
 260 for example, in a sleep call, even if actual cpu time would be
 261 basically 0).
 262 Also, if the method has been ran on different threads, the time
 263 will be a sum of the time used in each thread.
 264 .PP
 265 To change the sort order, use the option:
 266 .PP
 267 \f[B]--method-sort=MODE\f[]
 268 .PP
 269 where \f[I]MODE\f[] can be:
 270 .IP \[bu] 2
 271 \f[I]self\f[]: amount of time spent in the method itself and not in
 272 its callees
 273 .IP \[bu] 2
 274 \f[I]calls\f[]: the number of method invocations
 275 .IP \[bu] 2
 276 \f[I]total\f[]: the total time spent in the method.
 277 .PP
 278 Object allocation lists are sorted by default depending on the
 279 total amount of bytes used by each type.
 280 .PP
 281 To change the sort order of object allocations, use the option:
 282 .PP
 283 \f[B]--alloc-sort=MODE\f[]
 284 .PP
 285 where \f[I]MODE\f[] can be:
 286 .IP \[bu] 2
 287 \f[I]count\f[]: the number of allocated objects of the given type
 288 .IP \[bu] 2
 289 \f[I]bytes\f[]: the total number of bytes used by objects of the
 290 given type
 291 .PP
 292 To change the sort order of counters, use the option:
 293 .PP
 294 \f[B]--counters-sort=MODE\f[]
 295 .PP
 296 where \f[I]MODE\f[] can be:
 297 .IP \[bu] 2
 298 \f[I]time\f[]: sort values by time then category
 299 .IP \[bu] 2
 300 \f[I]category\f[]: sort values by category then time
 301 .SS Selecting what data to report
 302 .PP
 303 The profiler by default collects data about many runtime subsystems
 304 and mprof-report prints a summary of all the subsystems that are
 305 found in the data file.
 306 It is possible to tell mprof-report to only show information about
 307 some of them with the following option:
 308 .PP
 309 \f[B]--reports=R1[,R2...]\f[]
 310 .PP
 311 where the report names R1, R2 etc.
 312 can be:
 313 .IP \[bu] 2
 314 \f[I]header\f[]: information about program startup and profiler
 315 version
 316 .IP \[bu] 2
 317 \f[I]jit\f[]: JIT compiler information
 318 .IP \[bu] 2
 319 \f[I]sample\f[]: statistical sampling information
 320 .IP \[bu] 2
 321 \f[I]gc\f[]: garbage collection information
 322 .IP \[bu] 2
 323 \f[I]alloc\f[]: object allocation information
 324 .IP \[bu] 2
 325 \f[I]call\f[]: method profiling information
 326 .IP \[bu] 2
 327 \f[I]metadata\f[]: metadata events like image loads
 328 .IP \[bu] 2
 329 \f[I]exception\f[]: exception throw and handling information
 330 .IP \[bu] 2
 331 \f[I]monitor\f[]: lock contention information
 332 .IP \[bu] 2
 333 \f[I]thread\f[]: thread information
 334 .IP \[bu] 2
 335 \f[I]domain\f[]: app domain information
 336 .IP \[bu] 2
 337 \f[I]context\f[]: remoting context information
 338 .IP \[bu] 2
 339 \f[I]heapshot\f[]: live heap usage at heap shots
 340 .IP \[bu] 2
 341 \f[I]counters\f[]: counters samples
 342 .IP \[bu] 2
 343 \f[I]coverage\f[]: code coverage data
 344 .IP \[bu] 2
 345 \f[I]stats\f[]: event statistics
 346 .PP
 347 It is possible to limit some of the data displayed to a timeframe
 348 of the program execution with the option:
 349 .PP
 350 \f[B]--time=FROM-TO\f[]
 351 .PP
 352 where \f[I]FROM\f[] and \f[I]TO\f[] are seconds since application
 353 startup (they can be floating point numbers).
 354 .PP
 355 Another interesting option is to consider only events happening on
 356 a particular thread with the following option:
 357 .PP
 358 \f[B]--thread=THREADID\f[]
 359 .PP
 360 where \f[I]THREADID\f[] is one of the numbers listed in the thread
 361 summary report (or a thread name when present).
 362 .PP
 363 By default long lists of methods or other information like object
 364 allocations are limited to the most important data.
 365 To increase the amount of information printed you can use the
 366 option:
 367 .PP
 368 \f[B]--verbose\f[]
 369 .SS Track individual objects
 370 .PP
 371 Instead of printing the usual reports from the profiler data, it is
 372 possible to track some interesting information about some specific
 373 object addresses.
 374 The objects are selected based on their address with the
 375 \f[I]--track\f[] option as follows:
 376 .PP
 377 \f[B]--track=0xaddr1[,0xaddr2,...]\f[]
 378 .PP
 379 The reported info (if available in the data file), will be class
 380 name, size, creation time, stack trace of creation (with the
 381 \f[I]--traces\f[] option), etc.
 382 If heapshot data is available it will be possible to also track
 383 what other objects reference one of the listed addresses.
 384 .PP
 385 The object addresses can be gathered either from the profiler
 386 report in some cases (like in the monitor lock report), from the
 387 live application or they can be selected with the
 388 \f[I]--find=FINDSPEC\f[] option.
 389 FINDSPEC can be one of the following:
 390 .IP \[bu] 2
 391 \f[I]S:SIZE\f[]: where the object is selected if its size is at
 392 least \f[I]SIZE\f[]
 393 .IP \[bu] 2
 394 \f[I]T:NAME\f[]: where the object is selected if \f[I]NAME\f[]
 395 partially matches its class name
 396 .PP
 397 This option can be specified multiple times with one of the
 398 different kinds of FINDSPEC.
 399 For example, the following:
 400 .PP
 401 \f[B]--find=S:10000\ --find=T:Byte[]\f[]
 402 .PP
 403 will find all the byte arrays that are at least 10000 bytes in
 404 size.
 405 .PP
 406 Note that with a moving garbage collector the object address can
 407 change, so you may need to track the changed address manually.
 408 It can also happen that multiple objects are allocated at the same
 409 address, so the output from this option can become large.
 410 .SS Saving a profiler report
 411 .PP
 412 By default mprof-report will print the summary data to the console.
 413 To print it to a file, instead, use the option:
 414 .PP
 415 \f[B]--out=FILENAME\f[]
 416 .SS Processing code coverage data
 417 .PP
 418 If you ran the profiler with the \f[I]coverage\f[] option, you can
 419 process the collected coverage data into an XML file by running
 420 mprof-report like this:
 421 .PP
 422 \f[B]mprof-report --coverage-out=coverage.xml output.mlpd\f[]
 423 .SS Dealing with profiler slowness
 424 .PP
 425 If the profiler needs to collect lots of data, the execution of the
 426 program will slow down significantly, usually 10 to 20 times
 427 slower.
 428 There are several ways to reduce the impact of the profiler on the
 429 program execution.
 430 .IP "\f[I]Use the statistical sampling mode\f[]" 4
 431 .Sp
 432 Statistical sampling allows executing a program under the profiler
 433 with minimal performance overhead (usually less than 10%).
 434 This mode allows checking where the program is spending most of
 435 its execution time without significantly perturbing its behaviour.
 436 .IP "\f[I]Collect less data\f[]" 4
 437 .Sp
 438 Collecting method enter/leave events can be very expensive,
 439 especially in programs that perform many millions of tiny calls.
 440 The profiler option \f[I]nocalls\f[] can be used to avoid
 441 collecting this data or it can be limited to only a few call levels
 442 with the \f[I]calldepth\f[] option.
 443 .Sp
 444 Object allocation information is expensive as well, though much
 445 less than method enter/leave events.
 446 If it's not needed, it can be skipped with the \f[I]noalloc\f[]
 447 profiler option.
 448 Note that when method enter/leave events are discarded, by default
 449 stack traces are collected at each allocation and this can be
 450 expensive as well.
 451 The impact of stack trace information can be reduced by setting a
 452 low value with the \f[I]maxframes\f[] option or by eliminating them
 453 completely, by setting it to 0.
 454 .Sp
 455 The other major source of data is the \f[I]heapshot\f[] profiler
 456 option: especially if the managed heap is big, since every object
 457 needs to be inspected.
 458 The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be
 459 used to reduce the frequency of the heap shots.
 460 .SS Dealing with the size of the data files
 461 .PP
 462 When collecting a lot of information about a profiled program, huge
 463 data files can be generated.
 464 There are a few ways to minimize the amount of data, for example by
 465 not collecting some of the more space-consuming information or by
 466 compressing the information on the fly or by just generating a
 467 summary report.
 468 .IP "\f[I]Reducing the amount of data\f[]" 4
 469 .Sp
 470 Method enter/leave events can be excluded completely with the
 471 \f[I]nocalls\f[] option or they can be limited to just a few levels
 472 of calls with the \f[I]calldepth\f[] option.
 473 For example, the option:
 474 .Sp
 475 \f[B]calldepth=10\f[]
 476 .Sp
 477 will ignore the method events when there are more than 10 managed
 478 stack frames.
 479 This is very useful for programs that have deep recursion or for
 480 programs that perform many millions of tiny calls deep enough in
 481 the call stack.
 482 The optimal number for the calldepth option depends on the program
 483 and it needs to be balanced between providing enough profiling
 484 information and allowing fast execution speed.
 485 .Sp
 486 Note that by default, if method events are not recorded at all, the
 487 profiler will collect stack trace information at events like
 488 allocations.
 489 To avoid gathering this data, use the \f[I]maxframes=0\f[] profiler
 490 option.
 491 .Sp
 492 Allocation events can be eliminated with the \f[I]noalloc\f[]
 493 option.
 494 .Sp
 495 Heap shot data can also be huge: by default it is collected at each
 496 major collection.
 497 To reduce the frequency, you can specify a heapshot mode: for
 498 example to collect every 5 collections (including major and minor):
 499 .Sp
 500 \f[B]heapshot=5gc\f[]
 501 .Sp
 502 or when at least 5 seconds passed since the last heap shot:
 503 .Sp
 504 \f[B]heapshot=5000ms\f[]
 505 .IP "\f[I]Compressing the data\f[]" 4
 506 .Sp
 507 To reduce the amout of disk space used by the data, the data can be
 508 compressed either after it has been generated with the gzip
 509 command:
 510 .Sp
 511 \f[B]gzip\ -9\ output.mlpd\f[]
 512 .Sp
 513 or it can be compressed automatically by using the \f[I]zip\f[]
 514 profiler option.
 515 Note that in this case there could be a significant slowdown of the
 516 profiled program.
 517 .Sp
 518 The mprof-report program will tranparently deal with either
 519 compressed or uncompressed data files.
 520 .IP "\f[I]Generating only a summary report\f[]" 4
 521 .Sp
 522 Often it's enough to look at the profiler summary report to
 523 diagnose an issue and in this case it's possible to avoid saving
 524 the profiler data file to disk.
 525 This can be accomplished with the \f[I]report\f[] profiler option,
 526 which will basically send the data to the mprof-report program for
 527 display.
 528 .Sp
 529 To have more control of what summary information is reported (or to
 530 use a completely different program to decode the profiler data),
 531 the \f[I]output\f[] profiler option can be used, with \f[B]|\f[] as
 532 the first character: the rest of the output name will be executed
 533 as a program with the data fed in on the standard input.
 534 .Sp
 535 For example, to print only the Monitor summary with stack trace
 536 information, you could use it like this:
 537 .Sp
 538 \f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
 539 .SH WEB SITE
 540 http://www.mono-project.com/docs/debug+profile/profile/profiler/
 541 .SH SEE ALSO
 542 .PP
 543 mono(1)
 544 .SH AUTHORS
 545 Paolo Molaro, Alex Rønne Petersen