man/mprof-report.1

   1 .TH mprof-report 1 ""
   2 .SH The Mono log profiler
   3 .PP
   4 The Mono \f[I]log\f[] profiler can be used to collect a lot of
   5 information about a program running in the Mono runtime.
   6 This data can be used (both while the process is running and later)
   7 to do analyses of the program behaviour, determine resource usage,
   8 performance issues or even look for particular execution patterns.
   9 .PP
  10 This is accomplished by logging the events provided by the Mono
  11 runtime through the profiling interface and periodically writing
  12 them to a file which can be later inspected with the command line
  13 \f[I]mprof-report\f[] program or with a GUI (not developed yet).
  14 .PP
  15 The events collected include (among others):
  16 .IP \[bu] 2
  17 method enter and leave
  18 .IP \[bu] 2
  19 object allocation
  20 .IP \[bu] 2
  21 garbage collection
  22 .IP \[bu] 2
  23 JIT compilation
  24 .IP \[bu] 2
  25 metadata loading
  26 .IP \[bu] 2
  27 lock contention
  28 .IP \[bu] 2
  29 exceptions
  30 .PP
  31 In addition, the profiler can periodically collect info about all
  32 the objects present in the heap at the end of a garbage collection
  33 (this is called heap shot and currently implemented only for the
  34 sgen garbage collector).
  35 Another available profiler mode is the \f[I]sampling\f[] or
  36 \f[I]statistical\f[] mode: periodically the program is sampled and
  37 the information about what the program was busy with is saved.
  38 This allows to get information about the program behaviour without
  39 degrading its performance too much (usually less than 10%).
  40 .SS Basic profiler usage
  41 .PP
  42 The simpler way to use the profiler is the following:
  43 .PP
  44 \f[B]mono\ --profile=log\ program.exe\f[]
  45 .PP
  46 At the end of the execution the file \f[I]output.mlpd\f[] will be
  47 found in the current directory.
  48 A summary report of the data can be printed by running:
  49 .PP
  50 \f[B]mprof-report\ output.mlpd\f[]
  51 .PP
  52 With this invocation a huge amount of data is collected about the
  53 program execution and collecting and saving this data can
  54 significantly slow down program execution.
  55 If saving the profiling data is not needed, a report can be
  56 generated directly with:
  57 .PP
  58 \f[B]mono\ --profile=log:report\ program.exe\f[]
  59 .PP
  60 If the information about allocations is not of interest, it can be
  61 excluded:
  62 .PP
  63 \f[B]mono\ --profile=log:noalloc\ program.exe\f[]
  64 .PP
  65 On the other hand, if method call timing is not important, while
  66 allocations are, the needed info can be gathered with:
  67 .PP
  68 \f[B]mono\ --profile=log:nocalls\ program.exe\f[]
  69 .PP
  70 You will still be able to inspect information about the sequence of
  71 calls that lead to each allocation because at each object
  72 allocation a stack trace is collected if full enter/leave
  73 information is not available.
  74 .PP
  75 To periodically collect heap shots (and exclude method and
  76 allocation events) use the following options (making sure you run
  77 with the sgen garbage collector):
  78 .PP
  79 \f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
  80 .PP
  81 To perform a sampling profiler run, use the \f[I]sample\f[] option:
  82 .PP
  83 \f[B]mono\ --profile=log:sample\ program.exe\f[]
  84 .SS Profiler option documentation
  85 .PP
  86 By default the \f[I]log\f[] profiler will gather all the events
  87 provided by the Mono runtime and write them to a file named
  88 \f[I]output.mlpd\f[].
  89 When no option is specified, it is equivalent to using:
  90 .PP
  91 \f[B]--profile=log:calls,alloc,output=output.mlpd,maxframes=8,calldepth=100\f[]
  92 .PP
  93 The following options can be used to modify this default behaviour.
  94 Each option is separated from the next by a \f[B],\f[] character,
  95 with no spaces and all the options are included after the
  96 \f[I]log:\f[] profile module specifier.
  97 .IP \[bu] 2
  98 \f[I]help\f[]: display concise help info about each available
  99 option
 100 .IP \[bu] 2
 101 \f[I][no]alloc\f[]: \f[I]noalloc\f[] disables collecting object
 102 allocation info, \f[I]alloc\f[] enables it if it was disabled by
 103 another option like \f[I]heapshot\f[].
 104 .IP \[bu] 2
 105 \f[I][no]calls\f[]: \f[I]nocalls\f[] disables collecting method
 106 enter and leave events.
 107 When this option is used at each object allocation and at some
 108 other events (like lock contentions and exception throws) a stack
 109 trace is collected by default.
 110 See the \f[I]maxframes\f[] option to control this behaviour.
 111 \f[I]calls\f[] enables method enter/leave events if they were
 112 disabled by another option like \f[I]heapshot\f[].
 113 .IP \[bu] 2
 114 \f[I]heapshot[=MODE]\f[]: collect heap shot data at each major
 115 collection.
 116 The frequency of the heap shots can be changed with the
 117 \f[I]MODE\f[] parameter.
 118 When this option is used allocation events and method enter/leave
 119 events are not recorded by default: if they are needed, they need
 120 to be enabled explicitly.
 121 The optional parameter \f[I]MODE\f[] can modify the default heap
 122 shot frequency.
 123 heapshot can be used multiple times with different modes: in that
 124 case a heap shot is taken if either of the conditions are met.
 125 MODE can be one of:
 126 .RS 2
 127 .IP \[bu] 2
 128 \f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
 129 milliseconds passed since the last one.
 130 .IP \[bu] 2
 131 \f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
 132 garbage collections
 133 .IP \[bu] 2
 134 \f[I]ondemand\f[]: perform a heap shot when such a command is sent
 135 to the control port
 136 .RE
 137 .IP \[bu] 2
 138 \f[I]sample[=TYPE[/FREQ]]\f[]: collect statistical samples of the
 139 program behaviour.
 140 The default is to collect a 100 times per second (100 Hz) the
 141 instruction pointer.
 142 This is equivalent to the value \[lq]cycles/100\[rq] for
 143 \f[I]TYPE\f[].
 144 On some systems, like with recent Linux kernels, it is possible to
 145 cause the sampling to happen for other events provided by the
 146 performance counters of the cpu.
 147 In this case, \f[I]TYPE\f[] can be one of:
 148 .RS 2
 149 .IP \[bu] 2
 150 \f[I]cycles\f[]: processor cycles
 151 .IP \[bu] 2
 152 \f[I]instr\f[]: executed instructions
 153 .IP \[bu] 2
 154 \f[I]cacherefs\f[]: cache references
 155 .IP \[bu] 2
 156 \f[I]cachemiss\f[]: cache misses
 157 .IP \[bu] 2
 158 \f[I]branches\f[]: executed branches
 159 .IP \[bu] 2
 160 \f[I]branchmiss\f[]: mispredicted branches
 161 .RE
 162 .IP \[bu] 2
 163 \f[I]time=TIMER\f[]: use the TIMER timestamp mode.
 164 TIMER can have the following values:
 165 .RS 2
 166 .IP \[bu] 2
 167 \f[I]fast\f[]: a usually faster but possibly more inaccurate timer
 168 .RE
 169 .IP \[bu] 2
 170 \f[I]maxframes=NUM\f[]: when a stack trace needs to be performed,
 171 collect \f[I]NUM\f[] frames at the most.
 172 The default is 8.
 173 .IP \[bu] 2
 174 \f[I]maxsamples=NUM\f[]: stop allocating reusable sample events
 175 once \f[I]NUM\f[] events have been allocated (a value of zero for
 176 all intents and purposes means unlimited). By default, the value
 177 of this setting is the number of CPU cores multiplied by 1000. This
 178 is usually a good enough value for typical desktop and mobile apps.
 179 If you're losing too many samples due to this default (which is
 180 possible in apps with an unusually high amount of threads), you
 181 may want to tinker with this value to find a good balance between
 182 sample hit rate and performance impact on the app. The way it works
 183 is that sample events are enqueued for reuse after they're flushed
 184 to the output file; if a thread gets a sampling signal but there are
 185 no sample events in the reuse queue and the profiler has reached the
 186 maximum number of sample allocations, the sample gets dropped. So a
 187 higher number for this setting will increase the chance that a
 188 thread is able to collect a sample, but also necessarily means that
 189 there will be more work done by the profiler. You can run Mono with
 190 the \f[I]--stats\f[] option to see statistics about sample events.
 191 .IP \[bu] 2
 192 \f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
 193 call chain depth is bigger than NUM.
 194 .IP \[bu] 2
 195 \f[I]zip\f[]: automatically compress the output data in gzip
 196 format.
 197 .IP \[bu] 2
 198 \f[I]output=OUTSPEC\f[]: instead of writing the profiling data to
 199 the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[]
 200 with the current process id and \f[I]%t\f[] with the current date
 201 and time, then do according to \f[I]OUTSPEC\f[]:
 202 .RS 2
 203 .IP \[bu] 2
 204 if \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the
 205 rest as a program and feed the data to its standard input
 206 .IP \[bu] 2
 207 if \f[I]OUTSPEC\f[] begins with a \f[I]-\f[] character, use the
 208 rest of OUTSPEC as the filename, but force overwrite any existing
 209 file by that name
 210 .IP \[bu] 2
 211 otherwise write the data the the named file: note that is a file by
 212 that name already exists, a warning is issued and profiling is
 213 disabled.
 214 .RE
 215 .IP \[bu] 2
 216 \f[I]report\f[]: the profiling data is sent to mprof-report, which
 217 will print a summary report.
 218 This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
 219 If the \f[I]output\f[] option is specified as well, the report will
 220 be written to the output file instead of the console.
 221 .IP \[bu] 2
 222 \f[I]port=PORT\f[]: specify the tcp/ip port to use for the
 223 listening command server.
 224 Currently not available for windows.
 225 This server is started for example when heapshot=ondemand is used:
 226 it will read commands line by line.
 227 The following commands are available:
 228 .RS 2
 229 .IP \[bu] 2
 230 \f[I]heapshot\f[]: perform a heapshot as soon as possible
 231 .RE
 232 .IP \[bu] 2
 233 \f[I]counters\f[]: sample counters values every 1 second. This allow
 234 a really lightweight way to have insight in some of the runtime key
 235 metrics. Counters displayed in non verbose mode are : Methods from AOT,
 236 Methods JITted using mono JIT, Methods JITted using LLVM, Total time
 237 spent JITting (sec), User Time, System Time, Total Time, Working Set,
 238 Private Bytes, Virtual Bytes, Page Faults and CPU Load Average (1min,
 239 5min and 15min).
 240 .RE
 241 .SS Analyzing the profile data
 242 .PP
 243 Currently there is a command line program (\f[I]mprof-report\f[])
 244 to analyze the data produced by the profiler.
 245 This is ran automatically when the \f[I]report\f[] profiler option
 246 is used.
 247 Simply run:
 248 .PP
 249 \f[B]mprof-report\ output.mlpd\f[]
 250 .PP
 251 to see a summary report of the data included in the file.
 252 .SS Trace information for events
 253 .PP
 254 Often it is important for some events, like allocations, lock
 255 contention and exception throws to know where they happened.
 256 Or we may want to see what sequence of calls leads to a particular
 257 method invocation.
 258 To see this info invoke mprof-report as follows:
 259 .PP
 260 \f[B]mprof-report\ --traces\ output.mlpd\f[]
 261 .PP
 262 The maximum number of methods in each stack trace can be specified
 263 with the \f[I]\[em]maxframes=NUM\f[] option:
 264 .PP
 265 \f[B]mprof-report\ --traces\ --maxframes=4\ output.mlpd\f[]
 266 .PP
 267 The stack trace info will be available if method enter/leave events
 268 have been recorded or if stack trace collection wasn't explicitly
 269 disabled with the \f[I]maxframes=0\f[] profiler option.
 270 Note that the profiler will collect up to 8 frames by default at
 271 specific events when the \f[I]nocalls\f[] option is used, so in
 272 that case, if more stack frames are required in mprof-report, a
 273 bigger value for maxframes when profiling must be used, too.
 274 .PP
 275 The \f[I]\[em]traces\f[] option also controls the reverse reference
 276 feature in the heapshot report: for each class it reports how many
 277 references to objects of that class come from other classes.
 278 .SS Sort order for methods and allocations
 279 .PP
 280 When a list of methods is printed the default sort order is based
 281 on the total time spent in the method.
 282 This time is wall clock time (that is, it includes the time spent,
 283 for example, in a sleep call, even if actual cpu time would be
 284 basically 0).
 285 Also, if the method has been ran on different threads, the time
 286 will be a sum of the time used in each thread.
 287 .PP
 288 To change the sort order, use the option:
 289 .PP
 290 \f[B]--method-sort=MODE\f[]
 291 .PP
 292 where \f[I]MODE\f[] can be:
 293 .IP \[bu] 2
 294 \f[I]self\f[]: amount of time spent in the method itself and not in
 295 its callees
 296 .IP \[bu] 2
 297 \f[I]calls\f[]: the number of method invocations
 298 .IP \[bu] 2
 299 \f[I]total\f[]: the total time spent in the method.
 300 .PP
 301 Object allocation lists are sorted by default depending on the
 302 total amount of bytes used by each type.
 303 .PP
 304 To change the sort order of object allocations, use the option:
 305 .PP
 306 \f[B]--alloc-sort=MODE\f[]
 307 .PP
 308 where \f[I]MODE\f[] can be:
 309 .IP \[bu] 2
 310 \f[I]count\f[]: the number of allocated objects of the given type
 311 .IP \[bu] 2
 312 \f[I]bytes\f[]: the total number of bytes used by objects of the
 313 given type
 314 .PP
 315 To change the sort order of counters, use the option:
 316 .PP
 317 \f[B]--counters-sort=MODE\f[]
 318 .PP
 319 where \f[I]MODE\f[] can be:
 320 .IP \[bu] 2
 321 \f[I]time\f[]: sort values by time then category
 322 .IP \[bu] 2
 323 \f[I]category\f[]: sort values by category then time
 324 .SS Selecting what data to report
 325 .PP
 326 The profiler by default collects data about many runtime subsystems
 327 and mprof-report prints a summary of all the subsystems that are
 328 found in the data file.
 329 It is possible to tell mprof-report to only show information about
 330 some of them with the following option:
 331 .PP
 332 \f[B]--reports=R1[,R2...]\f[]
 333 .PP
 334 where the report names R1, R2 etc.
 335 can be:
 336 .IP \[bu] 2
 337 \f[I]header\f[]: information about program startup and profiler
 338 version
 339 .IP \[bu] 2
 340 \f[I]jit\f[]: JIT compiler information
 341 .IP \[bu] 2
 342 \f[I]sample\f[]: statistical sampling information
 343 .IP \[bu] 2
 344 \f[I]gc\f[]: garbage collection information
 345 .IP \[bu] 2
 346 \f[I]alloc\f[]: object allocation information
 347 .IP \[bu] 2
 348 \f[I]call\f[]: method profiling information
 349 .IP \[bu] 2
 350 \f[I]metadata\f[]: metadata events like image loads
 351 .IP \[bu] 2
 352 \f[I]exception\f[]: exception throw and handling information
 353 .IP \[bu] 2
 354 \f[I]monitor\f[]: lock contention information
 355 .IP \[bu] 2
 356 \f[I]thread\f[]: thread information
 357 .IP \[bu] 2
 358 \f[I]domain\f[]: app domain information
 359 .IP \[bu] 2
 360 \f[I]context\f[]: remoting context information
 361 .IP \[bu] 2
 362 \f[I]heapshot\f[]: live heap usage at heap shots
 363 .IP \[bu] 2
 364 \f[I]counters\f[]: counters samples
 365 .PP
 366 It is possible to limit some of the data displayed to a timeframe
 367 of the program execution with the option:
 368 .PP
 369 \f[B]--time=FROM-TO\f[]
 370 .PP
 371 where \f[I]FROM\f[] and \f[I]TO\f[] are seconds since application
 372 startup (they can be floating point numbers).
 373 .PP
 374 Another interesting option is to consider only events happening on
 375 a particular thread with the following option:
 376 .PP
 377 \f[B]--thread=THREADID\f[]
 378 .PP
 379 where \f[I]THREADID\f[] is one of the numbers listed in the thread
 380 summary report (or a thread name when present).
 381 .PP
 382 By default long lists of methods or other information like object
 383 allocations are limited to the most important data.
 384 To increase the amount of information printed you can use the
 385 option:
 386 .PP
 387 \f[B]--verbose\f[]
 388 .SS Track individual objects
 389 .PP
 390 Instead of printing the usual reports from the profiler data, it is
 391 possible to track some interesting information about some specific
 392 object addresses.
 393 The objects are selected based on their address with the
 394 \f[I]\[em]track\f[] option as follows:
 395 .PP
 396 \f[B]--track=0xaddr1[,0xaddr2,...]\f[]
 397 .PP
 398 The reported info (if available in the data file), will be class
 399 name, size, creation time, stack trace of creation (with the
 400 \f[I]\[em]traces\f[] option), etc.
 401 If heapshot data is available it will be possible to also track
 402 what other objects reference one of the listed addresses.
 403 .PP
 404 The object addresses can be gathered either from the profiler
 405 report in some cases (like in the monitor lock report), from the
 406 live application or they can be selected with the
 407 \f[I]\[em]find=FINDSPEC\f[] option.
 408 FINDSPEC can be one of the following:
 409 .IP \[bu] 2
 410 \f[I]S:SIZE\f[]: where the object is selected if it's size is at
 411 least \f[I]SIZE\f[]
 412 .IP \[bu] 2
 413 \f[I]T:NAME\f[]: where the object is selected if \f[I]NAME\f[]
 414 partially matches its class name
 415 .PP
 416 This option can be specified multiple times with one of the
 417 different kinds of FINDSPEC.
 418 For example, the following:
 419 .PP
 420 \f[B]--find=S:10000\ --find=T:Byte[]\f[]
 421 .PP
 422 will find all the byte arrays that are at least 10000 bytes in
 423 size.
 424 .PP
 425 Note that with a moving garbage collector the object address can
 426 change, so you may need to track the changed address manually.
 427 It can also happen that multiple objects are allocated at the same
 428 address, so the output from this option can become large.
 429 .SS Saving a profiler report
 430 .PP
 431 By default mprof-report will print the summary data to the console.
 432 To print it to a file, instead, use the option:
 433 .PP
 434 \f[B]--out=FILENAME\f[]
 435 .SS Dealing with profiler slowness
 436 .PP
 437 If the profiler needs to collect lots of data, the execution of the
 438 program will slow down significantly, usually 10 to 20 times
 439 slower.
 440 There are several ways to reduce the impact of the profiler on the
 441 program execution.
 442 .SS Use the statistical sampling mode
 443 .PP
 444 Statistical sampling allows executing a program under the profiler
 445 with minimal performance overhead (usually less than 10%).
 446 This mode allows checking where the program is spending most of
 447 it's execution time without significantly perturbing its behaviour.
 448 .SS Collect less data
 449 .PP
 450 Collecting method enter/leave events can be very expensive,
 451 especially in programs that perform many millions of tiny calls.
 452 The profiler option \f[I]nocalls\f[] can be used to avoid
 453 collecting this data or it can be limited to only a few call levels
 454 with the \f[I]calldepth\f[] option.
 455 .PP
 456 Object allocation information is expensive as well, though much
 457 less than method enter/leave events.
 458 If it's not needed, it can be skipped with the \f[I]noalloc\f[]
 459 profiler option.
 460 Note that when method enter/leave events are discarded, by default
 461 stack traces are collected at each allocation and this can be
 462 expensive as well.
 463 The impact of stack trace information can be reduced by setting a
 464 low value with the \f[I]maxframes\f[] option or by eliminating them
 465 completely, by setting it to 0.
 466 .PP
 467 The other major source of data is the heapshot profiler option:
 468 especially if the managed heap is big, since every object needs to
 469 be inspected.
 470 The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be
 471 used to reduce the frequency of the heap shots.
 472 .SS Reduce the timestamp overhead
 473 .PP
 474 On many operating systems or architectures what actually slows down
 475 profiling is the function provided by the system to get timestamp
 476 information.
 477 The \f[I]time=fast\f[] profiler option can be usually used to speed
 478 up this operation, but, depending on the system, time accounting
 479 may have some level of approximation (though statistically the data
 480 should be still fairly valuable).
 481 .SS Dealing with the size of the data files
 482 .PP
 483 When collecting a lot of information about a profiled program, huge
 484 data files can be generated.
 485 There are a few ways to minimize the amount of data, for example by
 486 not collecting some of the more space-consuming information or by
 487 compressing the information on the fly or by just generating a
 488 summary report.
 489 .SS Reducing the amount of data
 490 .PP
 491 Method enter/leave events can be excluded completely with the
 492 \f[I]nocalls\f[] option or they can be limited to just a few levels
 493 of calls with the \f[I]calldepth\f[] option.
 494 For example, the option:
 495 .PP
 496 \f[B]calldepth=10\f[]
 497 .PP
 498 will ignore the method events when there are more than 10 managed
 499 stack frames.
 500 This is very useful for programs that have deep recursion or for
 501 programs that perform many millions of tiny calls deep enough in
 502 the call stack.
 503 The optimal number for the calldepth option depends on the program
 504 and it needs to be balanced between providing enough profiling
 505 information and allowing fast execution speed.
 506 .PP
 507 Note that by default, if method events are not recorded at all, the
 508 profiler will collect stack trace information at events like
 509 allocations.
 510 To avoid gathering this data, use the \f[I]maxframes=0\f[] profiler
 511 option.
 512 .PP
 513 Allocation events can be eliminated with the \f[I]noalloc\f[]
 514 option.
 515 .PP
 516 Heap shot data can also be huge: by default it is collected at each
 517 major collection.
 518 To reduce the frequency, you can specify a heapshot mode: for
 519 example to collect every 5 collections (including major and minor):
 520 .PP
 521 \f[B]heapshot=5gc\f[]
 522 .PP
 523 or when at least 5 seconds passed since the last heap shot:
 524 .PP
 525 \f[B]heapshot=5000ms\f[]
 526 .SS Compressing the data
 527 .PP
 528 To reduce the amout of disk space used by the data, the data can be
 529 compressed either after it has been generated with the gzip
 530 command:
 531 .PP
 532 \f[B]gzip\ -9\ output.mlpd\f[]
 533 .PP
 534 or it can be compressed automatically by using the \f[I]zip\f[]
 535 profiler option.
 536 Note that in this case there could be a significant slowdown of the
 537 profiled program.
 538 .PP
 539 The mprof-report program will tranparently deal with either
 540 compressed or uncompressed data files.
 541 .SS Generating only a summary report
 542 .PP
 543 Often it's enough to look at the profiler summary report to
 544 diagnose an issue and in this case it's possible to avoid saving
 545 the profiler data file to disk.
 546 This can be accomplished with the \f[I]report\f[] profiler option,
 547 which will basically send the data to the mprof-report program for
 548 display.
 549 .PP
 550 To have more control of what summary information is reported (or to
 551 use a completely different program to decode the profiler data),
 552 the \f[I]output\f[] profiler option can be used, with \f[B]|\f[] as
 553 the first character: the rest of the output name will be executed
 554 as a program with the data fed in on the standard input.
 555 .PP
 556 For example, to print only the Monitor summary with stack trace
 557 information, you could use it like this:
 558 .PP
 559 \f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
 560 .SH WEB SITE
 561 http://www.mono-project.com/docs/debug+profile/profile/profiler/
 562 .SH SEE ALSO
 563 .PP
 564 mono(1)
 565 .SH AUTHORS
 566 Paolo Molaro.
 567