[runtime] Move mempools up, don't free methodheaders pointing into it after, etc

[mono.git] / man / mprof-report.1
diff --git a/man/mprof-report.1 b/man/mprof-report.1

index 142a916da10ddf0d3643e4372cffc876365e932a..af61efa37cb84a91b735581972a4f8dec8565285 100644 (file)
--- a/man/mprof-report.1
+++ b/man/mprof-report.1
@@ -32,6 +32,11 @@ In addition, the profiler can periodically collect info about all
  the objects present in the heap at the end of a garbage collection
  (this is called heap shot and currently implemented only for the
  sgen garbage collector).
+Another available profiler mode is the \f[I]sampling\f[] or
+\f[I]statistical\f[] mode: periodically the program is sampled and
+the information about what the program was busy with is saved.
+This allows to get information about the program behaviour without
+degrading its performance too much (usually less than 10%).
  .SS Basic profiler usage
  .PP
  The simpler way to use the profiler is the following:
@@ -64,13 +69,18 @@ allocations are, the needed info can be gathered with:
  .PP
  You will still be able to inspect information about the sequence of
  calls that lead to each allocation because at each object
-allocation a stack trace is collected as well.
+allocation a stack trace is collected if full enter/leave
+information is not available.
  .PP
  To periodically collect heap shots (and exclude method and
  allocation events) use the following options (making sure you run
  with the sgen garbage collector):
  .PP
  \f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
+.PP
+To perform a sampling profiler run, use the \f[I]sample\f[] option:
+.PP
+\f[B]mono\ --profile=log:sample\ program.exe\f[]
  .SS Profiler option documentation
  .PP
  By default the \f[I]log\f[] profiler will gather all the events
@@ -101,16 +111,16 @@ See the \f[I]maxframes\f[] option to control this behaviour.
  \f[I]calls\f[] enables method enter/leave events if they were
  disabled by another option like \f[I]heapshot\f[].
  .IP \[bu] 2
-\f[I]heapshot\f[]: collect heap shot data at each major collection.
+\f[I]heapshot[=MODE]\f[]: collect heap shot data at each major
+collection.
  The frequency of the heap shots can be changed with the
-\f[I]hsmode\f[] option below.
+\f[I]MODE\f[] parameter.
  When this option is used allocation events and method enter/leave
  events are not recorded by default: if they are needed, they need
  to be enabled explicitly.
-.IP \[bu] 2
-\f[I]hsmode=MODE\f[]: modify the default heap shot frequency
-according to MODE.
-hsmode can be used multiple times with different modes: in that
+The optional parameter \f[I]MODE\f[] can modify the default heap
+shot frequency.
+heapshot can be used multiple times with different modes: in that
  case a heap shot is taken if either of the conditions are met.
  MODE can be one of:
  .RS 2
@@ -118,8 +128,36 @@ MODE can be one of:
  \f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
  milliseconds passed since the last one.
  .IP \[bu] 2
-\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] garbage
-collections (either minor or major).
+\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
+garbage collections
+.IP \[bu] 2
+\f[I]ondemand\f[]: perform a heap shot when such a command is sent
+to the control port
+.RE
+.IP \[bu] 2
+\f[I]sample[=TYPE[/FREQ]]\f[]: collect statistical samples of the
+program behaviour.
+The default is to collect a 100 times per second (100 Hz) the
+instruction pointer.
+This is equivalent to the value \[lq]cycles/100\[rq] for
+\f[I]TYPE\f[].
+On some systems, like with recent Linux kernels, it is possible to
+cause the sampling to happen for other events provided by the
+performance counters of the cpu.
+In this case, \f[I]TYPE\f[] can be one of:
+.RS 2
+.IP \[bu] 2
+\f[I]cycles\f[]: processor cycles
+.IP \[bu] 2
+\f[I]instr\f[]: executed instructions
+.IP \[bu] 2
+\f[I]cacherefs\f[]: cache references
+.IP \[bu] 2
+\f[I]cachemiss\f[]: cache misses
+.IP \[bu] 2
+\f[I]branches\f[]: executed branches
+.IP \[bu] 2
+\f[I]branchmiss\f[]: mispredicted branches
  .RE
  .IP \[bu] 2
  \f[I]time=TIMER\f[]: use the TIMER timestamp mode.
@@ -133,6 +171,24 @@ TIMER can have the following values:
  collect \f[I]NUM\f[] frames at the most.
  The default is 8.
  .IP \[bu] 2
+\f[I]maxsamples=NUM\f[]: stop allocating reusable sample events
+once \f[I]NUM\f[] events have been allocated (a value of zero for
+all intents and purposes means unlimited). By default, the value
+of this setting is the number of CPU cores multiplied by 1000. This
+is usually a good enough value for typical desktop and mobile apps.
+If you're losing too many samples due to this default (which is
+possible in apps with an unusually high amount of threads), you
+may want to tinker with this value to find a good balance between
+sample hit rate and performance impact on the app. The way it works
+is that sample events are enqueued for reuse after they're flushed
+to the output file; if a thread gets a sampling signal but there are
+no sample events in the reuse queue and the profiler has reached the
+maximum number of sample allocations, the sample gets dropped. So a
+higher number for this setting will increase the chance that a
+thread is able to collect a sample, but also necessarily means that
+there will be more work done by the profiler. You can run Mono with
+the \f[I]--stats\f[] option to see statistics about sample events.
+.IP \[bu] 2
  \f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
  call chain depth is bigger than NUM.
  .IP \[bu] 2
@@ -140,18 +196,48 @@ call chain depth is bigger than NUM.
  format.
  .IP \[bu] 2
  \f[I]output=OUTSPEC\f[]: instead of writing the profiling data to
-the output.mlpd file, do according to \f[I]OUTSPEC\f[]:
+the output.mlpd file, substitute \f[I]%p\f[] in \f[I]OUTSPEC\f[]
+with the current process id and \f[I]%t\f[] with the current date
+and time, then do according to \f[I]OUTSPEC\f[]:
  .RS 2
  .IP \[bu] 2
  if \f[I]OUTSPEC\f[] begins with a \f[I]|\f[] character, execute the
  rest as a program and feed the data to its standard input
  .IP \[bu] 2
-otherwise write the data the the named file
+if \f[I]OUTSPEC\f[] begins with a \f[I]-\f[] character, use the
+rest of OUTSPEC as the filename, but force overwrite any existing
+file by that name
+.IP \[bu] 2
+otherwise write the data the the named file: note that is a file by
+that name already exists, a warning is issued and profiling is
+disabled.
  .RE
  .IP \[bu] 2
  \f[I]report\f[]: the profiling data is sent to mprof-report, which
  will print a summary report.
  This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
+If the \f[I]output\f[] option is specified as well, the report will
+be written to the output file instead of the console.
+.IP \[bu] 2
+\f[I]port=PORT\f[]: specify the tcp/ip port to use for the
+listening command server.
+Currently not available for windows.
+This server is started for example when heapshot=ondemand is used:
+it will read commands line by line.
+The following commands are available:
+.RS 2
+.IP \[bu] 2
+\f[I]heapshot\f[]: perform a heapshot as soon as possible
+.RE
+.IP \[bu] 2
+\f[I]counters\f[]: sample counters values every 1 second. This allow
+a really lightweight way to have insight in some of the runtime key
+metrics. Counters displayed in non verbose mode are : Methods from AOT,
+Methods JITted using mono JIT, Methods JITted using LLVM, Total time
+spent JITting (sec), User Time, System Time, Total Time, Working Set,
+Private Bytes, Virtual Bytes, Page Faults and CPU Load Average (1min,
+5min and 15min).
+.RE
  .SS Analyzing the profile data
  .PP
  Currently there is a command line program (\f[I]mprof-report\f[])
@@ -225,6 +311,16 @@ where \f[I]MODE\f[] can be:
  .IP \[bu] 2
  \f[I]bytes\f[]: the total number of bytes used by objects of the
  given type
+.PP
+To change the sort order of counters, use the option:
+.PP
+\f[B]--counters-sort=MODE\f[]
+.PP
+where \f[I]MODE\f[] can be:
+.IP \[bu] 2
+\f[I]time\f[]: sort values by time then category
+.IP \[bu] 2
+\f[I]category\f[]: sort values by category then time
  .SS Selecting what data to report
  .PP
  The profiler by default collects data about many runtime subsystems
@@ -238,6 +334,13 @@ some of them with the following option:
  where the report names R1, R2 etc.
  can be:
  .IP \[bu] 2
+\f[I]header\f[]: information about program startup and profiler
+version
+.IP \[bu] 2
+\f[I]jit\f[]: JIT compiler information
+.IP \[bu] 2
+\f[I]sample\f[]: statistical sampling information
+.IP \[bu] 2
  \f[I]gc\f[]: garbage collection information
  .IP \[bu] 2
  \f[I]alloc\f[]: object allocation information
@@ -252,7 +355,13 @@ can be:
  .IP \[bu] 2
  \f[I]thread\f[]: thread information
  .IP \[bu] 2
+\f[I]domain\f[]: app domain information
+.IP \[bu] 2
+\f[I]context\f[]: remoting context information
+.IP \[bu] 2
  \f[I]heapshot\f[]: live heap usage at heap shots
+.IP \[bu] 2
+\f[I]counters\f[]: counters samples
  .PP
  It is possible to limit some of the data displayed to a timeframe
  of the program execution with the option:
@@ -312,6 +421,11 @@ For example, the following:
  .PP
  will find all the byte arrays that are at least 10000 bytes in
  size.
+.PP
+Note that with a moving garbage collector the object address can
+change, so you may need to track the changed address manually.
+It can also happen that multiple objects are allocated at the same
+address, so the output from this option can become large.
  .SS Saving a profiler report
  .PP
  By default mprof-report will print the summary data to the console.
@@ -325,6 +439,12 @@ program will slow down significantly, usually 10 to 20 times
  slower.
  There are several ways to reduce the impact of the profiler on the
  program execution.
+.SS Use the statistical sampling mode
+.PP
+Statistical sampling allows executing a program under the profiler
+with minimal performance overhead (usually less than 10%).
+This mode allows checking where the program is spending most of
+it's execution time without significantly perturbing its behaviour.
  .SS Collect less data
  .PP
  Collecting method enter/leave events can be very expensive,
@@ -347,8 +467,8 @@ completely, by setting it to 0.
  The other major source of data is the heapshot profiler option:
  especially if the managed heap is big, since every object needs to
  be inspected.
-The \f[I]hsmode\f[] option can be used to reduce the frequency of
-the heap shots.
+The \f[I]MODE\f[] parameter of the \f[I]heapshot\f[] option can be
+used to reduce the frequency of the heap shots.
  .SS Reduce the timestamp overhead
  .PP
  On many operating systems or architectures what actually slows down
@@ -358,12 +478,6 @@ The \f[I]time=fast\f[] profiler option can be usually used to speed
  up this operation, but, depending on the system, time accounting
  may have some level of approximation (though statistically the data
  should be still fairly valuable).
-.SS Use a statistical profiler instead
-.PP
-See the mono manpage for the use of a statistical (sampling)
-profiler.
-The \f[I]log\f[] profiler will be enhanced to provide sampling info
-in the future.
  .SS Dealing with the size of the data files
  .PP
  When collecting a lot of information about a profiled program, huge
@@ -401,15 +515,14 @@ option.
  .PP
  Heap shot data can also be huge: by default it is collected at each
  major collection.
-To reduce the frequency, you can use the \f[I]hsmode\f[] profiler
-option to collect for example every 5 collections (including major
-and minor):
+To reduce the frequency, you can specify a heapshot mode: for
+example to collect every 5 collections (including major and minor):
  .PP
-\f[B]hsmode=5gc\f[]
+\f[B]heapshot=5gc\f[]
  .PP
  or when at least 5 seconds passed since the last heap shot:
  .PP
-\f[B]hsmode=5000ms\f[]
+\f[B]heapshot=5000ms\f[]
  .SS Compressing the data
  .PP
  To reduce the amout of disk space used by the data, the data can be
@@ -445,7 +558,7 @@ information, you could use it like this:
  .PP
  \f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
  .SH WEB SITE
-http://www.mono-project.com/Profiler
+http://www.mono-project.com/docs/debug+profile/profile/profiler/
  .SH SEE ALSO
  .PP
  mono(1)