[mcs] Adjust correctly newline counter inside pragma warning

[mono.git] / man / mprof-report.1
diff --git a/man/mprof-report.1 b/man/mprof-report.1

index 5c3be5d7a5681f23633907292f669aadca0bc78e..af61efa37cb84a91b735581972a4f8dec8565285 100644 (file)
--- a/man/mprof-report.1
+++ b/man/mprof-report.1
@@ -32,6 +32,11 @@ In addition, the profiler can periodically collect info about all
  the objects present in the heap at the end of a garbage collection
  (this is called heap shot and currently implemented only for the
  sgen garbage collector).
+Another available profiler mode is the \f[I]sampling\f[] or
+\f[I]statistical\f[] mode: periodically the program is sampled and
+the information about what the program was busy with is saved.
+This allows to get information about the program behaviour without
+degrading its performance too much (usually less than 10%).
  .SS Basic profiler usage
  .PP
  The simpler way to use the profiler is the following:
@@ -64,13 +69,18 @@ allocations are, the needed info can be gathered with:
  .PP
  You will still be able to inspect information about the sequence of
  calls that lead to each allocation because at each object
-allocation a stack trace is collected as well.
+allocation a stack trace is collected if full enter/leave
+information is not available.
  .PP
  To periodically collect heap shots (and exclude method and
  allocation events) use the following options (making sure you run
  with the sgen garbage collector):
  .PP
  \f[B]mono\ --gc=sgen\ --profile=log:heapshot\ program.exe\f[]
+.PP
+To perform a sampling profiler run, use the \f[I]sample\f[] option:
+.PP
+\f[B]mono\ --profile=log:sample\ program.exe\f[]
  .SS Profiler option documentation
  .PP
  By default the \f[I]log\f[] profiler will gather all the events
@@ -118,8 +128,36 @@ MODE can be one of:
  \f[I]NUM\f[]ms: perform a heap shot if at least \f[I]NUM\f[]
  milliseconds passed since the last one.
  .IP \[bu] 2
-\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] garbage
-collections (either minor or major).
+\f[I]NUM\f[]gc: perform a heap shot every \f[I]NUM\f[] major
+garbage collections
+.IP \[bu] 2
+\f[I]ondemand\f[]: perform a heap shot when such a command is sent
+to the control port
+.RE
+.IP \[bu] 2
+\f[I]sample[=TYPE[/FREQ]]\f[]: collect statistical samples of the
+program behaviour.
+The default is to collect a 100 times per second (100 Hz) the
+instruction pointer.
+This is equivalent to the value \[lq]cycles/100\[rq] for
+\f[I]TYPE\f[].
+On some systems, like with recent Linux kernels, it is possible to
+cause the sampling to happen for other events provided by the
+performance counters of the cpu.
+In this case, \f[I]TYPE\f[] can be one of:
+.RS 2
+.IP \[bu] 2
+\f[I]cycles\f[]: processor cycles
+.IP \[bu] 2
+\f[I]instr\f[]: executed instructions
+.IP \[bu] 2
+\f[I]cacherefs\f[]: cache references
+.IP \[bu] 2
+\f[I]cachemiss\f[]: cache misses
+.IP \[bu] 2
+\f[I]branches\f[]: executed branches
+.IP \[bu] 2
+\f[I]branchmiss\f[]: mispredicted branches
  .RE
  .IP \[bu] 2
  \f[I]time=TIMER\f[]: use the TIMER timestamp mode.
@@ -133,6 +171,24 @@ TIMER can have the following values:
  collect \f[I]NUM\f[] frames at the most.
  The default is 8.
  .IP \[bu] 2
+\f[I]maxsamples=NUM\f[]: stop allocating reusable sample events
+once \f[I]NUM\f[] events have been allocated (a value of zero for
+all intents and purposes means unlimited). By default, the value
+of this setting is the number of CPU cores multiplied by 1000. This
+is usually a good enough value for typical desktop and mobile apps.
+If you're losing too many samples due to this default (which is
+possible in apps with an unusually high amount of threads), you
+may want to tinker with this value to find a good balance between
+sample hit rate and performance impact on the app. The way it works
+is that sample events are enqueued for reuse after they're flushed
+to the output file; if a thread gets a sampling signal but there are
+no sample events in the reuse queue and the profiler has reached the
+maximum number of sample allocations, the sample gets dropped. So a
+higher number for this setting will increase the chance that a
+thread is able to collect a sample, but also necessarily means that
+there will be more work done by the profiler. You can run Mono with
+the \f[I]--stats\f[] option to see statistics about sample events.
+.IP \[bu] 2
  \f[I]calldepth=NUM\f[]: ignore method enter/leave events when the
  call chain depth is bigger than NUM.
  .IP \[bu] 2
@@ -162,6 +218,26 @@ will print a summary report.
  This is equivalent to the option: \f[B]output=mprof-report\ -\f[].
  If the \f[I]output\f[] option is specified as well, the report will
  be written to the output file instead of the console.
+.IP \[bu] 2
+\f[I]port=PORT\f[]: specify the tcp/ip port to use for the
+listening command server.
+Currently not available for windows.
+This server is started for example when heapshot=ondemand is used:
+it will read commands line by line.
+The following commands are available:
+.RS 2
+.IP \[bu] 2
+\f[I]heapshot\f[]: perform a heapshot as soon as possible
+.RE
+.IP \[bu] 2
+\f[I]counters\f[]: sample counters values every 1 second. This allow
+a really lightweight way to have insight in some of the runtime key
+metrics. Counters displayed in non verbose mode are : Methods from AOT,
+Methods JITted using mono JIT, Methods JITted using LLVM, Total time
+spent JITting (sec), User Time, System Time, Total Time, Working Set,
+Private Bytes, Virtual Bytes, Page Faults and CPU Load Average (1min,
+5min and 15min).
+.RE
  .SS Analyzing the profile data
  .PP
  Currently there is a command line program (\f[I]mprof-report\f[])
@@ -235,6 +311,16 @@ where \f[I]MODE\f[] can be:
  .IP \[bu] 2
  \f[I]bytes\f[]: the total number of bytes used by objects of the
  given type
+.PP
+To change the sort order of counters, use the option:
+.PP
+\f[B]--counters-sort=MODE\f[]
+.PP
+where \f[I]MODE\f[] can be:
+.IP \[bu] 2
+\f[I]time\f[]: sort values by time then category
+.IP \[bu] 2
+\f[I]category\f[]: sort values by category then time
  .SS Selecting what data to report
  .PP
  The profiler by default collects data about many runtime subsystems
@@ -248,6 +334,13 @@ some of them with the following option:
  where the report names R1, R2 etc.
  can be:
  .IP \[bu] 2
+\f[I]header\f[]: information about program startup and profiler
+version
+.IP \[bu] 2
+\f[I]jit\f[]: JIT compiler information
+.IP \[bu] 2
+\f[I]sample\f[]: statistical sampling information
+.IP \[bu] 2
  \f[I]gc\f[]: garbage collection information
  .IP \[bu] 2
  \f[I]alloc\f[]: object allocation information
@@ -262,7 +355,13 @@ can be:
  .IP \[bu] 2
  \f[I]thread\f[]: thread information
  .IP \[bu] 2
+\f[I]domain\f[]: app domain information
+.IP \[bu] 2
+\f[I]context\f[]: remoting context information
+.IP \[bu] 2
  \f[I]heapshot\f[]: live heap usage at heap shots
+.IP \[bu] 2
+\f[I]counters\f[]: counters samples
  .PP
  It is possible to limit some of the data displayed to a timeframe
  of the program execution with the option:
@@ -322,6 +421,11 @@ For example, the following:
  .PP
  will find all the byte arrays that are at least 10000 bytes in
  size.
+.PP
+Note that with a moving garbage collector the object address can
+change, so you may need to track the changed address manually.
+It can also happen that multiple objects are allocated at the same
+address, so the output from this option can become large.
  .SS Saving a profiler report
  .PP
  By default mprof-report will print the summary data to the console.
@@ -335,6 +439,12 @@ program will slow down significantly, usually 10 to 20 times
  slower.
  There are several ways to reduce the impact of the profiler on the
  program execution.
+.SS Use the statistical sampling mode
+.PP
+Statistical sampling allows executing a program under the profiler
+with minimal performance overhead (usually less than 10%).
+This mode allows checking where the program is spending most of
+it's execution time without significantly perturbing its behaviour.
  .SS Collect less data
  .PP
  Collecting method enter/leave events can be very expensive,
@@ -368,12 +478,6 @@ The \f[I]time=fast\f[] profiler option can be usually used to speed
  up this operation, but, depending on the system, time accounting
  may have some level of approximation (though statistically the data
  should be still fairly valuable).
-.SS Use a statistical profiler instead
-.PP
-See the mono manpage for the use of a statistical (sampling)
-profiler.
-The \f[I]log\f[] profiler will be enhanced to provide sampling info
-in the future.
  .SS Dealing with the size of the data files
  .PP
  When collecting a lot of information about a profiled program, huge
@@ -454,7 +558,7 @@ information, you could use it like this:
  .PP
  \f[B]output=|mprof-report\ --reports=monitor\ --traces\ -\f[]
  .SH WEB SITE
-http://www.mono-project.com/Profiler
+http://www.mono-project.com/docs/debug+profile/profile/profiler/
  .SH SEE ALSO
  .PP
  mono(1)