Performance engineering with performance profiling tools (PEWPPT)

This project show exmaples of performance optimization with help of perf witch is a performance profiling tool.
You can see the presentation pdf and run demos in your linux.

Perf

Perf is an amazing performance mesaurement tool that use unknown profiling tool built into the linux kernal. It is easy to install and run and can be used either for quick analysis of what the whole system is doing (perf top) or advanced analysis and comparisions of multiple program benchmarking runs (perf record).

Installing

You need install kernel specific Perf binary package.

sudo apt-get update
sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

uname -r
Used for cheking your linux kernel version. This is important that you install linux-tools with compatibility of your linux kernel.

You should now have the perf command. Run it as root like:

sudo perf --help

Perf top

Running perf top will provide a real-time aggregate of functions where most of the time is spent across all cpu's and processes.

sudo perf top

Perf record

To isolate profiling to a single process or program use perf record. You can launch a process under perf record or attach to an already running process

Run a program with perf record and profile until the program exits:

sudo perf record ./test-program

Attach to a running program:

sudo perf record --pid `pidof program-name` # ctrl-c to stop recording

Handy options to perf record are:

-a: force perf record to track all cpus
-g: generates call graph information

NOTE: more recent perf versions may require an argument tobe passed after the -g like dwarf

Perf report

To view the profiling results you use perf report.

To bring up an interactive console interface with details do:

sudo perf report

To dump out all the details of the report to a shareable text file do:

sudo perf report > profiling-results.txt

Perf additional docs

The developers notes: http://lxr.free-electrons.com/source/tools/perf/Documentation/examples.txt
Compared to oprofile/gprof: http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html
Detailed wiki with tutorial: https://perf.wiki.kernel.org/index.php/Main_Page

Requierments

install `pmu-tools`

git clone git://github.com/andikleen/pmu-tools

install `autofdo`

apt-get install autofdo

Presentation Demos

C1

With Clock

##### PART1 #####
# Compile sort.c
gcc sort.c -o sort.o
# Run
./sort.o
# Run with input
./sort.o 100000

##### PART2 #####
# Run with Perf
sudo perf stat ./sort.o
# Compile sort.c with debug flag
gcc sort.c -o sort.o

C2

https://gcc.gnu.org/wiki/AutoFDO/Tutorial

##### PART1 #####
# Compile with optimization flags
gcc -O3 sort.c -o sort_optimized.o
# Run
sudo perf stat ./sort_optimized.o

##### PART2 #####
# Compile with gcc built-in profiler
gcc sort.c -o sort_instrumented.o -fprofile-generate
# Run
sudo perf stat ./sort_instrumented.o
# Give gcda file to compiler
gcc -O3 sort.c -o sort_fdo.o -fprofile-use=sort.gcda
# Run
sudo perf stat ./sort_fdo.o

##### PART3 #####
# AutoFDO
gcc sort.c -o sort.o
# Profiling
ocperf.py record -b -e br_inst_retired.near_taken:pp -- ./sort.o
# Generate GCC standard profiled file
create_gcov --binary=./sort.o --profile=perf.data --gcov=sort.gcov -gcov_version=1
# Use Profile file
gcc -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo.o
# Run 
sudo perf stat ./sort_autofdo.o

C3

##### PART1 #####
# Compile with optimization flags
gcc sort.c -o sort.o
# OR
gcc sort.c -o sort.o -DARRAY_LEN=10000000
# Run
sudo perf stat ./sort.o

##### PART2 #####
gcc sort_branchless.c -o sort_branchless.o -g
# Run
sudo perf stat ./sort_branchless.o
# Record program
sudo perf record
# Show Annotate
perf annotate

C4

# NOT COMPLETED

C5

# example1 compile
gcc example.c -o example.o
# Run example1
./example.o
# Run perf top
sudo perf top --pid `pidof example.o`
# Run perf record
sudo perf record --pid `pidof example.o`
# OR
sudo perf record sleep 10 ./example.o
# Repeat profiling
sudo perf stat -r 2 sleep 5 ./example.o

Use perf with vpp

Get vpp-main proccess stats with perf:

perf stat -e task-clock,cycles,instructions,cache-references,cache-misses,LLC-loads,LLC-load-misses -p `pgrep vpp_main` #-t `ps -eL | grep vpp_wk_0 | cut -d" " -f 2`

Get vpp-main proccess stats with threads info:

perf stat --per-thread -e task-clock,cycles,instructions,cache-references,cache-misses,LLC-loads,LLC-load-misses -p `pgrep vpp_main` #-t `ps -eL | grep vpp_wk_0 | cut -d" " -f 2`

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
c1		c1
c2		c2
c3		c3
c4		c4
c5		c5
.gitignore		.gitignore
README.md		README.md
pewppt-presentation.pdf		pewppt-presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance engineering with performance profiling tools (PEWPPT)

Perf

Installing

Perf top

Perf record

Perf report

Perf additional docs

Requierments

install `pmu-tools`

install `autofdo`

Presentation Demos

C1

C2

C3

C4

C5

Use perf with vpp

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Performance engineering with performance profiling tools (PEWPPT)

Perf

Installing

Perf top

Perf record

Perf report

Perf additional docs

Requierments

install pmu-tools

install autofdo

Presentation Demos

C1

C2

C3

C4

C5

Use perf with vpp

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

install `pmu-tools`

install `autofdo`

Packages