teak-llvm/libc/utils/benchmarks
Guillaume Chatelet aba80d0734 [llvm-libc] Add memory function benchmarks
Summary:
This patch adds a benchmarking infrastructure for llvm-libc memory functions.

In a nutshell, the code can benchmark small and large buffers for the memcpy, memset and memcmp functions.
It also produces graphs of size vs latency by running targets of the form `render-libc-{memcpy|memset|memcmp}-benchmark-{small|big}`.

The configurations are provided as JSON files and the benchmark also produces a JSON file.
This file is then parsed and rendered as a PNG file via the `render.py` script (make sure to run `pip3 install matplotlib scipy numpy`).
The script can take several JSON files as input and will superimpose the curves if they are from the same host.

TODO:
 - The code benchmarks whatever is available on the host but should be configured to benchmark the -to be added- llvm-libc memory functions.
 - Add a README file with instructions and rationale.
 - Produce scores to track the performance of the functions over time to allow for regression detection.

Reviewers: sivachandra, ckennelly

Subscribers: mgorny, MaskRay, libc-commits

Tags: #libc-project

Differential Revision: https://reviews.llvm.org/D72516
2020-01-24 11:30:58 +01:00
..
CMakeLists.txt
configuration_big.json
configuration_small.json
JSON.cpp
JSON.h
JSONTest.cpp
LibcBenchmark.cpp
LibcBenchmark.h
LibcBenchmarkTest.cpp
LibcMemoryBenchmark.cpp
LibcMemoryBenchmark.h
LibcMemoryBenchmarkMain.cpp
LibcMemoryBenchmarkMain.h
LibcMemoryBenchmarkTest.cpp
Memcmp.cpp
Memcpy.cpp
Memset.cpp
RATIONALE.md
README.md
render.py3

Libc mem* benchmarks

This framework has been designed to evaluate and compare relative performance of memory function implementations on a particular host.

It will also be use to track implementations performances over time.

Quick start

Setup

Python 2 being deprecated it is advised to used Python 3.

Then make sure to have matplotlib, scipy and numpy setup correctly:

apt-get install python3-pip
pip3 install matplotlib scipy numpy

To get good reproducibility it is important to make sure that the system runs in performance mode. This is achieved by running:

cpupower frequency-set --governor performance

Run and display memcpy benchmark

The following commands will run the benchmark and display a 95 percentile confidence interval curve of time per copied bytes. It also features host informations and benchmarking configuration.

cd llvm-project
cmake -B/tmp/build -Sllvm -DLLVM_ENABLE_PROJECTS=libc -DCMAKE_BUILD_TYPE=Release
make -C /tmp/build -j display-libc-memcpy-benchmark-small

Benchmarking regimes

Using a profiler to observe size distributions for calls into libc functions, it was found most operations act on a small number of bytes.

Function % of calls with size ≤ 128 % of calls with size ≤ 1024
memcpy 96% 99%
memset 91% 99.9%
memcmp1 99.5% ~100%

Benchmarking configurations come in two flavors:

  • small
    • Exercises sizes up to 1KiB, representative of normal usage
    • The data is kept in the L1 cache to prevent measuring the memory subsystem
  • big
    • Exercises sizes up to 32MiB to test large operations
    • Caching effects can show up here which prevents comparing different hosts

1 - The size refers to the size of the buffers to compare and not the number of bytes until the first difference.

Benchmarking targets

The benchmarking process occurs in two steps:

  1. Benchmark the functions and produce a json file
  2. Display (or renders) the json file

Targets are of the form <action>-libc-<function>-benchmark-<configuration>

  • action is one of :
    • run, runs the benchmark and writes the json file
    • display, displays the graph on screen
    • render, renders the graph on disk as a png file
  • function is one of : memcpy, memcmp, memset
  • configuration is one of : small, big

Superposing curves

It is possible to merge several json files into a single graph. This is useful to compare implementations.

In the following example we superpose the curves for memcpy, memset and memcmp:

> make -C /tmp/build run-libc-memcpy-benchmark-small run-libc-memcmp-benchmark-small run-libc-memset-benchmark-small
> python libc/utils/benchmarks/render.py3 /tmp/last-libc-memcpy-benchmark-small.json /tmp/last-libc-memcmp-benchmark-small.json /tmp/last-libc-memset-benchmark-small.json

Useful render.py3 flags

  • To save the produced graph --output=/tmp/benchmark_curve.png.
  • To prevent the graph from appearing on the screen --headless.

Under the hood

To learn more about the design decisions behind the benchmarking framework, have a look at the RATIONALE.md file.