Bentley Rules for Optimising work : Performance Engineering
The work of a program on given input is sum total of all operations executed by the program.
In this blog I will be taking notes from the lecture 2 on Performance engineering by MIT OCW.
Algorithm design can produce dramatic reduction in the amount of work it takes to solve a problem, as when O(n log n) replaces 0(n²).
Reducing the work of a program does not automatically reduce its running time, however, due to the complex nature of computer hardware — Caching, Vectorization, Speculation and branch prediction, instruction-level parallelism(ILP).
Nevertheless, reducing the work serves as a good heuristic for reducing overall running time.
Classic Bentley rules for dealing with work can be found here — http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/orgs/sgi/bentley.html
The professor divides the proposed New Bentley Rules in 4 categories —
- Data Structures —
- Packing and Encoding — Packing means to store more than one data value in a machine word. Encoding is to convert data values into a representation requiring fewer bits. Example, store dates in a struct format rather than string.
- Augmentation — Add information to a data structure to make common operations do less work.
- Precomputation — perform calculations in advance so as to avoid doing them at “mission critical” times. (Look meta-programming here)
- Compile-time initialisation — (Look meta-programming here)
- Caching — Store results that have been accessed recently so that the program need not to compute them again. Improves cache hit rates.
- Lazy Evaluation —
- Sparsity — Avoid computing zeroes. (Compressed Sparse Row)
2. Loops —
- Hoisting — loop-invariant code, avoid recomputing loop-invariant code each time through the body of a loop.
- Sentinels
- Loop unrolling
- Loop fusion
- Eliminating wasted iterations
3. Logic —
- Constant folding and propagation — Evaluate constant expressions and substitute the results into further expressions, during compilations.
- Common-subexpression elimination — Avoid computing same expression multiple times by evaluating the expression once and storing the result for later use.
- Algebraic identities — Replace expensive algebraic identities with equivalents that require less work. instead of comparing square roots compare squares.
- Short-circuiting — When performing a series of tests, the idea of short circuiting is to stop evaluating as soon as you know the answer.
- Ordering tests — Inexpensive tests should precede expensive ones. Check the tests which are rarely successful first.
- Creating a fast path
- Combining tests — replace sequence of tests with one test or switch.
4. Functions —
- Inlining — Avoid function call overhead by replacing a call to the function with the body of the function itself. Inlined functions can be as efficient as macros and they are better structured.
- Tail-recursion elimination — Replace a recursive call that occurs as the last step of a function with a branch, saving function — call overhead.
- Coarsening recursion
Important Advice —
- Avoid premature optimization.
- Reducing the work does not necessarily decrease its running time!
- The compiler automates many low level optimizations.
- Look at the assembly code to tell if the compiler is actually performing a particular optimization.
Read my other works —
- https://vaibhaw-vipul.medium.com/matrix-multiplication-optimizing-the-code-from-6-hours-to-1-sec-70889d33dcfa
- https://medium.com/@vaibhaw.vipul/writing-synthetic-data-generator-using-opencv-e2658a63f195
- https://medium.com/@vaibhaw.vipul/building-autoencoder-in-pytorch-34052d1d280c
I hope that you liked the article! Keep Learning and sharing knowledge. Follow me on github, linkedin or twitter.