Exceptions in C++ and their Costs

Some shortcuts :

Various Usage Scenarios
- The Expected Ideal Case for Exceptions
Stack Unwinding
Impact on the Size of Binaries
Summary

This document aims to provide data on the cost of exceptions. It has been written to help discussions in SG14, and as such is written in English; be warned though that most links therein lead to my personal site, which is in French as this is the language I use when I teach most of the time.

TODO list :

Add a test path without any throw whatsoever (thanks Sean Parent)
Check worst case memory consumption (thanks Billy Baker)

This might take a few weeks, so be patient; I'll post updates on the mailing list when I'll get around to doing them.

Various Usage Scenarios

What is the relative cost of exceptions compared to returning and validating error codes? What follows makes the following assumptions:

The client code will manage the errors returned just as it would process the exceptions; there would be no need for exceptions in the first place if the problem it expresses could be ignored without adverse effects
The comparative costs of error management and exception handling are influenced by the probability of a problem being signalled. Exceptions are meant to represent exceptional situations, after all

For these tests, my work hypothesis were:

An exception is expected to occur rarely (exceptionally)
- Should an exception occur too frequently, then it is probably not an exception and other error handling mechanisms are expected to perform better
- I have not explored the question of « how frequent is too frequent »
The try block should be a fast path
A catch block is expected to be a slow path
There is a non-null cost to error handling, which has to be accounted for in the tests

I hoped to measure the costs of exception handling and error handling in situations which should favor one and situations that should favor the other.

I used 64 bit binairies on a 64 bit machine, as SG14 members have expressed the opinion that this is the most relevant use case

There is an aesthetical aspect to the debate: exceptions do not deface a function's interface the way error handling does, for example; also, the non-local aspect of exception handling mechanisms avoid cluttering client code with error management code. These tests do not explore these issues

	Using Exceptions	Using Error Codes
In both cases, the test data will be generated in the same fashion. Generation of test data is not included in the timing measurement below: We create a vector<int> of size n We fill it with n consecutive integer values starting at -100 We shuffle the values	`#include <chrono> #include <iostream> #include <vector> #include <algorithm> #include <numeric> #include <random> #include <functional> #include <cmath> using namespace std; using namespace std::chrono; auto generate_test_data(int n) { vector<int> v(n); generate_n(begin(v), n, [cur = -100]() mutable { return cur++; }); random_device rd; mt19937 prng{ rd() }; shuffle(begin(v), end(v), prng); return v; }`	`#include <chrono> #include <iostream> #include <vector> #include <algorithm> #include <numeric> #include <random> #include <functional> #include <cmath> using namespace std; using namespace std::chrono; auto generate_test_data(int n) { vector<int> v(n); generate_n(begin(v), n, [cur = -100]() mutable { return cur++; }); random_device rd; mt19937 prng{ rd() }; shuffle(begin(v), end(v), prng); return v; }`
We use similar functions for the exception-based tests and the error-baseds tests : With exceptions, we call process_or_throw(n) which throws invalid_data if n is negative With errors, we call process_or_signal(n,result) which returns false if n is negative, true otherwise. Since the return value is used to signal success of failure, an additional argument (result, passed by reference) is required for the function to provide its intended service. Alternatives would be returning a pair<bool,int> or using a form of expected<int> Both functions perform the exact same processing in the case where n is valid.	`class invalid_data {}; int process_or_throw(int n) { if (n < 0) throw invalid_data{}; return n * 2; }`	`bool process_or_signal(int n, int &result) { if (n < 0) return false; result = n * 2; return true; }`
The test named check_every_iteration(v) is meant to provide a worst-case scenario for exceptions. It behaves as follows: Both functions keep a count in invalid values found in v. In our tests, the number of times this incrementation will be performed is identical in both tests Each value is in v is passed to the processing function of its test The success or failure of the processing function is checked in both cases using the mechanism measured At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream Both functions return a pair containing the time elapsed and the error count.	`auto check_every_iteration(vector<int> v) { clog.rdbuf(nullptr); auto before = high_resolution_clock::now(); int nthrows = 0; for (auto & val : v) { try { val = process_or_throw(val); } catch (...) { ++nthrows; } } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, nthrows); }`	`auto check_every_iteration(vector<int> v) { clog.rdbuf(nullptr); auto before = high_resolution_clock::now(); int nerrs = 0; for (auto & val : v) { int result; if (process_or_signal(val, result)) { val = result; } else { ++nerrs; } } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, nerrs); }`
The test named check_general_early_exit(v) is meant to provide a close-to-worst-case scenario for exceptions. It behaves as follows: Values in v are sorted in order to put a problem case right at the beginning A global try...catch is used in the exception case, to amortize the cost of installing the exception-handling mechanics An immediate exit mechanism is used in the error handling case to break out of the loop as soon as a problem is detected Each value is in v is passed to the processing function of its test At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream Both functions return a pair containing the time elapsed and the element at which the first problem has been detected.	`auto check_general_early_exit(vector<int> v) { clog.rdbuf(nullptr); sort(begin(v), end(v)); auto before = high_resolution_clock::now(); vector<int>::size_type i = 0; try { for (; i != v.size(); ++i) v[i] = process_or_throw(v[i]); } catch (...) { } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, i); }`	`auto check_general_early_exit(vector<int> v) { clog.rdbuf(nullptr); sort(begin(v), end(v)); auto before = high_resolution_clock::now(); vector<int>::size_type i = 0; for (; i != v.size(); ++i) { int result; if (!process_or_signal(v[i], result)) break; v[i] = result; } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, i); }`
The test named check_general_late_exit(v) is meant to provide a case scenario suited for exceptions. It behaves as follows: Values in v are sorted in order to put problem cases at the end A global try...catch is used in the exception case, to amortize the cost of installing the exception-handling mechanics An immediate exit mechanism is used in the error handling case to break out of the loop as soon as a problem is detected Each value is in v is passed to the processing function of its test At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream Both functions return a pair containing the time elapsed and the element at which the first problem has been detected.	`auto check_general_late_exit(vector<int> v) { clog.rdbuf(nullptr); sort(begin(v), end(v), greater<>{}); auto before = high_resolution_clock::now(); vector<int>::size_type i = 0; try { for (; i != v.size(); ++i) v[i] = process_or_throw(v[i]); } catch (...) { } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, i); }`	`auto check_general_late_exit(vector<int> v) { clog.rdbuf(nullptr); sort(begin(v), end(v), greater<>{}); auto before = high_resolution_clock::now(); vector<int>::size_type i = 0; for (; i != v.size(); ++i) { int result; if (!process_or_signal(v[i], result)) break; v[i] = result; } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, i); }`
The test named check_general_no_exit(v) is meant to provide a close-to-ideal (see below for a discussion of this) case scenario suited for exceptions, as code always uses what is presumed to be the « fast path ». It behaves as follows: Values in v are transformed to ensure there is no error case The code is considered to be unaware of this and uses its normal exception- or error-handling mechanisms A global try...catch is used in the exception case, to amortize the cost of installing the exception-handling mechanics An immediate exit mechanism is used in the error handling case to break out of the loop as soon as a problem is detected Each value is in v is passed to the processing function of its test At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream Both functions return a pair containing the time elapsed and the element at which the first problem has been detected.	`auto check_general_no_exit(vector<int> v) { clog.rdbuf(nullptr); transform(begin(v), end(v), begin(v), [](int n) { return abs(n); }); auto before = high_resolution_clock::now(); vector<int>::size_type i = 0; try { for (; i != v.size(); ++i) v[i] = process_or_throw(v[i]); } catch (...) { } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, i); }`	`auto check_general_no_exit(vector<int> v) { clog.rdbuf(nullptr); transform(begin(v), end(v), begin(v), [](int n) { return abs(n); }); auto before = high_resolution_clock::now(); vector<int>::size_type i = 0; for (; i != v.size(); ++i) { int result; if (!process_or_signal(v[i], result)) break; v[i] = result; } auto after = high_resolution_clock::now(); clog << accumulate(begin(v), end(v), 0); return make_pair(after - before, i); }}`
Both main() functions are identical, and run the same tests (except for the differences mentioned above) on the same data.	int main(){ enum { N = 5000000 }; auto v = generate_test_data(N); cout << "\nChecking for exceptions every iteration\n\n"; auto every_it = check_every_iteration(v); cout << "Processed " << v.size() << " elements, of which " << every_it.second << " led to an exception being thrown\n" << "Elapsed time: " << duration_cast<microseconds>(every_it.first).count() << " microseconds" << endl; cout << "\nChecking for exceptions globally, early exit\n\n"; auto early_exit = check_general_early_exit(v); cout << "Processed " << v.size() << " elements, of which element " << early_exit.second << " led to an exception being thrown\n" << "Elapsed time: " << duration_cast<microseconds>(early_exit.first).count() << " microseconds" << endl; cout << "\nChecking for exceptions globally, late exit\n\n"; auto late_exit = check_general_late_exit(v); cout << "Processed " << v.size() << " elements, of which element " << late_exit.second << " led to an exception being thrown\n" << "Elapsed time: " << duration_cast<microseconds>(late_exit.first).count() << " microseconds" << endl; cout << "\nChecking for exceptions globally, no exit\n\n"; auto no_check = check_general_no_exit(v); cout << "Processed " << v.size() << " elements\n" << "Elapsed time: " << duration_cast<microseconds>(no_check.first).count() << " microseconds" << endl; }	int main() { enum { N = 5000000 }; auto v = generate_test_data(N); cout << "\nChecking for errors every iteration\n\n"; auto every_it = check_every_iteration(v); cout << "Processed " << v.size() << " elements, of which " << every_it.second << " led to an error being signalled\n" << "Elapsed time: " << duration_cast<microseconds>(every_it.first).count() << " microseconds" << endl; cout << "\nChecking for errors globally, early exit\n\n"; auto early_exit = check_general_early_exit(v); cout << "Processed " << v.size() << " elements, of which element " << early_exit.second << " led to an error being signalled\n" << "Elapsed time: " << duration_cast<microseconds>(early_exit.first).count() << " microseconds" << endl; cout << "\nChecking for errors globally, late exit\n\n"; auto late_exit = check_general_late_exit(v); cout << "Processed " << v.size() << " elements, of which element " << late_exit.second << " led to an error being signalled\n" << "Elapsed time: " << duration_cast<microseconds>(late_exit.first).count() << " microseconds" << endl; cout << "\nChecking for errors globally, no exit\n\n"; auto no_check = check_general_no_exit(v); cout << "Processed " << v.size() << " elements\n" << "Elapsed time: " << duration_cast<microseconds>(no_check.first).count() << " microseconds" << endl; }
I ran the tests on two platforms. The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM). We can see that in this case, even in the scenarios most favorable to exceptions, error handling was faster.	Checking for exceptions every iteration Processed 5000000 elements, of which 100 led to an exception being thrown Elapsed time: 7749 microseconds Checking for exceptions globally, early exit Processed 5000000 elements, of which element 0 led to an exception being thrown Elapsed time: 40 microseconds Checking for exceptions globally, late exit Processed 5000000 elements, of which element 499900 led to an exception being thrown Elapsed time: 6034 microseconds Checking for exceptions globally, no exit Processed 5000000 elements Elapsed time: 5970 microseconds	Checking for errors every iteration Processed 5000000 elements, of which 100 led to an error being signalled Elapsed time: 4553 microseconds Checking for errors globally, early exit Processed 5000000 elements, of which element 0 led to an error being signalled Elapsed time: 0 microseconds Checking for errors globally, late exit Processed 5000000 elements, of which element 499900 led to an error being signalled Elapsed time: 5277 microseconds Checking for errors globally, no exit Processed 5000000 elements Elapsed time: 5239 microseconds
The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options : `g++ -std=c++14 -O2 -Wall -pedantic -pthread` This test shows that in the case most favorable to exceptions (the one where they are effectively exceptional), the exception-handling code is faster than the error-handling one.	Checking for exceptions every iteration Processed 5000000 elements, of which 100 led to an exception being thrown Elapsed time: 11958 microseconds Checking for exceptions globally, early exit Processed 5000000 elements, of which element 0 led to an exception being thrown Elapsed time: 42 microseconds Checking for exceptions globally, late exit Processed 5000000 elements, of which element 4999900 led to an exception being thrown Elapsed time: 13429 microseconds Checking for exceptions globally, no exit Processed 5000000 elements Elapsed time: 9792 microseconds	Checking for errors every iteration Processed 5000000 elements, of which 100 led to an error being signalled Elapsed time: 10210 microseconds Checking for errors globally, early exit Processed 5000000 elements, of which element 0 led to an error being signalled Elapsed time: 1 microseconds Checking for errors globally, late exit Processed 5000000 elements, of which element 4999900 led to an error being signalled Elapsed time: 9979 microseconds Checking for errors globally, no exit Processed 5000000 elements Elapsed time: 11623 microseconds

Some remarks:

Exception handling in the exception-favorable case is consistently slower with exceptions with one compiler, and consistenly faster with exceptions on another compiler
- This suggests a quality-of-implementation issue
- Without knowing the hardware on which the Coliru compiler executes the compiled code, it is difficult to make an informed judgement in this case
In the exception-unfavorable cases, where problems are not exceptional, exceptions are most probably the wrong tool to use, as our metrics show
- The test code used here required us to deface our processing function's signature a bit, which is unpleasant
- For those cases ill-suited for exceptions, all other things considered, the attraction of maybe<T>, optional<T> or expected<T> is understandable, and I think a point can clearly be made for such a facility, regardless of this group's stance on exceptions

The Expected Ideal Case for Exceptions

What would be the ideal case for exceptions? Simply : no try or catch blocks in the client code, unless there really is a need to (a) perform some cleanup code, e.g. in standard generic functions that are not expected to leave the program in an undefined state, or (b) in containers that have invariants to maintain. These algorithms and containers should then be exception-neutral and in general, once their cleanup stages have completed, let the exception escape where it should. We write code as if nothing bad happened. Should something occur indeed, if we handle it, it's because we care, otherwise, we crash due to an unhandled exception or to a call to std::terminate().

Stack Unwinding

When a deeply nested function faces a problem, it is sometimes necessary to unwind the stack up to the point, either one or many layers above, where the problem can be handled.

The following test uses a recursive function in which non-trivial local variables are instantiated, and provokes a problem situation at various depths down the call stack. In the case where an exception is raised, stack unwinding occurs at each level until someone catches the exception or the program crashes; in the case where an error is returned, said error needs to be treated at each level, and local variables still need to be destroyed as the return statements bring control back up the stack.

Stack Unwinding – With SSO

The test code I used follows.

For these tests, my work hypothesis were:

There is a cost in installing exception handling code at some call site
There is a stack unwinding cost to exceptions being raised deep in a call stack, due to the need to finalize local variables in the function call sequence that has to be traversed backwards
The stack unwinding cost is non-null for error handling as the same variables have to be finalized in that case
Using SSO string instances is expected to reduce the need for dynamic memory allocation, and accelerate the stack unwinding process in both cases

	Using Exceptions	Using Error Codes
The code relies strictly on standard classes and algorithms.	`#include <algorithm> #include <iostream> #include <iomanip> #include <string> #include <chrono> #include <numeric> #include <vector> #include <iterator> using namespace std; using namespace std::chrono;`	`#include <algorithm> #include <iostream> #include <iomanip> #include <string> #include <chrono> #include <numeric> #include <vector> #include <iterator> using namespace std; using namespace std::chrono;`
Function test(gen,n,m,limit) creates a vector of n elements all initialized with gen(). Then, it computes the sum of the sizes of these string instances and performs Variable rest is used here to ensure that the evaluation order of functions test() and accumulate() does not alter our metrics.	`class limit_reached {}; template <class G> string::size_type test(G gen, int n, int m, int limit) { if (m == limit) throw limit_reached{}; vector<string> v; v.reserve(n); generate_n(back_inserter(v), n, gen); auto rest = test(gen, n, m + 1, limit); return rest + accumulate(begin(v), end(v), string::size_type{}, [](string::size_type sz, const string &s) { return sz + s.size(); }); }`	`template <class G> string::size_type test(G gen, int n, int m, int limit) { if (m == limit) return string::npos; vector<string> v; v.reserve(n); generate_n(back_inserter(v), n, gen); auto rest = test(gen, n, m + 1, limit); if (rest == string::npos) return rest; return rest + accumulate(begin(v), end(v), string::size_type{}, [](string::size_type sz, const string &s) { return sz + s.size(); }); }`
Both programs essentially do the same thing, but with exception handling on the left and with error handling on the right.	int main() { string::size_type total{}; enum { NTESTS = 10000, LIMIT = 64 }; for (int limit = 1; limit < LIMIT; ++limit) { auto before = high_resolution_clock::now(); for (int i = 0; i < NTESTS; ++i) { try { auto var = test( [] { return string("Yo"); }, 16, 0, limit ); total += var; } catch (...) { } } auto after = high_resolution_clock::now(); cout << "With exceptions, " << NTESTS << " tests, depth " << setw(2) << limit << ": " << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl; } clog.rdbuf(nullptr); clog << total; }	int main() { string::size_type total{}; enum { NTESTS = 10000, LIMIT = 64 }; for (int limit = 1; limit < LIMIT; ++limit) { auto before = high_resolution_clock::now(); for (int i = 0; i < NTESTS; ++i) { auto var = test( [] { return string("Yo"); }, 16, 0, limit ); if (var != string::npos) total += var; } auto after = high_resolution_clock::now(); cout << "With errors, " << NTESTS << " tests, depth " << setw(2) << limit << ": " << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl; } clog.rdbuf(nullptr); clog << total; }
I ran the tests on two platforms. The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM). We can see that in this case, the code with exceptions is slower than the code with errors by a factor of more than two. This is more than most would have expected.	With exceptions, 10000 tests, depth 1: 41327 us With exceptions, 10000 tests, depth 2: 52459 us With exceptions, 10000 tests, depth 3: 63306 us With exceptions, 10000 tests, depth 4: 75269 us With exceptions, 10000 tests, depth 5: 87089 us With exceptions, 10000 tests, depth 6: 98491 us With exceptions, 10000 tests, depth 7: 108981 us With exceptions, 10000 tests, depth 8: 120747 us With exceptions, 10000 tests, depth 9: 132695 us With exceptions, 10000 tests, depth 10: 145205 us With exceptions, 10000 tests, depth 11: 156599 us With exceptions, 10000 tests, depth 12: 168957 us With exceptions, 10000 tests, depth 13: 180864 us With exceptions, 10000 tests, depth 14: 191661 us With exceptions, 10000 tests, depth 15: 204039 us With exceptions, 10000 tests, depth 16: 216198 us With exceptions, 10000 tests, depth 17: 226950 us With exceptions, 10000 tests, depth 18: 239023 us With exceptions, 10000 tests, depth 19: 251166 us With exceptions, 10000 tests, depth 20: 262936 us With exceptions, 10000 tests, depth 21: 274267 us With exceptions, 10000 tests, depth 22: 284392 us With exceptions, 10000 tests, depth 23: 296929 us With exceptions, 10000 tests, depth 24: 307826 us With exceptions, 10000 tests, depth 25: 321245 us With exceptions, 10000 tests, depth 26: 332741 us With exceptions, 10000 tests, depth 27: 343095 us With exceptions, 10000 tests, depth 28: 357019 us With exceptions, 10000 tests, depth 29: 367148 us With exceptions, 10000 tests, depth 30: 378649 us With exceptions, 10000 tests, depth 31: 395339 us	With errors, 10000 tests, depth 1: 6206 us With errors, 10000 tests, depth 2: 11396 us With errors, 10000 tests, depth 3: 16764 us With errors, 10000 tests, depth 4: 22296 us With errors, 10000 tests, depth 5: 28065 us With errors, 10000 tests, depth 6: 33469 us With errors, 10000 tests, depth 7: 39449 us With errors, 10000 tests, depth 8: 44370 us With errors, 10000 tests, depth 9: 50794 us With errors, 10000 tests, depth 10: 55546 us With errors, 10000 tests, depth 11: 61550 us With errors, 10000 tests, depth 12: 67278 us With errors, 10000 tests, depth 13: 72448 us With errors, 10000 tests, depth 14: 78609 us With errors, 10000 tests, depth 15: 84046 us With errors, 10000 tests, depth 16: 89776 us With errors, 10000 tests, depth 17: 95685 us With errors, 10000 tests, depth 18: 100960 us With errors, 10000 tests, depth 19: 106435 us With errors, 10000 tests, depth 20: 111976 us With errors, 10000 tests, depth 21: 118207 us With errors, 10000 tests, depth 22: 124841 us With errors, 10000 tests, depth 23: 128929 us With errors, 10000 tests, depth 24: 135166 us With errors, 10000 tests, depth 25: 140394 us With errors, 10000 tests, depth 26: 146619 us With errors, 10000 tests, depth 27: 151445 us With errors, 10000 tests, depth 28: 158840 us With errors, 10000 tests, depth 29: 163460 us With errors, 10000 tests, depth 30: 169006 us With errors, 10000 tests, depth 31: 180258 us
The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options : `g++ -std=c++14 -O2 -Wall -pedantic -pthread` Note that since I used an online compiler, execution runs on a limited budget, which was exceeded in the exceptions case. In this case, the exceptions case is significantly slower...	With exceptions, 10000 tests, depth 1: 95439 us With exceptions, 10000 tests, depth 2: 133593 us With exceptions, 10000 tests, depth 3: 171083 us With exceptions, 10000 tests, depth 4: 208772 us With exceptions, 10000 tests, depth 5: 249903 us With exceptions, 10000 tests, depth 6: 287059 us With exceptions, 10000 tests, depth 7: 327783 us With exceptions, 10000 tests, depth 8: 365278 us With exceptions, 10000 tests, depth 9: 402601 us With exceptions, 10000 tests, depth 10: 442150 us With exceptions, 10000 tests, depth 11: 477466 us With exceptions, 10000 tests, depth 12: 514387 us With exceptions, 10000 tests, depth 13: 554215 us With exceptions, 10000 tests, depth 14: 592790 us With exceptions, 10000 tests, depth 15: 633356 us With exceptions, 10000 tests, depth 16: 668736 us With exceptions, 10000 tests, depth 17: 704979 us With exceptions, 10000 tests, depth 18: 744218 us With exceptions, 10000 tests, depth 19: 781855 us With exceptions, 10000 tests, depth 20: 828158 us With exceptions, 10000 tests, depth 21: 855371 us With exceptions, 10000 tests, depth 22: 895674 us With exceptions, 10000 tests, depth 23: 947332 us With exceptions, 10000 tests, depth 24: 976160 us With exceptions, 10000 tests, depth 25: 1015844 us With exceptions, 10000 tests, depth 26: 1058750 us With exceptions, 10000 tests, depth 27: 1086016 us With exceptions, 10000 tests, depth 28: 1125130 us	With errors, 10000 tests, depth 1: 2597 us With errors, 10000 tests, depth 2: 5256 us With errors, 10000 tests, depth 3: 7866 us With errors, 10000 tests, depth 4: 10517 us With errors, 10000 tests, depth 5: 13051 us With errors, 10000 tests, depth 6: 15589 us With errors, 10000 tests, depth 7: 18104 us With errors, 10000 tests, depth 8: 20642 us With errors, 10000 tests, depth 9: 23138 us With errors, 10000 tests, depth 10: 25701 us With errors, 10000 tests, depth 11: 28141 us With errors, 10000 tests, depth 12: 30821 us With errors, 10000 tests, depth 13: 33157 us With errors, 10000 tests, depth 14: 35895 us With errors, 10000 tests, depth 15: 38441 us With errors, 10000 tests, depth 16: 40979 us With errors, 10000 tests, depth 17: 43506 us With errors, 10000 tests, depth 18: 46140 us With errors, 10000 tests, depth 19: 48465 us With errors, 10000 tests, depth 20: 50831 us With errors, 10000 tests, depth 21: 53425 us With errors, 10000 tests, depth 22: 56314 us With errors, 10000 tests, depth 23: 58935 us With errors, 10000 tests, depth 24: 61356 us With errors, 10000 tests, depth 25: 63722 us With errors, 10000 tests, depth 26: 66624 us With errors, 10000 tests, depth 27: 69257 us With errors, 10000 tests, depth 28: 71843 us With errors, 10000 tests, depth 29: 74940 us With errors, 10000 tests, depth 30: 76695 us With errors, 10000 tests, depth 31: 79393 us

Stack Unwinding – Without SSO

I tried the exact same code but with a longer string, "I love my teacher because he's very cool", in order to avoid the Small String Optimization (SSO). Thus, the only change I made was:

Before	After
`auto var = test( [] { return string("Yo"); }, 16, 0, limit );`	`auto var = test( [] { return string("I love my teacher because he's very cool"); }, 16, 0, limit );`

Let's see what the impact of this change is.

For these tests, my work hypothesis were:

There is a cost in installing exception handling code at some call site
There is a stack unwinding cost to exceptions being raised deep in a call stack, due to the need to finalize local variables in the function call sequence that has to be traversed backwards
The stack unwinding cost is non-null for error handling as the same variables have to be finalized in that case
Using non-SSO string instances is expected to induce a need for dynamic memory allocation, and slow down the stack unwinding process in both cases

I ran the tests on two platforms.

The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM).

In this case, the code with exceptions is faster than the code with errors. The main difference between this test and the previous one is that this test relies on dynamic memory allocation within each string whereas the previous one did not.

With exceptions, 10000 tests, depth  1:    41327 us
With exceptions, 10000 tests, depth  2:    52459 us
With exceptions, 10000 tests, depth  3:    63306 us
With exceptions, 10000 tests, depth  4:    75269 us
With exceptions, 10000 tests, depth  5:    87089 us
With exceptions, 10000 tests, depth  6:    98491 us
With exceptions, 10000 tests, depth  7:   108981 us
With exceptions, 10000 tests, depth  8:   120747 us
With exceptions, 10000 tests, depth  9:   132695 us
With exceptions, 10000 tests, depth 10:   145205 us
With exceptions, 10000 tests, depth 11:   156599 us
With exceptions, 10000 tests, depth 12:   168957 us
With exceptions, 10000 tests, depth 13:   180864 us
With exceptions, 10000 tests, depth 14:   191661 us
With exceptions, 10000 tests, depth 15:   204039 us
With exceptions, 10000 tests, depth 16:   216198 us
With exceptions, 10000 tests, depth 17:   226950 us
With exceptions, 10000 tests, depth 18:   239023 us
With exceptions, 10000 tests, depth 19:   251166 us
With exceptions, 10000 tests, depth 20:   262936 us
With exceptions, 10000 tests, depth 21:   274267 us
With exceptions, 10000 tests, depth 22:   284392 us
With exceptions, 10000 tests, depth 23:   296929 us
With exceptions, 10000 tests, depth 24:   307826 us
With exceptions, 10000 tests, depth 25:   321245 us
With exceptions, 10000 tests, depth 26:   332741 us
With exceptions, 10000 tests, depth 27:   343095 us
With exceptions, 10000 tests, depth 28:   357019 us
With exceptions, 10000 tests, depth 29:   367148 us
With exceptions, 10000 tests, depth 30:   378649 us
With exceptions, 10000 tests, depth 31:   395339 us

With errors, 10000 tests, depth  1:    20827 us
With errors, 10000 tests, depth  2:    34465 us
With errors, 10000 tests, depth  3:    47497 us
With errors, 10000 tests, depth  4:    61512 us
With errors, 10000 tests, depth  5:    79681 us
With errors, 10000 tests, depth  6:    93367 us
With errors, 10000 tests, depth  7:   106139 us
With errors, 10000 tests, depth  8:   118248 us
With errors, 10000 tests, depth  9:   138902 us
With errors, 10000 tests, depth 10:   152578 us
With errors, 10000 tests, depth 11:   165008 us
With errors, 10000 tests, depth 12:   177862 us
With errors, 10000 tests, depth 13:   201981 us
With errors, 10000 tests, depth 14:   214070 us
With errors, 10000 tests, depth 15:   227796 us
With errors, 10000 tests, depth 16:   240052 us
With errors, 10000 tests, depth 17:   260457 us
With errors, 10000 tests, depth 18:   272261 us
With errors, 10000 tests, depth 19:   286947 us
With errors, 10000 tests, depth 20:   297481 us
With errors, 10000 tests, depth 21:   322884 us
With errors, 10000 tests, depth 22:   336485 us
With errors, 10000 tests, depth 23:   347553 us
With errors, 10000 tests, depth 24:   361057 us
With errors, 10000 tests, depth 25:   380685 us
With errors, 10000 tests, depth 26:   392628 us
With errors, 10000 tests, depth 27:   404773 us
With errors, 10000 tests, depth 28:   418660 us
With errors, 10000 tests, depth 29:   444442 us
With errors, 10000 tests, depth 30:   457051 us
With errors, 10000 tests, depth 31:   472818 us

The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options :

g++ -std=c++14 -O2 -Wall -pedantic -pthread

Note that since I used an online compiler, execution runs on a limited budget, which was exceeded in the exceptions case. In this case again, the exceptions case is significantly slower. Could this be due to the quality on the string implementation? Hard to judge from this viewpoint.

With exceptions, 10000 tests, depth  1:   107870 us
With exceptions, 10000 tests, depth  2:   161449 us
With exceptions, 10000 tests, depth  3:   209145 us
With exceptions, 10000 tests, depth  4:   262849 us
With exceptions, 10000 tests, depth  5:   310293 us
With exceptions, 10000 tests, depth  6:   365627 us
With exceptions, 10000 tests, depth  7:   413848 us
With exceptions, 10000 tests, depth  8:   456456 us
With exceptions, 10000 tests, depth  9:   510788 us
With exceptions, 10000 tests, depth 10:   564795 us
With exceptions, 10000 tests, depth 11:   611621 us
With exceptions, 10000 tests, depth 12:   665246 us
With exceptions, 10000 tests, depth 13:   724558 us
With exceptions, 10000 tests, depth 14:   772821 us
With exceptions, 10000 tests, depth 15:   821923 us
With exceptions, 10000 tests, depth 16:   867255 us
With exceptions, 10000 tests, depth 17:   921280 us
With exceptions, 10000 tests, depth 18:   964334 us
With exceptions, 10000 tests, depth 19:  1016856 us
With exceptions, 10000 tests, depth 20:  1068728 us
With exceptions, 10000 tests, depth 21:  1113028 us
With exceptions, 10000 tests, depth 22:  1168277 us
With exceptions, 10000 tests, depth 23:  1224574 us
With exceptions, 10000 tests, depth 24:  1268895 us

With errors, 10000 tests, depth  1:     9855 us
With errors, 10000 tests, depth  2:    19209 us
With errors, 10000 tests, depth  3:    29060 us
With errors, 10000 tests, depth  4:    39017 us
With errors, 10000 tests, depth  5:    47589 us
With errors, 10000 tests, depth  6:    57281 us
With errors, 10000 tests, depth  7:    66952 us
With errors, 10000 tests, depth  8:    76470 us
With errors, 10000 tests, depth  9:    85957 us
With errors, 10000 tests, depth 10:    95373 us
With errors, 10000 tests, depth 11:   104931 us
With errors, 10000 tests, depth 12:   114036 us
With errors, 10000 tests, depth 13:   124811 us
With errors, 10000 tests, depth 14:   132930 us
With errors, 10000 tests, depth 15:   143297 us
With errors, 10000 tests, depth 16:   152643 us
With errors, 10000 tests, depth 17:   161585 us
With errors, 10000 tests, depth 18:   171137 us
With errors, 10000 tests, depth 19:   181506 us
With errors, 10000 tests, depth 20:   192037 us
With errors, 10000 tests, depth 21:   200857 us
With errors, 10000 tests, depth 22:   210361 us
With errors, 10000 tests, depth 23:   220087 us
With errors, 10000 tests, depth 24:   228843 us
With errors, 10000 tests, depth 25:   238381 us
With errors, 10000 tests, depth 26:   248758 us
With errors, 10000 tests, depth 27:   258322 us
With errors, 10000 tests, depth 28:   269589 us
With errors, 10000 tests, depth 29:   278104 us
With errors, 10000 tests, depth 30:   287786 us
With errors, 10000 tests, depth 31:   294860 us

Stack Unwinding – Replacing std::string with std::vector<char>

There is a risk that the results of the previous test could be influenced by the implementation of std::string. To verify this, I used a slightly different version with std::vector<char> instead.

For these tests, my work hypothesis were:

The string case on gcc in previous tests seemed to have bad performance characteristics
This could be due to QoI considerations
Type std::vector<char> is expected to be well tested and optimized
Behavior on MSVC is not expected to change much
Behavior on gcc is expected to be better

	Using Exceptions	Using Error Codes
The code relies strictly on standard classes and algorithms.	`#include <algorithm> #include <iostream> #include <iomanip> #include <string> #include <chrono> #include <numeric> #include <vector> #include <iterator> using namespace std; using namespace std::chrono;`	`#include <algorithm> #include <iostream> #include <iomanip> #include <string> #include <chrono> #include <numeric> #include <vector> #include <iterator> using namespace std; using namespace std::chrono;`
Function test(gen,n,m,limit) creates a vector of n elements all initialized with gen(). Then, it computes the sum of the sizes of these string instances and performs Variable rest is used here to ensure that the evaluation order of functions test() and accumulate() does not alter our metrics.	`class limit_reached {}; template <class G> string::size_type test(G gen, int n, int m, int limit) { if (m == limit) throw limit_reached{}; vector<vector<char>> v; v.reserve(n); generate_n(back_inserter(v), n, gen); auto var = test(gen, n, m + 1, limit); return accumulate(begin(v), end(v), vector<char>::size_type{}, [](vector<char>::size_type sz, const vector<char> &s) { return sz + s.size(); }) + var; }`	`template <class G> string::size_type test(G gen, int n, int m, int limit) { if (m == limit) return string::npos; vector<vector<char>> v; v.reserve(n); generate_n(back_inserter(v), n, gen); auto var = test(gen, n, m + 1, limit); return accumulate(begin(v), end(v), vector<char>::size_type{}, [](vector<char>::size_type sz, const vector<char> &s) { return sz + s.size(); }) + var; }`
Both programs essentially do the same thing, but with exception handling on the left and with error handling on the right.	int main() { const char src[] = "I love my teacher because he's very cool"; string::size_type total{}; enum { NTESTS = 10000, LIMIT = 32 }; for (int limit = 1; limit < LIMIT; ++limit) { auto before = high_resolution_clock::now(); for (int i = 0; i < NTESTS; ++i) { try { auto var = test( [&src] { return vector<char>(begin(src), end(src)); }, 16, 0, limit ); total += var; } catch (...) { } } auto after = high_resolution_clock::now(); cout << "With exceptions, " << NTESTS << " tests, depth " << setw(2) << limit << ": " << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl; } clog.rdbuf(nullptr); clog << total; }	int main() { const char src[] = "I love my teacher because he's very cool"; string::size_type total{}; enum { NTESTS = 10000, LIMIT = 32 }; for (int limit = 1; limit < LIMIT; ++limit) { auto before = high_resolution_clock::now(); for (int i = 0; i < NTESTS; ++i) { auto var = test( [&src] { return vector<char>(begin(src), end(src)); }, 16, 0, limit ); if (var != string::npos) total += var; } auto after = high_resolution_clock::now(); cout << "With errors, " << NTESTS << " tests, depth " << setw(2) << limit << ": " << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl; } clog.rdbuf(nullptr); clog << total; }
I ran the tests on two platforms. The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM). We can see that in this case, the code with exceptions is slower than the code with errors by a factor of approximately 1.5. This is more than most would have expected.	With exceptions, 10000 tests, depth 1: 54386 us With exceptions, 10000 tests, depth 2: 69768 us With exceptions, 10000 tests, depth 3: 84226 us With exceptions, 10000 tests, depth 4: 99579 us With exceptions, 10000 tests, depth 5: 123263 us With exceptions, 10000 tests, depth 6: 138252 us With exceptions, 10000 tests, depth 7: 153631 us With exceptions, 10000 tests, depth 8: 169649 us With exceptions, 10000 tests, depth 9: 193553 us With exceptions, 10000 tests, depth 10: 207907 us With exceptions, 10000 tests, depth 11: 227961 us With exceptions, 10000 tests, depth 12: 243927 us With exceptions, 10000 tests, depth 13: 273530 us With exceptions, 10000 tests, depth 14: 289658 us With exceptions, 10000 tests, depth 15: 305411 us With exceptions, 10000 tests, depth 16: 320519 us With exceptions, 10000 tests, depth 17: 345276 us With exceptions, 10000 tests, depth 18: 360545 us With exceptions, 10000 tests, depth 19: 376973 us With exceptions, 10000 tests, depth 20: 391885 us With exceptions, 10000 tests, depth 21: 418899 us With exceptions, 10000 tests, depth 22: 435420 us With exceptions, 10000 tests, depth 23: 450171 us With exceptions, 10000 tests, depth 24: 466055 us With exceptions, 10000 tests, depth 25: 487559 us With exceptions, 10000 tests, depth 26: 506852 us With exceptions, 10000 tests, depth 27: 524694 us With exceptions, 10000 tests, depth 28: 555336 us With exceptions, 10000 tests, depth 29: 569889 us With exceptions, 10000 tests, depth 30: 580770 us With exceptions, 10000 tests, depth 31: 598920 us	With errors, 10000 tests, depth 1: 17811 us With errors, 10000 tests, depth 2: 27918 us With errors, 10000 tests, depth 3: 38182 us With errors, 10000 tests, depth 4: 48061 us With errors, 10000 tests, depth 5: 66364 us With errors, 10000 tests, depth 6: 77027 us With errors, 10000 tests, depth 7: 86517 us With errors, 10000 tests, depth 8: 96786 us With errors, 10000 tests, depth 9: 114991 us With errors, 10000 tests, depth 10: 125874 us With errors, 10000 tests, depth 11: 138727 us With errors, 10000 tests, depth 12: 159482 us With errors, 10000 tests, depth 13: 170084 us With errors, 10000 tests, depth 14: 180867 us With errors, 10000 tests, depth 15: 190786 us With errors, 10000 tests, depth 16: 208081 us With errors, 10000 tests, depth 17: 218213 us With errors, 10000 tests, depth 18: 228715 us With errors, 10000 tests, depth 19: 239488 us With errors, 10000 tests, depth 20: 260270 us With errors, 10000 tests, depth 21: 271453 us With errors, 10000 tests, depth 22: 280826 us With errors, 10000 tests, depth 23: 292126 us With errors, 10000 tests, depth 24: 308666 us With errors, 10000 tests, depth 25: 318486 us With errors, 10000 tests, depth 26: 329852 us With errors, 10000 tests, depth 27: 339771 us With errors, 10000 tests, depth 28: 363561 us With errors, 10000 tests, depth 29: 375061 us With errors, 10000 tests, depth 30: 383913 us With errors, 10000 tests, depth 31: 398376 us
The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options : `g++ -std=c++14 -O2 -Wall -pedantic -pthread` In this case, the exceptions case is significantly slower...	With exceptions, 10000 tests, depth 1: 109584 us With exceptions, 10000 tests, depth 2: 160912 us With exceptions, 10000 tests, depth 3: 210568 us With exceptions, 10000 tests, depth 4: 261759 us With exceptions, 10000 tests, depth 5: 313235 us With exceptions, 10000 tests, depth 6: 362863 us With exceptions, 10000 tests, depth 7: 415040 us With exceptions, 10000 tests, depth 8: 464001 us With exceptions, 10000 tests, depth 9: 515755 us With exceptions, 10000 tests, depth 10: 569331 us With exceptions, 10000 tests, depth 11: 620290 us With exceptions, 10000 tests, depth 12: 674356 us With exceptions, 10000 tests, depth 13: 720531 us With exceptions, 10000 tests, depth 14: 771169 us With exceptions, 10000 tests, depth 15: 824409 us With exceptions, 10000 tests, depth 16: 873789 us With exceptions, 10000 tests, depth 17: 924100 us With exceptions, 10000 tests, depth 18: 976170 us With exceptions, 10000 tests, depth 19: 1026879 us With exceptions, 10000 tests, depth 20: 1076055 us With exceptions, 10000 tests, depth 21: 1126992 us With exceptions, 10000 tests, depth 22: 1180019 us With exceptions, 10000 tests, depth 23: 1226846 us With exceptions, 10000 tests, depth 24: 1282984 us With exceptions, 10000 tests, depth 25: 1328822 us With exceptions, 10000 tests, depth 26: 1380397 us With exceptions, 10000 tests, depth 27: 1431215 us With exceptions, 10000 tests, depth 28: 1489763 us With exceptions, 10000 tests, depth 29: 1538279 us With exceptions, 10000 tests, depth 30: 1590279 us With exceptions, 10000 tests, depth 31: 1637463 us	With errors, 10000 tests, depth 1: 10128 us With errors, 10000 tests, depth 2: 19673 us With errors, 10000 tests, depth 3: 29573 us With errors, 10000 tests, depth 4: 39263 us With errors, 10000 tests, depth 5: 48943 us With errors, 10000 tests, depth 6: 58651 us With errors, 10000 tests, depth 7: 68287 us With errors, 10000 tests, depth 8: 77962 us With errors, 10000 tests, depth 9: 87720 us With errors, 10000 tests, depth 10: 97372 us With errors, 10000 tests, depth 11: 107013 us With errors, 10000 tests, depth 12: 116845 us With errors, 10000 tests, depth 13: 126548 us With errors, 10000 tests, depth 14: 136250 us With errors, 10000 tests, depth 15: 145945 us With errors, 10000 tests, depth 16: 155688 us With errors, 10000 tests, depth 17: 165587 us With errors, 10000 tests, depth 18: 175600 us With errors, 10000 tests, depth 19: 185300 us With errors, 10000 tests, depth 20: 194976 us With errors, 10000 tests, depth 21: 204779 us With errors, 10000 tests, depth 22: 216605 us With errors, 10000 tests, depth 23: 226768 us With errors, 10000 tests, depth 24: 234541 us With errors, 10000 tests, depth 25: 244356 us With errors, 10000 tests, depth 26: 255442 us With errors, 10000 tests, depth 27: 265386 us With errors, 10000 tests, depth 28: 276807 us With errors, 10000 tests, depth 29: 286500 us With errors, 10000 tests, depth 30: 295008 us With errors, 10000 tests, depth 31: 305390 us

Impact on the Size of Binaries

I cannot compare the impact of exception-handling code on binary size from compiler to compiler right now (I used to have gcc and MSVC on my laptop but had a hard drive crash last week and have not gone around to reinstalling gcc yet; it's a matter of time). I can however compare its impact on the binary sizes with MSVC 2015. Thus, for the test sets we have on this page, we have :

Test Set	Binary Size with Exception Handling	Binary Size with Error Handling
Problem checked at various stages	$30208$ bytes	$29696$ bytes
Recursive function, problem detected at various depths, with SSO	$23040$ bytes	$22528$ bytes
Recursive function, problem detected at various depths, without SSO	$23040$ bytes	$22528$ bytes
Recursive function, problem detected at various depths, vector<char>	$21504$ bytes	$20992$ bytes

Using SSO or not has an impact on the programs runtime memory footprint, but not on its binary size. This was to be expected.

Summary

Let's take a more visual look at the data we have.

In the stack unwinding case with SSO-enabled string instances, we have (lower is better, vertical axis is in mocroseconds, horizontal axis is function call depth):

For this test, g++ with error handling is much better that g++ with exception handling. With MSVC, error handling is also faster than exception handling, but by a much smaller margin.

In the stack unwinding case with non-SSO-enabled string instances, we have (lower is better, vertical axis is in mocroseconds, horizontal axis is function call depth):

This case shows that the situation with g++ does not really change much, whereas code generated MSVC quite consistently performs better with exception handling than with error handling, except for very small depths (in which case the overhead on installing the initial try...catch block probably still shows).

In the stack unwinding case with vector<char> instances, we have (lower is better, vertical axis is in mocroseconds, horizontal axis is function call depth):

The pattern with vector<char> is very similar to the pattern with vector<string> given SSO-enabled string instances.

As far as executable binary size goes, we have (lower is better):

There is a difference in binary executable size. Legend has it that the difference is on the order of 10%, but that is not what my (small) tests seem to show. Indeed, here, the cost in binary size varies between 1.69% and 2.38%.

In summary:

There is, in general, a non-null cost to using exceptions
There ae use cases where exception handling is consistently faster that error handling
The cost in terms of binary size exist, but does not seem to be as high as some would expect
Important variations between compilers seem to suggest that implementations could be improved, at least as far as execution speed goes

You can use the code above and play with it. Should you find faults with the methodology or the test code, or should you find the time to run tests on other platforms, I would like to know what you get and if it diverges from what my personal experimentations seem to show.