Exceptions in C++ and their Costs

Some shortcuts :

This document aims to provide data on the cost of exceptions. It has been written to help discussions in SG14, and as such is written in English; be warned though that most links therein lead to my personal site, which is in French as this is the language I use when I teach most of the time.

TODO list :

This might take a few weeks, so be patient; I'll post updates on the mailing list when I'll get around to doing them.

Various Usage Scenarios

What is the relative cost of exceptions compared to returning and validating error codes? What follows makes the following assumptions:

For these tests, my work hypothesis were:

I hoped to measure the costs of exception handling and error handling in situations which should favor one and situations that should favor the other.

I used 64 bit binairies on a 64 bit machine, as SG14 members have expressed the opinion that this is the most relevant use case

There is an aesthetical aspect to the debate: exceptions do not deface a function's interface the way error handling does, for example; also, the non-local aspect of exception handling mechanisms avoid cluttering client code with error management code. These tests do not explore these issues

  Using Exceptions Using Error Codes

In both cases, the test data will be generated in the same fashion. Generation of test data is not included in the timing measurement below:

  • We create a vector<int> of size n
  • We fill it with n consecutive integer values starting at -100
  • We shuffle the values
#include <chrono>
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
#include <random>
#include <functional>
#include <cmath>
using namespace std;
using namespace std::chrono;
auto generate_test_data(int n)
{
   vector<int> v(n);
   generate_n(begin(v), n, [cur = -100]() mutable { return cur++; });
   random_device rd;
   mt19937 prng{ rd() };
   shuffle(begin(v), end(v), prng);
   return v;
}
#include <chrono>
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
#include <random>
#include <functional>
#include <cmath>
using namespace std;
using namespace std::chrono;

auto generate_test_data(int n)
{
   vector<int> v(n);
   generate_n(begin(v), n, [cur = -100]() mutable { return cur++; });
   random_device rd;
   mt19937 prng{ rd() };
   shuffle(begin(v), end(v), prng);
   return v;
}

We use similar functions for the exception-based tests and the error-baseds tests :

  • With exceptions, we call process_or_throw(n) which throws invalid_data if n is negative
  • With errors, we call process_or_signal(n,result) which returns false if n is negative, true otherwise. Since the return value is used to signal success of failure, an additional argument (result, passed by reference) is required for the function to provide its intended service. Alternatives would be returning a pair<bool,int> or using a form of expected<int>

Both functions perform the exact same processing in the case where n is valid.

class invalid_data {};

int process_or_throw(int n)
{
   if (n < 0) throw invalid_data{};
   return n * 2;
}
bool process_or_signal(int n, int &result)
{
   if (n < 0) return false;
   result = n * 2;
   return true;
}

The test named check_every_iteration(v) is meant to provide a worst-case scenario for exceptions. It behaves as follows:

  • Both functions keep a count in invalid values found in v. In our tests, the number of times this incrementation will be performed is identical in both tests
  • Each value is in v is passed to the processing function of its test
  • The success or failure of the processing function is checked in both cases using the mechanism measured
  • At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream

Both functions return a pair containing the time elapsed and the error count.

auto check_every_iteration(vector<int> v)
{
   clog.rdbuf(nullptr);
   auto before = high_resolution_clock::now();
   int nthrows = 0;
   for (auto & val : v)
   {
      try
      {
         val = process_or_throw(val);
      }
      catch (...)
      {
         ++nthrows;
      }
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, nthrows);
}
auto check_every_iteration(vector<int> v)
{
   clog.rdbuf(nullptr);
   auto before = high_resolution_clock::now();
   int nerrs = 0;
   for (auto & val : v)
   {
      int result;
      if (process_or_signal(val, result))
      {
         val = result;
      }
      else
      {
         ++nerrs;
      }
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, nerrs);
}

The test named check_general_early_exit(v) is meant to provide a close-to-worst-case scenario for exceptions. It behaves as follows:

  • Values in v are sorted in order to put a problem case right at the beginning
  • A global try...catch is used in the exception case, to amortize the cost of installing the exception-handling mechanics
  • An immediate exit mechanism is used in the error handling case to break out of the loop as soon as a problem is detected
  • Each value is in v is passed to the processing function of its test
  • At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream

Both functions return a pair containing the time elapsed and the element at which the first problem has been detected.

auto check_general_early_exit(vector<int> v)
{
   clog.rdbuf(nullptr);
   sort(begin(v), end(v));
   auto before = high_resolution_clock::now();
   vector<int>::size_type i = 0;
   try
   {
      for (; i != v.size(); ++i)
         v[i] = process_or_throw(v[i]);
   }
   catch (...)
   {
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, i);
}
auto check_general_early_exit(vector<int> v)
{
   clog.rdbuf(nullptr);
   sort(begin(v), end(v));
   auto before = high_resolution_clock::now();
   vector<int>::size_type i = 0;
   for (; i != v.size(); ++i)
   {
      int result;
      if (!process_or_signal(v[i], result)) break;
      v[i] = result;
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, i);
}

The test named check_general_late_exit(v) is meant to provide a case scenario suited for exceptions. It behaves as follows:

  • Values in v are sorted in order to put problem cases at the end
  • A global try...catch is used in the exception case, to amortize the cost of installing the exception-handling mechanics
  • An immediate exit mechanism is used in the error handling case to break out of the loop as soon as a problem is detected
  • Each value is in v is passed to the processing function of its test
  • At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream

Both functions return a pair containing the time elapsed and the element at which the first problem has been detected.

auto check_general_late_exit(vector<int> v)
{
   clog.rdbuf(nullptr);
   sort(begin(v), end(v), greater<>{});
   auto before = high_resolution_clock::now();
   vector<int>::size_type i = 0;
   try
   {
      for (; i != v.size(); ++i)
         v[i] = process_or_throw(v[i]);
   }
   catch (...)
   {
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, i);
}
auto check_general_late_exit(vector<int> v)
{
   clog.rdbuf(nullptr);
   sort(begin(v), end(v), greater<>{});
   auto before = high_resolution_clock::now();
   vector<int>::size_type i = 0;
   for (; i != v.size(); ++i)
   {
      int result;
      if (!process_or_signal(v[i], result)) break;
      v[i] = result;
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, i);
}

The test named check_general_no_exit(v) is meant to provide a close-to-ideal (see below for a discussion of this) case scenario suited for exceptions, as code always uses what is presumed to be the « fast path ». It behaves as follows:

  • Values in v are transformed to ensure there is no error case
  • The code is considered to be unaware of this and uses its normal exception- or error-handling mechanisms
  • A global try...catch is used in the exception case, to amortize the cost of installing the exception-handling mechanics
  • An immediate exit mechanism is used in the error handling case to break out of the loop as soon as a problem is detected
  • Each value is in v is passed to the processing function of its test
  • At the end, to avoid unwanted optimizations, a sum of the processed values is performed and this sum is « output » to a null output stream

Both functions return a pair containing the time elapsed and the element at which the first problem has been detected.

auto check_general_no_exit(vector<int> v)
{
   clog.rdbuf(nullptr);
   transform(begin(v), end(v), begin(v), [](int n) { return abs(n); });
   auto before = high_resolution_clock::now();
   vector<int>::size_type i = 0;
   try
   {
      for (; i != v.size(); ++i)
         v[i] = process_or_throw(v[i]);
   }
   catch (...)
   {
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, i);
}
auto check_general_no_exit(vector<int> v)
{
   clog.rdbuf(nullptr);
   transform(begin(v), end(v), begin(v), [](int n) { return abs(n); });
   auto before = high_resolution_clock::now();
   vector<int>::size_type i = 0;
   for (; i != v.size(); ++i)
   {
      int result;
      if (!process_or_signal(v[i], result)) break;
      v[i] = result;
   }
   auto after = high_resolution_clock::now();
   clog << accumulate(begin(v), end(v), 0);
   return make_pair(after - before, i);
}}

Both main() functions are identical, and run the same tests (except for the differences mentioned above) on the same data.

int main(){
   enum { N = 5000000 };
   auto v = generate_test_data(N);
   cout << "\nChecking for exceptions every iteration\n\n";
   auto every_it = check_every_iteration(v);
   cout << "Processed " << v.size() << " elements, of which "
        << every_it.second << " led to an exception being thrown\n"
        << "Elapsed time: " << duration_cast<microseconds>(every_it.first).count()
        << " microseconds" << endl;
   cout << "\nChecking for exceptions globally, early exit\n\n";
   auto early_exit = check_general_early_exit(v);
   cout << "Processed " << v.size() << " elements, of which element "
        << early_exit.second << " led to an exception being thrown\n"
        << "Elapsed time: " << duration_cast<microseconds>(early_exit.first).count()
        << " microseconds" << endl;
   cout << "\nChecking for exceptions globally, late exit\n\n";
   auto late_exit = check_general_late_exit(v);
   cout << "Processed " << v.size() << " elements, of which element "
        << late_exit.second << " led to an exception being thrown\n"
        << "Elapsed time: " << duration_cast<microseconds>(late_exit.first).count()
        << " microseconds" << endl;
   cout << "\nChecking for exceptions globally, no exit\n\n";
   auto no_check = check_general_no_exit(v);
   cout << "Processed " << v.size() << " elements\n"
        << "Elapsed time: " << duration_cast<microseconds>(no_check.first).count()
        << " microseconds" << endl;
}
int main()
{
   enum { N = 5000000 };
   auto v = generate_test_data(N);
   cout << "\nChecking for errors every iteration\n\n";
   auto every_it = check_every_iteration(v);
   cout << "Processed " << v.size() << " elements, of which "
        << every_it.second << " led to an error being signalled\n"
        << "Elapsed time: " << duration_cast<microseconds>(every_it.first).count()
        << " microseconds" << endl;
   cout << "\nChecking for errors globally, early exit\n\n";
   auto early_exit = check_general_early_exit(v);
   cout << "Processed " << v.size() << " elements, of which element "
        << early_exit.second << " led to an error being signalled\n"
        << "Elapsed time: " << duration_cast<microseconds>(early_exit.first).count()
        << " microseconds" << endl;
   cout << "\nChecking for errors globally, late exit\n\n";
   auto late_exit = check_general_late_exit(v);
   cout << "Processed " << v.size() << " elements, of which element "
        << late_exit.second << " led to an error being signalled\n"
        << "Elapsed time: " << duration_cast<microseconds>(late_exit.first).count()
        << " microseconds" << endl;
   cout << "\nChecking for errors globally, no exit\n\n";
   auto no_check = check_general_no_exit(v);
   cout << "Processed " << v.size() << " elements\n"
        << "Elapsed time: " << duration_cast<microseconds>(no_check.first).count()
        << " microseconds" << endl;
}

I ran the tests on two platforms.

The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM).

We can see that in this case, even in the scenarios most favorable to exceptions, error handling was faster.

Checking for exceptions every iteration

Processed 5000000 elements, of which 100 led to an exception being thrown
Elapsed time: 7749 microseconds

Checking for exceptions globally, early exit

Processed 5000000 elements, of which element 0 led to an exception being thrown
Elapsed time: 40 microseconds

Checking for exceptions globally, late exit

Processed 5000000 elements, of which element 499900 led to an exception being thrown
Elapsed time: 6034 microseconds

Checking for exceptions globally, no exit

Processed 5000000 elements
Elapsed time: 5970 microseconds
Checking for errors every iteration

Processed 5000000 elements, of which 100 led to an error being signalled
Elapsed time: 4553 microseconds

Checking for errors globally, early exit

Processed 5000000 elements, of which element 0 led to an error being signalled
Elapsed time: 0 microseconds

Checking for errors globally, late exit

Processed 5000000 elements, of which element 499900 led to an error being signalled
Elapsed time: 5277 microseconds

Checking for errors globally, no exit

Processed 5000000 elements
Elapsed time: 5239 microseconds

The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options :

g++ -std=c++14 -O2 -Wall -pedantic -pthread

This test shows that in the case most favorable to exceptions (the one where they are effectively exceptional), the exception-handling code is faster than the error-handling one.

Checking for exceptions every iteration

Processed 5000000 elements, of which 100 led to an exception being thrown
Elapsed time: 11958 microseconds

Checking for exceptions globally, early exit

Processed 5000000 elements, of which element 0 led to an exception being thrown
Elapsed time: 42 microseconds

Checking for exceptions globally, late exit

Processed 5000000 elements, of which element 4999900 led to an exception being thrown
Elapsed time: 13429 microseconds

Checking for exceptions globally, no exit

Processed 5000000 elements
Elapsed time: 9792 microseconds
Checking for errors every iteration

Processed 5000000 elements, of which 100 led to an error being signalled
Elapsed time: 10210 microseconds

Checking for errors globally, early exit

Processed 5000000 elements, of which element 0 led to an error being signalled
Elapsed time: 1 microseconds

Checking for errors globally, late exit

Processed 5000000 elements, of which element 4999900 led to an error being signalled
Elapsed time: 9979 microseconds

Checking for errors globally, no exit

Processed 5000000 elements
Elapsed time: 11623 microseconds

Some remarks:

The Expected Ideal Case for Exceptions

What would be the ideal case for exceptions? Simply : no try or catch blocks in the client code, unless there really is a need to (a) perform some cleanup code, e.g. in standard generic functions that are not expected to leave the program in an undefined state, or (b) in containers that have invariants to maintain. These algorithms and containers should then be exception-neutral and in general, once their cleanup stages have completed, let the exception escape where it should. We write code as if nothing bad happened. Should something occur indeed, if we handle it, it's because we care, otherwise, we crash due to an unhandled exception or to a call to std::terminate().

Stack Unwinding

When a deeply nested function faces a problem, it is sometimes necessary to unwind the stack up to the point, either one or many layers above, where the problem can be handled.

The following test uses a recursive function in which non-trivial local variables are instantiated, and provokes a problem situation at various depths down the call stack. In the case where an exception is raised, stack unwinding occurs at each level until someone catches the exception or the program crashes; in the case where an error is returned, said error needs to be treated at each level, and local variables still need to be destroyed as the return statements bring control back up the stack.

Stack Unwinding – With SSO

The test code I used follows.

For these tests, my work hypothesis were:

  Using Exceptions Using Error Codes

The code relies strictly on standard classes and algorithms.

#include <algorithm>
#include <iostream>
#include <iomanip>
#include <string>
#include <chrono>
#include <numeric>
#include <vector>
#include <iterator>
using namespace std;
using namespace std::chrono;
#include <algorithm>
#include <iostream>
#include <iomanip>
#include <string>
#include <chrono>
#include <numeric>
#include <vector>
#include <iterator>
using namespace std;
using namespace std::chrono;

Function test(gen,n,m,limit) creates a vector of n elements all initialized with gen(). Then, it computes the sum of the sizes of these string instances and performs

Variable rest is used here to ensure that the evaluation order of functions test() and accumulate() does not alter our metrics.

class limit_reached {};

template <class G>
   string::size_type test(G gen, int n, int m, int limit)
   {
      if (m == limit) throw limit_reached{};
      vector<string> v;
      v.reserve(n);
      generate_n(back_inserter(v), n, gen);
      auto rest = test(gen, n, m + 1, limit);
      return rest + accumulate(begin(v), end(v), string::size_type{}, [](string::size_type sz, const string &s) {
         return sz + s.size();
      });
   }
template <class G>
   string::size_type test(G gen, int n, int m, int limit)
   {
      if (m == limit) return string::npos;
      vector<string> v;
      v.reserve(n);
      generate_n(back_inserter(v), n, gen);
      auto rest = test(gen, n, m + 1, limit);
      if (rest == string::npos) return rest;
      return rest + accumulate(begin(v), end(v), string::size_type{}, [](string::size_type sz, const string &s) {
         return sz + s.size();
      });
   }

Both programs essentially do the same thing, but with exception handling on the left and with error handling on the right.

int main()
{
   string::size_type total{};
   enum { NTESTS = 10000, LIMIT = 64 };
   for (int limit = 1; limit < LIMIT; ++limit)
   {
      auto before = high_resolution_clock::now();
      for (int i = 0; i < NTESTS; ++i)
      {
         try
         {
            auto var = test(
               [] { return string("Yo"); }, 16, 0, limit
            );
            total += var;
         }
         catch (...)
         {
         }
      }
      auto after = high_resolution_clock::now();
      cout << "With exceptions, " << NTESTS << " tests, depth " << setw(2) << limit << ": "
           << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl;
   }
   clog.rdbuf(nullptr);
   clog << total;
}
int main()
{
   string::size_type total{};
   enum { NTESTS = 10000, LIMIT = 64 };
   for (int limit = 1; limit < LIMIT; ++limit)
   {
      auto before = high_resolution_clock::now();
      for (int i = 0; i < NTESTS; ++i)
      {
         auto var = test(
            [] { return string("Yo"); }, 16, 0, limit
         );
         if (var != string::npos)
            total += var;
      }
      auto after = high_resolution_clock::now();
      cout << "With errors, " << NTESTS << " tests, depth " << setw(2) << limit << ": "
           << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl;
   }
   clog.rdbuf(nullptr);
   clog << total;
}

I ran the tests on two platforms.

The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM).

We can see that in this case, the code with exceptions is slower than the code with errors by a factor of more than two. This is more than most would have expected.

With exceptions, 10000 tests, depth  1:    41327 us
With exceptions, 10000 tests, depth  2:    52459 us
With exceptions, 10000 tests, depth  3:    63306 us
With exceptions, 10000 tests, depth  4:    75269 us
With exceptions, 10000 tests, depth  5:    87089 us
With exceptions, 10000 tests, depth  6:    98491 us
With exceptions, 10000 tests, depth  7:   108981 us
With exceptions, 10000 tests, depth  8:   120747 us
With exceptions, 10000 tests, depth  9:   132695 us
With exceptions, 10000 tests, depth 10:   145205 us
With exceptions, 10000 tests, depth 11:   156599 us
With exceptions, 10000 tests, depth 12:   168957 us
With exceptions, 10000 tests, depth 13:   180864 us
With exceptions, 10000 tests, depth 14:   191661 us
With exceptions, 10000 tests, depth 15:   204039 us
With exceptions, 10000 tests, depth 16:   216198 us
With exceptions, 10000 tests, depth 17:   226950 us
With exceptions, 10000 tests, depth 18:   239023 us
With exceptions, 10000 tests, depth 19:   251166 us
With exceptions, 10000 tests, depth 20:   262936 us
With exceptions, 10000 tests, depth 21:   274267 us
With exceptions, 10000 tests, depth 22:   284392 us
With exceptions, 10000 tests, depth 23:   296929 us
With exceptions, 10000 tests, depth 24:   307826 us
With exceptions, 10000 tests, depth 25:   321245 us
With exceptions, 10000 tests, depth 26:   332741 us
With exceptions, 10000 tests, depth 27:   343095 us
With exceptions, 10000 tests, depth 28:   357019 us
With exceptions, 10000 tests, depth 29:   367148 us
With exceptions, 10000 tests, depth 30:   378649 us
With exceptions, 10000 tests, depth 31:   395339 us
With errors, 10000 tests, depth  1:     6206 us
With errors, 10000 tests, depth  2:    11396 us
With errors, 10000 tests, depth  3:    16764 us
With errors, 10000 tests, depth  4:    22296 us
With errors, 10000 tests, depth  5:    28065 us
With errors, 10000 tests, depth  6:    33469 us
With errors, 10000 tests, depth  7:    39449 us
With errors, 10000 tests, depth  8:    44370 us
With errors, 10000 tests, depth  9:    50794 us
With errors, 10000 tests, depth 10:    55546 us
With errors, 10000 tests, depth 11:    61550 us
With errors, 10000 tests, depth 12:    67278 us
With errors, 10000 tests, depth 13:    72448 us
With errors, 10000 tests, depth 14:    78609 us
With errors, 10000 tests, depth 15:    84046 us
With errors, 10000 tests, depth 16:    89776 us
With errors, 10000 tests, depth 17:    95685 us
With errors, 10000 tests, depth 18:   100960 us
With errors, 10000 tests, depth 19:   106435 us
With errors, 10000 tests, depth 20:   111976 us
With errors, 10000 tests, depth 21:   118207 us
With errors, 10000 tests, depth 22:   124841 us
With errors, 10000 tests, depth 23:   128929 us
With errors, 10000 tests, depth 24:   135166 us
With errors, 10000 tests, depth 25:   140394 us
With errors, 10000 tests, depth 26:   146619 us
With errors, 10000 tests, depth 27:   151445 us
With errors, 10000 tests, depth 28:   158840 us
With errors, 10000 tests, depth 29:   163460 us
With errors, 10000 tests, depth 30:   169006 us
With errors, 10000 tests, depth 31:   180258 us

The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options :

g++ -std=c++14 -O2 -Wall -pedantic -pthread

Note that since I used an online compiler, execution runs on a limited budget, which was exceeded in the exceptions case. In this case, the exceptions case is significantly slower...

With exceptions, 10000 tests, depth  1:    95439 us
With exceptions, 10000 tests, depth  2:   133593 us
With exceptions, 10000 tests, depth  3:   171083 us
With exceptions, 10000 tests, depth  4:   208772 us
With exceptions, 10000 tests, depth  5:   249903 us
With exceptions, 10000 tests, depth  6:   287059 us
With exceptions, 10000 tests, depth  7:   327783 us
With exceptions, 10000 tests, depth  8:   365278 us
With exceptions, 10000 tests, depth  9:   402601 us
With exceptions, 10000 tests, depth 10:   442150 us
With exceptions, 10000 tests, depth 11:   477466 us
With exceptions, 10000 tests, depth 12:   514387 us
With exceptions, 10000 tests, depth 13:   554215 us
With exceptions, 10000 tests, depth 14:   592790 us
With exceptions, 10000 tests, depth 15:   633356 us
With exceptions, 10000 tests, depth 16:   668736 us
With exceptions, 10000 tests, depth 17:   704979 us
With exceptions, 10000 tests, depth 18:   744218 us
With exceptions, 10000 tests, depth 19:   781855 us
With exceptions, 10000 tests, depth 20:   828158 us
With exceptions, 10000 tests, depth 21:   855371 us
With exceptions, 10000 tests, depth 22:   895674 us
With exceptions, 10000 tests, depth 23:   947332 us
With exceptions, 10000 tests, depth 24:   976160 us
With exceptions, 10000 tests, depth 25:  1015844 us
With exceptions, 10000 tests, depth 26:  1058750 us
With exceptions, 10000 tests, depth 27:  1086016 us
With exceptions, 10000 tests, depth 28:  1125130 us
With errors, 10000 tests, depth  1:     2597 us
With errors, 10000 tests, depth  2:     5256 us
With errors, 10000 tests, depth  3:     7866 us
With errors, 10000 tests, depth  4:    10517 us
With errors, 10000 tests, depth  5:    13051 us
With errors, 10000 tests, depth  6:    15589 us
With errors, 10000 tests, depth  7:    18104 us
With errors, 10000 tests, depth  8:    20642 us
With errors, 10000 tests, depth  9:    23138 us
With errors, 10000 tests, depth 10:    25701 us
With errors, 10000 tests, depth 11:    28141 us
With errors, 10000 tests, depth 12:    30821 us
With errors, 10000 tests, depth 13:    33157 us
With errors, 10000 tests, depth 14:    35895 us
With errors, 10000 tests, depth 15:    38441 us
With errors, 10000 tests, depth 16:    40979 us
With errors, 10000 tests, depth 17:    43506 us
With errors, 10000 tests, depth 18:    46140 us
With errors, 10000 tests, depth 19:    48465 us
With errors, 10000 tests, depth 20:    50831 us
With errors, 10000 tests, depth 21:    53425 us
With errors, 10000 tests, depth 22:    56314 us
With errors, 10000 tests, depth 23:    58935 us
With errors, 10000 tests, depth 24:    61356 us
With errors, 10000 tests, depth 25:    63722 us
With errors, 10000 tests, depth 26:    66624 us
With errors, 10000 tests, depth 27:    69257 us
With errors, 10000 tests, depth 28:    71843 us
With errors, 10000 tests, depth 29:    74940 us
With errors, 10000 tests, depth 30:    76695 us
With errors, 10000 tests, depth 31:    79393 us

Stack Unwinding – Without SSO

I tried the exact same code but with a longer string, "I love my teacher because he's very cool", in order to avoid the Small String Optimization (SSO). Thus, the only change I made was:

Before After
auto var = test(
   [] { return string("Yo"); }, 16, 0, limit
);
auto var = test(
   [] { return string("I love my teacher because he's very cool"); }, 16, 0, limit
);

Let's see what the impact of this change is.

For these tests, my work hypothesis were:

I ran the tests on two platforms.

The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM).

In this case, the code with exceptions is faster than the code with errors. The main difference between this test and the previous one is that this test relies on dynamic memory allocation within each string whereas the previous one did not.

With exceptions, 10000 tests, depth  1:    41327 us
With exceptions, 10000 tests, depth  2:    52459 us
With exceptions, 10000 tests, depth  3:    63306 us
With exceptions, 10000 tests, depth  4:    75269 us
With exceptions, 10000 tests, depth  5:    87089 us
With exceptions, 10000 tests, depth  6:    98491 us
With exceptions, 10000 tests, depth  7:   108981 us
With exceptions, 10000 tests, depth  8:   120747 us
With exceptions, 10000 tests, depth  9:   132695 us
With exceptions, 10000 tests, depth 10:   145205 us
With exceptions, 10000 tests, depth 11:   156599 us
With exceptions, 10000 tests, depth 12:   168957 us
With exceptions, 10000 tests, depth 13:   180864 us
With exceptions, 10000 tests, depth 14:   191661 us
With exceptions, 10000 tests, depth 15:   204039 us
With exceptions, 10000 tests, depth 16:   216198 us
With exceptions, 10000 tests, depth 17:   226950 us
With exceptions, 10000 tests, depth 18:   239023 us
With exceptions, 10000 tests, depth 19:   251166 us
With exceptions, 10000 tests, depth 20:   262936 us
With exceptions, 10000 tests, depth 21:   274267 us
With exceptions, 10000 tests, depth 22:   284392 us
With exceptions, 10000 tests, depth 23:   296929 us
With exceptions, 10000 tests, depth 24:   307826 us
With exceptions, 10000 tests, depth 25:   321245 us
With exceptions, 10000 tests, depth 26:   332741 us
With exceptions, 10000 tests, depth 27:   343095 us
With exceptions, 10000 tests, depth 28:   357019 us
With exceptions, 10000 tests, depth 29:   367148 us
With exceptions, 10000 tests, depth 30:   378649 us
With exceptions, 10000 tests, depth 31:   395339 us
With errors, 10000 tests, depth  1:    20827 us
With errors, 10000 tests, depth  2:    34465 us
With errors, 10000 tests, depth  3:    47497 us
With errors, 10000 tests, depth  4:    61512 us
With errors, 10000 tests, depth  5:    79681 us
With errors, 10000 tests, depth  6:    93367 us
With errors, 10000 tests, depth  7:   106139 us
With errors, 10000 tests, depth  8:   118248 us
With errors, 10000 tests, depth  9:   138902 us
With errors, 10000 tests, depth 10:   152578 us
With errors, 10000 tests, depth 11:   165008 us
With errors, 10000 tests, depth 12:   177862 us
With errors, 10000 tests, depth 13:   201981 us
With errors, 10000 tests, depth 14:   214070 us
With errors, 10000 tests, depth 15:   227796 us
With errors, 10000 tests, depth 16:   240052 us
With errors, 10000 tests, depth 17:   260457 us
With errors, 10000 tests, depth 18:   272261 us
With errors, 10000 tests, depth 19:   286947 us
With errors, 10000 tests, depth 20:   297481 us
With errors, 10000 tests, depth 21:   322884 us
With errors, 10000 tests, depth 22:   336485 us
With errors, 10000 tests, depth 23:   347553 us
With errors, 10000 tests, depth 24:   361057 us
With errors, 10000 tests, depth 25:   380685 us
With errors, 10000 tests, depth 26:   392628 us
With errors, 10000 tests, depth 27:   404773 us
With errors, 10000 tests, depth 28:   418660 us
With errors, 10000 tests, depth 29:   444442 us
With errors, 10000 tests, depth 30:   457051 us
With errors, 10000 tests, depth 31:   472818 us

The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options :

g++ -std=c++14 -O2 -Wall -pedantic -pthread

Note that since I used an online compiler, execution runs on a limited budget, which was exceeded in the exceptions case. In this case again, the exceptions case is significantly slower. Could this be due to the quality on the string implementation? Hard to judge from this viewpoint.

With exceptions, 10000 tests, depth  1:   107870 us
With exceptions, 10000 tests, depth  2:   161449 us
With exceptions, 10000 tests, depth  3:   209145 us
With exceptions, 10000 tests, depth  4:   262849 us
With exceptions, 10000 tests, depth  5:   310293 us
With exceptions, 10000 tests, depth  6:   365627 us
With exceptions, 10000 tests, depth  7:   413848 us
With exceptions, 10000 tests, depth  8:   456456 us
With exceptions, 10000 tests, depth  9:   510788 us
With exceptions, 10000 tests, depth 10:   564795 us
With exceptions, 10000 tests, depth 11:   611621 us
With exceptions, 10000 tests, depth 12:   665246 us
With exceptions, 10000 tests, depth 13:   724558 us
With exceptions, 10000 tests, depth 14:   772821 us
With exceptions, 10000 tests, depth 15:   821923 us
With exceptions, 10000 tests, depth 16:   867255 us
With exceptions, 10000 tests, depth 17:   921280 us
With exceptions, 10000 tests, depth 18:   964334 us
With exceptions, 10000 tests, depth 19:  1016856 us
With exceptions, 10000 tests, depth 20:  1068728 us
With exceptions, 10000 tests, depth 21:  1113028 us
With exceptions, 10000 tests, depth 22:  1168277 us
With exceptions, 10000 tests, depth 23:  1224574 us
With exceptions, 10000 tests, depth 24:  1268895 us
With errors, 10000 tests, depth  1:     9855 us
With errors, 10000 tests, depth  2:    19209 us
With errors, 10000 tests, depth  3:    29060 us
With errors, 10000 tests, depth  4:    39017 us
With errors, 10000 tests, depth  5:    47589 us
With errors, 10000 tests, depth  6:    57281 us
With errors, 10000 tests, depth  7:    66952 us
With errors, 10000 tests, depth  8:    76470 us
With errors, 10000 tests, depth  9:    85957 us
With errors, 10000 tests, depth 10:    95373 us
With errors, 10000 tests, depth 11:   104931 us
With errors, 10000 tests, depth 12:   114036 us
With errors, 10000 tests, depth 13:   124811 us
With errors, 10000 tests, depth 14:   132930 us
With errors, 10000 tests, depth 15:   143297 us
With errors, 10000 tests, depth 16:   152643 us
With errors, 10000 tests, depth 17:   161585 us
With errors, 10000 tests, depth 18:   171137 us
With errors, 10000 tests, depth 19:   181506 us
With errors, 10000 tests, depth 20:   192037 us
With errors, 10000 tests, depth 21:   200857 us
With errors, 10000 tests, depth 22:   210361 us
With errors, 10000 tests, depth 23:   220087 us
With errors, 10000 tests, depth 24:   228843 us
With errors, 10000 tests, depth 25:   238381 us
With errors, 10000 tests, depth 26:   248758 us
With errors, 10000 tests, depth 27:   258322 us
With errors, 10000 tests, depth 28:   269589 us
With errors, 10000 tests, depth 29:   278104 us
With errors, 10000 tests, depth 30:   287786 us
With errors, 10000 tests, depth 31:   294860 us

Stack Unwinding – Replacing std::string with std::vector<char>

There is a risk that the results of the previous test could be influenced by the implementation of std::string. To verify this, I used a slightly different version with std::vector<char> instead.

For these tests, my work hypothesis were:

  Using Exceptions Using Error Codes

The code relies strictly on standard classes and algorithms.

#include <algorithm>
#include <iostream>
#include <iomanip>
#include <string>
#include <chrono>
#include <numeric>
#include <vector>
#include <iterator>
using namespace std;
using namespace std::chrono;
#include <algorithm>
#include <iostream>
#include <iomanip>
#include <string>
#include <chrono>
#include <numeric>
#include <vector>
#include <iterator>
using namespace std;
using namespace std::chrono;

Function test(gen,n,m,limit) creates a vector of n elements all initialized with gen(). Then, it computes the sum of the sizes of these string instances and performs

Variable rest is used here to ensure that the evaluation order of functions test() and accumulate() does not alter our metrics.

class limit_reached {};

template <class G>
string::size_type test(G gen, int n, int m, int limit)
{
   if (m == limit) throw limit_reached{};
   vector<vector<char>> v;
   v.reserve(n);
   generate_n(back_inserter(v), n, gen);
   auto var = test(gen, n, m + 1, limit);
   return accumulate(begin(v), end(v), vector<char>::size_type{}, [](vector<char>::size_type sz, const vector<char> &s) {
      return sz + s.size();
   }) + var;
}
template <class G>
string::size_type test(G gen, int n, int m, int limit)
{
   if (m == limit) return string::npos;
   vector<vector<char>> v;
   v.reserve(n);
   generate_n(back_inserter(v), n, gen);
   auto var = test(gen, n, m + 1, limit);
   return accumulate(begin(v), end(v), vector<char>::size_type{}, [](vector<char>::size_type sz, const vector<char> &s) {
      return sz + s.size();
   }) + var;
}

Both programs essentially do the same thing, but with exception handling on the left and with error handling on the right.

int main()
{
   const char src[] = "I love my teacher because he's very cool";
   string::size_type total{};
   enum { NTESTS = 10000, LIMIT = 32 };
   for (int limit = 1; limit < LIMIT; ++limit)
   {
      auto before = high_resolution_clock::now();
      for (int i = 0; i < NTESTS; ++i)
      {
         try
         {
            auto var = test(
               [&src] { return vector<char>(begin(src), end(src)); }, 16, 0, limit
            );
            total += var;
         }
         catch (...)
         {
         }
      }
      auto after = high_resolution_clock::now();
      cout << "With exceptions, " << NTESTS << " tests, depth " << setw(2) << limit << ": "
         << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl;
   }
   clog.rdbuf(nullptr);
   clog << total;
}
int main()
{
   const char src[] = "I love my teacher because he's very cool";
   string::size_type total{};
   enum { NTESTS = 10000, LIMIT = 32 };
   for (int limit = 1; limit < LIMIT; ++limit)
   {
      auto before = high_resolution_clock::now();
      for (int i = 0; i < NTESTS; ++i)
      {
         auto var = test(
            [&src] { return vector<char>(begin(src), end(src)); }, 16, 0, limit
         );
         if (var != string::npos)
            total += var;
      }
      auto after = high_resolution_clock::now();
      cout << "With errors, " << NTESTS << " tests, depth " << setw(2) << limit << ": "
         << setw(8) << duration_cast<microseconds>(after - before).count() << " us" << endl;
   }
   clog.rdbuf(nullptr);
   clog << total;
}

I ran the tests on two platforms.

The output visible on the right comes from code generated with MSVC 2015, Community Edition, Release, 64 bits. I used my personal laptop for these tests (Windows 7, 2.6 GHz, 8GB of RAM).

We can see that in this case, the code with exceptions is slower than the code with errors by a factor of approximately 1.5. This is more than most would have expected.

With exceptions, 10000 tests, depth  1:    54386 us
With exceptions, 10000 tests, depth  2:    69768 us
With exceptions, 10000 tests, depth  3:    84226 us
With exceptions, 10000 tests, depth  4:    99579 us
With exceptions, 10000 tests, depth  5:   123263 us
With exceptions, 10000 tests, depth  6:   138252 us
With exceptions, 10000 tests, depth  7:   153631 us
With exceptions, 10000 tests, depth  8:   169649 us
With exceptions, 10000 tests, depth  9:   193553 us
With exceptions, 10000 tests, depth 10:   207907 us
With exceptions, 10000 tests, depth 11:   227961 us
With exceptions, 10000 tests, depth 12:   243927 us
With exceptions, 10000 tests, depth 13:   273530 us
With exceptions, 10000 tests, depth 14:   289658 us
With exceptions, 10000 tests, depth 15:   305411 us
With exceptions, 10000 tests, depth 16:   320519 us
With exceptions, 10000 tests, depth 17:   345276 us
With exceptions, 10000 tests, depth 18:   360545 us
With exceptions, 10000 tests, depth 19:   376973 us
With exceptions, 10000 tests, depth 20:   391885 us
With exceptions, 10000 tests, depth 21:   418899 us
With exceptions, 10000 tests, depth 22:   435420 us
With exceptions, 10000 tests, depth 23:   450171 us
With exceptions, 10000 tests, depth 24:   466055 us
With exceptions, 10000 tests, depth 25:   487559 us
With exceptions, 10000 tests, depth 26:   506852 us
With exceptions, 10000 tests, depth 27:   524694 us
With exceptions, 10000 tests, depth 28:   555336 us
With exceptions, 10000 tests, depth 29:   569889 us
With exceptions, 10000 tests, depth 30:   580770 us
With exceptions, 10000 tests, depth 31:   598920 us
With errors, 10000 tests, depth  1:    17811 us
With errors, 10000 tests, depth  2:    27918 us
With errors, 10000 tests, depth  3:    38182 us
With errors, 10000 tests, depth  4:    48061 us
With errors, 10000 tests, depth  5:    66364 us
With errors, 10000 tests, depth  6:    77027 us
With errors, 10000 tests, depth  7:    86517 us
With errors, 10000 tests, depth  8:    96786 us
With errors, 10000 tests, depth  9:   114991 us
With errors, 10000 tests, depth 10:   125874 us
With errors, 10000 tests, depth 11:   138727 us
With errors, 10000 tests, depth 12:   159482 us
With errors, 10000 tests, depth 13:   170084 us
With errors, 10000 tests, depth 14:   180867 us
With errors, 10000 tests, depth 15:   190786 us
With errors, 10000 tests, depth 16:   208081 us
With errors, 10000 tests, depth 17:   218213 us
With errors, 10000 tests, depth 18:   228715 us
With errors, 10000 tests, depth 19:   239488 us
With errors, 10000 tests, depth 20:   260270 us
With errors, 10000 tests, depth 21:   271453 us
With errors, 10000 tests, depth 22:   280826 us
With errors, 10000 tests, depth 23:   292126 us
With errors, 10000 tests, depth 24:   308666 us
With errors, 10000 tests, depth 25:   318486 us
With errors, 10000 tests, depth 26:   329852 us
With errors, 10000 tests, depth 27:   339771 us
With errors, 10000 tests, depth 28:   363561 us
With errors, 10000 tests, depth 29:   375061 us
With errors, 10000 tests, depth 30:   383913 us
With errors, 10000 tests, depth 31:   398376 us

The output on the right in this case comes from the Coliru online compiler. The code was compiled with the following options :

g++ -std=c++14 -O2 -Wall -pedantic -pthread

In this case, the exceptions case is significantly slower...

With exceptions, 10000 tests, depth  1:   109584 us
With exceptions, 10000 tests, depth  2:   160912 us
With exceptions, 10000 tests, depth  3:   210568 us
With exceptions, 10000 tests, depth  4:   261759 us
With exceptions, 10000 tests, depth  5:   313235 us
With exceptions, 10000 tests, depth  6:   362863 us
With exceptions, 10000 tests, depth  7:   415040 us
With exceptions, 10000 tests, depth  8:   464001 us
With exceptions, 10000 tests, depth  9:   515755 us
With exceptions, 10000 tests, depth 10:   569331 us
With exceptions, 10000 tests, depth 11:   620290 us
With exceptions, 10000 tests, depth 12:   674356 us
With exceptions, 10000 tests, depth 13:   720531 us
With exceptions, 10000 tests, depth 14:   771169 us
With exceptions, 10000 tests, depth 15:   824409 us
With exceptions, 10000 tests, depth 16:   873789 us
With exceptions, 10000 tests, depth 17:   924100 us
With exceptions, 10000 tests, depth 18:   976170 us
With exceptions, 10000 tests, depth 19:  1026879 us
With exceptions, 10000 tests, depth 20:  1076055 us
With exceptions, 10000 tests, depth 21:  1126992 us
With exceptions, 10000 tests, depth 22:  1180019 us
With exceptions, 10000 tests, depth 23:  1226846 us
With exceptions, 10000 tests, depth 24:  1282984 us
With exceptions, 10000 tests, depth 25:  1328822 us
With exceptions, 10000 tests, depth 26:  1380397 us
With exceptions, 10000 tests, depth 27:  1431215 us
With exceptions, 10000 tests, depth 28:  1489763 us
With exceptions, 10000 tests, depth 29:  1538279 us
With exceptions, 10000 tests, depth 30:  1590279 us
With exceptions, 10000 tests, depth 31:  1637463 us
With errors, 10000 tests, depth  1:    10128 us
With errors, 10000 tests, depth  2:    19673 us
With errors, 10000 tests, depth  3:    29573 us
With errors, 10000 tests, depth  4:    39263 us
With errors, 10000 tests, depth  5:    48943 us
With errors, 10000 tests, depth  6:    58651 us
With errors, 10000 tests, depth  7:    68287 us
With errors, 10000 tests, depth  8:    77962 us
With errors, 10000 tests, depth  9:    87720 us
With errors, 10000 tests, depth 10:    97372 us
With errors, 10000 tests, depth 11:   107013 us
With errors, 10000 tests, depth 12:   116845 us
With errors, 10000 tests, depth 13:   126548 us
With errors, 10000 tests, depth 14:   136250 us
With errors, 10000 tests, depth 15:   145945 us
With errors, 10000 tests, depth 16:   155688 us
With errors, 10000 tests, depth 17:   165587 us
With errors, 10000 tests, depth 18:   175600 us
With errors, 10000 tests, depth 19:   185300 us
With errors, 10000 tests, depth 20:   194976 us
With errors, 10000 tests, depth 21:   204779 us
With errors, 10000 tests, depth 22:   216605 us
With errors, 10000 tests, depth 23:   226768 us
With errors, 10000 tests, depth 24:   234541 us
With errors, 10000 tests, depth 25:   244356 us
With errors, 10000 tests, depth 26:   255442 us
With errors, 10000 tests, depth 27:   265386 us
With errors, 10000 tests, depth 28:   276807 us
With errors, 10000 tests, depth 29:   286500 us
With errors, 10000 tests, depth 30:   295008 us
With errors, 10000 tests, depth 31:   305390 us

Impact on the Size of Binaries

I cannot compare the impact of exception-handling code on binary size from compiler to compiler right now (I used to have gcc and MSVC on my laptop but had a hard drive crash last week and have not gone around to reinstalling gcc yet; it's a matter of time). I can however compare its impact on the binary sizes with MSVC 2015. Thus, for the test sets we have on this page, we have :

Test Set Binary Size with Exception Handling Binary Size with Error Handling

Problem checked at various stages

bytes

bytes

Recursive function, problem detected at various depths, with SSO

bytes

bytes

Recursive function, problem detected at various depths, without SSO

bytes

bytes

Recursive function, problem detected at various depths, vector<char>

bytes

bytes

Using SSO or not has an impact on the programs runtime memory footprint, but not on its binary size. This was to be expected.

Summary

Let's take a more visual look at the data we have.

In the stack unwinding case with SSO-enabled string instances, we have (lower is better, vertical axis is in mocroseconds, horizontal axis is function call depth):

For this test, g++ with error handling is much better that g++ with exception handling. With MSVC, error handling is also faster than exception handling, but by a much smaller margin.

In the stack unwinding case with non-SSO-enabled string instances, we have (lower is better, vertical axis is in mocroseconds, horizontal axis is function call depth):

This case shows that the situation with g++ does not really change much, whereas code generated MSVC quite consistently performs better with exception handling than with error handling, except for very small depths (in which case the overhead on installing the initial try...catch block probably still shows).

In the stack unwinding case with vector<char> instances, we have (lower is better, vertical axis is in mocroseconds, horizontal axis is function call depth):

The pattern with vector<char> is very similar to the pattern with vector<string> given SSO-enabled string instances.

As far as executable binary size goes, we have (lower is better):

There is a difference in binary executable size. Legend has it that the difference is on the order of 10%, but that is not what my (small) tests seem to show. Indeed, here, the cost in binary size varies between 1.69% and 2.38%.

In summary:

You can use the code above and play with it. Should you find faults with the methodology or the test code, or should you find the time to run tests on other platforms, I would like to know what you get and if it diverges from what my personal experimentations seem to show.


Valid XHTML 1.0 Transitional

CSS Valide !