Welcome to part 3 of Expressive C++, a series of articles devoted Embedded Domain-Specific Languages (EDSLs)1 and Boost.Proto, a library for implementing them in C++. The title of this article is intentionally provocative to give me the creative license I need to get this righteous rage out of my system, lay some much-deserved blame, and—after my catharsis—offer some constructive suggestions for improving the situation. You might be surprised at where I direct my ire, and also pleased to know that whether you’re a library author or a user, there are things you can do to help to improve the situation.
Eventually, we’ll bring the discussion back to EDSLs and apply my recommendations to the simple string formatting routine we developed in the last installment. By the end of the article, you’ll know how to syntax-check an EDSL by defining its grammar, validate that an expression matches the grammar, and issue a short and meaningful diagnostic if it doesn’t.
Template Errors: A Sad State Of Affairs
Breaking news from the C++ meta-verse2:
TEMPLATE ERROR MESSAGES ARE TERRIBLE!
<yawn>. That’s no news to anyone who’s used C++ in the last 10 years or so. Even simple misuses of template libraries can lead to 100′s of Kb of compiler spew. Who’s to blame? Take your pick: library authors, compiler vendors, or the C++ standardization committee? They’ve each felt their share of the heat. The key selling point of C++0x concepts (R.I.P.) was improved template errors. And one of the key selling points of clang, an exciting, new C/C++ compiler in active development, is better error messages. But in my personal experience as a library developer, I believe this problem begins at home: with poorly designed and implemented template libraries.
Library techniques for improved compile-time error detection and reporting have existed for a while, but they’re not commonly known or widely used.3 If folks only knew how much better the world would be if these techniques were consistently applied, we wouldn’t settle for 100′s of Kbs of compiler spew. We’d be outraged.
If I’m not being clear enough, let me say it explicitly, and in a way that’s likely to raise a few eyebrows:
Bad template errors are library bugs and should be reported as such.
The implication for library users is simple: stop cursing the darkess and start cursing library authors. Well, don’t curse them because they might be me. File bugs instead. Yes, really. (And if you just can’t wait for the bugs to be fixed, switch to clang or install STLFilt.)
What are the implications for library writers? What could a library author possibly do to fix these so-called “bugs”? And hey, why are bad template errors endemic in the first place? Simply put: a total lack of parameter validation.
Remedial Software 101
When you first starting writing code, someone probably told you how important it is to validate parameters at (runtime) API boundaries. Null pointers, out-of-bounds indices, incorrectly escaped URLs—if you fail to check for them, you’ll end up with runtime bugs that hackers can exploit. Any programmer worth his salt will tell you this.
But when those same programmers sit down to write a template, many tend to forget this very basic advice and blithely accept user-supplied types without doing any parameter checking at all. The result is a car wreck of epic proportions.
Let’s take the Boost.Spirit example from the Intro and modify it slightly:
1
2
3
4
5
6
7
8
9
10
11
|
#include <boost/spirit/home/qi.hpp>
int main()
{
using namespace boost::spirit::qi;
rule<char const *> expression, term, factor;
expression = term >> *( ( '+' >> term ) | ( '-' >> term ) ) ;
term = factor >> *( ( '*' >> ~factor ) | ( '/' >> factor ) ) ;
factor = uint_ | '(' >> expression >> ')' | '-' >> factor ;
}
|
Can you spot the typo in the code? Answer» It’s the tilde before factor
on line 9.Powered by Hackadelic Sliding Notes 1.6.5 The resulting 160 Kb of compiler spew is enough to make a sane programmer run screaming4:
In file included from /home/Eric/boost/org/trunk/boost/spir
it/home/qi/char.hpp:14:0,
from /home/Eric/boost/org/trunk/boost/spir
it/home/qi.hpp:17,
from main.cpp:1:
/home/Eric/boost/org/trunk/boost/spirit/home/qi/char/char_p
arser.hpp: In instantiation of ‘boost::spirit::qi::make_com
posite<boost::proto::tag::complement, boost::fusion::cons<b
oost::spirit::qi::reference<const boost::spirit::qi::rule<c
onst char*> >, boost::fusion::nil>, boost::fusion::unused_t
ype, void>’:
/home/Eric/boost/org/trunk/boost/spirit/home/qi/meta_compil
er.hpp:103:13: instantiated from ...
<snip enormous template instantiation backtrace>
/home/Eric/boost/org/trunk/boost/mpl/if.hpp:70:41: error: ‘
value’ is not a member of ‘boost::spirit::traits::has_no_un
used<boost::fusion::transform_view<boost::fusion::cons<boos
t::spirit::qi::sequence<boost::fusion::cons<boost::spirit::
qi::literal_char<boost::spirit::char_encoding::standard, tr
ue, false>, boost::fusion::cons<boost::spirit::qi::negated_
char_parser<boost::spirit::qi::reference<const boost::spiri
t::qi::rule<const char*> > >, boost::fusion::nil> > >, boos
t::fusion::cons<boost::spirit::qi::sequence<boost::fusion::
cons<boost::spirit::qi::literal_char<boost::spirit::char_en
coding::standard, true, false>, boost::fusion::cons<boost::
spirit::qi::reference<const boost::spirit::qi::rule<const c
har*> >, boost::fusion::nil> > >, boost::fusion::nil> >, bo
ost::spirit::traits::build_attribute_sequence<boost::fusion
::cons<boost::spirit::qi::sequence<boost::fusion::cons<boos
t::spirit::qi::literal_char<boost::spirit::char_encoding::s
tandard, true, false>, boost::fusion::cons<boost::spirit::q
i::negated_char_parser<boost::spirit::qi::reference<const b
oost::spirit::qi::rule<const char*> > >, boost::fusion::nil
> > >, boost::fusion::cons<boost::spirit::qi::sequence<boos
t::fusion::cons<boost::spirit::qi::literal_char<boost::spir
it::char_encoding::standard, true, false>, boost::fusion::c
ons<boost::spirit::qi::reference<const boost::spirit::qi::r
ule<const char*> >, boost::fusion::nil> > >, boost::fusion:
:nil> >, boost::spirit::context<boost::fusion::cons<boost::
fusion::unused_type&, boost::fusion::nil>, boost::fusion::v
ector0<> >, boost::mpl::identity, const char*>::element_att
ribute, boost::fusion::void_> >’
The mistake is that the definition of the term
rule is invalid, but it’s not easy to infer that from this mountain of spew.
The problem of error detection and reporting is particularly acute in EDSLs. A domain-specific language will typically have domain-specific errors about which the C++ compiler is ignorant. Any errors the compiler is allowed to emit are likely to be too low-level to make any sense to the EDSL user (like that horrific Spirit error). The library needs to detect and report domain-specific errors. Fortunately, if you are using Boost.Proto, you have some powerful tools at your disposal.6 Let’s see in detail what my advice means for the EDSL I developed in the previous article.
Mad Libs7 Formatting, Revisited
The Mad Libs-like string formatting API from the last article lets users format strings and specify map-like relationships inline. A typical usage looks like this:
std::cout << format("The home directory of {user} is {home}\n"
, map("user", "eric")
("home", "/home/eric") );
This expression should print:
The home directory of eric is /home/eric
The EDSL part is the second argument to format
. Since we’ll be referring back to it a lot, I’ll duplicate the complete example from the last article:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
#include <map>
#include <string>
#include <iostream>
#include <boost/proto/proto.hpp>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
struct map_ {};
boost::proto::terminal<map_>::type map = {};
typedef std::map<std::string, std::string> string_map;
// Recursive function used to fill the map
template< class Expr >
void fill_map( Expr const & expr, string_map & subs )
{
using boost::proto::value; // read a value from a terminal
using boost::proto::child_c; // get the Nth child of a non-terminal
subs[ value(child_c<1>( expr )) ] = value(child_c<2>(expr));
fill_map( child_c<0>(expr), subs );
}
// The 'map' terminal ends the recursion
void fill_map( boost::proto::terminal<map_>::type const &, string_map & )
{}
// The old format API that accepts a map of string substitutions
std::string format( std::string fmt, string_map & subs )
{
namespace xp = boost::xpressive;
using namespace xp;
sregex const rx = '{' >> (s1= +_w) >> '}'; // like "{(\\w+)}"
return regex_replace(fmt, rx, xp::ref(subs)[s1]);
}
// The new format API that forwards to the old one
template< class Expr >
std::string format( std::string fmt, Expr const & expr )
{
string_map subs;
fill_map( expr, subs );
return format( fmt, subs );
}
int main()
{
std::cout << format("The home directory of {user} is {home}\n"
, map("user", "eric")
("home", "/home/eric") );
}
|
Fill_map
expects to be given expression trees of a certain form. But notice how the second format
overload takes the map expression and simply forwards it to fill_map
on line 41 without any parameter validation at all. Let’s mess with the expression tree and see what happens:
std::cout << format("The home directory of {user} is {home}\n"
, map("user", L"eric")
("home", "/home/eric") );
Notice that I changed one string literal from narrow to wide. When I recompile the code with this most recent change, I get a 50+ line error message8:
Click here to view the error message.
The error occurs deep within our EDSL implementation. Had we validated the expr
parameter before calling fill_map
, we could have done much better. Let’s see how.
Proto Grammars
At first blush, validating the expr
parameter looks difficult. After all, the user could pass one of an infinite number of map expressions of arbitrary depth. But when we put our language-design goggles on, this problem looks much simpler: we just need to find the grammar to which all map expressions must conform. Then we just check that the expression matches the grammar.
This expression:
map("user", "eric")
("home", "/home/eric")
…builds a Proto expression tree that looks like this:
Figure 1: A map expression tree
In plain English, we can describe the structure of map expression trees as follows: a map expression is either:
- A
map
terminal, or
- A ternary function call where:
- The 0th child is a valid map expression tree (note recursion),
- The 1st child is a string, and
- The 2nd child is also a string
Using Proto’s support for defining grammars, we can define the MapGrammar
as follows (to be explained below):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
// Define the grammar of map expressions
struct MapGrammar
: proto::or_<
// A map expression is a map terminal, or
proto::terminal<map_>
// ... a ternary function non-terminal where the child nodes are
, proto::function<
// ... a map expression,
MapGrammar
// ... a narrow string terminal, and
, proto::terminal<char const *>
// ... another narrow string terminal.
, proto::terminal<char const *>
>
>
{};
|
Let’s take this one piece at a time:
- Line 2: Proto grammars are simple user-defined structs.
- Line 3: Inheritance is used to say that
MapGrammar
is expressed in terms of proto::or_
. Proto::or_
is used for grammar alternation like the |
operator in EBNF. An expression is allowed to match this or that. In Proto, alternate grammars are tried in order.
- Line 5: A map expression can be a simple
map
terminal. Note that proto::terminal<map_>
was used to define the global map
object on line 9 of the complete example. It is also used here as a grammar that matches that terminal.
- Line 7:
Proto::function
defines a grammar that matches Proto expression nodes created by overloaded function-call operators. Proto provides templates like function
for all the operators that Proto overloads. A complete list can be found in Proto’s documentation.
- Line 9: The zeroth child of the function node must match
MapGrammar
. This is interesting! It looks like we’re recursively defining MapGrammar
in terms of itself. Surprisingly, this is legal. In fact, you may already be familiar with this technique. It’s called the Curiously Recurring Template Pattern, or CRTP. It gives Proto a natural way to define recursive grammars.
- Lines 11-16: Nothing too surprising here. The other two children must be narrow string terminals. The
MapGrammar
struct itself is empty. That is always the case for Proto grammars.
Pause To Consider
By now, you might be feeling a bit overwhelmed. We just covered a lot of new ground, and this coding style might feel strange to you. But consider for a moment what we’ve just expressed and how concisely we’ve expressed it: we’ve defined in code the grammar for valid map expressions, and it took only a pittance of code to do it. This is quite an accomplishment. Take a moment to become comfortable with the definition of MapGrammar
. Grammars are the central pillar of Proto, and once you get grammars under your belt (is that a pillar under your belt? har har), you’ll really be able to do some neat things. In fact, all the powerful and interesting things you can do with Proto begin right here with grammars.
Validating Expressions Against Grammars
No doubt you’re wondering what we actually do with MapGrammar
. Proto provides a trait called proto::matches
for determining at compile time whether an expression type matches a given grammar. We can use proto::matches
in conjuction with C++0x’s static_assert
or various C++03 approximations of it (see note below) to halt compilation as soon as an invalid expression is detected.
With static_assert
, proto::matches
and MapGrammar
, we can modify our format
overload to validate the expr
parameter before passing it to the fill_map
function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
template< class Expr >
std::string format( std::string fmt, Expr const & expr )
{
/* READ THIS IF YOUR COMPILE BREAKS ON THE FOLLOWING LINE
*
* You have passed to format an invalid map expression.
* They should be of the form:
* map("this", "that")("here", "there")
*/
static_assert(
proto::matches<Expr, MapGrammar>::value
, "The map expression passed to format does not match MapGrammar");
string_map subs;
fill_map( expr, subs );
return format( fmt, subs );
}
|
When we pass our invalid expression to format
now, our error goes from 50+ lines to about 10, including this message9:
c:\scratch.cpp(94): error C2338: The map expression passed
to format does not match MapGrammar
Click here to view the full error.
This error is nicer because:
- It is shorter!
- The error message itself indicates what the problem might be.
- The error happens at the API boundary, not on some random line of code deep in the library’s guts.
- We’ve helpfully left a comment by the assert to let people know what’s wrong in case the assertion fails, and what to do to fix it.
If you don’t have a C++0x compiler with static_assert
, I recommend using Boost.MPL‘s BOOST_MPL_ASSERT_MSG
macro, which accepts a compile-time Boolean and a message to display if the Boolean is false. The static assertion on line 9 above would instead look like this:
BOOST_MPL_ASSERT_MSG(
(proto::matches<Expr, MapGrammar>::value),
THE_MAP_EXPRESSION_PASSED_TO_FORMAT_DOES_NOT_MATCH_MAPGRAMMAR,
(MapGrammar));
When this assertion fails, it emits an error like:
c:\scratch.cpp(115): error C2664: 'boost::mpl::assertion_fa
iled' : cannot convert parameter 1 from 'boost::mpl::failed
************(__thiscall format::THE_MAP_EXPRESSION_PASSED_T
O_FORMAT_DOES_NOT_MATCH_MAPGRAMMAR::* ***********)(MapGramm
ar)' to 'boost::mpl::assert::type'
Avoid Follow-on Errors
If you try the above example on gcc-4.5, you’ll find that rather than a shorter error, the static_assert
gives a longer one!
Click here to see the full error.
What’s going on here? If you trawl through the error spew, you can see the nice message from the static assertion, but it’s buried in a lot of other junk with two other errors from the guts of our EDSL implementation. Let’s look again at the new implementation of format
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
template< class Expr >
std::string format( std::string fmt, Expr const & expr )
{
/* READ THIS IF YOUR COMPILE BREAKS ON THE FOLLOWING LINE
*
* You have passed to format an invalid map expression.
* They should be of the form:
* map("this", "that")("here", "there")
*/
static_assert(
proto::matches<Expr, MapGrammar>::value
, "The map expression passed to format does not match MapGrammar");
string_map subs;
fill_map( expr, subs );
return format( fmt, subs );
}
|
The static_assert
on line 10 causes the nice diagnostic, but gcc helpfully keeps right on compiling, eventually reaching the call to fill_map
on line 14. We’ve already established that the call will fail to compile, but nobody told gcc it was OK to stop!
In general, it’s not enough to issue a diagnostic for the known errors. We must also avoid the follow-on diagnostics from overeager compilers like gcc. The answer is usually quite simple: move the guts to a separate function, and use static dispatch to call that function or an empty one depending on whether validation succeeded. A little code should make it clear:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
template< class Expr >
std::string format_impl( std::string fmt, Expr const & expr, boost::mpl::true_ )
{
string_map subs;
fill_map( expr, subs );
return format( fmt, subs );
}
template< class Expr >
std::string format_impl( std::string fmt, Expr const & expr, boost::mpl::false_ )
{
return std::string(); // never called for valid imput
}
template< class Expr >
std::string format( std::string fmt, Expr const & expr )
{
/* READ THIS IF YOUR COMPILE BREAKS ON THE FOLLOWING LINE
*
* You have passed to format an invalid map expression.
* They should be of the form:
* map("this", "that")("here", "there")
*/
static_assert(
proto::matches<Expr, MapGrammar>::value
, "The map expression passed to format does not match MapGrammar");
/* Dispatch to the real implementation or a stub depending on
whether our parameters are valid or not.
*/
return format_impl( fmt, expr, proto::matches<Expr, MapGrammar>() );
}
|
We added two overloads of a new function format_impl
. The first takes an extra argument of type boost::mpl::true_
and does the real work. The second takes boost::mpl::false_
and simply returns an empty string. The original format
function is now just a shell that (maybe) issues a diagnostic and dispatches to one or the other overload. Proto::matches
conveniently inherits from mpl::true_
or mpl::false_
accordingly to make this possible. With this change, the full error is much shorter:
scratch.cpp: In function ‘std::string format(std::string, c
onst Expr&) [with Expr = boost::proto::exprns_::expr<boost:
:proto::tag::function, boost::proto::argsns_::list3<const b
oost::proto::exprns_::expr<boost::proto::tag::function, boo
st::proto::argsns_::list3<boost::proto::exprns_::expr<boost
::proto::tag::terminal, boost::proto::argsns_::term<map_>,
0l>&, boost::proto::exprns_::expr<boost::proto::tag::termin
al, boost::proto::argsns_::term<const char (&)[5]>, 0l>, bo
ost::proto::exprns_::expr<boost::proto::tag::terminal, boos
t::proto::argsns_::term<const wchar_t (&)[5]>, 0l> >, 3l>&,
boost::proto::exprns_::expr<boost::proto::tag::terminal, bo
ost::proto::argsns_::term<const char (&)[5]>, 0l>, boost::p
roto::exprns_::expr<boost::proto::tag::terminal, boost::pro
to::argsns_::term<const char (&)[11]>, 0l> >, 3l>, std::str
ing = std::basic_string<char>]’:
scratch.cpp:126:55: instantiated from here
scratch.cpp:112:9: error: static assertion failed: "The map
expression passed to format does not match MapGrammar"
Conclusions and What's To Come
Thanks for reading. Since I'm priming you guys to be library authors, I feel obligated to give you the tools to make your libraries user-friendly. As you can see, we had to be a bit proactive about making our code behave well when passed garbage, but it wasn't so hard. Although I talked mostly about EDSLs and Proto, these techniques are applicable very broadly:
- Validate template parameters at API boundaries.
- Use C++0x's
static_assert
or a C++03 equivalent to issue readable diagnostics.
- Leave detailed comments by the static assertions to let people know what has gone wrong and how to fix it.
- Dispatch to stubs on invalid input to avoid follow-on failures.
These techniques can greatly reduce the amount of compiler spew C++ programmers encounter on a daily basis.
Proto grammars make validating expression trees easy and (dare I say it?) fun. But they are far more useful than that. You can use Proto grammars to restrict Proto's operator overloads to only those that create valid trees. And by embedding semantic actions (a.k.a transforms) within Proto grammars, you can write algorithms that manipulate trees and generate code in powerful ways. In future articles, we'll dig deep into Proto grammars and transforms. But first, we'll take a closer look at Proto expressions and how to extend them, adding and customizing member functions, making them anything but dumb, static trees.
Until next time, don't forget to validate your parameters. And if you see any bad template errors, file a bug!