As a C++ project grows and matures, the following line is inevitably spoken: “The build is too slow”. It doesn’t really matter how long the build actually takes; it is just taking longer than it was. Things like this are an inevitability as the project grows in size and scope.

In this post I’ll talk specifically about my recent use of **forward declarations** to vastly improve build times on one of those projects, and how you can too.

What are forward declarations?

A forward declaration in C++ is when you declare something before its implementation. For example:

class Foo; // a forward declaration for class Foo // ... class Foo{ // the actual declaration for class Foo int member_one; // ... };

You can forward declare more than just a class, but in this article I’m only referring to class forward declarations.

When you forward declare a class, the class type is considered to be “incomplete” (the compiler knows about the name, but nothing else). You cannot do much with an incomplete type besides use pointers (or references) to that type, but pointers are all that we will need. (More on that in a bit.)

When the compiler is creating your class, it doesn’t actually care about very much. Its goal is ultimately to determine the class’ layout in memory, and to do that, it needs to know the size of your class’ data members. For example:

struct Foo{ int a; int b; };

Our class `Foo`

has two integer members. When the compiler creates a layout for this class, it will approximately allocate `sizeof(int) + sizeof(int)`

contiguous space for it. (Padding and custom-alignment directives notwithstanding).

When Foo has a dependency on `Bar`

, then the compiler needs to know the size of `Bar`

as it compiles `Foo`

:

struct Bar{ int a; }; struct Foo{ int a; int b; Bar c; };

In the code above, when the compiler reaches `Foo`

, it already knows what the size and alignment of `Bar`

is. (“Alignment” is a property of a class that dictates how much space the compiler will allocate for it. A thorough discussion is outside the scope of this article, but The Lost Art of C Structure Packing gives it a good treatment).

If we reversed the order like so:

struct Foo{ int a; int b; Bar c; }; struct Bar{ int a; };

We would likely end up with a compiler error because the compiler cannot possibly determine a layout for `Foo`

without first knowing the layout for `Bar`

. If `Bar`

was in its own header file, we would need to include it in Foo’s header file:

#include "Bar.h" struct Foo{ int a; int b; Bar c; };

So now Foo.h has a dependency on Bar.h.

And what if we complicate `Bar`

to have another member, `Baz`

?

#include "Baz.h" struct Bar{ int a; Baz b; };

Now Bar.h depends on Baz.h. Foo.h directly depends on Bar.h, and indirectly on Baz.h. You can see the beginnings of a “dependency graph” forming here. As your codebase grows, you can imagine how large these dependency graphs might get.

Why is this a bad thing? The C++ compiler takes a simplistic approach to handling these dependency graphs — during the “pre-processing” stage of compilation it just copy-pastes one header into another, collapsing the graph into one gargantuan source file. Just check the documentation for what “#include” actually does!

The `Foo`

class might not actually care at all about the `Baz`

class; the `Bar`

class (which has a `Baz`

member) may only use it internally. So at compilation time,`Foo`

is paying for the compiler to to parse something it doesn’t even care about! This violates one of the core tenets of C++: “Only pay for what you use”.

What’s worse, is if Baz.h changes, then the compiler must recompile `Foo`

! Not only does it take longer to compile `Foo`

, but we must also compile more often. Good grief.

We’ve decided that we don’t like how Foo.h depends on Baz.h through Bar.h, so we decide to solve the problem with a little forward declaration. If `Bar`

forward-declares `Baz`

, and then uses a pointer to `Baz`

, then the compiler no longer needs to know anything about the size and layout of `Baz`

when creating a layout for `Bar`

:

//Bar.h class Baz; struct Bar{ int a; Baz* b; }; // Bar.cpp #include "Bar.h" #include "Baz.h" // ... (use our pointer to Baz)

This works because, from a size perspective, all pointers are exactly the same. That means we don’t need to know the full definition of `Baz`

until we try to access one of its members. `Bar`

still depends on `Baz`

, since the translation unit is per .cpp file, but the dependency is left out of Bar.h.

This causes something interesting to happen to Foo.h:

// Foo.h #include "Bar.h" struct Foo{ int a; int b; Bar c; }; // Foo.cpp #inlude "Foo.h" // ...

Nowhere in the included files for Foo.h will we find Baz.h. This means that:

- if Baz.h changes, only Bar.cpp will recompile
- the preprocessed source file for
`Foo`

will not include the contents of Baz.h

Now that there’s less work for the preprocessor and compiler to do for `Foo`

, it goes faster. It takes up less memory. It needs to be rebuilt *less often*! With forward declarations we’ve improved both full rebuilds and incremental rebuilds.

The Google style guide recommends against using forward declarations, and for good reasons:

- If someone forward declares something from namespace std, then your code exhibits undefined behavior (but will likely work).
- Forward declarations can easily become redundant when an API is changed such that it’s unavoidable to know the full size and alignment of a dependent type. You may end up with both a #include and a forward declaration in your header in this case.
- There are some rare cases where your code may behave differently. You may not be bringing in additional function overloads or template specializations that you previously relied upon, or you lose inheritance information which can cause a different overload or specialization to be called in the first place.

Also, notice that when we transformed a class member to a pointer, we likely had to start dealing with heap-allocated memory for each instance of `Foo`

. If `Foo`

needs to access its `Bar`

pointer very often, the small overhead of a pointer indirection can add up. It’s also not very cache friendly; members of `Foo`

are not together in memory, which could cause a cache-miss when trying to access a member of `Bar`

at runtime (very expensive).

Like any other technique, forward declarations must be used carefully.

The Modules TS may present another safer alternative to improving build times by removing the need for the preprocessor to paste in entire headers over and over again for different translation units.

I was recently tasked with the onerous job of “improve the build” for a mid-sized code base.

How did I decide to start with forward declarations?

For starters, forward declarations are low-hanging fruit as far as improving build time goes. It’s much easier to routinely go through the code adding forward declarations than it is to change an interface, pull files out into new libraries, or build faςades. In a short, they are a lazy programmer’s best friend.

Also, having had some experience with the code base, I knew that the code had:

- many classes that only used pointers to our types
- many unnecessary includes (for historical reasons, laziness, or naiveté…)
- plenty of automated unit tests to ensure I didn’t accidentally break something

I also had a bit of a hint that our header dependencies were a little bloated when I found that Visual Studio’s built-in dependency graph generator consistently crashed when I tried to run it on our code base. Still, I didn’t have any real proof that forward declarations would actually improve anything at all. Just an intuition. So we decided to be Agile about it.

I took the top 5-10 headers that were most often included, and I made it a challenge to replace them with forward declarations wherever I could. If doing this improved things at all, then I could go ahead and take a more comprehensive approach.

What we found was that we could do a full rebuild of our C++ **10% faster!** Along the way I gained even more confidence that a more comprehensive approach would yield additional gains.

Before I continue, here are some of the pain points you’ll discover when you

want to set about replacing your headers with forward declarations:

- other random files will start breaking from missing includes (that they

used to indirectly have).

This is frustrating, but on a positive note, it forces your codebase to follow a best practice — a translation unit should be self-contained. Never should you rely on an indirect include because future refactoring efforts will needlessly break your code and cause headaches for other programmers.

Because you will likely have to fix unrelated code…

- work like this ends up touching way more files than you initially thought.

This is a bit of a nightmare for your code reviewers. Everyone groans when they see hundreds of changed files in a code review. The solution to this is to communicate to your reviewers what’s going on ahead of time; they don’t need to look at every single file in the review. Perhaps a random sampling, or just some of the more important headers.

After I showed the team our 10% speedup on rebuild, they were as hungry as I was to see more. I got the go-ahead to spend a week touching as many headers as I could for forward declaration work. At the end of it all, I had gone though perhaps two-thirds of all header files. The result? An additional 30% faster compile time for a total of **40% faster C++ compile times**! (Plus Visual Studio stopped crashing when generating the dependency graph).

This result was quite surprising, I had thought we would already start to see diminishing returns after the first go. I must admit that I didn’t have the luxury to be purely scientific; I was also removing unnecessary headers along the way, but I will assert that the work was predominately forward declarations.

Best practices have their flip-sides. The Google style guide (and a few of my coworkers) made some good points against the usage of forward declarations, but the real world results of the technique are undeniable. All the developers are happier; the build => test => run cycle is faster for them. The automated builds are faster. The compiler’s memory usage is down.

The point about memory usage becomes more important for large parallel builds. In fact, we were occasionally running out of memory, and this work has abated those issues (for the time being; forward declarations are really just a band-aid on an architectural issue).

Time is money, and in a larger project with many well-paid people, saving even a small amount of time has an economy-of-scale effect; provably thousands or hundreds of thousands of dollars saved in development time.

]]>Oftentimes I see questions StackOverflow asking something to the effect of

The canonical, final, never-going-to-change answer to this question is a thorough

C++ is a statically-typed language. A vector will hold an object of a single type, and only a single type.

Of course there are ways to work around this. You can hide types within types! In this post I will discuss the existing popular workarounds to the problem, as well as describe my own radical new heterogeneous container that has a much simpler interface from a client’s perspective.

boost::variant, for one,(and now std::variant) has allowed us to specify a type-safe union that holds a flag to indicate which type is “active”.

The downside to variant data structures is that you’re forced to specify a list of allowed types ahead of time, e.g.,

std::variant<int, double, MyType> myVariant;

To handle a type that could hold “any” type, Boost then created

boost::any (and with C++17 we’ll get std::any)

This construct allows us to really, actually hold any type. The downside here is that now the client generally has to track which type is held inside. You also pay a bit of overhead in the underlying polymorphism that comes with it, as well as the cost of an “any_cast” to get your typed object back out.

The real “but…” portion of the answer to these StackOverflow questions will state that you *could* use polymorphism to create a common base class for any type that your vector will hold, but that is ridiculous, especially if you want to hold primitive types like int and double. Assuming your design isn’t totally off-base, what you really wanted to do was use a std::variant or std::any and apply the Visitor Pattern to process it.

The visitor pattern implementation for this scenario basically works like this:

- Create a “callable” (a class with an overloaded or templated function call operator, or a polymorphic lambda) that can be called for any of the types.
- “Visit” the collection, invoking your callable for each element within.

I’ll demonstrate using std::variant.

First we start by creating our variant:

std::variant<int, double, std::string> myVariant; myVariant = 1; // initially it's an integer

Then we write our callable (our visitor):

struct MyVisitor { void operator()(int& _in){_in += _in;} void operator()(double& _in){_in += _in;} void operator()(std::string& _in){_in += _in;} };

This visitor will double the element that it visits. That is, “1” shall become “2”, or “MyString” shall become “MyStringMyString”.

Next we invoke our visitor using std::visit:

std::visit(MyVisitor{}, myVariant);

That’s it! You’ll notice that all the code for each type in MyVisitor is identical, so we could replace it all with a template. Let’s go with that template idea and create a visitor that will print the active element:

struct PrintVisitor { template <class T> void operator()(T&& _in){std::cout << _in;} };

Or we could similarly have created an identical polymorphic lambda like so:

auto lambdaPrintVisitor = [](auto&& _in){std::cout << _in;};

Applying our PrintVisitor or lambdaPrintVisitor will have the same effect:

std::visit(PrintVisitor{}, myVariant); // will print "2" std::visit(lambdaPrintVisitor, myVariant); // will also print "2"

And for a string:

myVariant = "foo"; std::visit(MyVisitor{}, myVariant); // doubles to "foofoo" std::visit(lambdaPrintVisitor, myVariant); // prints "foofoo"

Here’s a working demo on Wandbox that demonstrates the above code snippets.

Moving from a single variant to a collection of variants is pretty easy — you shove the variant into a std::vector:

std::vector<std::variant<int, double, std::string>> variantCollection; variantCollection.emplace_back(1); variantCollection.emplace_back(2.2); variantCollection.emplace_back("foo");

And now to visit the collection, we apply our visitors to each element in the collection:

// print them for (const auto& nextVariant : variantCollection) { std::visit(lambdaPrintVisitor, nextVariant); std::cout << " "; } std::cout << std::endl; // double them for(auto& nextVariant : variantCollection) { std::visit(MyVisitor{}, nextVariant); } // print again for (const auto& nextVariant : variantCollection) { std::visit(lambdaPrintVisitor, nextVariant); std::cout << " "; } std::cout << std::endl;

Here’s a live demo of that code.

Now we begin to see how tedious it is to write those loops manually, so we encapsulate the vector into a class with a visit mechanism. We make it a template so that the client can specify the underlying types that go into the variant:

template <class T> struct VariantContainer { template <class V> void visit(V&& visitor) { for (auto& object : objects) { std::visit(visitor, object); } } using value_type = std::variant; std::vector<value_type> objects; };

And then our code from above to visit it is nicely shortened to:

VariantContainer<int, double, std::string> variantCollection; variantCollection.objects.emplace_back(1); variantCollection.objects.emplace_back(2.2); variantCollection.objects.emplace_back("foo"); // print them variantCollection.visit(lambdaPrintVisitor); std::cout << std::endl; // double them variantCollection.visit(MyVisitor{}); // print again variantCollection.visit(lambdaPrintVisitor); std::cout << std::endl;

At this point we’re thinking we are pretty smart. The client can simply direct us on how heterogeneous they want that container to be! And then we start wrapping up the vector a little nicer, hiding the underlying storage, and replicating the rest of std::vector’s interface so someone can call “emplace_back” directly on our container etc, and we’re rolling!

heterogeneous_container c; c.push_back('a'); // char c.push_back(1); // int c.push_back(2.0); // double c.push_back(3); // another int c.push_back(std::string{"foo"}); // a string // print it all c.visit(print_visitor{}); // prints "a 1 2 3 foo"

A C++ programmer would tell you. “Unless you’re doing a bunch of very expensive cast-and checks somewhere” (e.g. with std::any)

But I will now demonstrate how such an interface is possible (without RTTI) using C++14 and C++17 features (the C++17 features are not necessary, just nice to haves).

(*****Authors note:** The following is intended as a toy, not to be used in any real implementation. It has a gaping security hole in it. Think of this more as an exercise in what we can do with C++14 and C++17 ***)

Now, admittedly. We can never be as flexible as a duck-typed language such as Python. We cannot create new types at runtime and add them to our container, and we can’t easily iterate over the container; we must still use a **visitor pattern**.

Let’s begin with a feature that was added in C++14: **Variable templates**

If you’ve done any templating in C++ before, you’re familiar with the typical syntax to template a function so that it can operate on many types:

template <class T> T Add(const T& _left, const T& _right) { return _left + _right; }

But variable templates allow us to interpret a *variable* differently depending on a type. Here’s an example where we can interpret the mathematical constant π (pi) differently:

template<class T> constexpr T pi = T(3.1415926535897932385);

And now we can explicitly refer to pi<double> or pi<float> to more easily express the amount of precision we need.

Recall that when you instantiate a template, you are really telling the compiler to copy-paste the template code, substituting for each type. Variable templates are the same way. That is, “pi<double>” and “pi<float>” are two separate variables.

What happens when we move a variable template into a class?

Well, the rules of C++ dictate that such a variable template becomes static, so any instantiation of the template will create a new member across all class instances. But with our heterogeneous container we want instances to only know or care about the types that have been used *for that specific instance*! So we abuse the language and create a mapping of container pointers to vectors:

namespace andyg{ struct heterogeneous_container{ private: template<class T> static std::unordered_map<const heterogeneous_container*, std::vector<T>> items; public: template <class T> void push_back(const T& _t) { items<T>[this].push_back(_t); } }; // storage for our static members template<class T> std::unordered_map<const heterogeneous_container*, std::vector<T>> heterogeneous_container::items; } // andyg namespace

And now suddenly we have a class which we can add members to *after* creating an instance of! We can even declare a struct later and add that in, too:

andyg::heterogeneous_container c; c.push_back(1); c.push_back(2.f); c.push_back('c'); struct LocalStruct{}; c.push_back(LocalStruct{});

There are quite a few shortcomings we still need to address first before our container is really useful in any way. One of which is the fact that when an instance of andyg::heterogeneous_container goes out of scope, all of its data still remains within the static map.

To address this, we will need to somehow track which types we received, and delete the appropriate vectors. Fortunately we can write a lambda to do this and store it in a std::function. Let’s augment our push_back function:

template<class T> void push_back(const T& _t) { // don't have it yet, so create functions for destroying if (items<T>.find(this) == std::end(items<T>)) { clear_functions.emplace_back( [](heterogeneous_container& _c){items<T>.erase(&_c);}); } items<T>[this].push_back(_t); }

Where “clear_functions” becomes a local member of the class that looks like this:

std::vector<std::function<void(heterogeneous_container&)>> clear_functions;

Whenever we want to destroy all elements of a given andyg::heterogeneous_container, we can call all of its clear_functions. So now our class can look like this:

struct heterogeneous_container { public: heterogeneous_container() = default; template<class T> void push_back(const T& _t) { // don't have it yet, so create functions for printing, copying, moving, and destroying if (items<T>.find(this) == std::end(items<T>)) { clear_functions.emplace_back([](heterogeneous_container& _c){items<T>.erase(&_c);}); } items<T>[this].push_back(_t); } void clear() { for (auto&& clear_func : clear_functions) { clear_func(*this); } } ~heterogeneous_container() { clear(); } private: template<class T> static std::unordered_map<const heterogeneous_container*, std::vector<T>> items; std::vector<std::function<void(heterogeneous_container&)>> clear_functions; }; template<class T> std::unordered_map<const heterogeneous_container*, std::vector<T>> heterogeneous_container::items; } // andyg namespace

Our class is starting to become pretty useful, but we still have issues with copying. We’d have some pretty disastrous results if we tried this:

andyg::heterogeneous_container c; c.push_back(1); // more push_back { andyg::heterogeneous_container c2 = c; }

The solution is fairly straightforward; we follow the pattern as when we implemented “clear”, with some additional work to be done for a copy constructor and copy assignment operator. On push_back, we’ll create another function that can copy a vector<T> from one heterogeneous_container to another, and in copy construction/assignment, we’ll call each of our copy functions:

struct heterogeneous_container { public: heterogeneous_container() = default; heterogeneous_container(const heterogeneous_container& _other) { *this = _other; } heterogeneous_container& operator=(const heterogeneous_container& _other) { clear(); clear_functions = _other.clear_functions; copy_functions = _other.copy_functions; for (auto&& copy_function : copy_functions) { copy_function(_other, *this); } return *this; } template<class T> void push_back(const T& _t) { // don't have it yet, so create functions for copying and destroying if (items<T>.find(this) == std::end(items<T>)) { clear_functions.emplace_back([](heterogeneous_container& _c){items<T>.erase(&_c);}); // if someone copies me, they need to call each copy_function and pass themself copy_functions.emplace_back([](const heterogeneous_container& _from, heterogeneous_container& _to) { items<T>[&_to] = items<T>[&_from]; }); } items<T>[this].push_back(_t); } void clear() { for (auto&& clear_func : clear_functions) { clear_func(*this); } } ~heterogeneous_container() { clear(); } private: template<class T> static std::unordered_map<const heterogeneous_container*, std::vector<T>> items; std::vector<std::function<void(heterogeneous_container&)>> clear_functions; std::vector<std::function<void(const heterogeneous_container&, heterogeneous_container&)>> copy_functions; }; template<class T> std::unordered_map<const heterogeneous_container*, std::vector<T>> heterogeneous_container::items; } // andyg namespace

And now our container is starting to become useful. What other function do you think you could implement to follow the pattern we established with “clear” and “copy”? Perhaps a “size” function? A “number_of<T>” function? A “gather<T>” function to gather all elements of type T for that class?

We can really do anything with our andyg::heterogeneous_container yet, because we cannot iterate over it. We need a way to *visit* the container so that useful computation can be performed.

Unfortunately we cannot do it as easy as calling std::visit on a std::variant. Why? Because a std::variant implicitly advertises the types it holds within. Our andyg::heterogeneous_container does not (and cannot). So we need to push this advertisement onto the visitor. That is, the *visitor* will advertise the types it is capable of visiting. Of course this would be a pain point to using an andyg::heterogeneous_container, but alas, tradeoffs must be made.

So how do we make it easy for the client to write a visitor? We can use some lightweight inheritance to automatically provide client visitors a way to publish the types they can visit.

First we’ll start with a generic “type list” kind of class that takes advantage of a the variadic template feature that arrived with C++11:

template<class...> struct type_list{};

And then we’ll create a templated base class for visitors that defines a type_list:

template<class... TYPES> struct visitor_base { using types = andy::type_list<TYPES...>; };

How can we use this base class? Let’s demonstrate with an example.

Say I have some heterogeneous container:

andyg::heterogeneous_container c; c.push_back(1); // int c.push_back(2.2); // double

and now I want to visit this class so that I can double the members. Similar to how we wrote a visitor class for std::variant, we’ll write a structure that overloads the function call operator for each type. This structure should inherit from our “visitor_base”

struct my_visitor : andyg::visitor_base<int, double> { void operator()(int& _i) { _i+=_i; } void operator()(double& _d) { _d+=2.0; } };

And now our visitor implicitly defines a type called “types” that is templated on “int” and “double”.

We could rewrite our visitor with a template instead, like before:

struct my_visitor : andyg::visitor_base<int, double> { template<class T> void operator()(T& _in) { _in +=_in; } };

Although you wouldn’t be allowed to declare this struct locally within a function on account of the template.

Next up: writing the “visit” method of the andy::heterogeneous_container class

Like we did for our “VariantCollection” above, the visitor pattern basically amounts to calling std::visit for each element in the container. Problem is, std::visit won’t work here so we have to implement it ourself.

Our strategy will be to use the types published by the visitor class and invoke the function call operator for each type. This is easy if we use some helper functions. But first, the main entry point, “visit()”:

template<class T> void visit(T&& visitor) { visit_impl(visitor, typename std::decay_t<T>::types{}); }

Note that I don’t really constraint the template at all. So long as T has a type named “types”, I use it. In this way, one wouldn’t be constrained to use an andyg::visitor_base.

Next you’ll notice that the call to “visit_impl” not just passes on the “visitor” received in “visit”, but it additionally tries to *construct an instance* of “T::types”. Why? Well the reasoning is similar to the reasoning behind tag dispatching, but not quite. We simply want an easy way to pass our typelist (“types”) as a template parameter to “visit_impl”.

I think it’s better explained if we look at the declaration for “visit_impl”:

template<class T, template<class...> class TLIST, class... TYPES> void visit_impl(T&& visitor, TLIST<TYPES...>)

Like before, we receive our visitor as “T”, but then we use a template template argument to indicate that the incoming “types” object itself is templated, and that those types it is templated on can be referred to as “TYPES”. When we call “visit” with the “my_visitor” defined above, the resulting type substitution will appear as:

void visit_impl(my_visitor& visitor, andyg::type_list<int, double>)

The second argument is unnamed, indicating that it is unused and really only there to help us out in our metaprogramming. An optimizing compiler should be able to optimize out any real construction of the type, but even if it doesn’t we only lose 1 byte because the class itself is empty.

Essentially we want to say “for each type in TYPES, iterate over the associated vector<type> and visit each element”. However C++ doesn’t allow “iteration” over types. In the past we’ve been forced to use tail recursion — receive a parameter pack like “<class HEAD, class… TAIL>” process “HEAD”, then recurse for “TAIL”, eventually reaching a base case of a single type. Such expansion creates a lot of work for the compiler, and the separation of functions makes the code that much harder to read.

An alternative that was done in C++11 and C++14 was called “simple expansion” and involved abusing the comma operator, and placing a function call into an array initializer list, for which the array was cast to void so that the compiler would not actually allocate. It sounds complicated because it is. It would have looked like this:

```
using swallow = int[];
(void)swallow{0, (void(visit_impl_help<T>(visitor), 0)...};
```

Fortunately for us, C++17 made such hackery redundant with the introduction of fold expressions, which, for simplicity’s sake you can imagine as calling a single function for each type in a parameter pack. The syntax does take a little getting used to, but here’s what the implementation of “visit_impl” looks like:

template<class T, template<class...> class TLIST, class... TYPES> void visit_impl(T&& visitor, TLIST<TYPES...>) { (..., visit_impl_help<std::decay_t<T>, TYPES>(visitor)); }

This is formally called a unary left fold and what happens is that the resulting expression for our “my_visitor” will cause it to be expanded as such:

visit_impl_help<T, int>(visitor), visit_impl_help<T, double>;

This approach also abuses the comma operator. You can see that the introduction of more types in “TYPES” would result in more commas and expressions between those. The result is a function call for each type.

You’ll again notice that I’m calling *yet another helper* and I promise it’s the last one! I wanted a clean looking function to iterate over a vector. Here’s its definition:

template<class T, class U> void visit_impl_help(T& visitor) { for (auto&& element : items<U>[this]) { visitor(element); } }

By overloading the function call operator in our visitor class, we can treat “visitor” itself as a function here and simply call it for each element. That’s pretty neat! In the final demo code, I additionally added a static_assert (which itself uses some complicated metaprogramming to detect a proper overloaded function call operator) to visit_impl_help so that clients don’t get stuck in template error hell if they miswrote their visitor class.

At this point we can basically do anything we want with the class, creation, copying, destroying, assignment, and visiting (everything else is gold plating). We can even declare a new class after creating our container instance, and then add an instance of that new class into the container. Whoa.

Here’s a sample code run:

// my_visitor defined as above, and print_visitor pretty obvious auto print_container = [](andyg::heterogeneous_container& _in){_in.visit(print_visitor{}); std::cout << std::endl;}; andyg::heterogeneous_container c; c.push_back('a'); c.push_back(1); c.push_back(2.0); c.push_back(3); c.push_back(std::string{"foo"}); std::cout << "c: "; print_container(c); andyg::heterogeneous_container c2 = c; std::cout << "c2: "; print_container(c2); c.clear(); std::cout << "c after clearing c: "; c.visit(print_visitor{}); std::cout << std::endl; std::cout << "c2 after clearing c: "; print_container(c2); c = c2; std::cout << "c after assignment to c2: "; print_container(c); my_visitor v; std::cout << "Visiting c (should double ints and doubles)\n"; c.visit(v); std::cout << "c: "; print_container(c); struct SomeRandomNewStruct{}; c.push_back(SomeRandomNewStruct{});

And its output:

c: 1 3 2 a foo c2: 1 3 2 a foo c after clearing c: c2 after clearing c: 1 3 2 a foo c after assignment to c2: 1 3 2 a foo Visiting c (should double ints and doubles) c: 2 6 4 a foo

And there you have it. Here’s a live running demo on WandBox. The live demo includes some additional tests and exercise of some “nice-to-have” features of a heterogeneous container. (For you experts, of course the templating is simplified and in a production environment there should be better forwarding semantics, static_asserts, and flexibility in the templating).

The primary difference in storage between an andyg::heterogeneous_container and a std::vector<std::any> is that, in an andyg::heterogneous_container, all elements of the same type are stored contiguously. This allows extremely fast iteration as compared to a std::vector<std::any>, which gets bogged down by a number of try-catches during visitation. By “extremely fast”, I mean that it’s actually an order of magnitude faster. Run the timings yourself here.

C++14 and C++17 offer us some pretty powerful new metaprogramming tools in the form of template variables, variadic lambdas, standardized variants, and fold expressions, among others. By experimenting with these we can begin pushing the boundaries of what we thought was possible in C++. In this post we created a new kind of visitor pattern that has very nice syntax and won’t be slowed down by any run-time type inference (all the visiting knows exactly which types it’s iterating over already), but also has some drawbacks (the least of which is the gaping security hole where one andyg::heterogeneous_container can see the contents of *any other* andyg::heterogeneous_container).

Even though I consider the container I just wrote to be an incomplete plaything that is only interesting only for the concepts it demonstrates, depending on your use case you might actually want to “borrow” it to satisfy yet another impossible requirement your customer gives you.

]]>When you are watching a digitally-rendered battle onscreen in the latest blockbuster movie, you don’t always think about the “camera” moving about that scene. In the real-world, cameras have a *field of view *that dictates how much of the world about them they can see. Virtual cameras have a similar concept (called the *viewing frustum*) whereby they can only show so much of the digital scene. Everything else gets chopped off, so to speak. Because rendering a digital scene is a laborious task, computer scientists are very interested in making it go faster. Understandably, they only want to spend time drawing pixels that you’ll see in the final picture and forget about everything else (outside the field of view, or occluded (hidden) behind something else).

Our friends in the digital graphics world make heavy use of *planes* everyday, and being able to test for plane-plane intersection is extremely important.

In this post, I’ll try to break down what a plane is in understandable terms, how we can create one given a triangle, and how we would go about testing for the intersection between two of them.

(This article was originally written in and transcribed here. To get a hard copy that you can also cite, grab the pdf here)

Say you’re given a triangle in 3D space. It consists of three points, , , and that each have , , and components:

For simplicity’s sake, we’ll assume that, moving “counter-clockwise”, the triangle’s points go in the order , , and then (and then back to ).

A triangle is distinctly “flat”. It’s like a perfectly constructed Dorito. Also, no matter which way you turn it and rotate it, there’s always a definite front and back. That is, if you extended the corners of your Dorito off towards infinity, then you could completely bisect the room, your state, the planet, the universe!

This ability to split a space in two is a salient feature of a plane, and we can construct one with the triangle we’ve just defined. In the next few sections we’ll go more into depth in exactly how to do that.

In the previous section I said that our gargantuan flat Dorito had two distinct sides, front and back. That means you could draw a straight line from the “back” that eventually comes out the “front”. Imagine that our Dorito is so large that it is its own planet orbiting the Sun. You could hop in a space ship, fly up to that Dorito, take one small step for man, and then plant your flag on top. In geometric terms, your flag pole would be the **normal vector** to the plane’s (Dorito’s) surface.

A **normal vector** to a plane is a line that is perfectly perpendicular to any other line drawn entirely within the plane. We call it a vector because it has a direction associated with it. That is, the flag you planted on the Dorito planet points “up”, but you could just have easily landed on the other side and planted the flag pointing “down”, and it would still be perfectly valid; all we need is just to choose one.

Our normal vector will also have , , and components, but we’ll label them , , and to distinguish them from the points in the triangle:

At this point we *have* a triangle, but we only *know about* what a normal vector is. How do we compute the normal vector given a triangle’s points?

Our normal vector is perpendicular to any line we draw in the plane. In computer graphics terms, we say the normal vector is “orthogonal” to our plane. An nice example of orthogonality is that the Y-axis is perpendicular to the X-axis (and the Z-axis is perpendicular to both!).

In geometry, we know how to obtain a perpendicular vector (a normal vector) so long as we have 2 non-parallel vectors in the plane. But first we need to get those two vectors in the plane. So how do we do that?

Remember from before that we defined our plane originally using three points, , , and . We can use these points to compute our vectors.

Think about it this way. The center of the earth is position , and you’re standing at some random point . Your friend Andy is standing at position . Vectors imply a *direction*, so how do you get a direction from you to your friend? Subtraction! You can compute your friend’s offset from yourself by subtracting your own position from Andy’s: . The resulting offset is a direction out from the center of the earth in the same direction as the direction from you to your friend.

If your other friend Bernhard is standing at position , you’d do the exact same thing to get a vector representing the direction from you to him. So long as all three of you are not standing in a line, you now have two non-parallel vectors. Fortunately, our original points , , and define a triangle, and none of the sides of that triangle are parallel, so we’re good.

In our example, you are standing at , Andy is at , and Bernhard is at . We can compute two non-parallel vectors, as

In geometry there is this fancy thing called a **cross product** that enables us to compute an orthogonal (perpendicular) vector so long as we are given two input vectors. What does this mean? Well, there are a couple different interpretations, and I’ll work up to them after first discussing something called a *dot product*.

The cross product is a special kind of multiplication. The multiplication you were taught in grade school resulted in numbers many times larger than they were before. For example, , which is quite bit larger than either or . The “problem” here is that doesn’t have any sense of **direction** to it!

Our grade school multiplication has a ready analog in the world of vectors (things that have direction) called the **dot product**: we start by multiplying the components! Afterwards we add them all together to get a single value. The dot product can be thought of as “multiplication in the same direction” because we multiply the components together, the components together, as well as the components.

An alternate interpretation of the dot product is that it’s a measure of just “how much” in the same direction your two vectors are:

Where and are the **magnitudes** of the vectors:

and is the angle between them.

When is 0, the vectors are in the same direction (parallel), meaning the cosine is (it’s largest value) and therefore the result is the largest possible dot product between the two vectors. (Data miners love to use this and call it “cosine similarity”). But if the vectors were orthogonal (perpendicular), the dot product would be zero.

The **cross product** is similar to the dot product except it’s more of a measure of how *different* two vectors are. The data miners might choose the following representation of a cross product:

When the term is 0, the vectors are in the same direction (parallel), which maximizes the dot product, but the cross product is zero! And when is 90 degrees ( radians), the vectors are perpendicular and the cross product is maximized! (The dot product, sadly, becomes 0).

There exists another way to compute a cross product where the result is not a single number, but rather a vector that is orthogonal to both the input vectors:

**(1)**

It’s not the most intuitive computation. Personally, I enjoyed the “xyzxyz” explanation at

Better Explained.

All we need to know is that the result of this equation is a shiny new orthogonal vector.

We now have two vectors in our plane, and , which we computed using our triangle’s points , , and . We also know how to take two vectors and compute an orthogonal vector. Our *normal vector* is exactly that; an orthogonal vector to our plane, so when we apply the cross product to and , we obtain our normal vector :

Which gives us

Now that we have a normal vector, we can define our plane intuitively. We know that our normal vector is perpendicular to all vectors in the plane, and as we saw before, the **dot product** of any two vectors is zero if the vectors are orthogonal. We can therefore say that, given our point on the plane, and any other point on the plane, the dot product between and (subtracting two points results in a vector) is :

**(2)**

Which is our generic representation of a plane. This is a nice way to think about a plane, but alone it doesn’t help us find the intersection between two planes. Ultimately what we want is a system of linear equations like the title talked about.

Given our plane equation

Let’s expand the terms.

After performing the dot product:

Distributing , , and :

We know what the values of and are (they’re constants) because they were given to us in the plane’s definition, so let’s move them to the other side of the sign in order to have only variables on one side and only constants on the other.

and since is constant, it will be easier to relabel it as instead of writing it out every time. That is,

Which results in a linear equation that looks like this:

If we have two planes, then we’ll distinguish between their definitions via subscripts on our constant:

**(3)**

Finally we have a system of linear equations. Great! We have ways to solve those! However, this particular one presents a little problem. (A tiny one that we’ll make go away)

In linear algebra, we are often provided with a number of equations and an equal number of unknowns (variables) we must solve for. However, in *our* system, we appear to have more variables than equations!

Let’s take another look.

We have three variables to solve for — , , , and yet only two equations. This is called an **Under-determined system** meaning there are *infinite* solutions (assuming the planes intersect at all). Infinite? You see, when a line intersects a line, they intersect at a single point, but when a plane intersects another plane, that intersection is a *line*, which has infinitely many points. We would need three planes to intersect before we could find a single point of intersection.

There are more math-intensive approaches to solving under-determined systems that involve computing something called a **pseudo-inverse**, but for our purposes here, we can take advantage of our knowledge of the domain! (Also, a Moore-Penrose pseudo-inverse would only provide an approximate solution whereas we can compute an exact one here).

Let’s assume that the planes we’re intersecting are not parallel; they have a “nice” intersection. If this is true, then we’ll end up with a *line*. Because planes are infinitely large, the resulting line will be infinitely long. At some point, this line will cross at least one of the x, y or z axes.

How can I be so sure of that? Start by thinking of it this way: we can represent a line as a point plus a direction of travel. We can move forwards and backwards in that direction (this is called the **vector equation of a line** as shown below:

If we don’t want our line to pass through , then we set to some arbitrary constant like . In two dimensions, this would give us a vertical line. In three, it gives us a line with direction as a function of and . If we don’t want the line to cross the axis, then we fix to some arbitrary constant like . Now our line varies only in the direction. However, if we didn’t want the line to cross the axis, and tried to assign it to some arbitrary constant as well, say , then we’d end up with a single point , not a line. If we allow the line’s direction to be a function of , then is a feasible value, and if we allow it to be a function of and , then both and are feasible values. Thus we can always guarantee that our line passes through for at least one of our axes. In summary, in a dimensional system, if we fix dimensions to non-zero values and allow the last dimension to vary, is included in the last dimensions’ possible values, otherwise you would not have a line. Graphically, we can show this like so:

When a line cross through for one of our axes, this is great! It’s great because if a component at that point has a value of zero, then its associated term in equation 3 will drop out. This leaves us with a system that is *not* under-determined; we will have two variables and two linear equations. And as I discussed in my previous post, we know how to solve for that. At this point we’ve obtained a point on the line of intersection!

Okay, we know how to solve a 2D system of linear equations to get a single point, but what use is that? Indeed a point by itself isn’t a solution, we need a line. The point we obtained, however, is in fact the first part in determining the line of intersection between two planes.

We still have two problems to solve:

- How do we discover which component will become zero?
- Even if we obtained a single point on the line, how do we figure out the rest of the line?

Let’s solve each of these in turn.

I said before that we could use our knowledge of the domain to solve this plane-plane intersection problem. That knowledge is that, if the planes intersect “nicely”, then eventually that line will pass through zero for at least one of . So how do we discover if it’s , , or ? The short answer is: why not try them all?

First we set to (causing its term to drop out) in equation 3, leaving us with two lines. We can test to see if the resulting two lines intersect at all by testing to see if they’re parallel via the cross-product. If that doesn’t work, we move onto , and then .

The result is that we’ll have a point on the line of intersection where one of the components is zero, like . All that remains is to discover the direction of the line.

I actually hinted at this earlier, and you may have caught it if you were reading closely.

Up until now, we’ve assumed that the two planes intersect and are not co-planar. In other words, the planes are not parallel.

How can we make sure of that? Well, we already know our two planes have normal vectors and , respectively. Also, we know that the cross product (equation 1) between these normal vectors gives us a new vector that is orthogonal to both. What happens when you compute the cross product of a vector with another in the same direction? Well, since the angle between them is (or ) degrees, the resulting vector is the vector, with magnitude .

For our purposes, when we encounter co-planar planes, we will disregard this case (the intersection is a plane equal to one of the input planes).

After we’ve ensured that the cross product between the two planes’ normal vectors is non-zero, we are left with a vector that’s orthogonal to both. How is this useful?

Well, we know from our earlier discussion that a normal vector is orthogonal to any vector we could draw in a plane. So it follows that a vector that is orthogonal to *two* normal vectors must lie in both those planes! And since our planes intersect at exactly *one* line, our fancy new vector we got from the cross product of the normals **must** be in the same direction as the line of intersection between the planes!

Armed with this information, we now have a point *and* a direction to represent our line of intersection.

If you’ll allow me, let the point on the line be , and the direction it travels in be . Now we can represent any point on our line as:

**(4)**

Where is a resulting point, and is a value from to . Let’s go one step further, set to so that we can obtain a second point on the line, which we’ll call . And now we have two points on the line. We can use those two points to transform our line representation into Two-Point Form (like I use in my previous post where subscript of indicates a component of (, , and ), and a subscript of is a component of :

**(5)**

And thus we’ve computed the line of intersection between two planes.

Finding the line of intersection between two planes is generally done the same way as you’d intersect any two geometric objects — set their equations equal to each other and solve. We discovered here that the result was an under-determined system and we overcame that. This is exciting! From this understanding, we are ready to take the next steps and handle “real world” situations that involve line *segments* and triangles instead of mathematical lines and planes.

(Note: this article was originally written in and transcribed to WordPress, so forgive the equation alignment. Get the original).

In a previous post, I outlined an analytical solution for intersecting lines and

ellipses. In this post I’m doing much the same thing but rather with lines on lines. I’ll point out why the normal slope-intercept form for a line is a poor representation, and what we can do about that.

In computer graphics, or just geometry in general, you often find yourself in a scenario where you want to know if and how two or more objects intersect. For example, in the latest shooting game you’re playing perhaps a bullet is represented as a sphere and a target is represented as a disc. You want to know if the bullet you’ve fired has struck the target, and if so, was it a bulls-eye?

In this post, I’ll step you through one way we can accomplish discovering intersection points between two lines, being sure to carefully walk through each step of the calculation so that you don’t get lost.

We’re going to fly in the face of many approaches to line-line intersection here and try to ultimately wind up with a system of linear equations to solve. This is different from the “usual” way of finding intersections by crafting an equation to represent each geometric body, and then somehow setting those equations equal to each other. For example, if we used the familiar slope-intercept form to represent a line, we’d end up representing the lines as

**(1)**

**(2)**

Then afterwards we could solve for by assigning the equations to each other:

And with some algebraic fiddling, we could get , and the take that and insert it into either line equation in above to get .

But here are some questions to think about.

Slope-intercept form assumes two things:

- Every line has a slope
- Every line has a y-intercept

This is all well and good for most lines. Even horizontal lines have a slope of 0, and a y-intercept somewhere. Our problem is with perfectly vertical lines:

What is the slope of a vertical line, since the “run” part of rise-over-run, is zero? Vertical lines don’t touch the y-axis either unless they’re collinear with it, and even then there wouldn’t be just a single y-intercept.

Vertical lines are better represented as a function of . Like the part isn’t even here, it just doesn’t exist! That’s because there are infinite values of for our given .

Vertical lines are the bane of slope-intercept form’s existence. If both lines were vertical, we could test for that and then not bother with testing for intersection, but what if just one of them were vertical? How would we check for the intersection point then?

Well, let’s examine an alternative representation of a line. Those of you who have ever taken a linear algebra course should be familiar with it:

**(3)**

Where , , and are known constant values, and we’re solving for and .

Okay, what the heck is that? In the next section I’ll explain exactly how this solves our vertical line problem, but first I need to demonstrate to you how we can even *get* a line into that format.

“What? We don’t always use slope-intercept form??? All my teachers have lied to me!” In fact, there are many ways we can represent a line, and like any tool there’s a time and a place for each of them.

The representation we’re interested in here is called **Two-point form**. And we can derive it if we already have two points on the line (which is common if you have a bunch of line segments in the plane, like sketch strokes).

Given:

We have two points on the line,

we can represent a line in the following form, parameterized on and :

**(4)**

Still doesn’t seem to help us, does it? We’ve gotten rid of the intercept, but we still have a slope, which becomes a big problem when . So let’s fix that but multiplying both sides of the equation by :

**(5)**

(This is called **Symmetric form**) Let’s manipulate Equation 5 so that and (no subscripts) appear only once. Start by multiplying everything through:

We notice that the terms can cancel out:

Now we rearrange the left side in terms of :

And the right side in terms of :

Subtract the term from both sides:

and add the term:

Let’s move the equals sign to the other side:

And move the and to the other side of the parentheses:

Notice now how similar this equation is to Equation 3.

We can define:

Distributing the negative through for :

**(6)**

To be left with the same equation (restated here):

Now that we can represent a single line as a linear equation in two variables, and , we can represent the intersection of two lines as a system of linear equations in two variables:

Where we compute and by using the two-point form mathematics from the previous section. With two equations and two unknowns, we can compute and .

In the next part I will slowly walk through how we will solve this system using basic techniques from linear algebra

(Warning! Lots of math incoming!).

The idea behind solving a system of equations like this is to get it into something called *row-echelon form*, which is a fancy way of saying “I want the coefficient on in the top equation to be , and the coefficient of in the bottom equation to be , with other coefficients of and to be zero”:

Let’s begin with getting the coefficient on in the top row to be . First divide through the top equation by :

Simplifying:

Notice that at this point, we’ve succeeded in getting a coefficient of on in the top equation.

The next thing we do is notice that, since we have a bare in the top equation, we could multiply it by and then subtract it from the in the bottom equation to get the bottom one’s coefficient on to be zero. However, we cannot do this solely on . Instead, we’re restricted to what are called *Elementary Row Operations* that limit what we can do while preserving the correctness of the equations. So what we have to do is multiply the *entire* top row by , and then subtract the top row from the bottom row, which looks like this:

Which gives us a coefficient on in the second row!

Let’s simplify the and terms in the second row to have a common denominator:

The next step is to get a coefficient of on the term in the second row by dividing through the row by , which results in us replacing the coefficient on with a , and then multiplying the third term by the flipped fraction (remember that when we divide two fractions, we flip the second one and then multiply):

We can cancel the terms in the multiplication, and the resulting equation becomes:

At this point, we’ve solved for . That is,

We *could* substitute for in the top row to solve for , but a more linear-algebra-ish way would be to perform another elementary row operation — multiply the bottom row by , and then subtract it from the top row. Here’s what that looks like:

Now we’ve achieved a coefficient of on the term in the top row!

We’ve essentially solved for at this point, but the term on the other side of the is a little ugly, and we can simplify it. Let’s start by making the two parts of the term have the same denominator, which means we need to multiply by (which is really just multiplying by 1!):

Distributing the term in the numerator, and combining the terms because they have the same denominator:

When we distribute the in the numerator, the term becomes positive:

Which leaves us with both and $+a_2b_1c_1$ in the numerator, which cancel:

We can factor out in the numerator:

Finally, the terms in the numerator and denominator cancel, leaving us with:

Now we’re ready to say we’ve solved the linear system for and , leaving us with

**(7)**

Substituting the values from Equation 6 into Equation 7 yields:

**(8)**

This formulation is identical to the one you’ll find on Wikipedia (although they’ve arranged the denominators slightly differently, but still mathematically equivalent).

Ironically, despite all that extra math, we’ll still have problems with vertical lines, but only when those lines are *parallel*, which means there are either infinitely many solutions (lines are collinear) or zero solutions (lines are parallel but not collinear, like in the figure below).

Checking for parallel lines is fortunately pretty simple.

Let’s say we have two line segments floating around the Cartesian plane:

To check for whether these are parallel, first imagine translating the lines so that they both have an endpoint at :

There will be some angle between the two lines:

If this angle is , or radians (°) then will be .

Given that we are representing our lines using two points, we can use those points to create *vectors*, which are like our lines from the origin that have a length and a magnitude.

To create a vector given points and , we just subtract one from the other (for our purposes here it doesn’t matter the order!):

**(9)**

We draw them with little arrows to indicate a direction:

After we’ve created vectors for both our lines, we can take advantage of a mathematical relationship between them called the cross product to find whether they’re parallel or not. For two vectors and , their cross product is defined as

**(10)**

Where and are the **magnitudes** of the vectors, which can be thought of as a vector’s “length”. For some vector , it’s defined as

**(11)**

However, for the sake of testing for parallel lines, what we’re really interested in is the term in the cross product, where is the smallest angle between the vectors (the vectors are simultaneously separated by and radians).

Assuming our vectors don’t have magnitudes of zero, when the cross product between them is zero (or really really close to zero) we consider the lines to be parallel because the angle between them must be zero. This tells us that the lines either never collide, or they’re infinitely colliding because they’re the same line (collinear). I leave checking for collinearity as an exercise to you.

One might say we solved our system of linear equations the long way. In fact, since we had exactly two equations and exactly two unknowns, we could have leveraged a mathematical technique known as Cramer’s rule. The method is actually not so hard to apply, but perhaps its correctness is a little more difficult to understand.

Checking for line-line intersections is a harder problem than it appears to be on the surface, but once we give it a little thought it all boils down to algebra.

One application for line-line intersection testing is in computer graphics. Testing for intersections is one of the foundational subroutines for computing Binary Space Partitioning (BSP) Trees, which are a way to efficiently represent a graphical scene, or even an individual object within a scene. BSP Trees are perhaps most famously known for their use by John Carmack in the Doom games.

In closing, we can consider a line in 2D to actually be a hyperplane that partitions the plane into areas “on one side or the other” the line. Therefore, there are analogs in 3D space where we use a 3D hyperplane (what we typically think of as a plane) to partition the space again into areas “on one side or the other” of the plane.

]]>The triangular numbers are an interesting mathematical phenomenon that appears constantly in computer science. When you, the programmer, talk about the Big-Oh complexity of a nested for loop that gets executed times, you might just slap on it and call it a day.

But do you ever think about what that summation *actually* is? In this article I’ll present an alternative formulation of the series that I think is satisfying from a programmer’s point of view, and also present some interesting results from looking at the series in different ways.

(Note: this article was originally written in and transcribed to WordPress. Get it as a .pdf here)

Consider the following algorithm:

How many times does Foo() execute? Let’s just walk through the case where .

- , Foo();
- , Foo();
- , Foo();
- , stop

So, when , Foo() will be executed 3 times. What sort of pattern did it follow, though? Let’s break it down:

- first iteration: 1 time ( case)
- second iteration: 2 times ( case)

It would be reasonable to assume that this pattern continues. That is, for iterations we will see the following pattern:

- iteration 1: 1 time
- iteration 2: 2 times
- iteration 3: 3 times
- iteration : times
- iteration : times

So the total number of times Foo() gets called is

Which we’ll wrap up nicely as the following summation:

At this point you may be saying “Yes, I understand what the summation *looks like*, but what does it sum up to!?”, and I’m getting to that. If you will, though, please allow me one more aside. Let’s look at a visualization of this series.

Now you see why they’re called the triangular numbers. Most visualizations will show you them in this format.

Perhaps the most common visual approach to computing triangular numbers is with some geometric help. You see, the triangular numbers, being a Figurate Number (sometimes called a Polygonal Number), are easy to visualize. To compute the triangular numbers using some geometric help, let us first recognize that a (right) triangle is half of a square.

Next, we recognize that the number of dots in the square is equal to its area, or simply the number of dots along one side multiplied by itself. More formally, the area of an by square is . For our 5×5 square, this gives 25 dots.

gives us the area of a square, and gives us half of it. But check out the figure below! Taking just half of the area only gives us half of the dots along the diagonal of the square! We want all of the dots along the diagonal of the square.

How many dots are in the diagonal of a square? Well, there’s one dot for each row in the triangle, therefore there are dots, and we only captured half of their areas when we got so we need to add in that other giving us a final computed number of dots as , or simplified as , which is the commonly used formula for computing triangular numbers, and is the exact summation we will get in the next section when approaching the summation from another angle.

Pretty awesome, right?

In this section we’ll actually derive the result of the summation in a novel way. A way that might appeal to a programmer who has to constantly look at loops all day.

I’ll pose a question to you:

“How many times can you make from the series?”

For the purpose of our “programmer’s proof”, let’s consider a slightly different visualization of the series than we saw in the previous section First, we’ll make our equilateral triangle a right triangle

Next, we’ll take the top of the triangle and align it to the gaps below

A little weird, huh? However, I think it helps us to answer our question. (We’ll play Tetris with it in the next subsection!) Let’s answer the question of “How many times can we get dots from our triangle?” Trivially, we can do it at least once, because the very last row has dots in it. The second last row has , and conveniently we can see that the first row only has , so we can combine them to obtain for a second time.

Finally, we notice that the second and third rows added together also equal . So we’ve made three times for . The implication is that the triangular number summation for is . Nice. Let’s try to derive a formula.

For elements, how many times can we make ? Start pulling terms simultaneously off the left and right ends of

Notice I made a bit of a modification to the series by adding to it. Now we have terms in our series instead of . Adding the term does not change the value of the summation; all we’re saying is that the triangular numbers for will result in a series containing terms.

Pulling from both sides at once, we get the following summations:

So, for a sequence of n items, we can make a sum of exactly times, leading to the solution of

As a programmer, I find this a very satisfying approach. It feels like we’re writing a method to perform the computation in time instead of time.

Let’s visualize what we just did by playing Tetris.

- Take our alternative figure of a triangle from above and horizontally flip the portion we set aside
- Then we’ll flip it vertically. Now we can see that it will fit very nicely in the gaps with the rest of the dots.
- Translate the portion down, giving us a nice rectangle that is long by tall, whose area is (look familiar!?!?).

*(Author’s note: Feel free to skip this section, it gets a little mathy)*

Approaching the summation from the other side:

is a little more challenging, but it yields an interesting series that is equivalent to our derived from previous sections.

Let’s first try to factor out :

And the last term evaluates to , so we have:

Which may not appear very interesting at first. Let’s begin combining terms in the interior and see what we can come up

with:

Aha! Now we’re getting somewhere. Let’s simplify the first term in the parenthesis:

And add the next term into the first term (we’re starting to see a pattern emerging)

As we combine our terms, the coefficient on within the parentheses grows by for each new term, and given that there are terms in parentheses (we factored out , we didn’t remove it altogether), we can deduce that the final coefficient will be .

As for the right hand side of the difference, it appears to be growing as , which is again our triangular numbers, just up to ! The formula begins to look like this:

Which we can simplify by distributing the term through the parenthesis, giving a new solution of

Which ostensibly will cause a repetition of what we just did, giving:

And so on. Eventually we stop at the last term, which is , so, for our series looks like this:

And when we distribute the subtraction, we get an interesting alternating series that looks like this:

Just as a sanity check, we can fill in the actual values:

Which clearly sums to , the same as if we applied our magic formula.

Let’s formalize our alternating series as

And now formally state the equivalence between our two series:

*(Proof by the commutative property of addition)*

Which is really a fancy way of saying

I think this revelation is the most interesting part of the entire article! (Okay, playing Tetris to prove the summation of the triangular numbers was also pretty awesome.)

The triangular numbers are a series that appears all the time in your average programmer’s life, without them really realizing it. It’s easy to write off the summation’s value as “some value less-than-or-equal-to “, but understanding the summation a little more deeply can also bear fruit, like the solution to a common interviewing problem.

Thank you for reading and sharing a love of all things technical.

]]>An ellipse is defined by a long axis and a short axis, called the semi-major and semi-minor axes, respectively. Usually people use the variable to represent the length of the semi-major axis, and to represent the length of the semi-minor axis. In this article I’ll use to represent only the **horizontal** axis and to represent only the **vertical** axis. That said, the formal equation for an ellipse is this: 1 And the equation for a line is this: To avoid confusion about what means, I’ll use the term to represent the y-intercept instead. 2 To find your potentially two intersecting points, you need to solve for and then use the values you found for (there will be two) to find corresponding values for . That is, you need to simultaneously solve equations 1 and 2. But first, let’s discuss our line.

You’re given two points , and you need to find values for slope and y-intercept like in Eqn. 2. Well slope, , is simply the change in over the change in . 3 The actual order of and in Eqn. 2 doesn’t matter — you can have or vice versa and you’ll get the same slope. Now to find the y-intercept, which we’re referring to as , we just take one of our points (arbitrarily choose ) and plug it into our equation to solve for : subtracting from both sides leaves us with our y-intercept : 4 Now that we know our values for , , , and , we are ready to solve for the intersection points between the line and the ellipse. First, substitute the line equation (Eqn. 2) into the ellipse equation (Eqn. 1) so that we can solve for : Expanding the square: We want to have a common denominator for both fractions on the left-hand side, so we’ll multiply the first term by and the second term by : Now we can multiply both sides of the equation by so we don’t have a fraction on the left-hand side: Notice how the first two terms on the left-hand side have a common term: , let’s factor that out: Now let’s notice that the terms and both consist of only our known constants. To make the rest of our solution simpler, let’s label these constants and . That is: 5 6 With our constant-naming out of the way, let’s re-examine our equation: That’s much cleaner isn’t it? Okay, next let’s move the term to the other side: The left-hand side is very clean now: just a quadratic equation. I’m going to use a trick called completing the square to help us solve for . If we first divide everything by we get: Which is of the form . and in this case refer to constants of a quadratic equation, not the same variables we’re using. Because we have it of this form, we know that if we add to both sides, then we can easily factor the left side: Becomes: Now, since we’re interested in finding the value of we need to take a square root of both sides: Evaluating the left hand side: Now we want to find the value for , not so let’s only keep the positive root: Let’s get by itself on the left-hand side by subtracting from both sides: Because things are getting kind of messy with that big square root, I’m going to notice that it’s simply a square root of constants that we already know, and label the whole thing . That is, 7 This makes our equation much cleaner: Later I’ll resubstitute for those constants, but bear with me as I use them to solve for and . We now know that has two solutions: If we take these values for along with our equation for a line (Eqn. 2), then we can solve for : which yields solutions for : This gives us our final intersection points of and If we resubstitute back in for we can simplify it ever so slightly. From Eqn. 7: Let’s substitute back in for and on the first term (Refer to Eqns. 5 and 6): Notice how we have a common term in the numerator, let’s factor it out: Now let’s substitute on the second term: Notice again how there is a common term in the numerator. Now we can factor an out of both, and get an term outside the radical: Finally, to resubstitute everything back into our point equations, our two potential intersection points are: , 8 And , 9

The final equation for the points isn’t really the cleanest is it? I myself prefer to keep the constants and that I defined. Note that the points you’ve discovered won’t necessarily lie on the ellipse if the line doesn’t intersect the ellipse at all; you should be able to substitute your discovered and values into equation1 and see if it still equals 1. So there you have it, an analytic solution for the intersection points of a line with an ellipse in a convenient equation for you to translate into code. Thanks for reading!

]]>

Our game features Noah, a biologist with an animal philanthropist streak (so a bio-philo-zoo-thropist?). Noah is travelling to the world Xion because the planet’s integrity has been compromised (by human mining operations) and the whole thing is about to explode! Perhaps a bit overly dramatic, but you, Noah, have come to Xion to save its creatures. Being a biologist, you can’t stand to see this kind of fauna go extinct. So you chartered a huge ship to bring all the creatures aboard for preservation and future study, and conveniently named it the Ark.

The only thing is, these creatures have no idea that you’re trying to *save *them. In fact, they’ve come to dislike humans in general, so they won’t be too receptive of your heroism. In fact, they’ll try to kill you. Armed with your Bio-Rifle, you must stun the creatures, and then get close enough to them so you can beam them aboard the Ark.

While I may not be a very good at art, animation, or sound, I am at least a decent programmer! (Or clever enough to fool the other team members that I am). They let me take on the role of AI Programmer, which was very exciting.

If you take a peek at the in-game screenshot from above, you’ll notice that there’s at least 9 enemies on-screen. In fact, our game needed to handle about 30 enemies on-screen simultaneously, and on iPad hardware. On top of rendering the models, doing shading and lighting calculations, sound, physics, and special effects, AI doesn’t usually get a lot of room to work. I was fortunate enough to have a bit of leeway on this game because AI is the crux of the gameplay, but I still wanted to keep things simple and quick.

I’m separating this topic into three parts, to be covered here and in two more blog posts. First, how to efficiently handle a pool of objects, like enemies (this post). Next post will be about finite state machines and behavior trees, and then finally I’ll talk a bit about implementation details using Unity’s powerful scripting engine in the last post.

What is an object pool? The name is quite intuitive; it’s a notion of a container full of *things *that you pull from when you want to use one, and when you’re done you put it back for re-use later. To use an example from the Unity forums, let’s say your character has a gun. Guns shoot bullets. Each time you shoot a bullet, you want to show that bullet travel along its path. After a bullet has hit something, it goes away.

A really inefficient solution to this problem is to simply create a new “bullet” object each time the player pulls the trigger, and then destroy it when it has completed its trajectory. A much more efficient solution is to create, say, 1000 bullet objects at game start, hide them offscreen somewhere, and then teleport them in as they’re needed. When the bullet finishes its trajectory, *Poof!* it’s deactivated and teleported back into the bullet pool.

Having no experience in object pools myself, I turned to the Unity forums, which had a few good posts on the matter.

Our gameplay draws inspiration from survival-type games like the (in)famous I MAED A GAM3 W1TH ZOMBIES 1N IT!!!1. I wanted the player to be initially unchallenged by only a few basic enemies, but eventually be overwhelmed by a horde as time progressed. For that, I needed an enemy pool, and an intelligent spawning strategy.

Like The Director in Valve and Turtle Rock Studios’ Left 4 Dead series, I wanted enemy spawning decision to be handled intelligently and on the fly. The AI Manager was the solution to that.

The AI Manager has only few very simple tasks:

- Create all our creature pools
- Spawn enemies at appropriate times and in appropriate locations
- Return creatures to the pool when they’ve been teleported or killed

While we discouraged the player from killing the creatures on the planet by subtracting points, it did happen. I suppose players today have been overexposed to games focused on killing things.

Creating the creature pools was pretty easy; creatures are objects with predefined (tuned) attributes like speed, attack damage, attack range, and health. That means all the AI manager had to do was create a bunch of instances of them and plop them into a list.

Spawning the creatures was a little more difficult. We want to spawn the creatures off-screen, on the terrain, and not intersecting with other creatures or objects. Finding an area off-screen was not too difficult. Instead of the more difficult calculation of the portion of the terrain not viewable in the camera frustum, we settled on a variable minimum distance from the player’s current position, so long as the camera didn’t zoom way out (it didn’t), we were fine.

Spawning the creatures at least a minimum radius away from the player solves the issue of spawning them off-screen, but we still had to worry about them being **on** the terrain. Our world wasn’t a totally flat plane; it had small hills here and there. If I spawned an enemy on the same plane as the player, it could either be a little above the ground (not a problem with physics that will bring it down), or a little below the ground (a much bigger problem, as the physics will cause the creature to fall to its doom).

To solve this issue I did the following:

- Define some point above the highest point on the terrain. Say, 60m above the origin.
- At the planar point where I want to spawn the creature, project that point vertically 60m so it’s guaranteed to be above all the ground.
- Cast a ray downward from the point and until it intersects the terrain. At the intersection point is where we want to spawn our creature.

Thankfully the Unity engine has support in place for this technique in its Physics.Raycast function.

So now we’ve taken care of spawning our creature off camera and on top of the terrain, so all that’s left is making sure it’s not in the middle of a tree or another creature or something. How can we tell if we’re intersecting a creature or not? Well, Unity has a nice physics system in place, so let’s let **it** tell us!

I *could* move the entire creature object to my desired spawn position, test for collision, and then move it back to the pool upon failure, but there’s a few problems with that. The physics engine in Unity has a different timestep than the rendering engine; a slower timestep. Moving the creature model in place, even briefly, runs the risk of having an unsightly visual artifact as we test for collision. Additionally, there’s no reason to test collision on the whole creature; it’s faster to test collisions against spheres.

A better solution, then, is to define an invisible sphere that could contain our creature, and use that instead. For each differently-sized creature, I created a differently sized sphere, which I called a *spawn tester*. Now I can invisibly place a sphere that can collide with other objects (but not vice versa! We don’t want the player running into mystical invisible teleporting ball), and I can know for certain whether an area is clear for spawning.

After a creature is teleported to the Ark, we shouldn’t be able to see it anymore. When this happens, a creature tells the AI Manager that it has been teleported. The AI Manager waits a second or two for the animation and sound to play, disables the creature, and then literally teleports it to our offscreen holding area. The deactivated enemy is added back into the creature pool, where it is now available to be reset and reactivated later.

This is a common thing to do in games, but it has an interesting philosophical implication; **you are constantly killing the same enemies over and over again!** It’s like they’re stuck in a cruel universe where they don’t ever really die, they only go to some weird catatonic limbo until a maleficent force reanimates them. In our game at least, most of the creatures aren’t being killed, merely stunned.

So that wraps up Part 1 of this series, I hope it was informative and exciting!

Dev Team (in alphabetical order)

- Stephen Aldriedge
- Cameron Coker
- Rachel Cunningham
- Jack Eggebrecht
- Me (Andy G)
- Jay Jackson
- Chiang Leng
- Sterling Morris
- John Pettingill
- Chris Potter
- Jake Ross
- Brian Smith
- Sterling Smith
- Jacob Zimmer

Computers can seem pretty dumb sometimes, can’t they? Why can’t they just learn how to do things like we do? Learning comes so effortlessly to us humans; we don’t even remember learning something as extraordinarily complicated as speech – it just sort of happened. If I showed you 10 pictures, 5 with cats in them and 5 without (actually this is the internet, so 11 of those 10 pictures would have cats in them, but bear with me) you could easily identify which images contained cats. Because computers are basically math machines, unless you can very precisely define what a cat *is*, then a computer will not be very good at such a task. That’s where neural networks come in – what if we could simulate a human brain? And like a human brain, what if we could purpose our simulation to only look at cats?

My previous post, Decision Tree Learning, briefly alluded to neural networks as an alternative machine learning technique. At their core, neural networks seek to very coarsely emulate a brain. You probably know that the brain has neurons in it. Neurons are little cells that can send an electrical signal to another neuron. Some neurons are more strongly connected to each other than others and therefore the messages they send to each other have a larger effect. If a neuron receives a strong enough message from all its neighbors, it will in turn “activate” and send a message. So how can we use this simplified concept of neurons to identify cats?

Let’s begin with the simplest of neural networks: 1 neuron. In machine learning terms, this is called a Perceptron. The neuron receives a signal, and based on that signal it either fires or it doesn’t. Let’s talk a little more about this “signal” the neuron receives.

In our neural network, a neuron can receive a signal from a variety of sources. Imagine your brain was only a single neuron instead of the billions it actually is. This neuron receives signals from your eyes, your nose, your mouth, your ears, etc. Your eyes tell you that you are sensing something with four legs. Your nose smells kitty litter. Your ears hear a “meow”, and your mouth….? Let’s hope you’re not using that sense to identify if something is a cat or not.

Anyway, all these senses get passed directly into your brain, or single neuron in this case. When a neuron receives a signal, or combination of signals, this may cause it to fire. That is, the signals it receives combine in some way to form a message that means “you too, should fire”. If your brain, like many human brains, is made for saying “YES, that’s a kitty!” or “No, not a kitty.” then the neuron firing is akin to saying “Yes”, while not firing is akin to saying “No”.

Let’s take away some of the mystique surrounding the perceptron, and neural networks in general. In a nutshell, a perceptron is a function that converts your input values into some output value.

That’s it! We make a fancy function, then feed it some numbers that describe the object we’re trying to identify, and it spits out some numbers, which we then interpret as a classification prediction. In slightly more formal terms, the “signal” that our lonely perceptron receives is a list of values (called a vector) from all the attributes of the object we’re trying to classify. Based on those values, we classify the object as “Cat” or “Not a cat”.

On top of simply receiving the values, we may also want to weight certain values higher than others. For example, there are thousands of animals that have four legs, but very few will regularly smell like kitty litter. Whether the object smells like kitty or not, then, should probably be more important to our final position. That is, it has more *weight*. Giving an attribute more weight is analogous to thickening the synapse between two neurons, strengthening the connection between them.

Here’s where I start getting a bit technical, but I’ll try to explain everything clearly. Feel free to skip all the math; I’ve tried to write this article to give you an intuition on how things work without it. Head to the comments to tell me what I need to clarify!

For each input attribute we need to determine a weight such that the sum of all weighted attributes is higher than the neuron’s threshold when it **should** fire, and below that threshold when the neuron **should not** fire.

Now, like all machine learners, we need a set of training data. That is, a bunch of objects that *are* cats, and bunch of objects that *aren’t* cats. Associated with each example are attribute values like shape, smell, sound, etc. Let’s call the set of training data **T**.

Each individual training example can be labeled t_{i} where indicates the example’s position in **T**’s list. If we have 3 attributes (shape, smell, sound) for each training example, then the attributes for t_{i} can be labeled as a_{i1}, a_{i2}, a_{i3}. More generally, there may be *k* attributes for each training example, so we can refer to them as a_{i1}, a_{i2}, …, a_{ij}, …, a_{ik}.

Now that we know what our input looks like, the objective is to determine how we should weight attributes. Some attributes may help our classification, and some may hurt it. Those that help it should be given a positive weight, and those that hurt it should be given a negative weight. Usually we start with some really small nonzero weights that we increase or decrease over time.

Now, the easiest way to apply a numerical weight to some attribute like “smell” would be to multiply them together. The only problem is, what does it mean to multiply English by some number? It doesn’t make any sense!

Therefore, we should **convert **these English descriptions to numbers. For example, let’s say you have values for your attribute “Smell” of “Kitty Litter”, “Grass”, “Hay”, and “Mud” that you want to change to numbers. The most logical thing to do is assign them values of 1, 2, 3, and 4, because that’s the order they appear in. I’ll talk a bit shortly about why this is a bad idea, and how we will change the values, but for now let’s go with it.

**Recap**: Okay, at this point we have training examples, which themselves have numeric attributes, and we have weights on each attribute. How does all this come together to give us “Cat” or “Not Cat”?

What we’ll do to actually get an answer to the question “Is it a Cat?” is combine our attributes with the weights we have on them through multiplication. Then we’ll sum up each product to get a final value, which we’ll interpret as our answer (more on the interpretation part in a sec).

Clearly, the animal we’re trying to classify is a horse, not a cat. How do we interpret that output value of 3.1 as “Not a cat”, though? What a perceptron does is this: define some number to be a *threshold* value; values that lie on one side of it are Cats, and values on the other side are Not Cats. Usually this value is 0.0:

From the figures above you can see what a perceptron does: the multiplication of our attributes by weights forms a line. Some of the elements on this line are above 0, and some are below 0. Those above 0 we’ll say are cats, and those below we’ll say aren’t. For funsies, we add another weight to our perceptron:

This new attribute, Attribute 0 is considered to always be on. The reason for adding it (and an associated weight) is so that our line from above doesn’t always necessarily go exactly through (0, 0):

Notice how our line has moved up, but the area we classify as Cats has not? This is right; the number of items that will map to a point on the line above “0” is higher! We could have easily gone the other way to make fewer items be classified as cats.

For one training example let’s compute a sum of all weights multiplied by their attributes:

Because I don’t like the term “Sum”, and it’s not really used in the literature, let’s replace it with the term “*y*”, and give it a subscript to indicate that it’s the sum for training example *i*:

This equation is actually the same as the equation for a line through the origin. This observation helps us to realize that what we’re really doing is trying to draw a line that best separates our “cat” and “not cat” examples. (See figures above).

To add a little extra ability to this function, it’s common to add one more “bias” weight that we’ll call weight 0 (). What this term does is give us the y intercept of our line, allowing it to not be forced to go through (0,0). In neural network terms, we’re adding a new attribute that always takes on a value of 1, and is our weight for it.

Those familiar with vector math might see that this summation of products is really the same as a vector multiplication:

**To recap**: For a training example that has attributes , and neural network weights on each attribute , we generate a linear sum . What does this sum mean? It’s just a number after all. Let’s say this: if is greater than, we’ll say that the neuron fires (yes, the object is a cat!), else it doesn’t (no, it’s not a cat). That is, we can just check the sign of , :

Alright, we now know how to classify our object if we have the numerical attribute values, and appropriate weights on those attributes. The purpose of a neural network is to learn these appropriate weights, which I’ll get to in a minute. First, let’s return to why the numerical attribute values we chose before were bad.

When we’re performing our multiplication of attribute values by their weights, won’t “Mud” inherently have a higher weight than “Kitty Litter”? Our neural network should theoretically be able to compensate by modifying weights on these attributes, but we should do this ourselves; It’s generally accepted that neural networks work better on normalized input. That is, we need to “normalize” our attribute values.

Normalization means that we change our input attribute values from the ones given into ones relative to each other. Quite often storing them relative to each other causes them to be in a smaller range. For example, if you have values that go from -1000000 to 1000000, it might be better to just divide everything by a million to get the range (-1.0, 1.0) (this is called Min-Max normalization).

There’s a variety of other ways to normalize data. The way I was taught, and the way I’ll describe here, is *z-score, *or* statistical* normalization. This kind of normalization treats all the given values of attributes in our training set as a population in the statistical sense. When I say “population” you might think of humans, and this is an appropriate intuition here. That is, for a population of humans, you’ll have an average height, average weight, etc. You can think of each humans’ height in terms relative to this average. That’s exactly what z-score normalization does; it replaces a value with one relative to the average.

In more technical terms, the z-score is defined as:

z is our z-score, m is our mean, s is our standard deviation, and x is the attribute value we are replacing. Let’s change our mapping from above to reflect our new normalized values (assume that mean = 1.4, std. deviation = 0.25, calculated from some training set not shown here):

The fact that normalization usually puts numbers into a smaller range can be beneficial because computers have a limit on the biggest number they can represent. Go over that number, and you come around the other side – a really really high positive number will become a really big negative number. Because we do a lot of multiplication and squaring operations in our neural network, it’s good to keep attribute values small.

In our case, normalization expanded the range, but if you notice, Kitty Litter became the only negative value. With negative weights on the Smell attribute, we’ll easy compensate for this to show that most objects being classified that smell like Kitty Litter are cats.

Okay, at this point we have our normalized, numerical input attributes, and a way to combine them with weights to generate a classification of “Cat” or “Not a cat”. It’s about time we learn how to assign the proper weights!

Here’s an idea: let’s start with some random weights on our synapses. If we run a training example through the neuron, and the neuron says “Cat!”, but we know it’s not really a cat, then we’ll modify the weights a little so that next time we might get it right.

So how do we provide some negative reinforcement to our neuron? Imagine if you will, a hill. You are told that you need to get to the bottom of this hill, and you have to do so with a blindfold on. It’s actually pretty easy, right? You feel the pull of gravity in one direction so you walk that way until the pull of gravity goes away. You *descended* the hill. Based on how steep the *gradient* of the hill was, it took you longer or shorter to get to the bottom. Training our perceptron to use the appropriate weights follows the same principle! In fact, it’s called *gradient descent.* We start with some random initial weight, and this weight gets pulled towards its true value.

In the figure above, we have some random initial weight that has an error value assigned to it. We want to “pull” that weight to the right so as to decrease its error measure, until we reach a point where we can’t improve it anymore.

More formally, we define a quadratic error function for our prediction. Taking the derivative of this function at a point gives us the slope of the function at that point. We want the slope to be 0, and if it is, we know that we’ve minimized our error (reached a local or global min). Now, this next part requires you to know what a partial derivative is.

Given a training example and weights on its attributes, we already know that the function we use to compute a prediction is the summation of attributes multiplied by their weights:

After we *threshold* the output using the sign function s(y_{i}), we have a prediction of 1 (it’s a cat!) or 0 (it’s not a cat…). Well, because all the training examples are labeled with what we **should** have predicted, we know whether the output *should* have been 1, or *should* have been 0. Let’s let the **real classification** of training example *i* be denoted *r _{i}*. And now let’s define our error as:

What the Error equation is saying is that we define the error on our weight vector (given the attributes and real classification of training example i) as proportional to the square of the difference between the real classification and our prediction. Because we only have 2 possible classes (cat, not a cat), the squared difference is only ever 1 or 0, but if we had many different classes this error could get much larger. The ½ that gets multiplied to it is completely man-made; it was only put there to make the next part (taking a derivative) easy.

Turning this error equation into a weight update equation requires us to take a partial derivative with respect to the weight term. Taking the derivative with respect to w, then requires us to use the chain rule twice, eventually leaving us with:

What this equation is saying is that we modify weight *h* based on the difference between the real classification and prediction, multiplied by the value for attribute h. For example, this this could be the weight on “smell”, where the attribute value was “Mud”.

One problem with the equation is that it assumes equal responsibility for error on behalf of all weights, when in fact this may not be the case. This is an artifact of us trying to optimize potentially many weights at the same time. Changing each weight entirely by equation 7, then, could make our weights “bounce around” the correct values. To address this issue, let’s only change the weight by a small percentage of what equation tells us. That is, let’s multiply the right hand side of the equation with a value in the range [0.0, 1.0), which we’ll label :

is typically small, but the best value of is yet another optimization problem. A different value tends to work better on different data sets, so knowing it beforehand is nigh impossible. Programmers use a few strategies on this, from just specifying a really small value, which makes training time take much longer, or specifying a slightly larger value and decreasing it over time, or letting the program adapt based on the network’s ongoing performance. I myself have tried all of the above, and found that an adaptive value seems to be the best generic strategy. I won’t get into how to adapt in this article for simplicity, but once you get the basics of neural networks down, it’s not too hard to implement.

**Recap:**

Alright, at this point we know the following:

- How to change qualitative attribute values into quantitative ones
- How to normalize quantitative attribute values
- How to combine a training example’s attribute values with weights to generate a classification prediction
- How to specify the error on our prediction
- How to update our weights in response to error on our prediction

The only question that remains is this: how do we know when we’ve finished training our perceptron?

What we do is this: we take a portion of all our training data, up to 1/3 of it, and we set it aside. Let’s call this our *validation set*. So now we have our training set, which is 2/3 of our training examples, and our validation set, which is 1/3 of our training examples.

What we do is this:

- Run each training example through our perceptron, calculating prediction error and updating weights in response to that error
- After we’ve done that, let’s run each example from the validation set on our perceptron, calculating prediction error, but
**NOT**updating weights in response to that error. - Repeat

Each time we repeat the above is called a “training epoch”. We’ll stop training our perceptron when it stops improving. That is, when the error on the training set doesn’t change for a few epochs. At this point, we know that we’re not getting any better, so we might as well stop. The perceptron’s performance over time will look something like this:

Notice how the early stages have a lot of variation, but the performance is generally improving. Near the end, though, it actually **decreases**! At this point we know that we’ve overfit the network; it’s too well trained for the training set, and so it performs poorly on the validation set.

When we compute the prediction error on the validation set, we do it on the set as a whole; we add up the error for each validation set example, and then divide by the size of the validation set. This is called the mean squared error (MSE):

The equation for the MSE is very similar to the error equation we used for deriving our weight update equation above.

The logic I used for training my neural network was this:

- Run for at least 100 epochs to get out of the typically noisy beginning
- Each time we run our validation set through the perceptron, compute the perceptron’s mean squared error (MSE) on the validation set. Track the epoch at which we saw the lowest MSE.
- Let Epoch
_{best}be the training epoch at which we saw the lowest MSE. If 2 * Epoch_{best}epochs have passed without finding a new minimum MSE, OR 10,000 epochs total have passed, terminate. (I also experimented with up to 100k training epochs, but saw no difference).

For some perspective, I compared the performance of our lowly perceptron against a decision tree approach. I also compared against a Multi-Layer Perceptron (MLP), which is a more complex neural network consisting of various perceptron layers. What I found may surprise you; despite the fact that the perceptron is effectively a brain consisting of only a single neuron it performs nearly as well as the other approaches.

The data sets we compared were taken from the UCI Machine Learning repository. Some are more difficult than others, as you can see that predicting Heart Disease is a difficult task that even the MLP only got about 65% accuracy on.

What’s a perceptron good for? You don’t exactly think of a brain as being composed of a single neuron, and on top of that, a perceptron can only learn a linear discriminant. They’re surprisingly accurate classifiers, though, and due to their size their fast too.

If you want to make your perceptron bigger and learn more complex things (and multiple class labels), it takes a little more work. You have to add more layers to the network, creating a *multi-layer perceptron *(MLP). You have an input layer that received your training examples, and then you have a series of *hidden layers*, that feed into each other, finally going into an *output layer* that will produce your prediction for each class you’re interested in (dog, cat, human, etc). Also, you can’t simply use a linear function as the output of your neurons; the output of your hidden layers needs to be thresholded using a nonlinear function like tanh or the logistic function. Finally, all this added complexity makes defining your weight update equations more difficult with each additional layer:

The above figure shows an example MLP network with a hidden layer size half that of the input layer size. The network is trying to learn four class labels: Cat, Dog, Horse, and Pig. The σ symbol implies I’m using the logistic (sigmoid) thresholding function in the hidden layers. The next layer outputs some value for each class label that implies how certain the network is about the example belonging to that particular class. The final prediction takes the most likely of these using the softmax function.

In my own experiments I found that sometimes a perceptron performs just as well as a MLP, and other times the MLP significantly outperforms the perceptron. It all comes down to the problem domain you’re trying to learn. This means messing around with different parameters like number of hidden layers, neurons per hidden layer, values for , training epochs, and thresholding functions until you end up with something that suits your needs.

Neural networks as a whole are very useful, and the subject to much research. It’s coincidental that I wrote this article just as Google is telling the world that it has made a neural network with a billion nodes in it! It’s using networks like this to identify arbitrary objects in pictures (yes, including cats!). The biggest network I created with had less than 50 neurons in it.

- Alpaydin, E.
*Introduction to machine learning*. 2nd ed. The MIT Press, 2012. 233-245. Print. - Priddy, Kevin L., and Paul E. Keller.
*Artificial neural networks: an introduction*. Vol. 68. Society of Photo Optical, 2005.

Remember my post on the Dubin’s car? It’s a car that can either go forward, turn fully left, or turn fully right. In that post I explained that the shortest path from one point to another via the Dubin’s car could be described by one of six control sequences. The Reeds-Shepp car is essentially the same thing as the Dubin’s car, except it can also move backwards! This post isn’t quite the comprehensive step-by-step guide as the Dubin’s one, but I’ll give you a great overview of the paths, point you to some great online resources, and give you some tips if you plan on implementing them yourself!

The Reeds-Shepp car is named after two guys, J.A. Reeds and L.A. Shepp, who in 1990 published a really comprehensive paper that showed we could classify all the control sequences that account for the shortest paths of the Dubin’s car if we permitted it to go in reverse. However, instead of 6 paths, they showed there to be 48, which is quite a few more if you’re implementing them one by one. (Fortunately they also showed an easier way to implement things such that we’d only have to write 8 or 9 functions and reuse them cleverly).

However, a year later two *other *people (HJ Sussman and G Tang) showed that 48 is actually **too many** control sequences, and the real number is 46. The proofs for all these things are quite complicated, so I won’t try to explain them here (And I don’t fully understand all of them myself!).

The above-mentioned paths/control sequences I’ve been mentioning are fairly well explained by Steven LaValle’s webpage. Also, Mark Moll at Lydia Kavraki’s lab wrote an implementation of the Reeds-Shepp paths for their Open Motion Planning Library. Steven LaValle also has his own implementation that became part of NASA’s CLARAty project.

Clearly there’s tons of information, and even code, online about the Reeds-Shepp cars, so why write about them? Well, because I implemented them myself, and learned a few things that might help you:

- There are typos in the original Reeds-Shepp paper
- This is important. Besides some formatting mistakes that they make here and there, the major issue is with paths described by 8.3 and 8.4. These paths include C | C | C (for example Left forwards, Right backwards, Left forwards) and C | C C (Left forwards, Right backwards, Left backwards). If you try to implement things as they stand it
**WILL NOT WORK**!

- This is important. Besides some formatting mistakes that they make here and there, the major issue is with paths described by 8.3 and 8.4. These paths include C | C | C (for example Left forwards, Right backwards, Left forwards) and C | C C (Left forwards, Right backwards, Left backwards). If you try to implement things as they stand it
- (Update 5-30-2013: I have been in contact with the OMPL developers and have determined that the following claim was made in error, so I am retracting it)
~~The OMPL library tries to address these issues, but in my experience~~**theese paths don’t work either**!~~The lengths of the paths being returned by their implementation seem correct, but the actual controls for those paths seem incorrect at time (or didn’t work for me at least).~~

- Steven LaValle’s implementations of 8.3 and 8.4
**do**work, though with some minor modifications.- I don’t claim to be the expert on how his code was supposed to be used, as I was doing my own thing, but if you try to grab the source, you may need to change some “eta” values to “y – 1.0 + cos(phi)”.

- You can be more efficient than all of these implementations!
- For one, Dr. LaValle’s implementation just has too many redundant functions that you could just ignore by using the other ones as recommended in the Reeds-Shepp paper (see the section on reflecting, timeflipping, and backwards paths)
- For another, there was yet another paper published in 1998 by P Soueres and J Boissonnat that gave us some strategies for ignoring entire swaths of potential paths, because we could know
*a priori*they wouldn’t be optimal. Basically, this works by defining theta to be the angle from the start to goal configuration, and then performing some really simple checks on the value of theta.

- Finally, a word of caution — the formulas provided by Reeds and Shepp assume the start configuration to be at position (0, 0) with rotation 0. If you use a global coordinate system, you need to convert the goal coordinates to relative ones, which is not just as simple as (goal.x – start.x), (goal.y – start.y), (goal.theta – start.theta). Well, it’s almost that simple. You need to rotate the (relativeX, relativeY) point by -start.theta so that it’s truly relative to (0, 0, 0).
- Also, the formulas all assume a unit turning radius, but you can easily account for this. (sidenote: they also assume unit velocity, i.e. no accelerating or braking, but you can also fake this afterwards!)

That’s pretty much it. The academic publishing scene is a great venue to show people what **does** work, but not so well for discussing what **doesn’t** work, or providing tips/tricks for getting things to work better. That’s what blogs are for

One word of advice: if you do use someone else’s open source code, always make sure to provide any disclaimers they require, as well as attribute the pieces you used to the original authors!

]]>Imagine you have a point, a single little dot on a piece of paper. What’s the quickest way to go from that point to another point on the piece of paper? You (the reader) sigh and answer ”A straight line” because it’s completely obvious; even first graders know that. Now let’s imagine you have an open parking lot, with a human standing in it. What’s the quickest way for the human to get from one side of the parking lot to the other? The answer is again obvious to you, so you get a little annoyed and half-shout ”A straight-line again, duh!”. Okay, okay, enough toss-up questions. Now what if I gave you a car in the parking lot and asked you what was the quickest way for that car to get into a parking spot? Hmm, a little harder now.

You can’t say a straight line because what if the car isn’t facing directly towards the parking space? Cars don’t just slide horizontally and then turn in place, so planning for them seems to be a lot more difficult than for a human. But! We can make planning for a car just about as easy as for a human if we consider the car to be a special type of car we’ll call a **Dubin’s Car**. Interested in knowing how? Then read on!

In robotics, we would consider our human from the example above to be a type of holonomic agent. Don’t let this terminology scare you; I’ll explain it all. A holonomic agent can simply be considered an agent who we have full control over. That is, if we treat the human as a point in the x-y plane, we are able to go from any coordinate to any other coordinate always. Humans can turn on a dime, walk straight forward, side-step, walk backwards, etc. They really have full control over their coordinates.

A car however, is not holonomic because we don’t have full control over it at all times. Therefore we call a car a type of nonholonomic agent.

Cars can only turn about some minimal radius circle, therefore they can’t move straight towards a point just barely to their left or right. As you can imagine, planning paths for nonholonomic agents is much more difficult than for holonomic ones.

Let’s simplify our model of our car. How about this: it can only move forwards — never backwards, and it’s always moving at a unit velocity so we don’t have to worry about braking or accelerating. What I just described is known as a Dubin’s car, and planning shortest paths for it is a well studied problem in robotics and control theory. What makes the Dubin’s car attractive is that its shortest paths can be solved exactly using relatively simple geometry, whereas planning for many other dynamical systems requires some pretty high level and complicated matrix operations.

The Dubin’s Car was introduced into the literature by Lester Dubins, a famous mathematician and statistician, in a paper published in 1957. The cars essentially have only 3 controls: “turn left at maximum”, “turn right at maximum”, and “go straight”. All the paths traced out by the Dubin’s car are combinations of these three controls. Let’s name the controls: “turn left at maximum” will be L, “turn right at maximum” will be R, and “go straight” will be S. We can make things even more general: left and right turns both describe curves, so lets group them under a single category that we’ll call C (”curve”). Lester Dubins proved in his paper that there are only 6 combinations of these controls that describe ALL the shortest paths, and they are: **RSR**,** LSL**,** RSL**,** LSR**,** RLR**,** and LRL**. Or using our more general terminology, there’s only two classes: **CSC** and **CCC**.

Despite being relatively well studied, with all sorts of geometric proofs out there and qualitative statements about what the shortest paths look like, I could not find one single source on the internet that described in depth how to actually compute these shortest paths! So what that we know the car can try to turn left for some amount of time, then move straight, and then move right — I want to know exactly how much I need to turn, and how long I need to turn for. And if you’re like me in the least, you tend to forget some geometry that you learned in high school, so doing these computations isn’t as “trivial” for you like all the other online sources make it out to be.

If you’re looking for the actual computations needed to compute these shortest paths, then you’ve come to the right place. The following will be my longest article to-date, and is basically a how-to guide on computing the geometry required for the Dubin’s shortest paths.

This blog post was originally written in and then transcribed here to WordPress. If you’d like a more portable, possibly easier to read version, then download the PDF version here (You may also cite this pdf if you’re writing a bibliography).

Let’s first talk about the dynamics of our system, and then describe in general terms what the shortest paths look like. Afterwards, I’ll delve into the actual calculations. Car-like robots all have one thing in common — they have a minimum turning radius. Think of this minimum turning radius as a circle next to the car that, try as it might, the robot can only go around the circle’s circumference if it turns as hard as it can. The radius of this circle, is the car’s minimum turning radius. This radius is determined by the physics of the car the maximum angle the tires can deviate from “forward”, and the car’s wheelbase, or the length from the front axle to rear axle. Marco Monster has a really good descriptive article that talks about these car dynamics.

I already mentioned that the Dubin’s car could only move forward at a unit velocity, and by “forward” I mean it can’t change gears into reverse. Let’s describe the vehicle dynamics more formally. A Car’s configuration can be describe by the triplet . Define the car’s velocity (actually speed, because it will be a scalar quantity) as . When a car is moving at velocity about a circle of radius , it will have angular velocity

Now let’s define how our system evolves over time. If you know even just the basics of differential equations, this will be cake for you. If not, I’ll explain. When we want to describe how our system evolves over time, we use notation like to talk about how our x coordinate changes over time.

In basic vector math, if our car is at position with , and we move straight forward for a single timestep, then our new configuration is Note how our coordinate changes as a function of . Therefore, we’d say that . A full system description would read like this:

I’d like to point out that this equation is linear in that the car’s new position is changing to be some straight line away from the previous one. In reality, this is not the case; cars are nonlinear systems. Therefore, the dynamics I described above are merely an approximation of the true system dynamics. Linear systems are much easier to handle, and if we do things right, they can approximate the true dynamics so well that you’d never be able to tell the difference. The dynamics equations translate nicely into update equations. These update equations describe what actually happens each time you want to update the car’s configuration. That is, if you’re stepping through your simulator, at each timestep you’d want to update the car. As the Dubin’s car only has unit velocity, we’ll replace all instances of with .

Note that I actually replaced instances of with the symbol . What’s with that? Well, remember when I said that our dynamics are a linear approximation of a nonlinear system? Every time we update, we are moving the car along a line. If we only move a very short amount on this line, then we approximate turns more finely. Think of it this way: You are trying to draw a circle using a series of evenly-spaced points. If you only use 3 points, you get a triangle. If you use 4 you get a square, and 8 you get an octagon. As you add more and more points you get something that more closely resembles the circle. This is the same concept. essentially tells us the amount of spacing between our points. If it is really small, then the points are close together and we get overall motion that looks closer to reality.

In this section I’ll briefly go over what each of the 6 trajectories (RSR, LSR, RSL, LSR, RLR, LRL) looks like. (* Disclaimer*:

The **CSC** trajectories include **RSR**, **LSR**, **RSL**, and **LSR** — a turn followed by a straight line followed by another turn (Shown in Figure 1).

Pick a position and orientation for your start and goal configurations. Draw your start and goal configurations as points in the plane with arrows extending out in the direction the car is facing. Next, draw circles to the left and right of the car with radius . The circles should be tangent at the location of the car. Draw tangent lines from the circles at the starting configuration to the circles at the goal configuration. In the next section I’ll discuss how to compute this, but for now just draw them.

For each pair of circles (RR), (LL), (RL), (LR), there should be four possible tangent lines, but note that there is only one valid line for each pair (Shown in Figure 2. That is, for the RR circles, only one line extending from the agent’s circle meets the goal’s circle such that everything is going in the correct direction. Therefore, for any of the **CSC** Trajectories, there is a unique tangent line to follow. This tangent line makes up the ‘S’ portion of the trajectory. The points at which the line is tangent to the circles are the points the agent must pass through to complete its trajectory. Therefore, solving these trajectories basically boils down to correctly computing these tangents.

**CCC** Trajectories are slightly different. They consist of a turn in one direction followed by a turn in the opposite direction, and then another turn in the original direction, like the **RLR** Trajectory shown in Figure 3. They are only valid when the agent and its goal are relatively close to each other, else one circle would need to have a radius larger than , and if we must do that, then the **CCC** Trajectory is sub-optimal. For the Dubin’s Car there are only 2 **CCC** Trajectories: **RLR** and **LRL**. Computing the tangent lines between RR or LL circles won’t help us this time around. The third circle that we turn about is still tangent to the agent and goal turning circles, but these tangent points are not the same as those from tangent line calculations. Therefore, solving these trajectories boils down to correctly computing the location of this third tangent circle.

In this section, I will discuss the geometry in constructing tangent lines between two circles. First I’ll discuss how to do so geometrically, and then I’ll show how to do so in a more efficient, vector-based manner.

Given: two circles and , with radii and respectively.

Consider the center of as , the center of as , and so on. (Figure 4).

1. First draw a vector from to . . This vector has a magnitude:

2. Construct a circle centered at the midpoint of with radius

That is,

3. Construct a circle centered at ’s center, with radius $latex r_4 = r_1 + r_2$

4. Construct a vector from to the “top” point, of intersection between and . If we can compute this vector, then we can get the first tangent point because is pointing at it in addition to . For visual reference, see Figure 7.

5. This is accomplished by first drawing a triangle from to to pt like the one shown in Figure 8. The segments and have magnitude . The segment has magnitude . We are interested in the angle . will give us the angle that vector that would need to rotate through to point in the same direction as vector . We obtain the full amount of rotation about the axis, for , by the equation (* Note:* the atan2() function first takes the y-component, and then the x-component) is therefore obtained by traversing for a distance of . That is,

6. To find the first inner tangent point pit1 on C1 , we follow a similar procedure to how we obtained pt — we travel along from , but we only go a distance of . Because we know , we are now able to actually compute . Now, we need to normalize and

then multiply it by to achieve a vector $latex \vec{V_3}$ to $latex p_{it1}$ from . (**Remember** that, to normalize a vector, you divide each element by the vector’s magnitude . follows simply:

It may be a bit of an abuse of notation to add a vector to a point, but when we add the components of the point to the components of the vector, we get our new point and everything works out.

7. Now that we have , we can draw a vector from to like in Figure 9. Note that this vector is parallel to an inner tangent between and

We can take advantage of its magnitude and direction to find an inner tangent point on . Given that we’ve already calculated , getting its associated tangent point on , is as easy as:

I hope this is clear enough for you to be able to compute the other inner tangent point. The trick is to use the “bottom” intersection point between and , and then work everything out exactly the same to get and , which define the other inner tangent line.

Constructing outer tangents is very similar to constructing the inner tangents. Given the same two circles and as before, and assuming Remember how we started off by making centered at with radius ? Well this time around we’ll construct it a bit differently. is centered at as before, but this time it has radius .

Follow the steps we performed for the interior tangent points, constructing on the midpoint of exactly the same as before. Find the intersection between and to get just as before as well. After we’ve gone through all the steps as before up to the point where we’ve obtained , we can get the first outer tangent point by following a distance of just as before. I just wanted to note that the magnitude of before normalization is instead of . To get , the accompanying tangent point on , we perform addition:

This is exactly the same as before. In essence, the only step that changes between calculating outer tangents as opposed to inner tangents is how is constructed; all other steps remains exactly the same.

Now that we understand geometrically how to get tangent lines between two circles, I’ll show you a more efficient way to do so using vectors that has no need for constructing circles or . We didn’t work through the example in the previous section for nothing– we had to work up to Step 7 so that you could have some intuition on why the following vector method works.

1. Draw your circles and as before (Figure 4).

2. Draw vector from to . This has magnitude as before.

3. Draw a vector between your tangent points. In this case, it’s easier to start with outer tangent points, which we’ll call pot1 and . So,

4. Draw a unit vector perpendicular to . We’ll call it (**n**ormal vector).

Figure 10 depicts our setup. In the general case, the radii of and will not be equivalent, but for us it will. This method works for both cases.

5. Let’s consider some relationships between these vectors.

• The dot product between and is 0 because the vectors are perpendicular.

• because is a unit vector.

• We can modify vector to be parallel to quite easily by subtraction:

Refer to Figure 10: Setup for computing tangents via the vector method.

Why does this work? For some intuition, consider Step 7 in the previous section: We drew a vector from to ’s center, and

then I stated that the vector was parallel to the vector between the tangent points. Well, both points were equidistant from their

respective tangents; we need to move a distance of to get to the first tangent point. Likewise we need to translate (the center of the other circle) ”down” the same amount of distance.

Using the head-to-tail method of vector addition, you can see that we modified to be this same vector! Now that I’ve convinced you that this modification causes to be parallel to , the following statement holds:

• Simplifying the above equation by distributing :

• Further simplifying the equation:

• Let’s normalize by dividing by its magnitude, . This doesn’t change the dot product between and because the vectors are still perpendicular, but we also have to divide the right-hand side by :

I’ll refer to the normalized Simply as from here on.

• Now we can solve for because it is the only unknown in the equations:

• The dot product between two vectors and is defined as:

Where is the angle between them. Therefore, in the equation

is the cosine of the angle between and . Because it’s constant, and I don’t want to keep re-typing it, let’s define (**c**onstant).

• All that remains now is to rotate through the angle , and it will be equivalent to . Rotating a vector is a well known operation in mathematics.

is, therefore:

Remember that is the cosine of the angle between and , therefore the sine of the angle is because

6. Knowing as well as *(remember when we modified* *to be* *to* ?), allows us to very easily calculate the tangent points by first going from the center of along n for a distance of , and then from there following to get the second tangent point.

The other tangent points are easily calculated from this vector-based formalism, and it’s much faster to do than all the geometry transformations

we did with Method 1. Refer to freely available code online for how to actually implement the vector method.

The only reason we needed to compute tangent points between circles is so that we could find the ‘S’ portion of our **CSC** curves. Knowing the tangent points on our circles we need to plan to, computing **CSC** paths simply become a matter of turning on a minimum turning radius until the first tangent point is reached, travelling straight until the second tangent point is reached, and then turning again at a minimum turning radius until the goal configuration is reached. There’s really just one gotcha one needs to worry about this step, and that has to do with the directionality of the circles. When an agent wants to traverse the arc of a circle between two points on the circle’s circumference, it must follow the circle’s direction. That is, an agent cannot turn left (positive) on a right-turn-only (negative) circle. This can become an issue when computing the arc lengths between points on a circle. The thing about all your trigonometry functions is that they will only return you the short angle between things. With directed circles, though, very often you want to actually be traversing the long angle between two

points on the circle. Fortunately there is a simple fix.

Given: A circle centered at with points and along its circumference, radius , and circle directionality , which can be “left” or “right”.

Arc lengths of circles are computed by multiplying the radius of the circle by the angle between the points along the circumference. Therefore arc length, , and can be naively computed using the law of cosines

after you compute the Euclidean distance between the points. I say naively because this method will only give you the shorter arc length, which will not always be correct for your directed circles (Figure 11).

A better strategy would be to generate vectors from the center of the circle to the points along the circumference. That is, and . Let’s assume that the arc you want to traverse is from to with direction .

We can use the function to compute the small angle between the points as before, but the difference is that gives us directional information. That is, will be positive or negative depending on what **direction** rotated to end up at . A positive rotation is a “left” turn and a negative rotation is a “right” turn. We can check the sign of this angle against . If our angle’s sign disagrees with the direction we’d like to turn, we’ll correct it by either adding or subtracting 2.

That is:

With a function like this to compute your arc lengths, you will now very easily be able to compute the duration of time to apply your “turn left” or “turn right” controls within a circle.

**RSR**, **LSL**, **RSL**, and **LSR** trajectories are all calculated similarly to each other, so I’ll only describe the **RSR** trajectory (Figure 12).

Given starting configuration and goal configuration , we wish to compute a trajectory that consists of first turning right (negative) at a minimum turning radius, driving straight, and then turning right (negative) again at minimum turning radius until the goal configuration is reached. Assume that the minimum turning radius is .

First, we must calculate the circles about which the agent will turn — a circle of minimum radius to the “right” of s as well as a circle of minimum radius to the “right” of . The center of , becomes

Similarly, the center of becomes

Now that we’ve obtained the circles about which the agent will turn, we need to find the outer tangent points we can traverse. I demonstrated earlier how one would go about calculating all the tangent points given two circles. For an RR circle, though, only one set of outer tangents is valid. If you implement your tangent-pt computing function appropriately, it will always return tangent points in a reliable manner. At this point, I’ll assume you are able to obtain the proper tangent points for a right-circle to right-circle connection.

Your tangent point function should return points and which respectively define the appropriate tangent points on , the agent’s right-turn circle and , the query configurations’ right-turn circle.

Now that we’ve calculated geometrically the points we need to travel through, we need to transform them into a control the agent can understand. Such a control should be in the form of “do this action at this timestep”. For the Dubin’s car, this is pretty easy. If we define a control as a pair of (steering angle, timesteps), then we can define a RSR trajectory as an array of 3 controls: (, ), (, ), and (, ). We still need to compute the number of timesteps, but this isn’t so bad.

We know () , the position of our starting configuration, as well as our outer tangent point. Both of these points lie on the circumference of the minimum turning-radius “right-turn only” circle to the right of the , the starting configuration. Now we can take advantage of our ArcLength function to compute the distance between the two points given a right-turn. Typically, simulations don’t update entire seconds at once, but rather some some . This is a type of integration called Euler integration that allows us to closely approximate the actual trajectory the car should follow, though not perfectly so. Because we’ve defined the Dubin’s car’s update function as a linear function of the form

then we actually travel in a straight line from timestep to timestep. A big therefore results in a poor approximation of reality. As approaches 0, though, we more closely approximate reality. Typically a delta in the range of [0.01,0.05] is appropriate, but you may need finer or coarser control. Accounting for , our update equations become:

Now that we have decided on a value for , we can compute the number of timesteps to apply each steering angle. Given the length of an arc (computed using the ArcLength function, or just simply the magnitude of the tangent line)

Computing the other trajectories now becomes a matter or getting the right tangent points to plan towards. The **RSR** and **LSL** trajectories will use outer tangents while the **RSL** and **LSR** trajectories use the inner tangents.

Two of the Dubin’s shortest paths don’t use the tangent lines at all. These are the **RLR** and **LRL** trajectories, which consist of three tangential, minimum radius turning circles. Very often, these trajectories aren’t even valid because of the small proximity that and must be to each other for such an arrangement of the three circles to be possible. If the distance between the agent and query configurations’ turning circles is less than , then a **CCC** curve is valid. I say less than , because if the distance were equal, then a **CSC** trajectory is more optimal.

For **CCC** trajectories, we must calculate the location of the third circle, as well as its tangent points to the circles near s and g. To be more concrete, I’ll address the **LRL** case, shown in Figure 13.

Consider the minimum-radius turning circle to the “left” of to be and the minimum-radius turning circle to the “left” of to be to . Our task is now to compute , a minimum-radius turning circle tangent to both and plus the points and which are respective points of intersection between and , and between and . Let be the centers of and , respectively. We can form the triangle using these points. Because is tangent to both and , we already know the lengths of all three sides.

Segments and have length and segment has length . We are interested in the angle because that is the angle that the line between and () must rotate to face the center of , which will allow us to calculate . Using the law of cosines, we can determine that

In a **LRL** trajectory, we’re interested in a that is “left” of . At the moment, we have an angle, that represents the amount of rotation that vector must rotate to point at the center of . However, ’s value is only valid if is parallel to the x-axis. Otherwise, we need to account for the amount is rotated with the function — . For a **LRL** trajectory, we want to add to this value, but for an **RLR** trajectory we want to subtract from it to obtain a circle “to the

right” of . (*Remember that we consider left, or counter-clockwise turns to be positive.*)

Now that theta represents the absolute amount of rotation, we can compute

Computing the tangent points and becomes easy; we can define vectors from the to and , and walk down them a distance of . As an example, I’ll calculate . First, obtain the vector from to ; . Next, change the vector’s magnitude to by normalizing it and multiplying by (or just dividing its components by 2, as its magnitude should already be ).

Next, compute using the new ;

Computing follows a similar procedure. At this point we have everything we need for our **CCC** trajectory! We have three arcs as before, from to . From to , and from to . One can compute arc lengths and durations as before to finish things off.

So now that we’ve computed all of the shortest paths for the Dubin’s Car, what can we do next? Well, you could use them to write a car simulator. The paths are very quick to compute, but are what’s known as ”open-loop” trajectories. An open-loop trajectory is only valid if you apply the desired controls exactly as stated – so a computer. Even if you used a computer to control your car, this would still only work perfectly in simulation. In the real-world we have all kinds of sources for error, that we represent as uncertainty in our controls. If a human driver tried to apply our Dubin’s shortest path controls, they wouldn’t end up exactly on target, would they? Perhaps in a later article I will discuss motion planning under uncertainty, but for now I defer you to the internet.

Another application we could use these shortest path calculations for would be something known as a rapidly exploring random tree, or RRT. An RRT uses the know-how of planning between two points to find reachable points in an environment that has obstacles. The resulting structure of points that are connected is a tree with a root at our start, and the paths one gets from the tree can look pretty good; imagine a car deftly maneuvering around things in its way. RRTs are a big topic of research right now, and Steven LaValle has a wealth of information about them.

Following from the Dubin’s RRT idea, you might one day use a much more complicated car-like robot. For this complicated car-like robot you may want to also use an RRT. Remember when I said that the resulting structure of an RRT looks like a tree with its root at our start? Well, to ”grow” this tree, one has to continue adding points to it, and part of that process is determining which point in the tree to connect to. Typically this is done by just choosing the closest point in the tree. ”Closest” usually means, ”the shortest line between myself and any point in the tree”. For a lot of robots (holonomic especially), this straight-line distance is just fine, but for car-like (nonholonomic) robots it may not be good enough, because you can’t always move on a straight line between start and goal. Therefore, you could use the length of the Dubin’s shortest path instead of the straight-line shortest path as your measure of distance! (We’re using the Dubin’s shortest paths as a ”distance metric”).

In reality, the Dubin’s shortest paths are not a true distance metric because they are not ”symmetric”, which means that the distance from A to B isn’t always the same as the distance from B to A. Because the Dubin’s car can only move forward, it isn’t symmetric. However, if you expanded the Dubin’s car to be able to move backwards, you’d have created a Reeds-Shepp car, and the Reeds-Shepp shortest paths are symmetric.

I’d like to thank my colleague Jory Denny for LaTeX sorcery and various grammar edits.

I’ve made the code (sans graphics stuff) available on my GitHub account: https://github.com/gieseanw/Dubins

]]>