Often when you’re trying to debug a piece of code, the debugger steps into a block you didn’t expect it to. Most of the time this is because your code has a logic error, and it wasn’t doing what your mental model thought it would.

Other times it’s because you accidentally compiled with optimizations on, and the compiler did some magic to make the outcome the same even if the code was different.

This post talks about a scenario where it’s *neither* of those things. Instead it’s something much more dangerous and difficult to detect — a violation of the One Definition Rule (ODR).

In this post I’ll talk a little about C++’s One Definition Rule, and then discuss how one manifested itself in a project I was working on. I’ll talk about how to detect them, and how I resolved mine.

(I talk a lot about violating ODR in the context of a tool I use (SWIG), but the manner in which I violated ODR is applicable to any C++ library that links to another.)

SWIG is a handy little tool I use to generate Python bindings for my C++ code. If I have the following code in a header file:

template<class T> class AndyGsFancyContainer { public: void insert(int key, T value){/*...*/} T get(int key){/*...*/} //... };

I can write a nifty SWIG interface file like so:

// swig.i %module fancy_container %{ #include "AndyGsFancyContainer.h" %} %include "AndyGsFancyContainer.h" %template(AndyGsFancyCharContainer) AndyGsFancyContainer<char>;

When I run SWIG with swig.i as input, it spits out a bunch of Python C API code that I can compile into a .pyd library (really just a renamed .dll). It also spits out a Python module named fancy_container.py that I can import and call through to my C++.

When a Python script calls into my fancy C++ container, the behind the scenes control flow looks something like this:

You don’t always want SWIG to wrap *every* single nook and cranny of your header file, so there are ways to tell SWIG to ignore those bits:

#ifndef SWIG void RazzleDazzle(std::unique_ptr<std::vector<int>> vals); // make sparks come out of the nearest outlet! #endif

Our RazzleDazzle function is awe-inspiring, but perhaps not something we want Python users to be able to call. Mostly because we don’t want to jump through all the hoops of telling SWIG how to transform unique_ptr and std::vector into types that Python can understand.

Using this preprocessor symbol is perfectly fine for our code, perhaps to some readers’ surprise. SWIG has its own preprocessor that it runs on your C++ code before spitting out what it wants to wrap, and it behaves as if the “SWIG” symbol is always defined at this time.

The “SWIG” symbol never actually gets defined either in the code or on the compiler command line.

We run afoul of the C++ Standard when we decide that we want to hide code from Python specifically (versus, say, C# bindings)

To hide code from a specific target language, SWIG utilizes alternative preprocessor symbols:

template class AndyGsFancyContainer { public: #ifndef SWIGPYTHON // too dangerous in the hands of a Python developer virtual void RazzleDazzle(std::unique_ptr<std::vector<int>> vals) // make sparks come out of the nearest outlet! {/*...*/}; #endif void insert(int key, T value){/*...*/} T get(int key){/*...*/} //... };

Why is this a problem? Because, while the C++ that SWIG spits out doesn’t define the “SWIG” symbol, it *does* define the “SWIGPYTHON” symbol. And we’ve silently violated the almighty One Definition Rule.

A silent violation of the One Definition rule is something you’ll rarely encounter in day-to-day development. Most of the time the compiler or linker will scream and yell at you about multiply-defined symbols until you fix it.

error LNK2005: "int variable_defined_in_header" (?variable_defined_in_header@@3HA) already defined in Source.obj

The silent (but deadly) ones are often a byproduct of a a mistake in your build process. You may run into it when attempting to integrate third party libraries that were compiled differently or on a different version of the same compiler. You may run into it in your own code when you apply different optimization switches to different parts of the code, or if you inconsistently define preprocessor symbols that affect which code gets compiled at all.

The code above runs afoul of the last one. But let’s not get ahead of ourselves; what is the One Definition rule in the first place? It says that you can only have one definition for any type, variable, function, etc. Let’s elaborate.

A .cpp file in your project is its own little universe. It has no care at all for what other .cpp files are doing. It does, however, care about types and functions declared in its dependent header (.h) files. Some of these types and functions it will create definitions for. This is why you end up with .h/.cpp pairs like Foo.h and Foo.cpp; Foo.cpp creates definitions for the declarations in the header.

For the types and functions that the .cpp file does NOT define, though, it’s generally good enough for the .cpp file to assume that someone else is going to create the definitions for them.

When compilation time rolls around, the compiler takes each .cpp file independently and compiles it into its own binary format (.obj in MSVC, .o in gcc). What does it do about the functions that weren’t defined, though?

It marks those symbols in some way to imply nothing is known about them yet. That’s where the linker comes in.

After compilation, the linker gathers up all your object files with the goal of creating your library (.dll or .so). Its job is to aggregate all definitions from all the .cpp files, and make sure that all the “nothing known about yet” areas get resolved.

If there’s a spot where a definition doesn’t get resolved, you get a linker error. The linker error will often say something along the lines of “undefined external symbol ‘blah'” which, following my explanation above, is perfectly clear; you have a .cpp file that relies on a type or function whose definition couldn’t be found by the linker.

Often it’s just because you have a typo, like you declared some function Blarg() in a header, but then typed it as Blargh() in the .cpp file.

Like I said above, the linker takes all the definitions and aggregates them into a single .dll or .so file. What happens if you have two definitions of the same thing?

According to the C++ standard, a single .cpp (more formally, translation unit) shall not contain more than a single definition of anything. From a practical standpoint, this is something the compiler can detect while it’s compiling that .cpp file because it has both definitions readily available.

In this situation, the compiler will be nice to enough to give you an error:

// Foo.h struct Foo { // declaration void DoAThing(); }; // Foo.cpp #include "Header.h" // first definition void Foo::DoAThing() { std::cout << "Doing a thing!\n"; } // second definition, compiler will give you an error void Foo::DoAThing() { std::cout << "Doing a thing!!!\n"; }

That's great and all, but what if you put the two definitions into different .cpp files?

// Foo.h

struct Foo

{

// declaration

void DoAThing();

};

// Foo_one.cpp

#include "Header.h"

// first definition

void Foo::DoAThing()

{

std::cout << "Doing a thing!\n";

}

// Foo_two.cpp

#include "Header.h"

//second definition

void Foo::DoAThing()

{

std::cout << "Second Doing a thing!\n";

}

What will happen? The compiler only knows about a single .cpp file at a time, so it goes on its merry way creating Foo_one.obj and Foo_two.obj. Then the linker comes in and sees that there are two definitions for Foo::DoAThing(). How does the linker know which one is right?

This, of course, is a violation of the One Definition Rule. We are required to provide only a single definition of Foo::DoAThing(), and yet we provided two.

However, we're still in pretty safe* territory because your linker will give you an error along the lines of "One or more multiply defined symbols found".

The real nastiness happens when we start introducing templates (or inline functions, but I won't talk about those so much)

When you use a template in C++, the compiler is going to instantiate the definition of that template right then and there. You don’t think about it, but you’re using templates from the standard template library all the time.

// SourceFile1.cpp std::vector<int> some_data; // SourceFile1.cpp std::vector<int> more_data;

Whoops, now we have a definition of an integer std::vector in two different translation units. Isn’t this an error?

It’s not, because the One Definition Rule makes special exceptions for templates (for obvious reasons). You’re allowed to have more than one definition of a template in your program so long as they’re all the same.

How similar qualifies as “the same”? The standard is pretty strict about this. All definitions must consist of “the same sequence of tokens”. On top of that, each name used by the template must resolve to the exact same entity. So if your template refers to “Bob”, it cannot refer to “namespace_a::Bob” in one translation unit, and “namespace_b::Bob” in another. There are other requirements, but you get the gist of it.

Think about trying to enforce these requirements from the compiler and linker’s standpoint. The linker would have to pretty deeply examine every single template instantiation to enforce these rules.

At scale, you’re probably using many templates over and over. To save on time, the compiler may cache template instantiations for reuse. The linker, upon seeing the same definition twice, may simply discard the newer one.

So, for performance reasons, the C++ Standard dictates that if you **violate** the One Definition Rule with respect to templates, the behavior is simply undefined. Since the linker doesn’t care, violations of this rule are **silent**.

By hand-waving violations of ODR away as undefined, compilers and linkers have an easier job to do, but it makes our job as C++ programmers a little harder (as if C++ development wasn’t hard enough!).

There are many ways you may silently run afoul of ODR in C++, but it almost always boils down to this:

You compile different translation units with different compiler options, and then you use them together.

That’s pretty much it! To avoid ODR violations, you need to be very sure that the same type appears the same and is compiled the same way in every place that type is used.

Let’s take a look at some concrete examples of how one might accidentally run afoul of this (seemingly) simple rule.

The size of a class in C++ is often thought of as the sum of the sizes of its constituent members, but this is not always the case.

The compiler has free reign to (and usually does) insert padding between class members so that they align to certain byte boundaries in order to make fetching them more efficient.

Take this contrived class for example:

struct RedditPost { int user_id; // who made the post double post_time; // seconds since epoch int upvotes; // negative implies downvotes };

If I tell you that sizeof(int) == 4, and sizeof(double) == 8, then what is sizeof(RedditPost)? You might justifiably say it’s 4 + 8 + 4 = 16 bytes, but in reality most compilers will tell you that sizeof(RedditPost) == 24. Why? Alignment rules are complicated, and I’m not going to go into great detail about them, but I’ll say this: the compiler wants to align individual members of a class for easy access, as well as instances of the class as a whole in order to optimize access in the context of an array of that class type. (One of the best articles I ever read about alignment was by Eric S. Raymond in his article “The Lost Art of Structure Packing”) What this boils down to saying is that the compiler treats our RedditPost class like so:

struct RedditPost { int user_id; // 4 bytes (total so far: 4) unsigned char padding1[4]; // post_time needs to start an address divisible by 8, so add 4 bytes (total so far: 8) double post_time; // 8 bytes (total so far: 16) int upvotes; // 4 bytes. no padding because 16 is divisible by 4 (total so far: 20) unsigned char padding2[4]; // add 4 bytes to get to 24, which is the nearest integer evenly divisble by 8 (total: 24 bytes) };

(See the comments within the code for why).

For the sake of completeness, note that we could “fix” things by grouping like types:

struct RedditPost { int user_id; // 4 bytes (total: 4) int upvotes; // 4 bytes (total: 8) double post_time; // 8 bytes (total 16), no need for padding because we're already on an 8 byte boundary };

But let’s ignore that for a minute.

The compiler doesn’t have to insert this padding. In fact, you sometimes want to tell the compiler NOT to add any padding so that your class is exactly the size of its

constituent members. You might need this if you are modeling a network packet following the standardized description in an official RFC.

For this reason, all compilers have some option for you to tell the compiler not to add any padding. (In MSVC you can do this via the compiler switch /Zp.)

So what happens if we have two interdependent libraries, library1.cpp and library2.cpp, that both use our RedditPost class, and we compile like so:

library1.cpp -o library1.dll /Zp1 library2.cpp -o library2.dll

(Note that MSVC, gcc, and clang all have ways to specify padding directly in your source files. In MSVC this is via #pragma pack(N), and in gcc/clang it’s __attribute__((packed, aligned(1))) )

Library1 thinks that RedditPost is 16 bytes and Library2 thinks its 24 bytes. Why is this a problem? If one DLL creates a RedditPost instance that the other is supposed to access, that access will not be correct. The resulting data that is accessed is undefined, and if you’re lucky then you’ll get a segfault instead of silent garbage.

Perhaps some well-intentioned developer added some diagnostic information to a class like so:

struct Person { std::string name; #if defined(DEBUG) int callCount = 0; #endif void Print(){ std::cout << "name is " << name << std::endl; #if defined(DEBUG) std::cout << "This function has been called " << callCount << " times\n"; #endif } };

What happens in this situation?

library1.cpp -o library1.dll library2.cpp -o library2.dll -DDEBUG

Since we’ve mixed a debug and non-debug compiled library, the sizeof(Person) is different because in library2 it has an extra integer member.

Hiding virtual member functions behind preprocessor flags can be similarly problematic; the compiler has to build a virtual table for your classes that have virtual members, which changes the class’ size and layout. I’ll talk a little more about this in a second.

Recall that the definitions of your classes must be symbol-for-symbol identical irrespective of translation unit. (Mark Nelson wrote about an interesting experience he had with violating this rule in a less malignant way in 2014.)

The list goes on and on, with the main theme being that the code itself is fine but the build process is broken.

Since ODR violations are undefined behavior, there’s no set way that they will surface. Oftentimes it may be a segfault that is indistinguishable from e.g., accessing one past the end of an array.

Like in Mark Nelson’s case, it may just be that you are expecting output A from some function, but instead see output B, and it has nothing to do with branching.

When I hid my virtual function RazzleDazzle from SWIG’s C++ translation unit, it affected how SWIG built its virtual table. The way this manifested was via a call from a base class pointer to a derived class’ overrided method. Instead of entering the RazzleDazzle function, control entered a different virtual function altogether! On top of that, these functions had compatible signatures. The result was that completely unexpected code was being executed, which eventually put an object into a bad state where the runtime finally, thankfully, segfaulted.

Your compilers have tools to help you out here. Remember earlier how I said that linkers discard multiple definitions as an optimization technique?

In GCC you can basically turn that off with the flto-odr-type-merging switch. And when you turn that switch on, since GCC 5.1 there is the (poorly documented at this time of writing) wodr switch that gets turned on with it to give you the actual ODR violation warnings.

Clang’s wodr switch attempts to do the same thing as GCC’s flag.

While I haven’t used it, Google’s ASAN appears to be the perfect tool for the job, performing granular testing of multiply defined symbols to ensure they’re the same.

In MSVC, there is the officially unofficially supported /d1reportSingleClassLayout that prints out a handy ASCII representation of your class layout.

1>class RedditPost size(16): 1> +--- 1> 0 | user_id 1> 4 | upvotes 1> 8 | post_time 1> +---

I used this tool successfully to prove my SWIG-related ODR violation, but it’s only really useful after the ODR violation has manifested itself.

To detect ODR violations, Microsoft recommends #pragma detect_mismatch (Stephan T Lavavej gives a better description of this pragma around the 29:00 mark in this video). In my opinion, it’s pretty intrusive.

The One Definition Rule permeates all the C++ code you write, subconsciously or not, and it’s just *so easy* to violate it. Fortunately, the compiler can catch a lot of the simple mistakes, but for a staggering amount of code the compiler and linker tell you that you’re on your own. What’s worse is the embarrassing lack of built-in diagnostic support your linkers provide in the current landscape.

Nailing down your build process should make things an order of magnitude safer. If you don’t already have an automated build, get one. I cannot stress enough how important it is to give your build some TLC. If your management balks at hiring *at least* one FTE to manage your build, remind them that it’s the only reason you have a product, no build = no product.

(*for inlined functions maybe not so safe)

]]>As a C++ project grows and matures, the following line is inevitably spoken: “The build is too slow”. It doesn’t really matter how long the build actually takes; it is just taking longer than it was. Things like this are an inevitability as the project grows in size and scope.

In this post I’ll talk specifically about my recent use of **forward declarations** to vastly improve build times on one of those projects, and how you can too.

What are forward declarations?

A forward declaration in C++ is when you declare something before its implementation. For example:

class Foo; // a forward declaration for class Foo // ... class Foo{ // the actual declaration for class Foo int member_one; // ... };

You can forward declare more than just a class, but in this article I’m only referring to class forward declarations.

When you forward declare a class, the class type is considered to be “incomplete” (the compiler knows about the name, but nothing else). You cannot do much with an incomplete type besides use pointers (or references) to that type, but pointers are all that we will need. (More on that in a bit.)

When the compiler is creating your class, it doesn’t actually care about very much. Its goal is ultimately to determine the class’ layout in memory, and to do that, it needs to know the size of your class’ data members. For example:

struct Foo{ int a; int b; };

Our class `Foo`

has two integer members. When the compiler creates a layout for this class, it will approximately allocate `sizeof(int) + sizeof(int)`

contiguous space for it. (Padding and custom-alignment directives notwithstanding).

When Foo has a dependency on `Bar`

, then the compiler needs to know the size of `Bar`

as it compiles `Foo`

:

struct Bar{ int a; }; struct Foo{ int a; int b; Bar c; };

In the code above, when the compiler reaches `Foo`

, it already knows what the size and alignment of `Bar`

is. (“Alignment” is a property of a class that dictates how much space the compiler will allocate for it. A thorough discussion is outside the scope of this article, but The Lost Art of C Structure Packing gives it a good treatment).

If we reversed the order like so:

struct Foo{ int a; int b; Bar c; }; struct Bar{ int a; };

We would likely end up with a compiler error because the compiler cannot possibly determine a layout for `Foo`

without first knowing the layout for `Bar`

. If `Bar`

was in its own header file, we would need to include it in Foo’s header file:

#include "Bar.h" struct Foo{ int a; int b; Bar c; };

So now Foo.h has a dependency on Bar.h.

And what if we complicate `Bar`

to have another member, `Baz`

?

#include "Baz.h" struct Bar{ int a; Baz b; };

Now Bar.h depends on Baz.h. Foo.h directly depends on Bar.h, and indirectly on Baz.h. You can see the beginnings of a “dependency graph” forming here. As your codebase grows, you can imagine how large these dependency graphs might get.

Why is this a bad thing? The C++ compiler takes a simplistic approach to handling these dependency graphs — during the “pre-processing” stage of compilation it just copy-pastes one header into another, collapsing the graph into one gargantuan source file. Just check the documentation for what “#include” actually does!

The `Foo`

class might not actually care at all about the `Baz`

class; the `Bar`

class (which has a `Baz`

member) may only use it internally. So at compilation time,`Foo`

is paying for the compiler to to parse something it doesn’t even care about! This violates one of the core tenets of C++: “Only pay for what you use”.

What’s worse, is if Baz.h changes, then the compiler must recompile `Foo`

! Not only does it take longer to compile `Foo`

, but we must also compile more often. Good grief.

We’ve decided that we don’t like how Foo.h depends on Baz.h through Bar.h, so we decide to solve the problem with a little forward declaration. If `Bar`

forward-declares `Baz`

, and then uses a pointer to `Baz`

, then the compiler no longer needs to know anything about the size and layout of `Baz`

when creating a layout for `Bar`

:

//Bar.h class Baz; struct Bar{ int a; Baz* b; }; // Bar.cpp #include "Bar.h" #include "Baz.h" // ... (use our pointer to Baz)

This works because, from a size perspective, all pointers are exactly the same. That means we don’t need to know the full definition of `Baz`

until we try to access one of its members. `Bar`

still depends on `Baz`

, since the translation unit is per .cpp file, but the dependency is left out of Bar.h.

This causes something interesting to happen to Foo.h:

// Foo.h #include "Bar.h" struct Foo{ int a; int b; Bar c; }; // Foo.cpp #inlude "Foo.h" // ...

Nowhere in the included files for Foo.h will we find Baz.h. This means that:

- if Baz.h changes, only Bar.cpp will recompile
- the preprocessed source file for
`Foo`

will not include the contents of Baz.h

Now that there’s less work for the preprocessor and compiler to do for `Foo`

, it goes faster. It takes up less memory. It needs to be rebuilt *less often*! With forward declarations we’ve improved both full rebuilds and incremental rebuilds.

The Google style guide recommends against using forward declarations, and for good reasons:

- If someone forward declares something from namespace std, then your code exhibits undefined behavior (but will likely work).
- Forward declarations can easily become redundant when an API is changed such that it’s unavoidable to know the full size and alignment of a dependent type. You may end up with both a #include and a forward declaration in your header in this case.
- There are some rare cases where your code may behave differently. You may not be bringing in additional function overloads or template specializations that you previously relied upon, or you lose inheritance information which can cause a different overload or specialization to be called in the first place.

Also, notice that when we transformed a class member to a pointer, we likely had to start dealing with heap-allocated memory for each instance of `Foo`

. If `Foo`

needs to access its `Bar`

pointer very often, the small overhead of a pointer indirection can add up. It’s also not very cache friendly; members of `Foo`

are not together in memory, which could cause a cache-miss when trying to access a member of `Bar`

at runtime (very expensive).

Like any other technique, forward declarations must be used carefully.

The Modules TS may present another safer alternative to improving build times by removing the need for the preprocessor to paste in entire headers over and over again for different translation units.

I was recently tasked with the onerous job of “improve the build” for a mid-sized code base.

How did I decide to start with forward declarations?

For starters, forward declarations are low-hanging fruit as far as improving build time goes. It’s much easier to routinely go through the code adding forward declarations than it is to change an interface, pull files out into new libraries, or build faςades. In a short, they are a lazy programmer’s best friend.

Also, having had some experience with the code base, I knew that the code had:

- many classes that only used pointers to our types
- many unnecessary includes (for historical reasons, laziness, or naiveté…)
- plenty of automated unit tests to ensure I didn’t accidentally break something

I also had a bit of a hint that our header dependencies were a little bloated when I found that Visual Studio’s built-in dependency graph generator consistently crashed when I tried to run it on our code base. Still, I didn’t have any real proof that forward declarations would actually improve anything at all. Just an intuition. So we decided to be Agile about it.

I took the top 5-10 headers that were most often included, and I made it a challenge to replace them with forward declarations wherever I could. If doing this improved things at all, then I could go ahead and take a more comprehensive approach.

What we found was that we could do a full rebuild of our C++ **10% faster!** Along the way I gained even more confidence that a more comprehensive approach would yield additional gains.

Before I continue, here are some of the pain points you’ll discover when you

want to set about replacing your headers with forward declarations:

- other random files will start breaking from missing includes (that they

used to indirectly have).

This is frustrating, but on a positive note, it forces your codebase to follow a best practice — a translation unit should be self-contained. Never should you rely on an indirect include because future refactoring efforts will needlessly break your code and cause headaches for other programmers.

Because you will likely have to fix unrelated code…

- work like this ends up touching way more files than you initially thought.

This is a bit of a nightmare for your code reviewers. Everyone groans when they see hundreds of changed files in a code review. The solution to this is to communicate to your reviewers what’s going on ahead of time; they don’t need to look at every single file in the review. Perhaps a random sampling, or just some of the more important headers.

After I showed the team our 10% speedup on rebuild, they were as hungry as I was to see more. I got the go-ahead to spend a week touching as many headers as I could for forward declaration work. At the end of it all, I had gone though perhaps two-thirds of all header files. The result? An additional 30% faster compile time for a total of **40% faster C++ compile times**! (Plus Visual Studio stopped crashing when generating the dependency graph).

This result was quite surprising, I had thought we would already start to see diminishing returns after the first go. I must admit that I didn’t have the luxury to be purely scientific; I was also removing unnecessary headers along the way, but I will assert that the work was predominately forward declarations.

Best practices have their flip-sides. The Google style guide (and a few of my coworkers) made some good points against the usage of forward declarations, but the real world results of the technique are undeniable. All the developers are happier; the build => test => run cycle is faster for them. The automated builds are faster. The compiler’s memory usage is down.

The point about memory usage becomes more important for large parallel builds. In fact, we were occasionally running out of memory, and this work has abated those issues (for the time being; forward declarations are really just a band-aid on an architectural issue).

Time is money, and in a larger project with many well-paid people, saving even a small amount of time has an economy-of-scale effect; provably thousands or hundreds of thousands of dollars saved in development time.

]]>Oftentimes I see questions StackOverflow asking something to the effect of

The canonical, final, never-going-to-change answer to this question is a thorough

C++ is a statically-typed language. A vector will hold an object of a single type, and only a single type.

Of course there are ways to work around this. You can hide types within types! In this post I will discuss the existing popular workarounds to the problem, as well as describe my own radical new heterogeneous container that has a much simpler interface from a client’s perspective.

boost::variant, for one,(and now std::variant) has allowed us to specify a type-safe union that holds a flag to indicate which type is “active”.

The downside to variant data structures is that you’re forced to specify a list of allowed types ahead of time, e.g.,

std::variant<int, double, MyType> myVariant;

To handle a type that could hold “any” type, Boost then created

boost::any (and with C++17 we’ll get std::any)

This construct allows us to really, actually hold any type. The downside here is that now the client generally has to track which type is held inside. You also pay a bit of overhead in the underlying polymorphism that comes with it, as well as the cost of an “any_cast” to get your typed object back out.

The real “but…” portion of the answer to these StackOverflow questions will state that you *could* use polymorphism to create a common base class for any type that your vector will hold, but that is ridiculous, especially if you want to hold primitive types like int and double. Assuming your design isn’t totally off-base, what you really wanted to do was use a std::variant or std::any and apply the Visitor Pattern to process it.

The visitor pattern implementation for this scenario basically works like this:

- Create a “callable” (a class with an overloaded or templated function call operator, or a polymorphic lambda) that can be called for any of the types.
- “Visit” the collection, invoking your callable for each element within.

I’ll demonstrate using std::variant.

First we start by creating our variant:

std::variant<int, double, std::string> myVariant; myVariant = 1; // initially it's an integer

Then we write our callable (our visitor):

struct MyVisitor { void operator()(int& _in){_in += _in;} void operator()(double& _in){_in += _in;} void operator()(std::string& _in){_in += _in;} };

This visitor will double the element that it visits. That is, “1” shall become “2”, or “MyString” shall become “MyStringMyString”.

Next we invoke our visitor using std::visit:

std::visit(MyVisitor{}, myVariant);

That’s it! You’ll notice that all the code for each type in MyVisitor is identical, so we could replace it all with a template. Let’s go with that template idea and create a visitor that will print the active element:

struct PrintVisitor { template <class T> void operator()(T&& _in){std::cout << _in;} };

Or we could similarly have created an identical polymorphic lambda like so:

auto lambdaPrintVisitor = [](auto&& _in){std::cout << _in;};

Applying our PrintVisitor or lambdaPrintVisitor will have the same effect:

std::visit(PrintVisitor{}, myVariant); // will print "2" std::visit(lambdaPrintVisitor, myVariant); // will also print "2"

And for a string:

myVariant = "foo"; std::visit(MyVisitor{}, myVariant); // doubles to "foofoo" std::visit(lambdaPrintVisitor, myVariant); // prints "foofoo"

Here’s a working demo on Wandbox that demonstrates the above code snippets.

Moving from a single variant to a collection of variants is pretty easy — you shove the variant into a std::vector:

std::vector<std::variant<int, double, std::string>> variantCollection; variantCollection.emplace_back(1); variantCollection.emplace_back(2.2); variantCollection.emplace_back("foo");

And now to visit the collection, we apply our visitors to each element in the collection:

// print them for (const auto& nextVariant : variantCollection) { std::visit(lambdaPrintVisitor, nextVariant); std::cout << " "; } std::cout << std::endl; // double them for(auto& nextVariant : variantCollection) { std::visit(MyVisitor{}, nextVariant); } // print again for (const auto& nextVariant : variantCollection) { std::visit(lambdaPrintVisitor, nextVariant); std::cout << " "; } std::cout << std::endl;

Here’s a live demo of that code.

Now we begin to see how tedious it is to write those loops manually, so we encapsulate the vector into a class with a visit mechanism. We make it a template so that the client can specify the underlying types that go into the variant:

template <class T> struct VariantContainer { template <class V> void visit(V&& visitor) { for (auto& object : objects) { std::visit(visitor, object); } } using value_type = std::variant; std::vector<value_type> objects; };

And then our code from above to visit it is nicely shortened to:

VariantContainer<int, double, std::string> variantCollection; variantCollection.objects.emplace_back(1); variantCollection.objects.emplace_back(2.2); variantCollection.objects.emplace_back("foo"); // print them variantCollection.visit(lambdaPrintVisitor); std::cout << std::endl; // double them variantCollection.visit(MyVisitor{}); // print again variantCollection.visit(lambdaPrintVisitor); std::cout << std::endl;

At this point we’re thinking we are pretty smart. The client can simply direct us on how heterogeneous they want that container to be! And then we start wrapping up the vector a little nicer, hiding the underlying storage, and replicating the rest of std::vector’s interface so someone can call “emplace_back” directly on our container etc, and we’re rolling!

heterogeneous_container c; c.push_back('a'); // char c.push_back(1); // int c.push_back(2.0); // double c.push_back(3); // another int c.push_back(std::string{"foo"}); // a string // print it all c.visit(print_visitor{}); // prints "a 1 2 3 foo"

A C++ programmer would tell you. “Unless you’re doing a bunch of very expensive cast-and checks somewhere” (e.g. with std::any)

But I will now demonstrate how such an interface is possible (without RTTI) using C++14 and C++17 features (the C++17 features are not necessary, just nice to haves).

(*****Authors note:** The following is intended as a toy, not to be used in any real implementation. It has a gaping security hole in it. Think of this more as an exercise in what we can do with C++14 and C++17 ***)

Now, admittedly. We can never be as flexible as a duck-typed language such as Python. We cannot create new types at runtime and add them to our container, and we can’t easily iterate over the container; we must still use a **visitor pattern**.

Let’s begin with a feature that was added in C++14: **Variable templates**

If you’ve done any templating in C++ before, you’re familiar with the typical syntax to template a function so that it can operate on many types:

template <class T> T Add(const T& _left, const T& _right) { return _left + _right; }

But variable templates allow us to interpret a *variable* differently depending on a type. Here’s an example where we can interpret the mathematical constant π (pi) differently:

template<class T> constexpr T pi = T(3.1415926535897932385);

And now we can explicitly refer to pi<double> or pi<float> to more easily express the amount of precision we need.

Recall that when you instantiate a template, you are really telling the compiler to copy-paste the template code, substituting for each type. Variable templates are the same way. That is, “pi<double>” and “pi<float>” are two separate variables.

What happens when we move a variable template into a class?

Well, the rules of C++ dictate that such a variable template becomes static, so any instantiation of the template will create a new member across all class instances. But with our heterogeneous container we want instances to only know or care about the types that have been used *for that specific instance*! So we abuse the language and create a mapping of container pointers to vectors:

namespace andyg{ struct heterogeneous_container{ private: template<class T> static std::unordered_map<const heterogeneous_container*, std::vector<T>> items; public: template <class T> void push_back(const T& _t) { items<T>[this].push_back(_t); } }; // storage for our static members template<class T> std::unordered_map<const heterogeneous_container*, std::vector<T>> heterogeneous_container::items; } // andyg namespace

And now suddenly we have a class which we can add members to *after* creating an instance of! We can even declare a struct later and add that in, too:

andyg::heterogeneous_container c; c.push_back(1); c.push_back(2.f); c.push_back('c'); struct LocalStruct{}; c.push_back(LocalStruct{});

There are quite a few shortcomings we still need to address first before our container is really useful in any way. One of which is the fact that when an instance of andyg::heterogeneous_container goes out of scope, all of its data still remains within the static map.

To address this, we will need to somehow track which types we received, and delete the appropriate vectors. Fortunately we can write a lambda to do this and store it in a std::function. Let’s augment our push_back function:

template<class T> void push_back(const T& _t) { // don't have it yet, so create functions for destroying if (items<T>.find(this) == std::end(items<T>)) { clear_functions.emplace_back( [](heterogeneous_container& _c){items<T>.erase(&_c);}); } items<T>[this].push_back(_t); }

Where “clear_functions” becomes a local member of the class that looks like this:

std::vector<std::function<void(heterogeneous_container&)>> clear_functions;

Whenever we want to destroy all elements of a given andyg::heterogeneous_container, we can call all of its clear_functions. So now our class can look like this:

struct heterogeneous_container { public: heterogeneous_container() = default; template<class T> void push_back(const T& _t) { // don't have it yet, so create functions for printing, copying, moving, and destroying if (items<T>.find(this) == std::end(items<T>)) { clear_functions.emplace_back([](heterogeneous_container& _c){items<T>.erase(&_c);}); } items<T>[this].push_back(_t); } void clear() { for (auto&& clear_func : clear_functions) { clear_func(*this); } } ~heterogeneous_container() { clear(); } private: template<class T> static std::unordered_map<const heterogeneous_container*, std::vector<T>> items; std::vector<std::function<void(heterogeneous_container&)>> clear_functions; }; template<class T> std::unordered_map<const heterogeneous_container*, std::vector<T>> heterogeneous_container::items; } // andyg namespace

Our class is starting to become pretty useful, but we still have issues with copying. We’d have some pretty disastrous results if we tried this:

andyg::heterogeneous_container c; c.push_back(1); // more push_back { andyg::heterogeneous_container c2 = c; }

The solution is fairly straightforward; we follow the pattern as when we implemented “clear”, with some additional work to be done for a copy constructor and copy assignment operator. On push_back, we’ll create another function that can copy a vector<T> from one heterogeneous_container to another, and in copy construction/assignment, we’ll call each of our copy functions:

struct heterogeneous_container { public: heterogeneous_container() = default; heterogeneous_container(const heterogeneous_container& _other) { *this = _other; } heterogeneous_container& operator=(const heterogeneous_container& _other) { clear(); clear_functions = _other.clear_functions; copy_functions = _other.copy_functions; for (auto&& copy_function : copy_functions) { copy_function(_other, *this); } return *this; } template<class T> void push_back(const T& _t) { // don't have it yet, so create functions for copying and destroying if (items<T>.find(this) == std::end(items<T>)) { clear_functions.emplace_back([](heterogeneous_container& _c){items<T>.erase(&_c);}); // if someone copies me, they need to call each copy_function and pass themself copy_functions.emplace_back([](const heterogeneous_container& _from, heterogeneous_container& _to) { items<T>[&_to] = items<T>[&_from]; }); } items<T>[this].push_back(_t); } void clear() { for (auto&& clear_func : clear_functions) { clear_func(*this); } } ~heterogeneous_container() { clear(); } private: template<class T> static std::unordered_map<const heterogeneous_container*, std::vector<T>> items; std::vector<std::function<void(heterogeneous_container&)>> clear_functions; std::vector<std::function<void(const heterogeneous_container&, heterogeneous_container&)>> copy_functions; }; template<class T> std::unordered_map<const heterogeneous_container*, std::vector<T>> heterogeneous_container::items; } // andyg namespace

And now our container is starting to become useful. What other function do you think you could implement to follow the pattern we established with “clear” and “copy”? Perhaps a “size” function? A “number_of<T>” function? A “gather<T>” function to gather all elements of type T for that class?

We can really do anything with our andyg::heterogeneous_container yet, because we cannot iterate over it. We need a way to *visit* the container so that useful computation can be performed.

Unfortunately we cannot do it as easy as calling std::visit on a std::variant. Why? Because a std::variant implicitly advertises the types it holds within. Our andyg::heterogeneous_container does not (and cannot). So we need to push this advertisement onto the visitor. That is, the *visitor* will advertise the types it is capable of visiting. Of course this would be a pain point to using an andyg::heterogeneous_container, but alas, tradeoffs must be made.

So how do we make it easy for the client to write a visitor? We can use some lightweight inheritance to automatically provide client visitors a way to publish the types they can visit.

First we’ll start with a generic “type list” kind of class that takes advantage of a the variadic template feature that arrived with C++11:

template<class...> struct type_list{};

And then we’ll create a templated base class for visitors that defines a type_list:

template<class... TYPES> struct visitor_base { using types = andy::type_list<TYPES...>; };

How can we use this base class? Let’s demonstrate with an example.

Say I have some heterogeneous container:

andyg::heterogeneous_container c; c.push_back(1); // int c.push_back(2.2); // double

and now I want to visit this class so that I can double the members. Similar to how we wrote a visitor class for std::variant, we’ll write a structure that overloads the function call operator for each type. This structure should inherit from our “visitor_base”

struct my_visitor : andyg::visitor_base<int, double> { void operator()(int& _i) { _i+=_i; } void operator()(double& _d) { _d+=2.0; } };

And now our visitor implicitly defines a type called “types” that is templated on “int” and “double”.

We could rewrite our visitor with a template instead, like before:

struct my_visitor : andyg::visitor_base<int, double> { template<class T> void operator()(T& _in) { _in +=_in; } };

Although you wouldn’t be allowed to declare this struct locally within a function on account of the template.

Next up: writing the “visit” method of the andy::heterogeneous_container class

Like we did for our “VariantCollection” above, the visitor pattern basically amounts to calling std::visit for each element in the container. Problem is, std::visit won’t work here so we have to implement it ourself.

Our strategy will be to use the types published by the visitor class and invoke the function call operator for each type. This is easy if we use some helper functions. But first, the main entry point, “visit()”:

template<class T> void visit(T&& visitor) { visit_impl(visitor, typename std::decay_t<T>::types{}); }

Note that I don’t really constraint the template at all. So long as T has a type named “types”, I use it. In this way, one wouldn’t be constrained to use an andyg::visitor_base.

Next you’ll notice that the call to “visit_impl” not just passes on the “visitor” received in “visit”, but it additionally tries to *construct an instance* of “T::types”. Why? Well the reasoning is similar to the reasoning behind tag dispatching, but not quite. We simply want an easy way to pass our typelist (“types”) as a template parameter to “visit_impl”.

I think it’s better explained if we look at the declaration for “visit_impl”:

template<class T, template<class...> class TLIST, class... TYPES> void visit_impl(T&& visitor, TLIST<TYPES...>)

Like before, we receive our visitor as “T”, but then we use a template template argument to indicate that the incoming “types” object itself is templated, and that those types it is templated on can be referred to as “TYPES”. When we call “visit” with the “my_visitor” defined above, the resulting type substitution will appear as:

void visit_impl(my_visitor& visitor, andyg::type_list<int, double>)

The second argument is unnamed, indicating that it is unused and really only there to help us out in our metaprogramming. An optimizing compiler should be able to optimize out any real construction of the type, but even if it doesn’t we only lose 1 byte because the class itself is empty.

Essentially we want to say “for each type in TYPES, iterate over the associated vector<type> and visit each element”. However C++ doesn’t allow “iteration” over types. In the past we’ve been forced to use tail recursion — receive a parameter pack like “<class HEAD, class… TAIL>” process “HEAD”, then recurse for “TAIL”, eventually reaching a base case of a single type. Such expansion creates a lot of work for the compiler, and the separation of functions makes the code that much harder to read.

An alternative that was done in C++11 and C++14 was called “simple expansion” and involved abusing the comma operator, and placing a function call into an array initializer list, for which the array was cast to void so that the compiler would not actually allocate. It sounds complicated because it is. It would have looked like this:

```
using swallow = int[];
(void)swallow{0, (void(visit_impl_help<T>(visitor), 0)...};
```

Fortunately for us, C++17 made such hackery redundant with the introduction of fold expressions, which, for simplicity’s sake you can imagine as calling a single function for each type in a parameter pack. The syntax does take a little getting used to, but here’s what the implementation of “visit_impl” looks like:

template<class T, template<class...> class TLIST, class... TYPES> void visit_impl(T&& visitor, TLIST<TYPES...>) { (..., visit_impl_help<std::decay_t<T>, TYPES>(visitor)); }

This is formally called a unary left fold and what happens is that the resulting expression for our “my_visitor” will cause it to be expanded as such:

visit_impl_help<T, int>(visitor), visit_impl_help<T, double>;

This approach also abuses the comma operator. You can see that the introduction of more types in “TYPES” would result in more commas and expressions between those. The result is a function call for each type.

You’ll again notice that I’m calling *yet another helper* and I promise it’s the last one! I wanted a clean looking function to iterate over a vector. Here’s its definition:

template<class T, class U> void visit_impl_help(T& visitor) { for (auto&& element : items<U>[this]) { visitor(element); } }

By overloading the function call operator in our visitor class, we can treat “visitor” itself as a function here and simply call it for each element. That’s pretty neat! In the final demo code, I additionally added a static_assert (which itself uses some complicated metaprogramming to detect a proper overloaded function call operator) to visit_impl_help so that clients don’t get stuck in template error hell if they miswrote their visitor class.

At this point we can basically do anything we want with the class, creation, copying, destroying, assignment, and visiting (everything else is gold plating). We can even declare a new class after creating our container instance, and then add an instance of that new class into the container. Whoa.

Here’s a sample code run:

// my_visitor defined as above, and print_visitor pretty obvious auto print_container = [](andyg::heterogeneous_container& _in){_in.visit(print_visitor{}); std::cout << std::endl;}; andyg::heterogeneous_container c; c.push_back('a'); c.push_back(1); c.push_back(2.0); c.push_back(3); c.push_back(std::string{"foo"}); std::cout << "c: "; print_container(c); andyg::heterogeneous_container c2 = c; std::cout << "c2: "; print_container(c2); c.clear(); std::cout << "c after clearing c: "; c.visit(print_visitor{}); std::cout << std::endl; std::cout << "c2 after clearing c: "; print_container(c2); c = c2; std::cout << "c after assignment to c2: "; print_container(c); my_visitor v; std::cout << "Visiting c (should double ints and doubles)\n"; c.visit(v); std::cout << "c: "; print_container(c); struct SomeRandomNewStruct{}; c.push_back(SomeRandomNewStruct{});

And its output:

c: 1 3 2 a foo c2: 1 3 2 a foo c after clearing c: c2 after clearing c: 1 3 2 a foo c after assignment to c2: 1 3 2 a foo Visiting c (should double ints and doubles) c: 2 6 4 a foo

And there you have it. Here’s a live running demo on WandBox. The live demo includes some additional tests and exercise of some “nice-to-have” features of a heterogeneous container. (For you experts, of course the templating is simplified and in a production environment there should be better forwarding semantics, static_asserts, and flexibility in the templating).

The primary difference in storage between an andyg::heterogeneous_container and a std::vector<std::any> is that, in an andyg::heterogneous_container, all elements of the same type are stored contiguously. This allows extremely fast iteration as compared to a std::vector<std::any>, which gets bogged down by a number of try-catches during visitation. By “extremely fast”, I mean that it’s actually an order of magnitude faster. Run the timings yourself here.

C++14 and C++17 offer us some pretty powerful new metaprogramming tools in the form of template variables, variadic lambdas, standardized variants, and fold expressions, among others. By experimenting with these we can begin pushing the boundaries of what we thought was possible in C++. In this post we created a new kind of visitor pattern that has very nice syntax and won’t be slowed down by any run-time type inference (all the visiting knows exactly which types it’s iterating over already), but also has some drawbacks (the least of which is the gaping security hole where one andyg::heterogeneous_container can see the contents of *any other* andyg::heterogeneous_container).

Even though I consider the container I just wrote to be an incomplete plaything that is only interesting only for the concepts it demonstrates, depending on your use case you might actually want to “borrow” it to satisfy yet another impossible requirement your customer gives you.

]]>When you are watching a digitally-rendered battle onscreen in the latest blockbuster movie, you don’t always think about the “camera” moving about that scene. In the real-world, cameras have a *field of view *that dictates how much of the world about them they can see. Virtual cameras have a similar concept (called the *viewing frustum*) whereby they can only show so much of the digital scene. Everything else gets chopped off, so to speak. Because rendering a digital scene is a laborious task, computer scientists are very interested in making it go faster. Understandably, they only want to spend time drawing pixels that you’ll see in the final picture and forget about everything else (outside the field of view, or occluded (hidden) behind something else).

Our friends in the digital graphics world make heavy use of *planes* everyday, and being able to test for plane-plane intersection is extremely important.

In this post, I’ll try to break down what a plane is in understandable terms, how we can create one given a triangle, and how we would go about testing for the intersection between two of them.

(This article was originally written in and transcribed here. To get a hard copy that you can also cite, grab the pdf here)

Say you’re given a triangle in 3D space. It consists of three points, , , and that each have , , and components:

For simplicity’s sake, we’ll assume that, moving “counter-clockwise”, the triangle’s points go in the order , , and then (and then back to ).

A triangle is distinctly “flat”. It’s like a perfectly constructed Dorito. Also, no matter which way you turn it and rotate it, there’s always a definite front and back. That is, if you extended the corners of your Dorito off towards infinity, then you could completely bisect the room, your state, the planet, the universe!

This ability to split a space in two is a salient feature of a plane, and we can construct one with the triangle we’ve just defined. In the next few sections we’ll go more into depth in exactly how to do that.

In the previous section I said that our gargantuan flat Dorito had two distinct sides, front and back. That means you could draw a straight line from the “back” that eventually comes out the “front”. Imagine that our Dorito is so large that it is its own planet orbiting the Sun. You could hop in a space ship, fly up to that Dorito, take one small step for man, and then plant your flag on top. In geometric terms, your flag pole would be the **normal vector** to the plane’s (Dorito’s) surface.

A **normal vector** to a plane is a line that is perfectly perpendicular to any other line drawn entirely within the plane. We call it a vector because it has a direction associated with it. That is, the flag you planted on the Dorito planet points “up”, but you could just have easily landed on the other side and planted the flag pointing “down”, and it would still be perfectly valid; all we need is just to choose one.

Our normal vector will also have , , and components, but we’ll label them , , and to distinguish them from the points in the triangle:

At this point we *have* a triangle, but we only *know about* what a normal vector is. How do we compute the normal vector given a triangle’s points?

Our normal vector is perpendicular to any line we draw in the plane. In computer graphics terms, we say the normal vector is “orthogonal” to our plane. An nice example of orthogonality is that the Y-axis is perpendicular to the X-axis (and the Z-axis is perpendicular to both!).

In geometry, we know how to obtain a perpendicular vector (a normal vector) so long as we have 2 non-parallel vectors in the plane. But first we need to get those two vectors in the plane. So how do we do that?

Remember from before that we defined our plane originally using three points, , , and . We can use these points to compute our vectors.

Think about it this way. The center of the earth is position , and you’re standing at some random point . Your friend Andy is standing at position . Vectors imply a *direction*, so how do you get a direction from you to your friend? Subtraction! You can compute your friend’s offset from yourself by subtracting your own position from Andy’s: . The resulting offset is a direction out from the center of the earth in the same direction as the direction from you to your friend.

If your other friend Bernhard is standing at position , you’d do the exact same thing to get a vector representing the direction from you to him. So long as all three of you are not standing in a line, you now have two non-parallel vectors. Fortunately, our original points , , and define a triangle, and none of the sides of that triangle are parallel, so we’re good.

In our example, you are standing at , Andy is at , and Bernhard is at . We can compute two non-parallel vectors, as

In geometry there is this fancy thing called a **cross product** that enables us to compute an orthogonal (perpendicular) vector so long as we are given two input vectors. What does this mean? Well, there are a couple different interpretations, and I’ll work up to them after first discussing something called a *dot product*.

The cross product is a special kind of multiplication. The multiplication you were taught in grade school resulted in numbers many times larger than they were before. For example, , which is quite bit larger than either or . The “problem” here is that doesn’t have any sense of **direction** to it!

Our grade school multiplication has a ready analog in the world of vectors (things that have direction) called the **dot product**: we start by multiplying the components! Afterwards we add them all together to get a single value. The dot product can be thought of as “multiplication in the same direction” because we multiply the components together, the components together, as well as the components.

An alternate interpretation of the dot product is that it’s a measure of just “how much” in the same direction your two vectors are:

Where and are the **magnitudes** of the vectors:

and is the angle between them.

When is 0, the vectors are in the same direction (parallel), meaning the cosine is (it’s largest value) and therefore the result is the largest possible dot product between the two vectors. (Data miners love to use this and call it “cosine similarity”). But if the vectors were orthogonal (perpendicular), the dot product would be zero.

The **cross product** is similar to the dot product except it’s more of a measure of how *different* two vectors are. The data miners might choose the following representation of a cross product:

When the term is 0, the vectors are in the same direction (parallel), which maximizes the dot product, but the cross product is zero! And when is 90 degrees ( radians), the vectors are perpendicular and the cross product is maximized! (The dot product, sadly, becomes 0).

There exists another way to compute a cross product where the result is not a single number, but rather a vector that is orthogonal to both the input vectors:

**(1)**

It’s not the most intuitive computation. Personally, I enjoyed the “xyzxyz” explanation at

Better Explained.

All we need to know is that the result of this equation is a shiny new orthogonal vector.

We now have two vectors in our plane, and , which we computed using our triangle’s points , , and . We also know how to take two vectors and compute an orthogonal vector. Our *normal vector* is exactly that; an orthogonal vector to our plane, so when we apply the cross product to and , we obtain our normal vector :

Which gives us

Now that we have a normal vector, we can define our plane intuitively. We know that our normal vector is perpendicular to all vectors in the plane, and as we saw before, the **dot product** of any two vectors is zero if the vectors are orthogonal. We can therefore say that, given our point on the plane, and any other point on the plane, the dot product between and (subtracting two points results in a vector) is :

**(2)**

Which is our generic representation of a plane. This is a nice way to think about a plane, but alone it doesn’t help us find the intersection between two planes. Ultimately what we want is a system of linear equations like the title talked about.

Given our plane equation

Let’s expand the terms.

After performing the dot product:

Distributing , , and :

We know what the values of and are (they’re constants) because they were given to us in the plane’s definition, so let’s move them to the other side of the sign in order to have only variables on one side and only constants on the other.

and since is constant, it will be easier to relabel it as instead of writing it out every time. That is,

Which results in a linear equation that looks like this:

If we have two planes, then we’ll distinguish between their definitions via subscripts on our constant:

**(3)**

Finally we have a system of linear equations. Great! We have ways to solve those! However, this particular one presents a little problem. (A tiny one that we’ll make go away)

In linear algebra, we are often provided with a number of equations and an equal number of unknowns (variables) we must solve for. However, in *our* system, we appear to have more variables than equations!

Let’s take another look.

We have three variables to solve for — , , , and yet only two equations. This is called an **Under-determined system** meaning there are *infinite* solutions (assuming the planes intersect at all). Infinite? You see, when a line intersects a line, they intersect at a single point, but when a plane intersects another plane, that intersection is a *line*, which has infinitely many points. We would need three planes to intersect before we could find a single point of intersection.

There are more math-intensive approaches to solving under-determined systems that involve computing something called a **pseudo-inverse**, but for our purposes here, we can take advantage of our knowledge of the domain! (Also, a Moore-Penrose pseudo-inverse would only provide an approximate solution whereas we can compute an exact one here).

Let’s assume that the planes we’re intersecting are not parallel; they have a “nice” intersection. If this is true, then we’ll end up with a *line*. Because planes are infinitely large, the resulting line will be infinitely long. At some point, this line will cross at least one of the x, y or z axes.

How can I be so sure of that? Start by thinking of it this way: we can represent a line as a point plus a direction of travel. We can move forwards and backwards in that direction (this is called the **vector equation of a line** as shown below:

If we don’t want our line to pass through , then we set to some arbitrary constant like . In two dimensions, this would give us a vertical line. In three, it gives us a line with direction as a function of and . If we don’t want the line to cross the axis, then we fix to some arbitrary constant like . Now our line varies only in the direction. However, if we didn’t want the line to cross the axis, and tried to assign it to some arbitrary constant as well, say , then we’d end up with a single point , not a line. If we allow the line’s direction to be a function of , then is a feasible value, and if we allow it to be a function of and , then both and are feasible values. Thus we can always guarantee that our line passes through for at least one of our axes. In summary, in a dimensional system, if we fix dimensions to non-zero values and allow the last dimension to vary, is included in the last dimensions’ possible values, otherwise you would not have a line. Graphically, we can show this like so:

When a line cross through for one of our axes, this is great! It’s great because if a component at that point has a value of zero, then its associated term in equation 3 will drop out. This leaves us with a system that is *not* under-determined; we will have two variables and two linear equations. And as I discussed in my previous post, we know how to solve for that. At this point we’ve obtained a point on the line of intersection!

Okay, we know how to solve a 2D system of linear equations to get a single point, but what use is that? Indeed a point by itself isn’t a solution, we need a line. The point we obtained, however, is in fact the first part in determining the line of intersection between two planes.

We still have two problems to solve:

- How do we discover which component will become zero?
- Even if we obtained a single point on the line, how do we figure out the rest of the line?

Let’s solve each of these in turn.

I said before that we could use our knowledge of the domain to solve this plane-plane intersection problem. That knowledge is that, if the planes intersect “nicely”, then eventually that line will pass through zero for at least one of . So how do we discover if it’s , , or ? The short answer is: why not try them all?

First we set to (causing its term to drop out) in equation 3, leaving us with two lines. We can test to see if the resulting two lines intersect at all by testing to see if they’re parallel via the cross-product. If that doesn’t work, we move onto , and then .

The result is that we’ll have a point on the line of intersection where one of the components is zero, like . All that remains is to discover the direction of the line.

I actually hinted at this earlier, and you may have caught it if you were reading closely.

Up until now, we’ve assumed that the two planes intersect and are not co-planar. In other words, the planes are not parallel.

How can we make sure of that? Well, we already know our two planes have normal vectors and , respectively. Also, we know that the cross product (equation 1) between these normal vectors gives us a new vector that is orthogonal to both. What happens when you compute the cross product of a vector with another in the same direction? Well, since the angle between them is (or ) degrees, the resulting vector is the vector, with magnitude .

For our purposes, when we encounter co-planar planes, we will disregard this case (the intersection is a plane equal to one of the input planes).

After we’ve ensured that the cross product between the two planes’ normal vectors is non-zero, we are left with a vector that’s orthogonal to both. How is this useful?

Well, we know from our earlier discussion that a normal vector is orthogonal to any vector we could draw in a plane. So it follows that a vector that is orthogonal to *two* normal vectors must lie in both those planes! And since our planes intersect at exactly *one* line, our fancy new vector we got from the cross product of the normals **must** be in the same direction as the line of intersection between the planes!

Armed with this information, we now have a point *and* a direction to represent our line of intersection.

If you’ll allow me, let the point on the line be , and the direction it travels in be . Now we can represent any point on our line as:

**(4)**

Where is a resulting point, and is a value from to . Let’s go one step further, set to so that we can obtain a second point on the line, which we’ll call . And now we have two points on the line. We can use those two points to transform our line representation into Two-Point Form (like I use in my previous post where subscript of indicates a component of (, , and ), and a subscript of is a component of :

**(5)**

And thus we’ve computed the line of intersection between two planes.

Finding the line of intersection between two planes is generally done the same way as you’d intersect any two geometric objects — set their equations equal to each other and solve. We discovered here that the result was an under-determined system and we overcame that. This is exciting! From this understanding, we are ready to take the next steps and handle “real world” situations that involve line *segments* and triangles instead of mathematical lines and planes.

(Note: this article was originally written in and transcribed to WordPress, so forgive the equation alignment. Get the original).

In a previous post, I outlined an analytical solution for intersecting lines and

ellipses. In this post I’m doing much the same thing but rather with lines on lines. I’ll point out why the normal slope-intercept form for a line is a poor representation, and what we can do about that.

In computer graphics, or just geometry in general, you often find yourself in a scenario where you want to know if and how two or more objects intersect. For example, in the latest shooting game you’re playing perhaps a bullet is represented as a sphere and a target is represented as a disc. You want to know if the bullet you’ve fired has struck the target, and if so, was it a bulls-eye?

In this post, I’ll step you through one way we can accomplish discovering intersection points between two lines, being sure to carefully walk through each step of the calculation so that you don’t get lost.

We’re going to fly in the face of many approaches to line-line intersection here and try to ultimately wind up with a system of linear equations to solve. This is different from the “usual” way of finding intersections by crafting an equation to represent each geometric body, and then somehow setting those equations equal to each other. For example, if we used the familiar slope-intercept form to represent a line, we’d end up representing the lines as

**(1)**

**(2)**

Then afterwards we could solve for by assigning the equations to each other:

And with some algebraic fiddling, we could get , and the take that and insert it into either line equation in above to get .

But here are some questions to think about.

Slope-intercept form assumes two things:

- Every line has a slope
- Every line has a y-intercept

This is all well and good for most lines. Even horizontal lines have a slope of 0, and a y-intercept somewhere. Our problem is with perfectly vertical lines:

What is the slope of a vertical line, since the “run” part of rise-over-run, is zero? Vertical lines don’t touch the y-axis either unless they’re collinear with it, and even then there wouldn’t be just a single y-intercept.

Vertical lines are better represented as a function of . Like the part isn’t even here, it just doesn’t exist! That’s because there are infinite values of for our given .

Vertical lines are the bane of slope-intercept form’s existence. If both lines were vertical, we could test for that and then not bother with testing for intersection, but what if just one of them were vertical? How would we check for the intersection point then?

Well, let’s examine an alternative representation of a line. Those of you who have ever taken a linear algebra course should be familiar with it:

**(3)**

Where , , and are known constant values, and we’re solving for and .

Okay, what the heck is that? In the next section I’ll explain exactly how this solves our vertical line problem, but first I need to demonstrate to you how we can even *get* a line into that format.

“What? We don’t always use slope-intercept form??? All my teachers have lied to me!” In fact, there are many ways we can represent a line, and like any tool there’s a time and a place for each of them.

The representation we’re interested in here is called **Two-point form**. And we can derive it if we already have two points on the line (which is common if you have a bunch of line segments in the plane, like sketch strokes).

Given:

We have two points on the line,

we can represent a line in the following form, parameterized on and :

**(4)**

Still doesn’t seem to help us, does it? We’ve gotten rid of the intercept, but we still have a slope, which becomes a big problem when . So let’s fix that but multiplying both sides of the equation by :

**(5)**

(This is called **Symmetric form**) Let’s manipulate Equation 5 so that and (no subscripts) appear only once. Start by multiplying everything through:

We notice that the terms can cancel out:

Now we rearrange the left side in terms of :

And the right side in terms of :

Subtract the term from both sides:

and add the term:

Let’s move the equals sign to the other side:

And move the and to the other side of the parentheses:

Notice now how similar this equation is to Equation 3.

We can define:

Distributing the negative through for :

**(6)**

To be left with the same equation (restated here):

Now that we can represent a single line as a linear equation in two variables, and , we can represent the intersection of two lines as a system of linear equations in two variables:

Where we compute and by using the two-point form mathematics from the previous section. With two equations and two unknowns, we can compute and .

In the next part I will slowly walk through how we will solve this system using basic techniques from linear algebra

(Warning! Lots of math incoming!).

The idea behind solving a system of equations like this is to get it into something called *row-echelon form*, which is a fancy way of saying “I want the coefficient on in the top equation to be , and the coefficient of in the bottom equation to be , with other coefficients of and to be zero”:

Let’s begin with getting the coefficient on in the top row to be . First divide through the top equation by :

Simplifying:

Notice that at this point, we’ve succeeded in getting a coefficient of on in the top equation.

The next thing we do is notice that, since we have a bare in the top equation, we could multiply it by and then subtract it from the in the bottom equation to get the bottom one’s coefficient on to be zero. However, we cannot do this solely on . Instead, we’re restricted to what are called *Elementary Row Operations* that limit what we can do while preserving the correctness of the equations. So what we have to do is multiply the *entire* top row by , and then subtract the top row from the bottom row, which looks like this:

Which gives us a coefficient on in the second row!

Let’s simplify the and terms in the second row to have a common denominator:

The next step is to get a coefficient of on the term in the second row by dividing through the row by , which results in us replacing the coefficient on with a , and then multiplying the third term by the flipped fraction (remember that when we divide two fractions, we flip the second one and then multiply):

We can cancel the terms in the multiplication, and the resulting equation becomes:

At this point, we’ve solved for . That is,

We *could* substitute for in the top row to solve for , but a more linear-algebra-ish way would be to perform another elementary row operation — multiply the bottom row by , and then subtract it from the top row. Here’s what that looks like:

Now we’ve achieved a coefficient of on the term in the top row!

We’ve essentially solved for at this point, but the term on the other side of the is a little ugly, and we can simplify it. Let’s start by making the two parts of the term have the same denominator, which means we need to multiply by (which is really just multiplying by 1!):

Distributing the term in the numerator, and combining the terms because they have the same denominator:

When we distribute the in the numerator, the term becomes positive:

Which leaves us with both and $+a_2b_1c_1$ in the numerator, which cancel:

We can factor out in the numerator:

Finally, the terms in the numerator and denominator cancel, leaving us with:

Now we’re ready to say we’ve solved the linear system for and , leaving us with

**(7)**

Substituting the values from Equation 6 into Equation 7 yields:

**(8)**

This formulation is identical to the one you’ll find on Wikipedia (although they’ve arranged the denominators slightly differently, but still mathematically equivalent).

Ironically, despite all that extra math, we’ll still have problems with vertical lines, but only when those lines are *parallel*, which means there are either infinitely many solutions (lines are collinear) or zero solutions (lines are parallel but not collinear, like in the figure below).

Checking for parallel lines is fortunately pretty simple.

Let’s say we have two line segments floating around the Cartesian plane:

To check for whether these are parallel, first imagine translating the lines so that they both have an endpoint at :

There will be some angle between the two lines:

If this angle is , or radians (°) then will be .

Given that we are representing our lines using two points, we can use those points to create *vectors*, which are like our lines from the origin that have a length and a magnitude.

To create a vector given points and , we just subtract one from the other (for our purposes here it doesn’t matter the order!):

**(9)**

We draw them with little arrows to indicate a direction:

After we’ve created vectors for both our lines, we can take advantage of a mathematical relationship between them called the cross product to find whether they’re parallel or not. For two vectors and , their cross product is defined as

**(10)**

Where and are the **magnitudes** of the vectors, which can be thought of as a vector’s “length”. For some vector , it’s defined as

**(11)**

However, for the sake of testing for parallel lines, what we’re really interested in is the term in the cross product, where is the smallest angle between the vectors (the vectors are simultaneously separated by and radians).

Assuming our vectors don’t have magnitudes of zero, when the cross product between them is zero (or really really close to zero) we consider the lines to be parallel because the angle between them must be zero. This tells us that the lines either never collide, or they’re infinitely colliding because they’re the same line (collinear). I leave checking for collinearity as an exercise to you.

One might say we solved our system of linear equations the long way. In fact, since we had exactly two equations and exactly two unknowns, we could have leveraged a mathematical technique known as Cramer’s rule. The method is actually not so hard to apply, but perhaps its correctness is a little more difficult to understand.

Checking for line-line intersections is a harder problem than it appears to be on the surface, but once we give it a little thought it all boils down to algebra.

One application for line-line intersection testing is in computer graphics. Testing for intersections is one of the foundational subroutines for computing Binary Space Partitioning (BSP) Trees, which are a way to efficiently represent a graphical scene, or even an individual object within a scene. BSP Trees are perhaps most famously known for their use by John Carmack in the Doom games.

In closing, we can consider a line in 2D to actually be a hyperplane that partitions the plane into areas “on one side or the other” the line. Therefore, there are analogs in 3D space where we use a 3D hyperplane (what we typically think of as a plane) to partition the space again into areas “on one side or the other” of the plane.

]]>The triangular numbers are an interesting mathematical phenomenon that appears constantly in computer science. When you, the programmer, talk about the Big-Oh complexity of a nested for loop that gets executed times, you might just slap on it and call it a day.

But do you ever think about what that summation *actually* is? In this article I’ll present an alternative formulation of the series that I think is satisfying from a programmer’s point of view, and also present some interesting results from looking at the series in different ways.

(Note: this article was originally written in and transcribed to WordPress. Get it as a .pdf here)

Consider the following algorithm:

How many times does Foo() execute? Let’s just walk through the case where .

- , Foo();
- , Foo();
- , Foo();
- , stop

So, when , Foo() will be executed 3 times. What sort of pattern did it follow, though? Let’s break it down:

- first iteration: 1 time ( case)
- second iteration: 2 times ( case)

It would be reasonable to assume that this pattern continues. That is, for iterations we will see the following pattern:

- iteration 1: 1 time
- iteration 2: 2 times
- iteration 3: 3 times
- iteration : times
- iteration : times

So the total number of times Foo() gets called is

Which we’ll wrap up nicely as the following summation:

At this point you may be saying “Yes, I understand what the summation *looks like*, but what does it sum up to!?”, and I’m getting to that. If you will, though, please allow me one more aside. Let’s look at a visualization of this series.

Now you see why they’re called the triangular numbers. Most visualizations will show you them in this format.

Perhaps the most common visual approach to computing triangular numbers is with some geometric help. You see, the triangular numbers, being a Figurate Number (sometimes called a Polygonal Number), are easy to visualize. To compute the triangular numbers using some geometric help, let us first recognize that a (right) triangle is half of a square.

Next, we recognize that the number of dots in the square is equal to its area, or simply the number of dots along one side multiplied by itself. More formally, the area of an by square is . For our 5×5 square, this gives 25 dots.

gives us the area of a square, and gives us half of it. But check out the figure below! Taking just half of the area only gives us half of the dots along the diagonal of the square! We want all of the dots along the diagonal of the square.

How many dots are in the diagonal of a square? Well, there’s one dot for each row in the triangle, therefore there are dots, and we only captured half of their areas when we got so we need to add in that other giving us a final computed number of dots as , or simplified as , which is the commonly used formula for computing triangular numbers, and is the exact summation we will get in the next section when approaching the summation from another angle.

Pretty awesome, right?

In this section we’ll actually derive the result of the summation in a novel way. A way that might appeal to a programmer who has to constantly look at loops all day.

I’ll pose a question to you:

“How many times can you make from the series?”

For the purpose of our “programmer’s proof”, let’s consider a slightly different visualization of the series than we saw in the previous section First, we’ll make our equilateral triangle a right triangle

Next, we’ll take the top of the triangle and align it to the gaps below

A little weird, huh? However, I think it helps us to answer our question. (We’ll play Tetris with it in the next subsection!) Let’s answer the question of “How many times can we get dots from our triangle?” Trivially, we can do it at least once, because the very last row has dots in it. The second last row has , and conveniently we can see that the first row only has , so we can combine them to obtain for a second time.

Finally, we notice that the second and third rows added together also equal . So we’ve made three times for . The implication is that the triangular number summation for is . Nice. Let’s try to derive a formula.

For elements, how many times can we make ? Start pulling terms simultaneously off the left and right ends of

Notice I made a bit of a modification to the series by adding to it. Now we have terms in our series instead of . Adding the term does not change the value of the summation; all we’re saying is that the triangular numbers for will result in a series containing terms.

Pulling from both sides at once, we get the following summations:

So, for a sequence of n items, we can make a sum of exactly times, leading to the solution of

As a programmer, I find this a very satisfying approach. It feels like we’re writing a method to perform the computation in time instead of time.

Let’s visualize what we just did by playing Tetris.

- Take our alternative figure of a triangle from above and horizontally flip the portion we set aside
- Then we’ll flip it vertically. Now we can see that it will fit very nicely in the gaps with the rest of the dots.
- Translate the portion down, giving us a nice rectangle that is long by tall, whose area is (look familiar!?!?).

*(Author’s note: Feel free to skip this section, it gets a little mathy)*

Approaching the summation from the other side:

is a little more challenging, but it yields an interesting series that is equivalent to our derived from previous sections.

Let’s first try to factor out :

And the last term evaluates to , so we have:

Which may not appear very interesting at first. Let’s begin combining terms in the interior and see what we can come up

with:

Aha! Now we’re getting somewhere. Let’s simplify the first term in the parenthesis:

And add the next term into the first term (we’re starting to see a pattern emerging)

As we combine our terms, the coefficient on within the parentheses grows by for each new term, and given that there are terms in parentheses (we factored out , we didn’t remove it altogether), we can deduce that the final coefficient will be .

As for the right hand side of the difference, it appears to be growing as , which is again our triangular numbers, just up to ! The formula begins to look like this:

Which we can simplify by distributing the term through the parenthesis, giving a new solution of

Which ostensibly will cause a repetition of what we just did, giving:

And so on. Eventually we stop at the last term, which is , so, for our series looks like this:

And when we distribute the subtraction, we get an interesting alternating series that looks like this:

Just as a sanity check, we can fill in the actual values:

Which clearly sums to , the same as if we applied our magic formula.

Let’s formalize our alternating series as

And now formally state the equivalence between our two series:

*(Proof by the commutative property of addition)*

Which is really a fancy way of saying

I think this revelation is the most interesting part of the entire article! (Okay, playing Tetris to prove the summation of the triangular numbers was also pretty awesome.)

The triangular numbers are a series that appears all the time in your average programmer’s life, without them really realizing it. It’s easy to write off the summation’s value as “some value less-than-or-equal-to “, but understanding the summation a little more deeply can also bear fruit, like the solution to a common interviewing problem.

Thank you for reading and sharing a love of all things technical.

]]>An ellipse is defined by a long axis and a short axis, called the semi-major and semi-minor axes, respectively. Usually people use the variable to represent the length of the semi-major axis, and to represent the length of the semi-minor axis. In this article I’ll use to represent only the **horizontal** axis and to represent only the **vertical** axis. That said, the formal equation for an ellipse is this: 1 And the equation for a line is this: To avoid confusion about what means, I’ll use the term to represent the y-intercept instead. 2 To find your potentially two intersecting points, you need to solve for and then use the values you found for (there will be two) to find corresponding values for . That is, you need to simultaneously solve equations 1 and 2. But first, let’s discuss our line.

You’re given two points , and you need to find values for slope and y-intercept like in Eqn. 2. Well slope, , is simply the change in over the change in . 3 The actual order of and in Eqn. 2 doesn’t matter — you can have or vice versa and you’ll get the same slope. Now to find the y-intercept, which we’re referring to as , we just take one of our points (arbitrarily choose ) and plug it into our equation to solve for : subtracting from both sides leaves us with our y-intercept : 4 Now that we know our values for , , , and , we are ready to solve for the intersection points between the line and the ellipse. First, substitute the line equation (Eqn. 2) into the ellipse equation (Eqn. 1) so that we can solve for : Expanding the square: We want to have a common denominator for both fractions on the left-hand side, so we’ll multiply the first term by and the second term by : Now we can multiply both sides of the equation by so we don’t have a fraction on the left-hand side: Notice how the first two terms on the left-hand side have a common term: , let’s factor that out: Now let’s notice that the terms and both consist of only our known constants. To make the rest of our solution simpler, let’s label these constants and . That is: 5 6 With our constant-naming out of the way, let’s re-examine our equation: That’s much cleaner isn’t it? Okay, next let’s move the term to the other side: The left-hand side is very clean now: just a quadratic equation. I’m going to use a trick called completing the square to help us solve for . If we first divide everything by we get: Which is of the form . and in this case refer to constants of a quadratic equation, not the same variables we’re using. Because we have it of this form, we know that if we add to both sides, then we can easily factor the left side: Becomes: Now, since we’re interested in finding the value of we need to take a square root of both sides: Evaluating the left hand side: Now we want to find the value for , not so let’s only keep the positive root: Let’s get by itself on the left-hand side by subtracting from both sides: Because things are getting kind of messy with that big square root, I’m going to notice that it’s simply a square root of constants that we already know, and label the whole thing . That is, 7 This makes our equation much cleaner: Later I’ll resubstitute for those constants, but bear with me as I use them to solve for and . We now know that has two solutions: If we take these values for along with our equation for a line (Eqn. 2), then we can solve for : which yields solutions for : This gives us our final intersection points of and If we resubstitute back in for we can simplify it ever so slightly. From Eqn. 7: Let’s substitute back in for and on the first term (Refer to Eqns. 5 and 6): Notice how we have a common term in the numerator, let’s factor it out: Now let’s substitute on the second term: Notice again how there is a common term in the numerator. Now we can factor an out of both, and get an term outside the radical: Finally, to resubstitute everything back into our point equations, our two potential intersection points are: , 8 And , 9

The final equation for the points isn’t really the cleanest is it? I myself prefer to keep the constants and that I defined. Note that the points you’ve discovered won’t necessarily lie on the ellipse if the line doesn’t intersect the ellipse at all; you should be able to substitute your discovered and values into equation1 and see if it still equals 1. So there you have it, an analytic solution for the intersection points of a line with an ellipse in a convenient equation for you to translate into code. Thanks for reading!

]]>

Our game features Noah, a biologist with an animal philanthropist streak (so a bio-philo-zoo-thropist?). Noah is travelling to the world Xion because the planet’s integrity has been compromised (by human mining operations) and the whole thing is about to explode! Perhaps a bit overly dramatic, but you, Noah, have come to Xion to save its creatures. Being a biologist, you can’t stand to see this kind of fauna go extinct. So you chartered a huge ship to bring all the creatures aboard for preservation and future study, and conveniently named it the Ark.

The only thing is, these creatures have no idea that you’re trying to *save *them. In fact, they’ve come to dislike humans in general, so they won’t be too receptive of your heroism. In fact, they’ll try to kill you. Armed with your Bio-Rifle, you must stun the creatures, and then get close enough to them so you can beam them aboard the Ark.

While I may not be a very good at art, animation, or sound, I am at least a decent programmer! (Or clever enough to fool the other team members that I am). They let me take on the role of AI Programmer, which was very exciting.

If you take a peek at the in-game screenshot from above, you’ll notice that there’s at least 9 enemies on-screen. In fact, our game needed to handle about 30 enemies on-screen simultaneously, and on iPad hardware. On top of rendering the models, doing shading and lighting calculations, sound, physics, and special effects, AI doesn’t usually get a lot of room to work. I was fortunate enough to have a bit of leeway on this game because AI is the crux of the gameplay, but I still wanted to keep things simple and quick.

I’m separating this topic into three parts, to be covered here and in two more blog posts. First, how to efficiently handle a pool of objects, like enemies (this post). Next post will be about finite state machines and behavior trees, and then finally I’ll talk a bit about implementation details using Unity’s powerful scripting engine in the last post.

What is an object pool? The name is quite intuitive; it’s a notion of a container full of *things *that you pull from when you want to use one, and when you’re done you put it back for re-use later. To use an example from the Unity forums, let’s say your character has a gun. Guns shoot bullets. Each time you shoot a bullet, you want to show that bullet travel along its path. After a bullet has hit something, it goes away.

A really inefficient solution to this problem is to simply create a new “bullet” object each time the player pulls the trigger, and then destroy it when it has completed its trajectory. A much more efficient solution is to create, say, 1000 bullet objects at game start, hide them offscreen somewhere, and then teleport them in as they’re needed. When the bullet finishes its trajectory, *Poof!* it’s deactivated and teleported back into the bullet pool.

Having no experience in object pools myself, I turned to the Unity forums, which had a few good posts on the matter.

Our gameplay draws inspiration from survival-type games like the (in)famous I MAED A GAM3 W1TH ZOMBIES 1N IT!!!1. I wanted the player to be initially unchallenged by only a few basic enemies, but eventually be overwhelmed by a horde as time progressed. For that, I needed an enemy pool, and an intelligent spawning strategy.

Like The Director in Valve and Turtle Rock Studios’ Left 4 Dead series, I wanted enemy spawning decision to be handled intelligently and on the fly. The AI Manager was the solution to that.

The AI Manager has only few very simple tasks:

- Create all our creature pools
- Spawn enemies at appropriate times and in appropriate locations
- Return creatures to the pool when they’ve been teleported or killed

While we discouraged the player from killing the creatures on the planet by subtracting points, it did happen. I suppose players today have been overexposed to games focused on killing things.

Creating the creature pools was pretty easy; creatures are objects with predefined (tuned) attributes like speed, attack damage, attack range, and health. That means all the AI manager had to do was create a bunch of instances of them and plop them into a list.

Spawning the creatures was a little more difficult. We want to spawn the creatures off-screen, on the terrain, and not intersecting with other creatures or objects. Finding an area off-screen was not too difficult. Instead of the more difficult calculation of the portion of the terrain not viewable in the camera frustum, we settled on a variable minimum distance from the player’s current position, so long as the camera didn’t zoom way out (it didn’t), we were fine.

Spawning the creatures at least a minimum radius away from the player solves the issue of spawning them off-screen, but we still had to worry about them being **on** the terrain. Our world wasn’t a totally flat plane; it had small hills here and there. If I spawned an enemy on the same plane as the player, it could either be a little above the ground (not a problem with physics that will bring it down), or a little below the ground (a much bigger problem, as the physics will cause the creature to fall to its doom).

To solve this issue I did the following:

- Define some point above the highest point on the terrain. Say, 60m above the origin.
- At the planar point where I want to spawn the creature, project that point vertically 60m so it’s guaranteed to be above all the ground.
- Cast a ray downward from the point and until it intersects the terrain. At the intersection point is where we want to spawn our creature.

Thankfully the Unity engine has support in place for this technique in its Physics.Raycast function.

So now we’ve taken care of spawning our creature off camera and on top of the terrain, so all that’s left is making sure it’s not in the middle of a tree or another creature or something. How can we tell if we’re intersecting a creature or not? Well, Unity has a nice physics system in place, so let’s let **it** tell us!

I *could* move the entire creature object to my desired spawn position, test for collision, and then move it back to the pool upon failure, but there’s a few problems with that. The physics engine in Unity has a different timestep than the rendering engine; a slower timestep. Moving the creature model in place, even briefly, runs the risk of having an unsightly visual artifact as we test for collision. Additionally, there’s no reason to test collision on the whole creature; it’s faster to test collisions against spheres.

A better solution, then, is to define an invisible sphere that could contain our creature, and use that instead. For each differently-sized creature, I created a differently sized sphere, which I called a *spawn tester*. Now I can invisibly place a sphere that can collide with other objects (but not vice versa! We don’t want the player running into mystical invisible teleporting ball), and I can know for certain whether an area is clear for spawning.

After a creature is teleported to the Ark, we shouldn’t be able to see it anymore. When this happens, a creature tells the AI Manager that it has been teleported. The AI Manager waits a second or two for the animation and sound to play, disables the creature, and then literally teleports it to our offscreen holding area. The deactivated enemy is added back into the creature pool, where it is now available to be reset and reactivated later.

This is a common thing to do in games, but it has an interesting philosophical implication; **you are constantly killing the same enemies over and over again!** It’s like they’re stuck in a cruel universe where they don’t ever really die, they only go to some weird catatonic limbo until a maleficent force reanimates them. In our game at least, most of the creatures aren’t being killed, merely stunned.

So that wraps up Part 1 of this series, I hope it was informative and exciting!

Dev Team (in alphabetical order)

- Stephen Aldriedge
- Cameron Coker
- Rachel Cunningham
- Jack Eggebrecht
- Me (Andy G)
- Jay Jackson
- Chiang Leng
- Sterling Morris
- John Pettingill
- Chris Potter
- Jake Ross
- Brian Smith
- Sterling Smith
- Jacob Zimmer

Computers can seem pretty dumb sometimes, can’t they? Why can’t they just learn how to do things like we do? Learning comes so effortlessly to us humans; we don’t even remember learning something as extraordinarily complicated as speech – it just sort of happened. If I showed you 10 pictures, 5 with cats in them and 5 without (actually this is the internet, so 11 of those 10 pictures would have cats in them, but bear with me) you could easily identify which images contained cats. Because computers are basically math machines, unless you can very precisely define what a cat *is*, then a computer will not be very good at such a task. That’s where neural networks come in – what if we could simulate a human brain? And like a human brain, what if we could purpose our simulation to only look at cats?

My previous post, Decision Tree Learning, briefly alluded to neural networks as an alternative machine learning technique. At their core, neural networks seek to very coarsely emulate a brain. You probably know that the brain has neurons in it. Neurons are little cells that can send an electrical signal to another neuron. Some neurons are more strongly connected to each other than others and therefore the messages they send to each other have a larger effect. If a neuron receives a strong enough message from all its neighbors, it will in turn “activate” and send a message. So how can we use this simplified concept of neurons to identify cats?

Let’s begin with the simplest of neural networks: 1 neuron. In machine learning terms, this is called a Perceptron. The neuron receives a signal, and based on that signal it either fires or it doesn’t. Let’s talk a little more about this “signal” the neuron receives.

In our neural network, a neuron can receive a signal from a variety of sources. Imagine your brain was only a single neuron instead of the billions it actually is. This neuron receives signals from your eyes, your nose, your mouth, your ears, etc. Your eyes tell you that you are sensing something with four legs. Your nose smells kitty litter. Your ears hear a “meow”, and your mouth….? Let’s hope you’re not using that sense to identify if something is a cat or not.

Anyway, all these senses get passed directly into your brain, or single neuron in this case. When a neuron receives a signal, or combination of signals, this may cause it to fire. That is, the signals it receives combine in some way to form a message that means “you too, should fire”. If your brain, like many human brains, is made for saying “YES, that’s a kitty!” or “No, not a kitty.” then the neuron firing is akin to saying “Yes”, while not firing is akin to saying “No”.

Let’s take away some of the mystique surrounding the perceptron, and neural networks in general. In a nutshell, a perceptron is a function that converts your input values into some output value.

That’s it! We make a fancy function, then feed it some numbers that describe the object we’re trying to identify, and it spits out some numbers, which we then interpret as a classification prediction. In slightly more formal terms, the “signal” that our lonely perceptron receives is a list of values (called a vector) from all the attributes of the object we’re trying to classify. Based on those values, we classify the object as “Cat” or “Not a cat”.

On top of simply receiving the values, we may also want to weight certain values higher than others. For example, there are thousands of animals that have four legs, but very few will regularly smell like kitty litter. Whether the object smells like kitty or not, then, should probably be more important to our final position. That is, it has more *weight*. Giving an attribute more weight is analogous to thickening the synapse between two neurons, strengthening the connection between them.

Here’s where I start getting a bit technical, but I’ll try to explain everything clearly. Feel free to skip all the math; I’ve tried to write this article to give you an intuition on how things work without it. Head to the comments to tell me what I need to clarify!

For each input attribute we need to determine a weight such that the sum of all weighted attributes is higher than the neuron’s threshold when it **should** fire, and below that threshold when the neuron **should not** fire.

Now, like all machine learners, we need a set of training data. That is, a bunch of objects that *are* cats, and bunch of objects that *aren’t* cats. Associated with each example are attribute values like shape, smell, sound, etc. Let’s call the set of training data **T**.

Each individual training example can be labeled t_{i} where indicates the example’s position in **T**’s list. If we have 3 attributes (shape, smell, sound) for each training example, then the attributes for t_{i} can be labeled as a_{i1}, a_{i2}, a_{i3}. More generally, there may be *k* attributes for each training example, so we can refer to them as a_{i1}, a_{i2}, …, a_{ij}, …, a_{ik}.

Now that we know what our input looks like, the objective is to determine how we should weight attributes. Some attributes may help our classification, and some may hurt it. Those that help it should be given a positive weight, and those that hurt it should be given a negative weight. Usually we start with some really small nonzero weights that we increase or decrease over time.

Now, the easiest way to apply a numerical weight to some attribute like “smell” would be to multiply them together. The only problem is, what does it mean to multiply English by some number? It doesn’t make any sense!

Therefore, we should **convert **these English descriptions to numbers. For example, let’s say you have values for your attribute “Smell” of “Kitty Litter”, “Grass”, “Hay”, and “Mud” that you want to change to numbers. The most logical thing to do is assign them values of 1, 2, 3, and 4, because that’s the order they appear in. I’ll talk a bit shortly about why this is a bad idea, and how we will change the values, but for now let’s go with it.

**Recap**: Okay, at this point we have training examples, which themselves have numeric attributes, and we have weights on each attribute. How does all this come together to give us “Cat” or “Not Cat”?

What we’ll do to actually get an answer to the question “Is it a Cat?” is combine our attributes with the weights we have on them through multiplication. Then we’ll sum up each product to get a final value, which we’ll interpret as our answer (more on the interpretation part in a sec).

Clearly, the animal we’re trying to classify is a horse, not a cat. How do we interpret that output value of 3.1 as “Not a cat”, though? What a perceptron does is this: define some number to be a *threshold* value; values that lie on one side of it are Cats, and values on the other side are Not Cats. Usually this value is 0.0:

From the figures above you can see what a perceptron does: the multiplication of our attributes by weights forms a line. Some of the elements on this line are above 0, and some are below 0. Those above 0 we’ll say are cats, and those below we’ll say aren’t. For funsies, we add another weight to our perceptron:

This new attribute, Attribute 0 is considered to always be on. The reason for adding it (and an associated weight) is so that our line from above doesn’t always necessarily go exactly through (0, 0):

Notice how our line has moved up, but the area we classify as Cats has not? This is right; the number of items that will map to a point on the line above “0” is higher! We could have easily gone the other way to make fewer items be classified as cats.

For one training example let’s compute a sum of all weights multiplied by their attributes:

Because I don’t like the term “Sum”, and it’s not really used in the literature, let’s replace it with the term “*y*”, and give it a subscript to indicate that it’s the sum for training example *i*:

This equation is actually the same as the equation for a line through the origin. This observation helps us to realize that what we’re really doing is trying to draw a line that best separates our “cat” and “not cat” examples. (See figures above).

To add a little extra ability to this function, it’s common to add one more “bias” weight that we’ll call weight 0 (). What this term does is give us the y intercept of our line, allowing it to not be forced to go through (0,0). In neural network terms, we’re adding a new attribute that always takes on a value of 1, and is our weight for it.

Those familiar with vector math might see that this summation of products is really the same as a vector multiplication:

**To recap**: For a training example that has attributes , and neural network weights on each attribute , we generate a linear sum . What does this sum mean? It’s just a number after all. Let’s say this: if is greater than, we’ll say that the neuron fires (yes, the object is a cat!), else it doesn’t (no, it’s not a cat). That is, we can just check the sign of , :

Alright, we now know how to classify our object if we have the numerical attribute values, and appropriate weights on those attributes. The purpose of a neural network is to learn these appropriate weights, which I’ll get to in a minute. First, let’s return to why the numerical attribute values we chose before were bad.

When we’re performing our multiplication of attribute values by their weights, won’t “Mud” inherently have a higher weight than “Kitty Litter”? Our neural network should theoretically be able to compensate by modifying weights on these attributes, but we should do this ourselves; It’s generally accepted that neural networks work better on normalized input. That is, we need to “normalize” our attribute values.

Normalization means that we change our input attribute values from the ones given into ones relative to each other. Quite often storing them relative to each other causes them to be in a smaller range. For example, if you have values that go from -1000000 to 1000000, it might be better to just divide everything by a million to get the range (-1.0, 1.0) (this is called Min-Max normalization).

There’s a variety of other ways to normalize data. The way I was taught, and the way I’ll describe here, is *z-score, *or* statistical* normalization. This kind of normalization treats all the given values of attributes in our training set as a population in the statistical sense. When I say “population” you might think of humans, and this is an appropriate intuition here. That is, for a population of humans, you’ll have an average height, average weight, etc. You can think of each humans’ height in terms relative to this average. That’s exactly what z-score normalization does; it replaces a value with one relative to the average.

In more technical terms, the z-score is defined as:

z is our z-score, m is our mean, s is our standard deviation, and x is the attribute value we are replacing. Let’s change our mapping from above to reflect our new normalized values (assume that mean = 1.4, std. deviation = 0.25, calculated from some training set not shown here):

The fact that normalization usually puts numbers into a smaller range can be beneficial because computers have a limit on the biggest number they can represent. Go over that number, and you come around the other side – a really really high positive number will become a really big negative number. Because we do a lot of multiplication and squaring operations in our neural network, it’s good to keep attribute values small.

In our case, normalization expanded the range, but if you notice, Kitty Litter became the only negative value. With negative weights on the Smell attribute, we’ll easy compensate for this to show that most objects being classified that smell like Kitty Litter are cats.

Okay, at this point we have our normalized, numerical input attributes, and a way to combine them with weights to generate a classification of “Cat” or “Not a cat”. It’s about time we learn how to assign the proper weights!

Here’s an idea: let’s start with some random weights on our synapses. If we run a training example through the neuron, and the neuron says “Cat!”, but we know it’s not really a cat, then we’ll modify the weights a little so that next time we might get it right.

So how do we provide some negative reinforcement to our neuron? Imagine if you will, a hill. You are told that you need to get to the bottom of this hill, and you have to do so with a blindfold on. It’s actually pretty easy, right? You feel the pull of gravity in one direction so you walk that way until the pull of gravity goes away. You *descended* the hill. Based on how steep the *gradient* of the hill was, it took you longer or shorter to get to the bottom. Training our perceptron to use the appropriate weights follows the same principle! In fact, it’s called *gradient descent.* We start with some random initial weight, and this weight gets pulled towards its true value.

In the figure above, we have some random initial weight that has an error value assigned to it. We want to “pull” that weight to the right so as to decrease its error measure, until we reach a point where we can’t improve it anymore.

More formally, we define a quadratic error function for our prediction. Taking the derivative of this function at a point gives us the slope of the function at that point. We want the slope to be 0, and if it is, we know that we’ve minimized our error (reached a local or global min). Now, this next part requires you to know what a partial derivative is.

Given a training example and weights on its attributes, we already know that the function we use to compute a prediction is the summation of attributes multiplied by their weights:

After we *threshold* the output using the sign function s(y_{i}), we have a prediction of 1 (it’s a cat!) or 0 (it’s not a cat…). Well, because all the training examples are labeled with what we **should** have predicted, we know whether the output *should* have been 1, or *should* have been 0. Let’s let the **real classification** of training example *i* be denoted *r _{i}*. And now let’s define our error as:

What the Error equation is saying is that we define the error on our weight vector (given the attributes and real classification of training example i) as proportional to the square of the difference between the real classification and our prediction. Because we only have 2 possible classes (cat, not a cat), the squared difference is only ever 1 or 0, but if we had many different classes this error could get much larger. The ½ that gets multiplied to it is completely man-made; it was only put there to make the next part (taking a derivative) easy.

Turning this error equation into a weight update equation requires us to take a partial derivative with respect to the weight term. Taking the derivative with respect to w, then requires us to use the chain rule twice, eventually leaving us with:

What this equation is saying is that we modify weight *h* based on the difference between the real classification and prediction, multiplied by the value for attribute h. For example, this this could be the weight on “smell”, where the attribute value was “Mud”.

One problem with the equation is that it assumes equal responsibility for error on behalf of all weights, when in fact this may not be the case. This is an artifact of us trying to optimize potentially many weights at the same time. Changing each weight entirely by equation 7, then, could make our weights “bounce around” the correct values. To address this issue, let’s only change the weight by a small percentage of what equation tells us. That is, let’s multiply the right hand side of the equation with a value in the range [0.0, 1.0), which we’ll label :

is typically small, but the best value of is yet another optimization problem. A different value tends to work better on different data sets, so knowing it beforehand is nigh impossible. Programmers use a few strategies on this, from just specifying a really small value, which makes training time take much longer, or specifying a slightly larger value and decreasing it over time, or letting the program adapt based on the network’s ongoing performance. I myself have tried all of the above, and found that an adaptive value seems to be the best generic strategy. I won’t get into how to adapt in this article for simplicity, but once you get the basics of neural networks down, it’s not too hard to implement.

**Recap:**

Alright, at this point we know the following:

- How to change qualitative attribute values into quantitative ones
- How to normalize quantitative attribute values
- How to combine a training example’s attribute values with weights to generate a classification prediction
- How to specify the error on our prediction
- How to update our weights in response to error on our prediction

The only question that remains is this: how do we know when we’ve finished training our perceptron?

What we do is this: we take a portion of all our training data, up to 1/3 of it, and we set it aside. Let’s call this our *validation set*. So now we have our training set, which is 2/3 of our training examples, and our validation set, which is 1/3 of our training examples.

What we do is this:

- Run each training example through our perceptron, calculating prediction error and updating weights in response to that error
- After we’ve done that, let’s run each example from the validation set on our perceptron, calculating prediction error, but
**NOT**updating weights in response to that error. - Repeat

Each time we repeat the above is called a “training epoch”. We’ll stop training our perceptron when it stops improving. That is, when the error on the training set doesn’t change for a few epochs. At this point, we know that we’re not getting any better, so we might as well stop. The perceptron’s performance over time will look something like this:

Notice how the early stages have a lot of variation, but the performance is generally improving. Near the end, though, it actually **decreases**! At this point we know that we’ve overfit the network; it’s too well trained for the training set, and so it performs poorly on the validation set.

When we compute the prediction error on the validation set, we do it on the set as a whole; we add up the error for each validation set example, and then divide by the size of the validation set. This is called the mean squared error (MSE):

The equation for the MSE is very similar to the error equation we used for deriving our weight update equation above.

The logic I used for training my neural network was this:

- Run for at least 100 epochs to get out of the typically noisy beginning
- Each time we run our validation set through the perceptron, compute the perceptron’s mean squared error (MSE) on the validation set. Track the epoch at which we saw the lowest MSE.
- Let Epoch
_{best}be the training epoch at which we saw the lowest MSE. If 2 * Epoch_{best}epochs have passed without finding a new minimum MSE, OR 10,000 epochs total have passed, terminate. (I also experimented with up to 100k training epochs, but saw no difference).

For some perspective, I compared the performance of our lowly perceptron against a decision tree approach. I also compared against a Multi-Layer Perceptron (MLP), which is a more complex neural network consisting of various perceptron layers. What I found may surprise you; despite the fact that the perceptron is effectively a brain consisting of only a single neuron it performs nearly as well as the other approaches.

The data sets we compared were taken from the UCI Machine Learning repository. Some are more difficult than others, as you can see that predicting Heart Disease is a difficult task that even the MLP only got about 65% accuracy on.

What’s a perceptron good for? You don’t exactly think of a brain as being composed of a single neuron, and on top of that, a perceptron can only learn a linear discriminant. They’re surprisingly accurate classifiers, though, and due to their size their fast too.

If you want to make your perceptron bigger and learn more complex things (and multiple class labels), it takes a little more work. You have to add more layers to the network, creating a *multi-layer perceptron *(MLP). You have an input layer that received your training examples, and then you have a series of *hidden layers*, that feed into each other, finally going into an *output layer* that will produce your prediction for each class you’re interested in (dog, cat, human, etc). Also, you can’t simply use a linear function as the output of your neurons; the output of your hidden layers needs to be thresholded using a nonlinear function like tanh or the logistic function. Finally, all this added complexity makes defining your weight update equations more difficult with each additional layer:

The above figure shows an example MLP network with a hidden layer size half that of the input layer size. The network is trying to learn four class labels: Cat, Dog, Horse, and Pig. The σ symbol implies I’m using the logistic (sigmoid) thresholding function in the hidden layers. The next layer outputs some value for each class label that implies how certain the network is about the example belonging to that particular class. The final prediction takes the most likely of these using the softmax function.

In my own experiments I found that sometimes a perceptron performs just as well as a MLP, and other times the MLP significantly outperforms the perceptron. It all comes down to the problem domain you’re trying to learn. This means messing around with different parameters like number of hidden layers, neurons per hidden layer, values for , training epochs, and thresholding functions until you end up with something that suits your needs.

Neural networks as a whole are very useful, and the subject to much research. It’s coincidental that I wrote this article just as Google is telling the world that it has made a neural network with a billion nodes in it! It’s using networks like this to identify arbitrary objects in pictures (yes, including cats!). The biggest network I created with had less than 50 neurons in it.

- Alpaydin, E.
*Introduction to machine learning*. 2nd ed. The MIT Press, 2012. 233-245. Print. - Priddy, Kevin L., and Paul E. Keller.
*Artificial neural networks: an introduction*. Vol. 68. Society of Photo Optical, 2005.

Remember my post on the Dubin’s car? It’s a car that can either go forward, turn fully left, or turn fully right. In that post I explained that the shortest path from one point to another via the Dubin’s car could be described by one of six control sequences. The Reeds-Shepp car is essentially the same thing as the Dubin’s car, except it can also move backwards! This post isn’t quite the comprehensive step-by-step guide as the Dubin’s one, but I’ll give you a great overview of the paths, point you to some great online resources, and give you some tips if you plan on implementing them yourself!

The Reeds-Shepp car is named after two guys, J.A. Reeds and L.A. Shepp, who in 1990 published a really comprehensive paper that showed we could classify all the control sequences that account for the shortest paths of the Dubin’s car if we permitted it to go in reverse. However, instead of 6 paths, they showed there to be 48, which is quite a few more if you’re implementing them one by one. (Fortunately they also showed an easier way to implement things such that we’d only have to write 8 or 9 functions and reuse them cleverly).

However, a year later two *other *people (HJ Sussman and G Tang) showed that 48 is actually **too many** control sequences, and the real number is 46. The proofs for all these things are quite complicated, so I won’t try to explain them here (And I don’t fully understand all of them myself!).

The above-mentioned paths/control sequences I’ve been mentioning are fairly well explained by Steven LaValle’s webpage. Also, Mark Moll at Lydia Kavraki’s lab wrote an implementation of the Reeds-Shepp paths for their Open Motion Planning Library. Steven LaValle also has his own implementation that became part of NASA’s CLARAty project.

Clearly there’s tons of information, and even code, online about the Reeds-Shepp cars, so why write about them? Well, because I implemented them myself, and learned a few things that might help you:

- There are typos in the original Reeds-Shepp paper
- This is important. Besides some formatting mistakes that they make here and there, the major issue is with paths described by 8.3 and 8.4. These paths include C | C | C (for example Left forwards, Right backwards, Left forwards) and C | C C (Left forwards, Right backwards, Left backwards). If you try to implement things as they stand it
**WILL NOT WORK**!

- This is important. Besides some formatting mistakes that they make here and there, the major issue is with paths described by 8.3 and 8.4. These paths include C | C | C (for example Left forwards, Right backwards, Left forwards) and C | C C (Left forwards, Right backwards, Left backwards). If you try to implement things as they stand it
- (Update 5-30-2013: I have been in contact with the OMPL developers and have determined that the following claim was made in error, so I am retracting it)
~~The OMPL library tries to address these issues, but in my experience~~**theese paths don’t work either**!~~The lengths of the paths being returned by their implementation seem correct, but the actual controls for those paths seem incorrect at time (or didn’t work for me at least).~~

- Steven LaValle’s implementations of 8.3 and 8.4
**do**work, though with some minor modifications.- I don’t claim to be the expert on how his code was supposed to be used, as I was doing my own thing, but if you try to grab the source, you may need to change some “eta” values to “y – 1.0 + cos(phi)”.

- You can be more efficient than all of these implementations!
- For one, Dr. LaValle’s implementation just has too many redundant functions that you could just ignore by using the other ones as recommended in the Reeds-Shepp paper (see the section on reflecting, timeflipping, and backwards paths)
- For another, there was yet another paper published in 1998 by P Soueres and J Boissonnat that gave us some strategies for ignoring entire swaths of potential paths, because we could know
*a priori*they wouldn’t be optimal. Basically, this works by defining theta to be the angle from the start to goal configuration, and then performing some really simple checks on the value of theta.

- Finally, a word of caution — the formulas provided by Reeds and Shepp assume the start configuration to be at position (0, 0) with rotation 0. If you use a global coordinate system, you need to convert the goal coordinates to relative ones, which is not just as simple as (goal.x – start.x), (goal.y – start.y), (goal.theta – start.theta). Well, it’s almost that simple. You need to rotate the (relativeX, relativeY) point by -start.theta so that it’s truly relative to (0, 0, 0).
- Also, the formulas all assume a unit turning radius, but you can easily account for this. (sidenote: they also assume unit velocity, i.e. no accelerating or braking, but you can also fake this afterwards!)

That’s pretty much it. The academic publishing scene is a great venue to show people what **does** work, but not so well for discussing what **doesn’t** work, or providing tips/tricks for getting things to work better. That’s what blogs are for

One word of advice: if you do use someone else’s open source code, always make sure to provide any disclaimers they require, as well as attribute the pieces you used to the original authors!

]]>