Nov 21 2015

STABILITY

Today: some fun bug-hunting stories.

In preparation for the launch on Monday I’ve been working through the bug backlog, making the game as stable and smooth as possible before lots of new people try it out. If something kills you, I want it to be on purpose!

The thing is, when you’ve been fixing the obvious bugs in a codebase for several years, any bugs that are still around have to be pretty weird. All the obvious stuff has been tested a million times and works fine, so problems only happen in massive co-incidences involving several rare things at once. That means the game-breaking bugs and crashes I’m fixing now are happening in situations like:

  • Equipping a Thorns buff (e.g. Holy Armour), then using Restless Blade to charge at an enemy with very low health so that it attacks you first, killing itself, immediately before Restless Blade hits and kills it again
  • Playing an Alchemist and shooting the final boss for its last few HP with a Pistol shot that knocks you backwards into a pit, killing you instantly while winning the game
  • Playing the Pugilist on Act 1, buying the Knockback talent and then punching a Lieutenant through the right half of the only double door in the game. Yes, the left half worked fine.

The toughest bug I’ve had lately wasn’t gameplay-related at all, though.

Sometimes, on desktop builds, the screen would go black after a few minutes. This would happen at random. It didn’t happen on my computer; it only happened to other people. I asked for a few details, and they all had really nice nVidia graphics cards. Could it be an obscure driver bug? How would I even start tracking that down?

Then it got weirder. When I sent them a debug build to find out more, the problem went away. That meant no debugging at all, even remotely.

I ended up logging loads of rendering data and adding shortcuts to fix things up that might have become invalid. As it turned out, the graphics card didn’t mean anything, and it wasn’t uninitialised memory. Instead, the very fast CPUs they had (alongside their nice graphics cards!) could technically process some game frames in zero milliseconds. So these zero-length frames? Each would cause a divide-by-zero in camera code from 2013 and replace half the camera data with NaNs. Cthulu numbers, figures so horribly wrong they break everything that looks at them.

With the bug backlog almost clear it’s starting to look solid now, at last. Tomorrow I’ll be adding the last few touches on Steam (like trading cards!), we’ll do a little more testing and I’ll write up the differences between the Steam and F2P versions so it’s as clear as possible.

Until then. 🙂

3 Comments on “STABILITY

  1. Cheers! Usually, debugging is a two way process; it’s about chasing a bug from its cause and its effect. When a bug doesn’t have a clear cause and you only have the effect to work from, debugging usually starts from learning as much as possible about those effects. Everything you can learn about what happens when the bug strikes is really important, however weird or minor.

    In the “black screen bug” case, someone found out that the problem would clear up if you quit back to the menu (using memorised keypresses to navigate the menu). That turned out to be crucial! It meant I could hunt the problem down by trying each line of the “quit to menu” code. After I found out that resetting the camera specifically did the trick, I added a way to make the camera write out all its data when it was broken. This showed the NaNs.

    I then went through the code looking for situations which could have caused the NaNs, adding logging and safety checks to each, and finally caught this zero-millisecond bug when it happened.

    It was a long bug hunt, for sure. But if that player hadn’t found the detail about quitting to the menu, it could have been a lot longer. 🙂

Comments are closed.