OpenNeo

Dress to Impress - Neopets wearables made easy

Jan 13

Adventures in Maintenance

Here’s the short version: more than a year ago I made a huge mistake in how Dress to Impress handles information. Today I fixed it.

Here’s the longer version:

Thursday night, we brought the main Dress to Impress server down for about an hour and a half, temporarily sending visitors the old server. The “enter a pet name” feature on the homepage was also spotty for the following 24 hours. Sorry for any trouble this may have caused, and thanks for understanding.

Usually when we push an update to the site, the server takes a few seconds to restart and most users are none the wiser. This time, however, we needed to make some very fundamental changes to our database structure that took quite some time to process, which would have made some sections of the site extremely unresponsive. Thankfully, the old server seemed to handle the job admirably. And everything seems to be running smoothly now, hooray!

And here’s the very long, very technical version:

First off, Dress to Impress tracks two different types of SWF assets: “biology” assets (like a Blue Shoyru’s head) and “object” assets (like how an Altador Cup Wig looks on that head). These SWFs have a unique ID number among assets of the same type—that is, though there can be both a biology asset #123 and object asset #123, there can’t be two biology assets with ID #123. Since both types of SWFs have a similar structure and need to do similar jobs (like produce PNGs of themselves for Image Mode), it seemed reasonable to store them in the same database table. Then, whenever we wanted to access information an SWF, we’d specify that we wanted, say, object asset #456 or biology asset #789. That was sufficient information for lookup, and everything seemed solid.

However, I’ll just be blunt about this bit: due to a painful inattention to detail, even though accessing SWF data works perfectly, updating that data has been critically broken for more than a year now. Specifically, after I would fetch biology asset #1337 and update it (for example, if TNT changed its zone ID), sometimes it would instead update object asset #1337. Ouch.

I can only assume that this sort of data corruption has been going on for quite some time now. For example, the Shadow Shoyru’s torso, biology asset #598, kept changing zone over the past few weeks (thanks again to everyone who reported the error!), because every time we saw the Mystery Island Lutari Anklets (which, umm, are bracelets, TNT), the site would attempt to update the zone for object asset #598. And update the biology asset instead. Sigh. This is also why so many Image Mode PNGs are broken: the wrong SWF was marked as converted, even though it had never actually gone through the conversion process. And it stands to reason that other data has been corrupted over the years, though I suspect that most errors are quickly patched automatically as time goes by, since every time a pet’s name is entered on the homepage, we update our data to match that pet’s appearance. Yay, crowd-sourcing!

(By the way, the whole 24-hour thing where loading pets by name was misbehaving: that was an unforeseen side-effect of the bug fix, and it has since been patched. Phew.)

So, we restructured our database, and everything now seems to be in order. We should never see that type of data corruption ever again, yay! However, I would like to point out that, though all seemed well in my basic testing—and I’m no longer receiving automatic error reports every single minute of the day, woot!—it’s very possible that I missed something. Please let me know at webmaster@openneo.net if you notice anything suspicious going on, and bear in mind that we may have to roll back the database if we discover a particularly large error.

So, that’s that. Thanks for being willing to handle a little downtime and for all the super-helpful bug reports that helped me track down this nasty little issue. It always makes me so happy to see that, even when I’m off at college and don’t really have the time to fully manage a big site like Dress to Impress, I don’t really need to: you guys take care of everything by yourselves. All the data you see on the site is added and updated by the community simply by entering pets’ names on the homepage, and it’s beautiful. Thank you for making my life so easy. You’re the best.

Happy new year!
Matchu 


Jan 4
New color: Eventide!
A nice, pretty skyscape, to be sure. I’m not convinced that we really need more new colors, but hey! At least they look nice.

New color: Eventide!

A nice, pretty skyscape, to be sure. I’m not convinced that we really need more new colors, but hey! At least they look nice.


Dec 11

Image converters back online - and everything’s smooth!

UPDATE: Wow! The converters cleared the backlog much faster than expected. Everything should be smooth sailing now :)

So, as it happens, the Image Mode converters have been down for a while now, and finals week prevented me from looking into it too closely. However, I’ve since managed to track down the issue, and the converters are now back online and gleefully plowing through a backlog of SWF-to-PNG conversions. Hooray!

Unfortunately, this will likely result in a heavy server load over the next 24 hours. The converters are always on, waiting for conversion jobs, but those jobs only rarely appear. Today, however, the converters will be working non-stop to process their backlog, which may cause delays to the main site. Sorry for the trouble, and thanks for understanding.

For the technologically minded, here’s a brief run-through of the issue at play:

Conversion jobs failed to run due to a permissions error on the server: the converter would try to save some temporary files, but would be told that it’s not allowed to save files to that folder, because that folder for some reason belonged to my personal account instead of the converter’s account. I removed the folder, allowing the converter to replace it with a folder that belonged to the correct user, and all was well. Crazy how one small thing like that can bring down an entire feature, eh?


Nov 15

Aug 22

Botox round two

Sigh. Just like last week, Dreamhost started pointing impress.openneo.net to another customer’s website.

We emailed them asking them to both resolve the issue immediately, and find a more permanent solution. The Dreamhost staff implemented a short-term solution within minutes, though we haven’t heard back yet about why this seems to keep happening. In the meantime, newimpress.openneo.net still works whenever impress.openneo.net goes down, in case anyone asks.

Thanks for understanding. Happy Monday!


Aug 14

Sorry for the Botox downtime!

Well, that was odd. I woke up this morning and found that at some point while I was sleeping http://impress.openneo.net/ started forwarding to some cosmetic consultants website. Umm, weird.

To be clear: this was not any sort of attack against DTI, and the Botox site was totally unrelated to Neopets. Dreamhost had some internal trouble, and started pointing that domain to another customer’s website. The Dreamhost support team quickly resolved the issue after we reported it.

Sorry for the trouble, and happy Sunday!


Jul 31

Jul 30
Happy teaser day! While Image Mode is pending a server upgrade, we’ve still been hard at work…

Happy teaser day! While Image Mode is pending a server upgrade, we’ve still been hard at work…


Jun 4

Upgrade successful!

Today we had some downtime here and there, but it was for a worthy cause :) In short, we swapped out some of the technology that runs Dress to Impress in order to reduce memory usage (that’s how much RAM the site is using) so that we can do more things at once.

Note that these changes involved moving to a different version of Ruby. I think I caught all of the major compatibility issues, but please keep an eye out for errors and report them when you see them.

Thanks for supporting Dress to Impress, both by spreading the word and donating to keep our super cool server running. You guys make it all possible :D

For the more technically minded…

Here’s the lowdown on our stack before the change. We were using the nginx web server (that’s kinda like Apache) as a proxy to share HTTP requests among three separate instances of Thin, a web server that runs Ruby on Rails applications like Dress to Impress. I originally chose this setup because Thin supports asynchronous connections: most servers can only handle one request at a time, which can be a problem if there’s one type of request that’s a serious bottleneck, like how typing a pet’s name into the box on the homepage means waiting on a request to Neopets.com, processing all that data, inserting it into the database, etc. However, asynchronous requests turned out to be too messy to implement, causing errors on various pages at random intervals. It would seem that the technology isn’t quite as polished as it needs to be. So, I instead left it at just having three Thin instances, since if one were busy loading a pet, the other two were probably free. (And loading a pet doesn’t really take that long, anyway.)

However, those Thin processes were running out of memory. Each Thin instance loaded up its own copy of Dress to Impress and all the libraries the site uses (and there are quite a few big ones!), which require a lot of RAM. This usually wasn’t a big problem, though once or twice a day Monit, our program that keeps an eye on the server, would restart a Thin process for taking up too much memory. Since it was only one at a time, y’all usually didn’t notice, since the other two stayed up; however, falling back on other servers is just our backup plan, and it’s not acceptable for that to happen regularly. Also, we’re working on a big feature that will take some serious processing power over the next few days—guess what it is!—so it’s important that we have significantly more RAM available without sacrificing the concurrency we get from multiple Thins.

So, we moved to Ruby Enterprise Edition via Phusion Passenger. That’s two important things there. The first is another version of the Ruby interpreter. Ruby is the programming language we use to make Dress to Impress, and there are a few different programs out there that run Ruby code. Ruby Enterprise Edition (REE) is a modification of the standard Ruby 1.8 interpreter, designed specifically to reduce memory usage. Phusion Passenger is another open-source project by the same company that replaces our Thin servers. Its main selling point is that it’s easy to plug into servers like Apache or nginx, but its best feature for us is that, when used with REE, Passenger also significantly reduces memory usage. It’s a clever trick: since every instance of Dress to Impress loads exactly the same fundamental application code, Passenger uses REE’s memory-sharing feature to only load one copy of the application code and share it among the application instances. So, instead of loading three separate copies of the app, we load three mini-servers that use the same application code. Much better for our memory usage by far :D

Hooray! We’re stable! Now the server is running more smoothly, and is able to also run a super cool background process or two…and those who stalk the Dress to Impress source code repository should know what they are for ;) It’ll be a few days until all that heavy-duty processing is done, but it will be so totally worth it. I’m excited.

Thanks for your continued support! It’s tons of fun to be able to make all this cool stuff for y’all. Here’s hoping you’ve enjoyed it :)


May 21

Weekend updates :)

We’ve had some fun this weekend fixing some minor issues around the site. Yay!

  • The first gender/emotion state for the Tyrannian Uni no longer has a big, fat White Koi sticking out of it. Please report any other glitched combinations you see, and note that, usually, a good temporary fix is to click one of the other numbered gender/emotion buttons.
  • Entries in the Infinite Closet items database now have prettier URLs and are significantly more search-engine-friendly.
  • We noticed that the NC Mall spider that automatically grabs previews of items being sold in the Mall (which, by the way, is 100% legit by Viacom’s terms) hasn’t been running for quite a few weeks. Turns out out deploy script wasn’t updating the crontab to refer to the app release’s specific path (</techspeak>), so it was trying to run the mall spider on an older version of Dress to Impress. This has since been fixed, and the spider is now working overtime to catch up on some things it missed.

In short, bug fixes and minor upgrades. Since major features, exciting as they are, can get exhausting to work on at great length. (But, man, are they exciting ;D)

Thanks for using Dress to Impress!
—Matchu 


Page 1 of 8