The inside story of a London based startup

Text

tl;dr: You should compose your web app with IO streams

On dump.ly, one of our most loved features is the download button, which creates a zip file with all the original images in an album.

The original solution was hacked up very quickly (ie was pretty ghetto). It simply downloaded each file from S3 into a temp folder on the server, zipped up that folder via a shell exec, and then sent the resulting archive down the pipe.

While it worked correctly, there were some problems:

Initial latency

The user has to wait while each file is downloaded to the server and zipped up. This results in bad UX, as the user expects an immediate action (the browser download popup) however they’re left waiting 10-30 seconds while the server does its work.

Spiky Server Load

The S3 downloads and zipping are both spiky CPU and IO intensive. This is the worst kind as our EC2 instances which usually can handle hundreds of thousands of requests, can very easily be brought to their knees.

Complexity

While this was supposed to be the quick and simple solution, it very quickly got messy as we needed to ensure all temporary files got deleted in case of errors. We also had to add a work queue to limit the number of concurrent requests to avoid overloading the server.

Repeated work

Each and every request would generate the whole zip file again, even if the download was aborted.

We really needed an improvement, so the the first thought was to add caching (the magical solution to everything). With a cache, there would only be a big hit on the first download, and so the work to create an archive can be amortized across multiple requests. As users can also select which images they want in a zip, we would have to use a hash of the image ids as a key for the cache. We would also have to store the cache files in S3, so that all front end servers can use them, and also work out an expiry strategy.

While this seemed like a sane idea, it reminded us of the proverbial ‘putting lipstick on a pig’. Then we thought: why can’t we just generate the zip on the fly without ever touching disk?

Streams to the rescue

Well, we can. Node has built in support for downloading data in chunks (eg files from S3), running chunks of data through deflate, and firing those chunks back at the user. All are exposed through the beautiful stream interface, and so can be composed to create a pipeline.

One immediate problem is nodejs’ zlib module only compresses raw data. To actually create a zip container we need to write out a bunch of headers, a few checksums, and an envelope for each file. Luckily github user wellawaretech had created a module zipstream, which I’ve forked, to wrap all this magic up.

Now, when a user clicks the download button, the server:

  1. Enumerates all the requested images in that album.
  2. Immediately writes to the client’s http response the http headers to say it’s a download and the file name is .zip.
  3. zipstream writes the header bytes of zip container.
  4. Creates an http request to the first image in S3.
  5. Pipes that into zipstream (we don’t actually need to run deflate as the images are already compressed).
  6. Pipes that into the client’s http response.
  7. Repeats for each image, with zipstream correctly writing envelopes for each file.
  8. zipstream writes the footer bytes for the zip container
  9. Ends the http response.

This is so much better than before, as:

  1. The download is now immediate, with only a second or two of latency.
  2. The pipeline ensures that the whole process only runs as fast as the slowest bottleneck, which is usually the client download speed. It’s auto throttling.
  3. Everything is in memory, and nothing ever touches disk. Only as much work as needed is ever done. eg Aborting a 1 GB download at 1MB, will only waste 1MB of CPU processing and IO bandwidth.
  4. We can run many thousands of downloads concurrently on one server, as at each point in time a download only takes minimal resources: 2 http requests and a few JS stream objects.
  5. Alternatively, we can run smaller cheaper servers, and get the same experience.
  6. The code is significantly simpler, no need to manually throttle, no need to clean up temp directories after, no work queues.
  7. No need for a cache, as we only stream what we need. Again less code to maintain.

So just overall better engineering. The only downside is that it’s conceptually more complicated, and requires some understanding of underlying components (zip files, http responses, streams). While IO streaming is node’s bread and butter, and this implementation was relatively trivial, this may not be true for other frameworks.

Future improvements

So what next? Well we can try to make everything streaming: our upload process waits for the image before processing it. It would be really cool if the processing code could run AS the image is coming in, though this is harder to implement as it would require support in the underlying graphics library (graphicsmagick). What about our API servers, can we write out JSON as it gets generated from the DB? Probably, but how do we expose our DB as a composable stream? Let me know in the comments if you’ve done something like this, or can suggest improvements.

Dharmesh Malam

CEO, dump.ly

Text

“A unique opportunity to meet the hottest Berlin startups” was the promise, but did it deliver?

I wasn’t bursting with excitement at the prospect of attending “Connecting Digital Startups between UK and Germany.” Why such an overly-functional title couldn’t possibly be interesting. Thankfully, I can tell you I was wrong, and so, my mother must have been right (she won’t be reading this.)

As was so clearly explained in its title, the emphasis of the event was to build relationships. The informal style, stand-whilst-you-eat bar tables and a change-seats-between-courses dining policy ensured this happened. It’s easy to come home with a stack of business cards, but to feel comfortable enough to drop someone a mail & continue the conversation the next day is the true sign of a successful ‘networking event.’ Sure enough my outbox was full the following morning, as was my inbox; it continues to surprise me how selfless the startup-community is and UK-German relations were no different.

To entertain us between the networking, several speakers, both Entrepreneur and VC, took to the stand and were comedic and inspirational in equal measure. The one take away lesson from the entrepreneurs? Success in business requires some serious hustling , and sometimes a little rule bending! (That’s two lessons I know.) As an example, when you’re next 14 and changing schools, look around at your new classmates and see what you can sell them (sale of pirate video games not endorsed here), and in the process turn a $60k profit and hide it under your bed! Or how about this? When you’re next at the bar, find the nearest small business owner, plie him with drinks, offer him some technology consulting with your (as-yet-non-existent) consultancy firm, and worry about how to deliver when you sober up the next morning with a fat cheque in your pocket. Like I said, hustle and throw yourself in way over your head, you could be surprised where you come out on the other side.

If the entrepreneurs inspired, the “sharks” (aka VCs) sobered with some frank insights into their world. I’m not going to bore you with the details here, [this] (http://www.danshapiro.com/blog/2010/08/vc-insanity-economics) is a great blog post on how VC’s think. The key take away was probably that VCs aren’t (always) sharks, but it’s essential to understand how their industry works and what their investors expect of them. If this is daunting, don’t worry, as suggested by dailydeals.de (just acquired by google so must know a thing or two) bring on an angel investor before you seek VC funding - angels know the industry and their interests are aligned with yours. On the other hand, seeking VC funding may not always be the best thing for your business - our $60k schoolyard wholesaler has never taken VC funding in his life!

So, we’re now 8 hours into what became a 20 hour day, but I think it’s evident that the promise was definitely delivered. I’d highly recommend this event, and in fact hope the UKTI organise them more frequently. We in the UK treat Europe like our holiday home and love it, yet for some reason technology startups forget that the holiday convenience could also transcend into business. Whilst expanding to the US works for some, it may not always work for many. Perhaps the grass isn’t greener. Events like this should at least act to remind us of what’s on our doorstep and consider all the options.

If you’ve made it this far, then thank you, but this may be where you want to stop reading if our antics around Berlin are of no interest, otherwise, you’re nearly there I promise!…

Berlin (East Berlin to be precise) is now my favourite city, and I assure you I’m not one to frivolously switch allegiances. I can only think to describe it as a quirkier, friendlier, cheaper version of Shoreditch or Greenwich Village, a 10-year younger version would also work. The bars range from speak-easy style cocktail lounges <> to hard core techno clubs and everything quirky in between . The food, although very diverse, is for the main part dominated by Vietnamese noodle houses complete with al fresco dining (blankets provided), Wurst fast-fooderies (think anything with fried Wurst) and Kebab Restaurants. Yes, Restaurant was intended, the London style Kebab House has no place in the city that invented the doner, the Kebabs in Berlin are undeniably better than what we’re used to in London.

For all its quirkiness, morning still follows the-night-before in Berlin, and when it does, head over to where you’ll find yourself surrounded by creatives fuelled by coffee or fresh mint tea staring intently at their macs (required for membership to the club), get in a few solid hours work and you’ll be ready to do it all over again as the evening approaches. If the night really was just too heavy, I’d suggest you head to for a breakfast that will extend into brunch and watch as the city goes by.

If I could speak German, put up with the smoke indoors (everyone smokes), and had enough self control to not eat kebabs every day, I’d probably relocate. For now, there’s easyJet

Text

Dumply is a new private way to send images. Check it out now

Source: dump.ly