Review: Big Data Book by Nathan Marz

Recently, I finished reading the latest “early access” version of the Big Data Book by Nathan Marz.

What is Big Data

Let’s look up Wikipedia:

In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, analysis, and visualization.

So, Big Data is relevant for any technical and business person whose company deals with lots of information and wants to make use of it. For example, Gmail search, etc.

Why this book is awesome

The book has been a fascinating and engaging learning for me because of two reasons:

First, it has a strong and simple “first principles” approach to an architecture and scalability problem, as opposed to the confusing (to me) and mushrooming complexity and treating Hadoop as a panacea in the Big Data world.

Second, Nathan Marz was one of the only 3 engineers who made the BackType search engine (the company was acq-hired by Twitter):

BackType captures online conversations, everything from tweets to blog comments to checkins and Facebook interactions. Its business is aimed at helping marketers and others understand those conversations by measuring them in a lot of ways, which means processing a massive amount of data.

To give you an idea of the scale of its task, it has about 25 terabytes of compressed binary data on its servers, holding over 100 billion individual records. Its API serves 400 requests per second on average, and it has 60 EC2 servers around at all times, scaling up to 150 for peak loads.

It has pulled this off with only seed funding and just three employees: Christopher Golda, Michael Montano and Nathan Marz. They’re all engineers, so there’s not even any sysadmins to take some of the load.

Note: BackType’s (now open sourced) real time data processing engine Storm powers Twitter’s analytics product and real-time trends among other things.

[Read more…]


Wrote an EDN format reader and writer in Python

I was reading about the EDN format over the weekend. EDN (pronounced like in “eden garden”) is a data format in the same league as JSON but is supposed to have some nifty features such as sets, keywords, date-time type, custom types, and also being a proper subset of Clojure.

Having a date-time type as well as custom types seems useful to me, so I was taking a look at the current Python implementations of the EDN format and I didn’t find them satisfactory, for example, one of the listed ones had all custom parsing code which was difficult to read, one was not even a real implementation, just boilerplate code, etc.

So I thought why not create a better implementation and I did – it is up on GitHub at

It has been a long time since I did lex and yacc, so it was a fun weekend project :)


Created an app for live preview of Pandoc

When my wife was editing my books, she used for live preview of the text so that she knows what the output is going to be like. The caveat was that does plain Markdown and not Pandoc format which would mean the preview would be screwed up whenever there was a code block, etc., so, today morning, I hacked up an app called “Kalam” which does exactly that – live preview for Pandoc text.

The app is based on top of node-webkit (which I came across when I was wondering what Light Table is built upon), created by Roger Wang and others at Intel China Open Source centre, they’ve basically integrated node.js into webkit and disabled all the security restrictions, which makes it the almost-perfect cross-platform desktop toolkit – write HTML, CSS, JavaScript and use any node.js module!

Update: There’s also AppJS which is the same concept as node-webkit but looks more polished (via @aravindavk)


Delhi By Cycle

I had an interesting morning today cycling around Old Delhi guided by Arkash of Delhi by Cycle tour.

Seeing a city by cycle was a great incentive by itself and was fun. But I ended up with a case of the Paris Shock Syndrome because of the state of filth of the city, even the grand Red Fort is nestled in a pile of garbage and we saw the city’s poo and pee being dumped into the “holy Yamuna river.”

The fun part for me was cycling through parantheywali gully which is impossible to imagine in the evenings.
















No matter how many times I have heard about Taj Mahal, it’s startling how big it really is.

It’s amazing that it is built on top of water wells and special wood because the wet wood makes it resistant to vibrations and hence earthquake-resistant. Of course, Shah Jahan didn’t anticipate that subsequent generations will destroy the river and hence endanger the Taj Mahal.

Thanks to a great guide Raju ( +919917371773 ), we had a fact-filled tour of Taj Mahal and Agra Fort, and his stories made us imagine and visualize the Mughal days vividly, right from the underground 6-floor well structure for the women to bathr children and be cool during summers to the treasure holes where the princesses can store their jewels to the optical illusions to the design of the diwan-i-aam where the emperor sits to interact with the public and one can see everyone from only that spot and nowhere else because of the pillars to why a Mughal emperor’s daughters were never married (so that they never have to bow down before the son-in-law and in-laws) to why only 16 of the 500 palaces within Agra Fort is open to public to how fountains work without electricity and motors to why there are gardens in 4s / squares to why there are only flowers and no animals in the carvings to how much Shah Jahan really loved Mumtaz (he spent 22 years spending a significant portion of the empire’s money on it because he made a promise to her that he will build a memorial for her, but he eventually lost his zest for life and could not focus on the empire’s strength an health, and hence Aurangzeb wanted to take over to keep the empire intact but he was not considered a good man himself) to why there are 14 verses of Quran inscribed on the walls of the Taj Mahal (Mumtaz Mahal died giving birth their 14th child).






After our previous flying fox / zipline experience, we were looking forward to the
6-zipline Flying Fox experience in the Mehrangarh Fort at Jodhpur and we were not disappointed, it was a picturesque location and the exciting ziplines went over water, fort walls and across the blue city scenery, ranging from 80m to 310m in length each!










P.S. Mehrangarh Fort is the same fort you may have seen in The Dark Knight Rises (Batman) movie.


We had an interesting camel ride and fantastic night in the Thar Desert, thanks to Trotters, Jaisalmer, Rajasthan.















Investing in the Tamenglong-Haflong Road

Today, I invested a small contribution of Rs. 5000 for the Tamenglong-Haflong Road a.k.a. The Great Indian Road.

What’s so great about it? Because it connects Assam, Manipur and Nagaland, and because it is crowd-funded by Indians all over. The Government of India has always ignored the North-Eastern region of India, so a road funded by Indians and NRIs is being created now.

I first came to know about this initiative by a random tweet pointing to this Times of India article talking about a Naga IAS officer Armstrong Pame building a 100km road without government help. Later, I read about his life story leading up to achieving the venerated position of an IAS officer.

I was surprised to read that funds were being raised via Facebook. I had a “Is this genuine?” question in my head, so I joined the “Tamenglong-Haflong Road Construction” Facebook group to learn more about it.

Stories about support and progress were pouring in the group which was really heartening to see great things being achieved by social media in the midst of all the anti-internet freedom and anti-citizen activities by the government of the day. For example, just today, Jeremiah Pame gave an update in the Facebook group that a Mr. Thomas Riamei, from a village called Saramram has given his bulldozer free for use of the construction of the road till completion, and a few days ago, Taranbir Singh, an IIT-graduate NRI based in New York donated 1.5 lakh rupees to the cause.

Having seen all these developments, I made a small contribution today to the cause and am looking forward to the day that this road becomes a reality and we never have to read such stories again:

Last December, then Union home minister P Chidambaram visited Manipur and asked what happened to the road.

The state government declared that it would be ‘done soon’, but nothing moved on the ground. Then in June-July this year, there was an outbreak of tropical diseases like typhoid and malaria. It takes two days for anyone in the village to make it to the nearest hospital on foot in the absence of a motorable road. Hundreds of patients had to be carried on makeshift bamboo stretchers, but very few made it to the town alive.


Why is Whatsapp so pervasive?

I’m astounded on how popular Whatsapp (messaging app for phones) really is:

  • My wife’s friend who runs a boutique went to an old market to buy cloth material for her shop – the salesman asked her to send the specific color she wants with a picture via Whatsapp. Think of an old dusty market and think of this again.
  • My wife’s friend who is a recent mom talks to her paediatrician via whatsapp for advice and general questions, and the doctor replies back (regardless of location).
  • An uncle and aunt in US go for shopping in the big malls and send photos to each other of whether they should pick up that item or not.
  • My uncle and aunt were in town and we went shopping – again, we sent photos of the T-shirts to their son in another town to ask whether he likes the shirt enough to buy it – decision done, shirt bought, no risk of a T-shirt going unused.
  • Recently, there was an incident in Bangalore because of which SMSes were restricted, which is like a heart attack for teenagers, including my sister – they all downloaded Whatsapp and shifted to it in an instant. “SMS costs – be gone!” Luckily, Whatsapp was free for a day or so in the iTunes app store around the same time and my sister, who is using my old iPhone (which is still working after 4 years) and does not have a credit card, grabbed it with eager fingers and is loving it.

I could go on and on, the point remains that the pervasiveness of it still surprises me. And it surprises many of my peers who grew up with email, Yahoo! Messenger, Google Talk, etc.

So I was imagining what could be the reasons that Whatsapp is so popular, and here are some wild guesses:

  • 2G on mobile is finally affordable? Now that 3G is more common and has been around for a couple of years, the slower predecessor has finally become cheap enough.
  • WiFi is more common now?
  • BlackBerry “BB-PIN” popularized the concept of instant messaging to a new phone-using generation, but people needed something cross-platform and Whatsapp was in the right place at the right time?
  • Whatsapp is available on most mobile operating systems including many older generation platforms such as Symbian, so people are not left out of the conversation.
  • Why didn’t people simply use GTalk? I’m guessing it’s because of the “create a Google account” barrier as well as GTalk not being as feature-full?
  • Talking about features – groups, photo sharing, video sharing is a natural extension that was just meant to happen, Whatsapp makes it free (as opposed to SMS/MMS)
  • The details in Whatsapp are great – for example, every message has two ticks – one that says it has gone from your phone to the server, the second tick shows that it has gone from the server to the other person’s phone – an in-built message delivery status as opposed to guessing whether the SMS has reached the other person
  • Did I mention how useful the groups feature is? I’m keeping in touch with friends all over the world through the same – in particular, one group has people in USA, Singapore, India all in one group and having a conversation at the same time.
  • So why didn’t email do the same? Because people have a work email/personal email distinction whereas a phone is undoubtedly personal? Because people don’t like to differentiate between a subject line and a body line (don’t laugh, I did too until I realized this is actually a barrier), they just want to “chat” because people are already familiar with SMS?

These are just my rough line of thought about this whole thing and I just wanted to write it down because many of my friends have asked the same question.

What are your thoughts?


Nostalgia in a startup vs. Anu Aunty book

In a long bus ride, I read How I braved Anu Aunty and made a million-dollar company and I loved the book. The stories in the book are especially familiar to those who have faced the ire of family and sometimes friends at wanting to do a startup.

Anu Aunty Book

In the midst of the book, there is a passionate explanation by Varun Agarwal of why his idea of alumni T-shirts and alumni hoodies are important to people:

… The strangest thing was that my long-forgotten cupboard kept yielding one memory after another. I ran into a lot of my stuff from school that had got lost in the decade gone by. I started thinking of all those wonderful days. And that is when it hit me. That is what Alma Mater was about! It was about bringing those good ol’ days back. It was about taking you down that memory lane that leads to the wonderful times of school and college.


… We didn’t have Facebook then but we did have ICQ. One line none of us from that ‘era’ can ever forget is ‘ASL (age/sex/location) please,’ when meeting someone new on ICQ.  We had atrociously funny-sounding email ids –, and the like and even funnier names in the ‘chat rooms.’ You couldn’t Google but had to go to or approach Mr Jeeves for any queries and clarifications.


You still had to call a girl on her landline and muster all the courage to ask for her. The only place you could hang out at was Wimpy’s or McD and one still stayed away from the solitary Coffee Day on Brigade Road. Galaxy was where all the movies played and one had to stand in a long queue to buy tickets for Mission Impossible 2.


TV still played The Wonder Years and The Crystal Maze and the world seemed far smarter minus the Saas-Bahu soaps and the reality shows.


You could still find the time to read a book in the evenings  and play cricket in your ‘gully’ on Sundays. ‘Canada Dry’ was the only source to get high and sweet, candy cigarettes were puffed at most of the times.


VSNL ensured porn still loaded one byte at a time and VCDs were all the rage. Hulk Hogan was perpetually rank one on all the ‘Trump Cards’ and Cameron Diaz from The Mask was in every puberty-hitting youngters’ dreams. The only operating system we knew of was Windows 98.


Anyone with a printer was treated with respect and the World Book Encyclopedia was the only source of information for projects. Hero Pen was the original Chinese nib was still preferred over the brash new ‘Pilot’ pen.


Azharuddin was still our captain and Jadeja and Robin Singh were our pinch hitters. Venkatesh Prasad was the only one with the balls to mess with the Pakis and we still lost all the test matches.


And I definitely cannot miss out wearing a ‘colour’ dress to school on your birthday and distributing Eclairs to everyone.


I could go on and on. but I guess you get the drift.


As I cleaned my room, I ran into my long forgotten collection of Tinkle. Gosh, how I used to love those comics.


I guess some of us might hate to admit it now but everyone of us have read a Tinkle at some point or the other in our childhood. Even though it would be really un-cool to talk about ‘Suppandi’ now, he was the coolest character we knew in junior school. Before there was cartoon network, before Swat Cats took over, there was Uncle Scrooge on Doordarshan and there was Tinkle.


… I guess Tinkle comics have long been forgotten but they will always remain with us in our memories and will always remind us of times when things were simpler, when Bangalore was greener, when one would get up at 7a.m. on Sundays to catch Talespin on DD, when Phantom cigarettes ruled and chakra was more than just wheels. When we wouldn’t worry about deadlines, meetings, Facebook and everything else that our lives have become today. We would only worry about when the next Tinkle comic would be out. Sadly, Uncle Pai, the creator of the series passed away recently. RIP Uncle Pai and thanks for the memories. We owe you way more than one.


So, you see, Alma Mater was not just about starting another company. It was about starting a whole new subculture. Of making you feel like you were in school or college again – that wonderfully delicious feeling.

Reading those words flooded my mind with wonderful memories – I could have written those words! I could relate to almost every single word – right from ICQ to funny-sounding email IDs to Wimpy’s to The Crystal Maze to gully cricket to candy cigarettes to Cameron Diaz in The Mask to Windows 98 to World Book to Venkatesh Prasad to Eclairs to Tinkle to Talespin. Phew!

Thank you, Varun Agrawal, for the nostalgia as well as a wonderfully written hilarious story on entrepreneurial struggle vs. Indian family culture. I especially love the way his bargaining skills with the auto rickshaw walla improved as he went further down his entrepreneurial journey!

Go read the book, it’s a perfect Sunday read.

Update: Based on the book’s recommendation, I watched Dead Poets Society, 1989 movie feat. Robin Williams as a teacher, and absolutely loved it – Carpe Diem!