Tuesday, September 18, 2012

Data

Anyone ever tried to use excel to view a large csv file and gotten really sad? Like frustratingly sad?

Here's how this came about. Rahm Emmanuel in all his wisdom opened up the city of chicago's data as much as possible. One dataset is crime 2001 to present here. Well, if you go to this, and export to csv, you get ~1.08 GB of information. Loading this into Excel was were the fun began. First, my Excel for Mac 2011 seems to have a limit on worksheet sizes of about one million. Second, the helpful dialog pops up saying that the import failed, but that I'll get as many rows as can fit, and that I should import the remainder using "start import at row: n" in the csv dialog. Which makes a lot of sense. However, the most positive integer which is allowed in the dialog box for row number is 37677, 2^15 - 1. A signed 16 bit integer is still in a modern excel running on a 64 bit host? I'm awestruck by the quality.

I will try this same experiment later on in windows using excel 2010 to see what gives.

In defense of excel, bringing up eshell in emacs 24, then typing `cat Crimes... | grep NARCOTICS` gave me three lines before it froze the program. There should have been over 511,000 lines for that.

This is going to be fun.

1 comment:

BadPirate said...

You should try importing it into google docs... Just for fun :)