Posts

Does LeBron Stand Out?

You decide.

  


Statistics taken from the 2018 NBA Post-season, courtesy of espn.com
Made using Tableau.
The data is openly available and free for public use here.

 

To open the full-sized graph on Mobile click here.

Arrays and ArrayLists

The differences between Arrays, ArrayLists, and Linked Lists can be confusing for a beginning programmer, especially if you come from a language like Python which abstracts these concepts away from you. The purpose of this post is to explain Arrays and ArrayLists – we’ll leave Linked Lists for a future post.

These are extremely important concepts if you want to be able to do well in a technical interview or be a good software engineer.

By the end of this post you will be able to answer:

What is the time complexity of removing an item from the end of an ArrayList?

What about inserting an item onto the end?

If you can’t answer the above questions, then this post is for you.


The elements of an Array all live next to each other in memory.

Creating two arrays looks like this:

Array my_birthday = [7, 23, 1996];
Array squares = [4, 9];

Inside your computer’s memory, this looks like:

Arrays and Strings Images (2).jpg

The computer’s memory is a big table of numbers. Given a memory address, our computer can return the item at that address almost instantly.

Arrays and Strings Images (3)

 

We can also store new values at a given location in a similar manner.

Why do we care about how an array is stored in memory? This mental model is key to understanding the properties of arrays intuitively.

Element Accessing

When I type the following in a script, what happens under the hood?

my_birthday[0]; // 7
my_birthday[1]; // 23
my_birthday[2]; // 1996

To get the 0th item in my array, the computer gets the first 32 bits (because each element is a 32 bit integer) after 100 (the starting address of the array). The bits at this location would be 00000000000000000000000000000111 in binary, which is 7 in decimal.
Because each element is a 32 bit integer, I know that the 1st item starts 32 bits after the 0th, at address 132. The 2nd item is at location 100 + 2*32 = 164. If there were 100 items in the array, the 100th item would be location 100 + 100*32 = 3300. Figuring out the location, and thus accessing the 1000th item of an array takes nearly the same time as accessing the 1st.

The time to access an element in an array is independent of its position in the array, or the length of the array.

This is pretty sweet. Even if your array is millions of elements long, retrieving the first item takes the same amount of time as retrieving the last. In further articles, we’ll look at some other data structures that do not have this property.

Element Setting

Lets say we want to make me 4 years older. We can update the value of an array with the following syntax in most languages:

my_birthday[2] = 1992;

Under the hood, this works very similarly as element accessing. First we calculate that we need to change the 32 bits starting at 100 + 2*32 = 164, and we set those bits to the binary representation of the integer “1992.” Like before, the time this takes is independent of the length of the array.

Arrays are really great at setting and getting values, but there is one large restriction that I haven’t yet mentioned.


Arrays are fixed-length.

The number of items in an array is set in stone as soon as it is created. Arrays have no insert or delete method. Lets say I want to add 16 onto my array of squares. I can’t add it directly to the end, because in reality the data in our memory is very tightly packed and there is likely a conflict with some other piece of data. Instead, I need to create a new array with enough room on the end, and then copy each element of the array over to an unallocated section of memory.

Arrays and Strings Images (5)

This will take much longer for an array of 10,000 elements than for an array of 100 elements, because many more elements must be copied over. This is horribly inefficient and no-good. Luckily this is where our friend the ArrayList comes to the rescue.

The ArrayList

Under the hood, an ArrayList is just an Array, except it has methods for inserting and deleting elements. In addition, this insert method works in a way that drastically reduces the amount of times we’ll need to copy the underlying Array to a new location. It works like this:

Whenever you need to add an item onto the end, if there is no previously allocated space, all the elements are copied to a new location and twice as much memory is allocated as before.

Arrays and Strings Images (8).jpg

Most of the time there will be unallocated space on the end, in which case the time cost is next to nothing. Whenever the array needs to be expanded, all of the items are copied over, which is still slow. However, these expansions are fairly rare, so the on average appending onto an ArrayList is very fast. For example, if 100 items were added onto the end of a 4 element array, it would only have to be expanded 5 times, at sizes 4, 8, 16, 32, and 64. The other 95 appends will be practically free.

A Word on Time Complexity

When talking about the running time of algorithms, we often use “big O” notation. Big O means “on the order of”. The time complexity of copying every element from an array of length n is O(n) because the running time is proportional to n, the length of the array.

If the running time of an operation is independent of it’s the length of the array, then we say that it’s O(1). This is the case for accessing an element of an array. When there is extra unallocated space on the end of an ArrayList, appending is O(1). In fact, we say that appending onto an ArrayList is O(1) on average because it needs to be expanded so rarely (Building up an array of length 1 billion requires just 30 expansions).

Deleting

Deleting an element from an ArrayList requires moving over every other element.

Arrays and Strings Images (7)

Deleting the 2nd element of a 5 element ArrayList is much quicker than if the ArrayList were 1000 elements longer, so the time it takes to delete an element is (on average) proportional to the length of the ArrayList, or O(n).


When would you use an ArrayList?

All the time! Whenever you’re gathering data, an ArrayList will usually be your friend. It has all the great fast getting/setting properties of Arrays, except you can also add items to it really fast.

ArrayLists are hugely important in many languages. In Python, the basic List is actually an ArrayList. 

Consider the following:

  1. In what situations is an ArrayList better than an Array?
  2. In what situations is an ArrayList similar to an Array?
  3. Are there any situations in which an ArrayList could be worse than an Array?

In Conclusion

You’ve hopefully learned the following:

  1. The mechanics of getting/setting items from an Array.
  2. A useful memory model for thinking about Arrays and other data structures.
  3. The Basics of Big O Notation.
  4. What ArrayLists are and how they differ from Arrays.

 

See a mistake? Have any questions? Please leave a comment below!

 

Quantifying a Summer of Learning

Author’s Note: This is a modified version of the presentation I gave on the last day of my summer internship at Tulip. If you’re into impractical but cool data viz, then this may be for you.

What does learning look like?

It was 8PM, the night before my final intern presentation, and I couldn’t get this question of out of my head. Tomorrow I supposed to present the culmination of all my work this summer by doing a demo of my main project. But something about that felt wrong.

This summer had been the most intense period of learning that I could recall. My 5 years of programming in languages like Python, Java, and Haskell left me completely unprepared for the convoluted, crazy world that is modern web programming.

The hardest thing I did this summer was learning how to be a web programmer. The scheduler I wrote was simply a result of all that learning, not the learning itself.

What if there was a way to quantify learning? If there was some way to put into numbers all the things that I had learned this summer, maybe I could present that.

 Google Searches

As a programmer, Google is arguable my most useful tool. The Hypothesis is that as I learned more, I would have to look up less and less. Turns out you can download all the searches you’ve ever done from Chrome. My brain is split about 50/50 on whether this is crazy scary or crazy cool. Probably both.

output_7_0

Although there the line points downward, the trend wasn’t very big. After mulling this graph over some more, it started to make more sense why there wasn’t a bigger trend.

When I started out at Tulip, I had to constantly get help from my mentor because I didn’t even know what questions to ask. As I learned more, I was able to leap-frog my learning and seek out my own answers. This data didn’t seem helpful, but I realized that it was useful for something else. Quantitatively proving my lack-of-sleep schedule this summer.

output_9_0

Note Data

Every time I learned something new this summer, I wrote it down on a google doc. By the end of my time at Tulip, these notes had grown into a giant encyclopedia totaling just under 100 pages. It felt like at the beginning of my internship I was writing pages and pages a day, and by the end I barely had to write down anything. Maybe this was the statistic I’d been looking for.

output_14_0

Turns out, the length of my note entries were pretty much constant throughout. On second thought, doesn’t that mean that I was encountering new challenges and learning new things constantly? It was about midnight, I was really tired, and I could have stopped here. In fact, I probably should have stopped here, but at this point I was hooked. I had one last idea and I had to try it out.

Commits

For non-technical folk, is a commit is like a save point in a video game. It contains all the lines of code you’ve changed in a project since the last commit.

Commits are definitely not a perfect metric. They don’t track the 4 hours you spent digging through your codebase trying to find a mystery bug that resulted in a 1-line change. That just looks like one, tiny commit.

Still, I was hopeful that when viewed cohesively, these commits could tell a powerful story. Unfortunately, there were no pre-existing libraries that could plot my commits in a nice, simple view, showing when each commit was made and how big each commit was. So, I had to write my own.

 output_22_0

Now, this is what I was looking for.

This graph told the story of how I started learning about our codebase with a bunch of small, self-contained projects. You can even see the time period that I start working on my main project, which I finally land on July 5th. At that point I had fluent enough to be able to see where code could be made better, and as a result my commits again drop to almost zero while I’m working on a major refactoring which finally lands on August 4th. And, in the last month I really hit my stride and pump out refactorings and updates to my main project consistently up until the last day of my internship.

I was happy. I was looking for something quantitative which I could point to and say this is what learning looks like, and I found it. My final presentation ended up consisting of a demo of my major project followed by the story that you just heard, and people loved it! They suggested I put it up online, which is what I’m doing now.

All the source code for these graphs is available on GitHub.