ARN

How to work with the Python list data type

Use Python lists to store data in one-dimensional rows, access them by indexes, and sort them any which way you like

Python comes with a collection of built-in data types that make common data-wrangling operations easy. Among them is the list, a simple but versatile collection type.

With a Python list, you can group Python objects together in a one-dimensional row that allows objects to be accessed by position, added, removed, sorted, and subdivided.

Python list basics

Defining a list in Python is easy — just use the bracket syntax to indicate items in a list.

list_of_ints = [1, 2, 3]

Items in a list do not have to all be the same type, either. They can be any Python object. (Here, assume Three is a function.)

list_of_objects = ["One", TWO, Three, {"Four":4}, None]

Note that having mixed objects in a list can have implications for sorting the list. We’ll go into this later.

The biggest reason to use a list is to able to find objects by their position in the list. To do this, you use Python’s index notation: a number in brackets, starting at 0, that indicates the position of the item in the list.

For the above example, list_of_ints[0] yields 1. list_of_ints[1] yields 2. list_of_objects[4] would be the None object.

Python list indexing

If you use a positive integer for the index, the integer indicates the position of the item to look for. But if you use a negative integer, then the integer indicates the position starting from the end of the list. For example, using an index of -1 is a handy way to grab the last item from a list no matter the size of the list.

list_of_ints[-1] yields 3list_of_objects[-1] yields None.

You can also use an integer variable as your index. If x=0list_of_ints[x] yields 1, and so on.

Adding and removing Python list items

Python has several ways you can add or remove items from a list.

  • .append() inserts an item at the end of the list. For example, list_of_ints.append(4) would turn list_of_ints into the list [1,2,3,4]. Appends are fast and efficient; it takes about the same amount of time to append one item to a list no matter how long the list is.
  • .pop() removes and returns the last item from the list. If we ran x = list_of_ints.pop() on the original list_of_ints, x would contain the value 3. (You don’t have to assign the results of .pop() to a value, though, if you don’t need it.) .pop()operations are also fast and efficient.
  • .insert() inserts an item at some arbitrary position in the list. For example, list_of_ints.insert(0,10) would turn list_of_ints into [10,1,2,3]. Note that the closer you insert to the front of the list, the slower this operation will be, though you won’t see much of a slowdown unless your list has many thousands of elements or you’re doing the inserts in a tight loop.
  • .pop(x) removes the item at the index x. So list_of_ints.pop(0) would remove the item at index 0. Again, the closer you are to the front of the list, the slower this operation can be.
  • .remove(item) removes an item from a list, but not based on its index. Rather, .remove() removes the first occurrence of the object you specify, searching from the top of the list down. For [3,7,7,9,8].remove(7), the first 7 would be removed, resulting in the list [3,7,9,8]. This operation too can slow down for a large list, since it theoretically has to traverse the entire list to work.

Slicing a Python list

Lists can be divided up into new lists, a process called slicing. Python’s slice syntax lets you specify which part of a list to carve off, and how to manipulate the carved-off portion.

You saw above how to use the bracket notation to get a single item from a list: my_list[2], for example. Slices use a variant of the same index notation (and following the same indexing rules): list_object[start:stop:step].

  • start is the position in the list to start the slice. 
  • stop is the position in the list where we stop slicing. In other words, that position and everything after it is omitted.
  • step is an optional “every nth element” indicator for the slice. By default this is 1, so the slice retains every element from the list it’s slicing from. Set step to 2, and you’ll select every second element, and so on.

Here are some examples. Consider this list:

slice_list = [1,2,3,4,5,6,7,8,9,0]
slice_list[0:5] = [1, 2, 3, 4, 5]

(Note that we’re stopping at index 4, not index 5!)

slice_list[0:5:2] = [1, 3, 5]

If you omit a particular slice index, Python assumes a default. Leave off the start index, and Python assumes the start of the list:

slice_list[:5] = [1, 2, 3, 4, 5]

Leave off the stop index, and Python assumes the end of the list:

slice_list[4:] = [5, 6, 7, 8, 9, 0]

The step element can also be negative. This lets us take slices that are reversed copies of the original:

slice_list[::-1] = [0, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Note that you can slice in reverse by using start and stop indexes that go backwards, not forwards:

slice_list[5:2:-1] = [6, 5, 4]

Also keep in mind that slices of lists are copies of the original list. The original list remains unchanged.

Sorting a Python list

Python provides two ways to sort lists: You can generate a new, sorted list from the old one, or you can sort an existing list in-place. These options have different behaviours and different usage scenarios.

To create a new, sorted list, use the sorted() function on the old list:

new_list = sorted(old_list)

This will sort the contents of the list using Python’s default sorting methods. For strings, the default is alphabetical order; for numbers, it’s ascending values. Note that the contents of the list need to be consistent for this to work. For instance, you can’t sort a mix of integers and strings, but you can sort a list that is all integers or all strings. Otherwise you’ll get a TypeError in the sort operation.

If you want to sort a list in reverse, pass the reverse parameter:

new_list = sorted(old_list, reverse=True)

The other way to sort, in-place sorting, performs the sort operation directly on the original list. To do this, use the list’s .sort()method:

old_list.sort()

.sort() also takes reverse as a parameter, allowing you to sort in reverse.

Both sorted() and .sort() also take a key parameter. The key parameter lets you provide a function that can be used to perform a custom sorting operation. When the list is sorted, each element is passed to the key function, and the resulting value is used for sorting. For instance, if we had a mix of integers and strings, and we wanted to sort them, we could use key like this:

mixed_list = [1,"2",3,"4", None]
def sort_mixed(item):
    try:
        return int(item)
    except:
        return 0
sorted_list = sorted(mixed_list, key = sort_mixed)
print (sorted_list)

Note that this code wouldn’t convert each element of the list into an integer! Rather, it would use the integer value of each item as its sort value. Also note how we use a try/except block to trap any values that don’t translate cleanly into an integer, and return 0 for them by default.

Python lists are not arrays

One important thing to know about lists in Python is that they aren’t “arrays.” Other languages, like C, have one-dimensional or multi-dimensional constructions called arrays that accept values of a single type. Lists are heterogenous; they can accept objects of any type.

What’s more, there is a separate array type in Python. The Python array is designed to emulate the behaviour of an array in C, and it’s meant chiefly to allow Python to work with C arrays. The array type is useful in those cases, but in almost every pure-Python case you’ll want to use lists.

When to use Python lists (and when not to)

So when are Python lists most useful? A list is best when:

  • You need to find things quickly by their position in a collection. Accessing any position in a list takes the same amount of time, so there is no performance penalty for looking up even the millionth item in a list.
  • You’re adding and removing to the collection mainly by appending to the end or removing from the end, in the manner of a stack. Again, these operations take the same amount of time regardless of the length of the list.

A Python list is less suitable when:

  • You want to find an item in a list, but you don’t know its position. You can do this with the .index() property. For instance, you could use list_of_ints.index(1) to find the index of the first occurrence of the number 1 in list_of_ints. Speed should not be not an issue if your list is only a few items long, but for lists thousands of items long, it means Python has to search the entire list. For a scenario like this, use a dictionary, where each item can be found using a key, and where the lookup time will be the same for each value.
  • You want to add or remove items from any position but the end. Each time you do this, Python must move every other item after the added or removed item. The longer the list, the greater the performance issue this becomes. Python’s deque object is a better fit if you want to add or remove objects freely from either the start or the end of the list.